Weicong Chen

Trust the Typical

2026

Debargha Ganguly , Srihari Sankar , Biyao Zhang , Vikash Singh , Kanan Gupta , Harshini Kavuru , Alan Luo , Weicong Chen , Warren Morningstar , Raghu Machiraju , Vipin Chaudhary

14th International Conference on Learning Representations (ICLR), April 23-27, 2026, Rio De Janeiro, Brazil

Current approaches to LLM safety rely on a brittle pattern of identifying and blocking known threats via guardrails. This paper introduces Trust The Typical (T3), a framework that reframes safety as an out-of-distribution detection problem, learning the distribution of acceptable prompts in a semantic space and flagging significant deviations as potential threats. Unlike prior methods, T3 requires no training on harmful examples yet achieves state-of-the-art performance across 18 benchmarks spanning toxicity, jailbreaking, multilingual harms, and over-refusal—reducing false positive rates by up to 40× relative to specialized safety models. A single model trained on safe English text transfers effectively to over 14 languages without retraining.

Trustworthy AI Artificial Intelligence

arXiv

BibTeX Citation

@inproceedings{
ganguly2026trust,
title={Trust The Typical},
author={Debargha Ganguly and Sreehari Sankar and Biyao Zhang and Vikash Singh and Kanan Gupta and Harshini Kavuru and Alan Luo and Weicong Chen and Warren Richard Morningstar and Raghu Machiraju and Vipin Chaudhary},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=vfbeleLBWv}
}

LABELING COPILOT: A Deep Research Agent for Automated Data Curation in Computer Vision

2025

Debargha Ganguly , Sumit Kumar , Ishwar Balappanawar , Weicong Chen , Shashank Kambhatla , Srinivasan Iyengar , Shivkumar Kalyanaraman , Ponnurangam Kumaraguru , Vipin Chaudhary

2025 IEEE International Conference on Big Data, December 8-11, 2025, Macau, China

Curating high-quality, domain-specific datasets is a major bottleneck for deploying robust vision systems. This paper introduces Labeling Copilot, the first data curation deep research agent for computer vision, powered by a large multimodal language model that uses multi-step reasoning to execute specialized tools across three core capabilities: Calibrated Discovery for sourcing in-distribution data from large repositories, Controllable Synthesis for generating rare-scenario data with robust filtering, and Consensus Annotation for producing accurate labels via a novel multi-model consensus mechanism. On the dense COCO dataset, the Consensus Annotation module achieves an annotation mAP of 37.1%, and on Open Images it discovers 903 new bounding box categories.

Artificial Intelligence Computer Vision

arXiv

BibTeX Citation

@misc{ganguly2025labelingcopilotdeepresearch,
      title={LABELING COPILOT: A Deep Research Agent for Automated Data Curation in Computer Vision}, 
      author={Debargha Ganguly and Sumit Kumar and Ishwar Balappanawar and Weicong Chen and Shashank Kambhatla and Srinivasan Iyengar and Shivkumar Kalyanaraman and Ponnurangam Kumaraguru and Vipin Chaudhary},
      year={2025},
      eprint={2509.22631},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2509.22631}, 
}

K4: Online Log Anomaly Detection via Unsupervised Typicality Learning

2025

Weicong Chen , Vikash Singh , Zahra Rahmani , Debargha Ganguly , Mohsen Hariri , Vipin Chaudhary

IEEE/ACM International Conference on High Performance Computing (SC25), December 17-20, 2025, Hyderabad, India

Existing log anomaly detection methods are often slow, dependent on error-prone parsing, and use unrealistic evaluation protocols. This paper introduces K4 (Knowing the Unknown by Knowing only the Known), a fully unsupervised, parser-independent framework that transforms arbitrary log embeddings into compact four-dimensional descriptors—Precision, Recall, Density, Coverage—using efficient k-nearest neighbor statistics. Under a realistic online chunk-based evaluation protocol, K4 achieves state-of-the-art AUROC of 0.995–0.999 across HDFS, BGL, and Thunderbird datasets, with training under 4 seconds and inference as low as 4 μs.

Trustworthy AI HPC Artificial Intelligence

DOI arXiv

BibTeX Citation

@misc{chen2025k4onlineloganomaly,
      title={$K^4$: Online Log Anomaly Detection Via Unsupervised Typicality Learning}, 
      author={Weicong Chen and Vikash Singh and Zahra Rahmani and Debargha Ganguly and Mohsen Hariri and Vipin Chaudhary},
      year={2025},
      eprint={2507.20051},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2507.20051}, 
}

Trust the Typical

LABELING COPILOT: A Deep Research Agent for Automated Data Curation in Computer Vision

K4: Online Log Anomaly Detection via Unsupervised Typicality Learning

Mentors

Vipin Chaudhary, PhD