Biyao Zhang

Trust the Typical

2026

Debargha Ganguly , Srihari Sankar , Biyao Zhang , Vikash Singh , Kanan Gupta , Harshini Kavuru , Alan Luo , Weicong Chen , Warren Morningstar , Raghu Machiraju , Vipin Chaudhary

14th International Conference on Learning Representations (ICLR), April 23-27, 2026, Rio De Janeiro, Brazil

Current approaches to LLM safety rely on a brittle pattern of identifying and blocking known threats via guardrails. This paper introduces Trust The Typical (T3), a framework that reframes safety as an out-of-distribution detection problem, learning the distribution of acceptable prompts in a semantic space and flagging significant deviations as potential threats. Unlike prior methods, T3 requires no training on harmful examples yet achieves state-of-the-art performance across 18 benchmarks spanning toxicity, jailbreaking, multilingual harms, and over-refusal—reducing false positive rates by up to 40× relative to specialized safety models. A single model trained on safe English text transfers effectively to over 14 languages without retraining.

Trustworthy AI Artificial Intelligence

arXiv

BibTeX Citation

@inproceedings{
ganguly2026trust,
title={Trust The Typical},
author={Debargha Ganguly and Sreehari Sankar and Biyao Zhang and Vikash Singh and Kanan Gupta and Harshini Kavuru and Alan Luo and Weicong Chen and Warren Richard Morningstar and Raghu Machiraju and Vipin Chaudhary},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=vfbeleLBWv}
}

Grammars of Formal Uncertainty: When to Trust LLMs in Automated Reasoning Tasks

2025

Debargha Ganguly , Vikash Singh , Sreehari Sankar , Biyao Zhang , Xuecen Zhang , Srinivasan Iyengar , Xiaotian Han , Amit Sharma , Shivkumar Kalyanaraman , Vipin Chaudhary

39th Conference on Neural Information Processing Systems (NeurIPS 2025), December 2025

Large language models show remarkable promise for automated reasoning by generating formal specifications, but a fundamental tension exists between their probabilistic nature and the deterministic guarantees required by formal verification. This paper comprehensively investigates failure modes and uncertainty quantification in LLM-generated formal artifacts, revealing that SMT-based autoformalization has highly domain-specific accuracy impacts ranging from +34.8% on logical tasks to −44.5% on factual ones. A probabilistic context-free grammar (PCFG) framework is introduced to model LLM outputs and yield a refined uncertainty taxonomy, finding that uncertainty signals are task-dependent—for example, grammar entropy for logic achieves AUROC > 0.93.

Artificial Intelligence Trustworthy AI

arXiv

BibTeX Citation

@misc{ganguly2025grammarsformaluncertaintytrust,
      title={Grammars of Formal Uncertainty: When to Trust LLMs in Automated Reasoning Tasks}, 
      author={Debargha Ganguly and Vikash Singh and Sreehari Sankar and Biyao Zhang and Xuecen Zhang and Srinivasan Iyengar and Xiaotian Han and Amit Sharma and Shivkumar Kalyanaraman and Vipin Chaudhary},
      year={2025},
      eprint={2505.20047},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.20047}, 
}

Efficient Fine-Grained GPU Performance Modeling for Distributed Deep Learning of LLM

2025

Biyao Zhang , Mingkai Zheng , Debargha Ganguly , Xuecen Zhang , Vikash Singh , Vipin Chaudhary , Zhao Zhang

IEEE/ACM International Conference on High Performance Computing (SC25), December 17-20, 2025, Hyderabad, India

Training large language models is one of the most compute-intensive tasks in HPC, and predicting end-to-end training time for multi-billion parameter models across hundreds of GPUs is challenging due to complex interactions between transformer components, parallelism strategies, and multi-tier communication. This paper addresses this by decomposing LLMs into core computational primitives and modeling them with operator-level decomposition, lightweight hardware-aware prediction models for key operations, and an end-to-end prediction system integrating these across complex parallelization strategies. The resulting framework enables accurate distributed LLM training performance prediction without costly full-scale sampling.

HPC Artificial Intelligence

DOI arXiv

BibTeX Citation

@misc{zhang2025efficientfinegrainedgpuperformance,
      title={Efficient Fine-Grained GPU Performance Modeling for Distributed Deep Learning of LLM}, 
      author={Biyao Zhang and Mingkai Zheng and Debargha Ganguly and Xuecen Zhang and Vikash Singh and Vipin Chaudhary and Zhao Zhang},
      year={2025},
      eprint={2509.22832},
      archivePrefix={arXiv},
      primaryClass={cs.DC},
      url={https://arxiv.org/abs/2509.22832}, 
}

Trust the Typical

Grammars of Formal Uncertainty: When to Trust LLMs in Automated Reasoning Tasks

Efficient Fine-Grained GPU Performance Modeling for Distributed Deep Learning of LLM

Mentors

Vipin Chaudhary, PhD