Multi-Institutional Projects
Research Areas
Current Projects
Intelligent Cyberinfrastructure (ICICLE)
Co-leading the NSF-funded ICICLE institute to advance 'AI-as-a-Service' through a plug-and-play cyberinfrastructure that spans the edge-cloud-HPC continuum for democratization of AI.
Trustworthy AI & Speech Recognition
Evaluating the reliability and human-likeness of AI-generated voice clones and ASR scoring methods to ensure robustness in clinical and hearing research applications.
Medical Imaging with Generative AI
Developing unsupervised methods for fully automated segmentation of knee lesions using conditional diffusion models and anomaly detection to eliminate annotator bias in osteoarthritis prognosis.
Materials Data Science
Applying computer vision and deep learning to automate phase identification in synchrotron X-ray diffraction patterns for advanced materials characterization.
Trust the Typical
2026Current approaches to LLM safety rely on a brittle pattern of identifying and blocking known threats via guardrails. This paper introduces Trust The Typical (T3), a framework that reframes safety as an out-of-distribution detection problem, learning the distribution of acceptable prompts in a semantic space and flagging significant deviations as potential threats. Unlike prior methods, T3 requires no training on harmful examples yet achieves state-of-the-art performance across 18 benchmarks spanning toxicity, jailbreaking, multilingual harms, and over-refusal—reducing false positive rates by up to 40× relative to specialized safety models. A single model trained on safe English text transfers effectively to over 14 languages without retraining.
Real-Time Online Learning Trajectory Prediction via Efficient Latent Predictor
2026Presents an efficient latent predictor for real-time online trajectory prediction in autonomous vehicles, achieving high accuracy with reduced computational overhead.
Using AI to Increase Efficiency of Multilingual Test Materials: Spanish BEL Sentences
2026This work-in-progress explores how AI can improve the efficiency of creating multilingual auditory test materials, with a focus on Spanish BEL sentences. The project investigates workflow acceleration and quality support for multilingual assessment design. It sits at the intersection of language technology, hearing research, and educational test development. The aim is to reduce manual burden while preserving the validity of test materials.
Scorio.jl: A Julia package for ranking stochastic responses
2026Scorio.jl is a Julia package for evaluating and ranking systems from repeated stochastic responses on shared tasks. It provides a common tensor-based interface for direct score-based, pairwise, psychometric, voting, graph, and listwise ranking methods. The package supports methodological studies of ranking stability as well as day-to-day leaderboard construction. It makes ranking under repeated stochastic observation easier to analyze across different assumptions and ranking families.
Ranking Reasoning LLMs under Test-Time Scaling
2026This paper studies how to rank reasoning large language models when evaluation uses multiple stochastic samples per prompt under test-time scaling. It formalizes dense benchmark ranking in this repeated-trial setting and introduces Scorio, a library that implements Bayesian, paired-comparison, psychometric, voting, and spectral ranking methods. Across twenty reasoning models and four Olympiad-style math benchmarks, the study shows that many full-trial rankings closely match a Bayesian gold standard while low-budget methods can be less stable. The results provide practical guidance for reliable model ranking under both high- and low-budget evaluation settings.
QuMod: Parallel Quantum Job Scheduling on Modular QPUs using Circuit Cutting
2026Presents QuMod, a parallel quantum job scheduling framework for modular QPUs leveraging circuit cutting to improve throughput on heterogeneous quantum hardware.
Quantize What Counts: More for Keys, Less for Values
2026This work studies asymmetric KV-cache quantization for large language models and shows that key tensors carry more information than value tensors. The analysis motivates allocating more bits and stronger outlier handling to keys than to values, instead of quantizing both sides identically. Experiments show that key-favored bit allocation preserves much more accuracy at the same memory budget. The paper provides both theoretical motivation and practical guidance for more efficient LLM inference.
Efficient Transpilation of OpenQASM 3.0 Dynamic Circuits to CUDAQ: Performance and Expressiveness Advantages
2026Presents an efficient transpilation approach for converting OpenQASM 3.0 dynamic circuits to CUDAQ, demonstrating performance and expressiveness advantages.
Medical Image Spatial Grounding with Semantic Sampling
2026This work studies spatial grounding for vision-language models in 3D medical imaging, where anatomy, modality, slice direction, and coordinate systems create unique challenges. It introduces MIS-Ground, a benchmark for analyzing failure modes in medical image spatial grounding, and MIS-SemSam, an inference-time semantic sampling method that improves grounding accuracy without retraining. The paper evaluates how visual and textual prompting choices influence grounding performance across clinical imaging settings. It advances reproducible evaluation and practical improvement of medical VLM grounding.
LRD-Net: A Lightweight Real-Centered Detection Network for Cross-Domain Face Forgery Detection
2026Introduces LRD-Net, a lightweight real-centered detection network for cross-domain deepfake detection, generalizing face forgery detection across domains.
Less Prune, MoRE Experts: Recognizing and Restructuring Latent Experts for Model Compression
2026Proposes recognizing and restructuring latent expert structures within large models for compression, achieving efficiency while preserving accuracy.
K^4-Serve: Robust Streaming Log Anomaly Detection for HPC & AI Infrastructure
2026K^4-Serve operationalizes the K^4 framework for streaming anomaly detection on production HPC and AI infrastructure logs. It combines Kafka-based ingestion, versioned normalization, sliding-window scoring, retraining, and observability features to support robust real-world deployment. The system achieves stable deployment on real HPC logs with near-perfect event-level detection and only one false alert in the reported study. The work bridges anomaly-detection methodology and production cyberinfrastructure practice.
HugRAG: Hierarchical Causal Knowledge Graph Design for RAG
2026HugRAG rethinks knowledge organization for graph-based RAG through causal gating across hierarchical modules. It explicitly models causal relationships to suppress spurious correlations while enabling scalable reasoning over large-scale knowledge graphs. Extensive experiments demonstrate that HugRAG consistently outperforms competitive graph-based RAG baselines across multiple datasets and evaluation metrics, establishing a principled foundation for structured, scalable, and causally grounded RAG systems.
Geom@k: Fast to Converge, Slow to Drift
2026This paper studies evaluation metrics for test-time scaling by separating answer discovery from repeated correctness. It derives Geom@k and the broader GeoSpectrum@K family from a common hypergeometric view of fixed-budget metrics. Across aggregate settings, Geom@2 provides a strong balance of fast convergence and low ranking drift relative to alternative summaries. The work offers a compute-aware perspective on stable evaluation under repeated sampling.
Don't Pass@k: A Bayesian Framework for Large Language Model Evaluation
2026Pass@k is widely used to report LLM reasoning performance but often yields unstable and misleading rankings, especially when trial counts are limited and compute is constrained. This paper proposes a principled Bayesian evaluation framework that replaces Pass@k with posterior estimates of a model's underlying success probability and credible intervals, using a Dirichlet prior to give closed-form expressions for posterior mean and uncertainty under any weighted rubric. Empirically, on AIME'24/'25, HMMT'25, and BrUMO'25, the Bayesian approach achieves faster convergence and greater rank stability than Pass@k, enabling reliable model comparisons at far smaller sample counts. The framework also naturally extends to graded, rubric-based evaluations, making uncertainty explicit.
Sweeping Promptable Spoofs under the DirtyRAG
2026This paper studies security vulnerabilities in retrieval-augmented generation through DirtyRAG, a query-blind benign-passage attack that can be steered by prompting. It shows that promptable spoof passages remain effective against strong defenses and exposes a practical attack surface for real-world RAG systems. The work also introduces RAG-ATag, a benchmark for evaluating RAG security under these attack conditions. It highlights the need for more robust retrieval and generation defenses in deployed LLM systems.
Categorical Evaluation of LLMs under Test-Time Scaling
2026This work argues that binary pass-based metrics are too coarse for evaluating reasoning models under test-time scaling. It introduces a categorical Bayesian framework that scores rubric-defined outcomes with uncertainty rather than collapsing all outputs into pass-or-fail labels. The study shows that lightweight runtime signals can support accurate categorical evaluation without relying on a judge model and that rubric design can materially change model rankings. The paper extends uncertainty-aware LLM evaluation beyond binary correctness.
Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time
2025Recent advances in post-training enhance model reasoning but require costly training pipelines and produce inefficient, overly lengthy outputs. This paper introduces Speculative Thinking, a training-free framework enabling large reasoning models to guide smaller ones during inference at the reasoning level—distinct from token-level speculative decoding—by identifying structural cues such as paragraph breaks followed by reflective phrases where small models struggle and delegating those steps to a larger model. The method significantly boosts smaller model reasoning accuracy while shortening output length, offering an efficient inference-time paradigm that preserves the small model's compute efficiency.
Novel Adaptation of Video Segmentation to 3D MRI: Efficient Zero-Shot Knee Segmentation with SAM2
2025Medical image segmentation methods face the challenge of domain transfer, where performance degrades due to distribution shifts between source and target domains. This paper adapts SAM2, a general-purpose video segmentation model, for zero-shot single-prompt 3D knee MRI segmentation by treating volumetric slices as individual video frames and leveraging SAM2's memory mechanism to generate motion- and spatially-aware predictions across the volume. Experiments on the OAI-ZIB dataset demonstrate a Dice similarity coefficient of 0.9643 on tibia using only a single prompt and no task-specific training or fine-tuning.
QuFlex: Parallel Quantum Job Scheduling Using Adaptive Circuit-Cutting
2025Parallel quantum job scheduling across multiple QPUs is critical for maximizing throughput in heterogeneous quantum computing environments. QuFlex introduces an adaptive circuit-cutting approach that dynamically partitions quantum circuits based on available QPU resources, enabling efficient parallel scheduling across heterogeneous quantum hardware. The framework demonstrates improved QPU utilization and reduced job completion times compared to static partitioning approaches.
MQuAKE-Remastered: Multi-Hop Knowledge Editing Can Only Be Advanced with Reliable Evaluations
2025Multi-hop knowledge editing in LLMs has been evaluated using benchmarks with unreliable protocols that conflate editing success with benchmark artifacts, producing misleading results. This paper presents MQuAKE-Remastered, which corrects systematic flaws in prior multi-hop knowledge editing assessments and demonstrates that reliable evaluation methodology is largely absent—and essential—for advancing the field. Accepted as a Spotlight at ICLR 2025, the work shows that many reported gains in multi-hop editing do not hold under rigorous evaluation, calling for a reset of evaluation standards.
Masked-speech recognition using human and synthetic cloned speech
2025This study evaluates the intelligibility and human-likeness of AI-generated voice clones compared to human speech. Using transformer-based language models, the research demonstrates that synthetic speech can achieve similar recognition scores and perceptual similarity to original human talkers, even in noisy environments. The findings suggest that voice synthesis and automatic speech recognition (ASR) are promising tools for evaluating speech recognition in both clinical audiology and hearing research.
LoRATK: LoRA Once, Backdoor Everywhere in the Share-and-Play Ecosystem
2025Fine-tuning LLMs with LoRA has created a convenient share-and-play ecosystem where users download community-shared adapters to enhance base models, but this also introduces a new attack surface for distributing malicious LoRAs. This paper demonstrates that a backdoor LoRA can be trained once and then seamlessly merged in a training-free fashion with multiple task-enhancing LoRAs, retaining both malicious behavior and legitimate downstream capabilities. Such merged LoRAs are particularly dangerous because malicious intent is concealed behind improved downstream performance, creating strong incentive for voluntary adoption, and no safety measures exist to intervene during local deployment.
Longer Context, Deeper Thinking: Uncovering the Role of Long-Context Ability in Reasoning
2025Recent language models exhibit strong reasoning capabilities, yet the influence of long-context capacity on reasoning remains underexplored. This paper hypothesizes that current reasoning limitations stem partly from insufficient long-context capacity, motivated by observations that higher context window lengths correlate with stronger reasoning performance and that failed reasoning cases resemble failed long-context cases. Controlled experiments comparing architecturally identical models with varying long-context capacities confirm that enhancing long-context ability before supervised fine-tuning leads to improved reasoning, advocating for long-context capacity as a first-class design objective.
LABELING COPILOT: A Deep Research Agent for Automated Data Curation in Computer Vision
2025Curating high-quality, domain-specific datasets is a major bottleneck for deploying robust vision systems. This paper introduces Labeling Copilot, the first data curation deep research agent for computer vision, powered by a large multimodal language model that uses multi-step reasoning to execute specialized tools across three core capabilities: Calibrated Discovery for sourcing in-distribution data from large repositories, Controllable Synthesis for generating rare-scenario data with robust filtering, and Consensus Annotation for producing accurate labels via a novel multi-model consensus mechanism. On the dense COCO dataset, the Consensus Annotation module achieves an annotation mAP of 37.1%, and on Open Images it discovers 903 new bounding box categories.
K4: Online Log Anomaly Detection via Unsupervised Typicality Learning
2025Existing log anomaly detection methods are often slow, dependent on error-prone parsing, and use unrealistic evaluation protocols. This paper introduces K4 (Knowing the Unknown by Knowing only the Known), a fully unsupervised, parser-independent framework that transforms arbitrary log embeddings into compact four-dimensional descriptors—Precision, Recall, Density, Coverage—using efficient k-nearest neighbor statistics. Under a realistic online chunk-based evaluation protocol, K4 achieves state-of-the-art AUROC of 0.995–0.999 across HDFS, BGL, and Thunderbird datasets, with training under 4 seconds and inference as low as 4 μs.
HOPPS: Hardware-Aware Optimal Phase Polynomial Synthesis with Blockwise Optimization for Quantum Circuits
2025Blocks composed of CNOT and Rz gates are ubiquitous in modern quantum applications such as QAOA ansatzes and quantum adders, but after compilation they often exhibit large CNOT counts or depths that lower fidelity. This paper introduces HOPPS, a SAT-based hardware-aware optimal phase polynomial synthesis algorithm that generates CNOT/Rz blocks with CNOT count or depth optimality under hardware topology constraints. To address scalability for large circuits, an iterative blockwise optimization strategy partitions large circuits into smaller blocks and optimally refines each—achieving CNOT count reductions up to 50% and depth reductions up to 57.1% when used as a peephole optimizer.
Grammars of Formal Uncertainty: When to Trust LLMs in Automated Reasoning Tasks
2025Large language models show remarkable promise for automated reasoning by generating formal specifications, but a fundamental tension exists between their probabilistic nature and the deterministic guarantees required by formal verification. This paper comprehensively investigates failure modes and uncertainty quantification in LLM-generated formal artifacts, revealing that SMT-based autoformalization has highly domain-specific accuracy impacts ranging from +34.8% on logical tasks to −44.5% on factual ones. A probabilistic context-free grammar (PCFG) framework is introduced to model LLM outputs and yield a refined uncertainty taxonomy, finding that uncertainty signals are task-dependent—for example, grammar entropy for logic achieves AUROC > 0.93.
Efficient Fine-Grained GPU Performance Modeling for Distributed Deep Learning of LLM
2025Training large language models is one of the most compute-intensive tasks in HPC, and predicting end-to-end training time for multi-billion parameter models across hundreds of GPUs is challenging due to complex interactions between transformer components, parallelism strategies, and multi-tier communication. This paper addresses this by decomposing LLMs into core computational primitives and modeling them with operator-level decomposition, lightweight hardware-aware prediction models for key operations, and an end-to-end prediction system integrating these across complex parallelization strategies. The resulting framework enables accurate distributed LLM training performance prediction without costly full-scale sampling.
Forte: Finding Outliers with Representation Typicality Estimation
2025Generative models can now produce photorealistic synthetic data virtually indistinguishable from real training data, challenging OOD detectors that rely on generative model likelihoods due to likelihood misestimation and typicality issues. This paper introduces Forte, which hypothesizes that estimating typical sets using self-supervised learners leads to better OOD detection, using representation learning and informative summary statistics based on manifold estimation to address these issues. Forte outperforms other unsupervised approaches and achieves state-of-the-art performance on established challenging benchmarks as well as new synthetic data detection tasks, requiring no class labels.
Flexible Group Count Enables Hassle-Free Structured Pruning
2025Densely structured pruning methods maintain pruned models in a fully dense format, allowing immediate compression benefits, but existing grouped kernel pruning approaches introduce dynamic operations that add complications or impose limitations such as requiring expensive clustering schemes or custom architecture support. This paper argues that making Conv2d group count flexible under an integral optimization is the best practice for grouped kernel pruning, leveraging its ideal alignment with grouped convolution infrastructure. The resulting one-shot, post-train, data-agnostic method is more performant, adaptive, and user-friendly than its predecessors, requiring little to no hyperparameter tuning or handcrafted criteria.
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float
2025Large-scale AI models have grown rapidly in size, creating significant challenges for deployment on resource-constrained hardware. This paper introduces Dynamic-Length Float (DFloat11), a lossless compression framework that reduces LLM size by 30% while preserving outputs that are bit-for-bit identical to the original model, exploiting the low entropy in BFloat16 weight representations through entropy coding and dynamic-length encodings. A custom GPU kernel enables fast online decompression, and experiments on Llama 3.3, Qwen 3, and Mistral 3 validate 30% size reduction with 2.3–46.2× higher throughput than CPU offloading—notably enabling lossless inference of Llama 3.1 405B on a single 8×80GB GPU node.
CausalRAG: Integrating Causal Graphs into Retrieval-Augmented Generation
2025Traditional RAG systems face critical limitations including disrupted contextual integrity from text chunking and over-reliance on semantic similarity for retrieval. This paper proposes CausalRAG, a novel framework that incorporates causal graphs into the retrieval process, constructing and tracing cause-effect relationships to preserve contextual continuity and improve retrieval precision. Evaluated against regular RAG and graph-based RAG approaches across multiple metrics including answer faithfulness and context precision, CausalRAG demonstrates consistent superiority, showing that causal grounding is a promising direction for knowledge-intensive tasks.
100-LongBench: Are de facto Long-Context Benchmarks Literally Evaluating Long-Context Ability?
2025Existing long-context evaluation benchmarks fail to separate long-context performance from a model's baseline ability, making cross-model comparisons unclear, and are typically constructed with fixed input lengths that limit applicability across models with different context windows. This paper introduces 100-LongBench, a length-controllable long-context benchmark with a novel metric that disentangles baseline knowledge from true long-context capability across multiple task categories. Experiments demonstrate that existing benchmarks significantly conflate baseline model strength with genuine long-context ability, revealing a widespread evaluation gap.
Visual Concept Networks: A Graph-Based Approach to Detecting Anomalous Data in Deep Neural Networks
2024Deep neural networks struggle with robustness against anomalous and out-of-distribution data, and current OOD benchmarks often oversimplify by focusing on single-object tasks. This paper introduces Visual Concept Networks, a graph-based method that converts images into networks of interconnected human-understandable visual concepts and uses topological features to detect both far-OOD and near-OOD data. Extensive testing on two novel complex real-world tasks with ablation studies using large vocabularies demonstrates the method's effectiveness for detecting anomalous data in DNNs.
Taylor Unswift: Secured Weight Release for Large Language Models via Taylor Expansion
2024Releasing LLM weights poses a dilemma: open-sourcing compromises ownership while closed APIs raise data privacy concerns. This paper introduces TaylorMLP, which protects LLM ownership by transforming weights into Taylor-series parameters that can be released instead of original weights, and prevents unauthorized use by inducing low-speed token generation through increasing the number of Taylor-series terms. Empirical experiments across five datasets and three LLM architectures demonstrate TaylorMLP induces over 4× latency increase while producing tokens precisely matched with original models, effectively defending against weight reconstruction from downstream datasets.
Phase Identification in Synchrotron X-ray Diffraction Patterns of Ti-6Al-4V Using Computer Vision and Deep Learning
2024This research utilizes convolutional neural networks (CNNs) to automate the phase identification of titanium alloys from synchrotron X-ray diffraction (XRD) patterns. By treating XRD patterns as one-dimensional images, the deep learning model achieves high accuracy in distinguishing between alpha and beta phases, significantly reducing the time required for manual analysis in materials characterization.
QGroup: Parallel Quantum Job Scheduling Using Dynamic Programming
2024Scheduling quantum circuits across multiple QPUs requires efficient algorithms that minimize idle time while respecting hardware constraints. QGroup uses dynamic programming to optimally group and schedule quantum circuits across multiple QPUs, maximizing throughput and minimizing idle time through principled combinatorial optimization. Evaluated on realistic quantum workloads, QGroup achieves improved scheduling efficiency compared to greedy and heuristic-based baseline approaches.
Privacy-Preserving Collaborative Genomic Research: A Real-Life Deployment and Vision
2024The genomic domain stands to benefit greatly from advances in AI and data science, but increasing privacy and cybersecurity concerns necessitate robust solutions for sensitive collaborative research. This paper presents a practical deployment of a privacy-preserving framework for genomic research developed in collaboration with Lynx.MD, a secure health data collaboration platform, addressing challenges of enabling joint analysis of genomic data while mitigating data breach risks. The framework demonstrates scalable, privacy-preserving data sharing and analysis that maintains utility while satisfying rigorous security requirements in a real production environment.
KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches
2024Long-context capability is critical for LLMs, but transformer architectures face significant challenges due to growing KV cache size and the complexity of attending to extended inputs. This paper provides a comprehensive taxonomy and benchmark evaluation of 10+ state-of-the-art approaches across seven long-context task categories—including KV cache quantization, token dropping, prompt compression, linear-time sequence models, and hybrid architectures—evaluated in a unified, aligned environment. The work reveals numerous previously unknown phenomena and offers a practical workbench and insights for the future development of long-context-capable LLMs.
Knowledge Graphs Can be Learned with Just Intersection Features
2024Knowledge graph completion can be framed as link prediction where structural information is key, but quantifying this structural information poses a challenge. This paper demonstrates that the intersection among k-hop neighborhoods of the head, relation, and tail is the critical structural signal for valid triple prediction, and proposes a novel randomized algorithm to efficiently generate these intersection features. A straightforward fully-connected network leveraging these features outperforms established KG embedding models and graph neural network baselines, while also achieving substantial training time efficiency gains.
Unsupervised Segmentation of Knee Bone Marrow Edema-like Lesions Using Conditional Generative Models
2024This study proposes a novel unsupervised method for the fully automated segmentation of Bone Marrow Edema-like Lesions (BMEL) in knee MRI. By leveraging conditional diffusion models and anomaly detection, the approach eliminates the need for labor-intensive and bias-prone manual annotations. The research sets new benchmarks for BMEL segmentation performance and provides a more reliable, quantitative tool for early diagnosis and prognosis of knee osteoarthritis.
GNNs Also Deserve Editing, and They Need It More Than Once
2024Model editing—updating specific factual knowledge—has been extensively studied for LLMs but has received little attention for graph neural networks, which present unique challenges due to their relational structure. This paper extends model editing to GNNs, showing that they require iterative multi-round editing to maintain accuracy after knowledge updates, unlike LLMs where single-pass editing is often sufficient. The work proposes efficient multi-round GNN editing methods and demonstrates that both graph structure and node attributes must be carefully managed across editing rounds to prevent knowledge degradation.
An Automated Approach for Improving the Inference Latency and Energy Efficiency of Pretrained CNNs by Removing Irrelevant Pixels with Focused Convolutions
2024Computer vision CNNs achieve high accuracy but face ever-increasing energy and computation requirements, and making them more energy-efficient typically requires costly retraining. This paper proposes an automated method to improve the inference latency and energy efficiency of pretrained CNNs without retraining, by inserting a threshold layer that identifies irrelevant image regions and replacing subsequent convolutional layers with focused convolutions that ignore those regions entirely. The approach saves inference latency by up to 25% and energy costs by up to 22% on popular pretrained CNNs including ResNet, VGG, and ConvNeXt, with little to no accuracy loss.
Creating Intelligent Cyberinfrastructure for Democratizing AI
2024This paper provides an overview of the NSF-funded ICICLE AI Institute, which aims to fundamentally advance 'edge-to-center' AI-as-a-Service. By developing intelligent cyberinfrastructure (CI) that spans the edge-cloud-HPC computing continuum, the project seeks to enable plug-and-play AI that is accessible to a wider population. The work highlights high-impact applications in animal ecology, digital agriculture, and smart foodsheds as primary drivers for democratizing next-generation AI.
Materials Data Science Using CRADLE: A Distributed, Data-Centric Approach
2024This paper introduces CRADLE, a distributed framework designed to support data-centric AI and materials data science at scale. By integrating heterogeneous data management with elastic scaling, CRADLE addresses the challenges of massive datasets generated by modern experiments and simulations. The study demonstrates the framework's capabilities through five applications, including phase identification in X-ray diffraction and defect segmentation in computed tomography, emphasizing scalable and reproducible scientific insights.
Efficient Circuit Wire Cutting Based on Commuting Groups
2024Current quantum devices face challenges with large circuits due to increasing error rates as circuit size and qubit count grow. Inspired by ancilla-assisted quantum process tomography and MUBs-based grouping for simultaneous measurement, this paper proposes a new circuit wire cutting approach that uses ancillary qubits to transform quantum input initializations into quantum output measurements, allowing multiple measurements to be grouped and executed simultaneously. The technique significantly reduces subcircuit execution overhead and classical reconstruction complexity compared to standard wire cutting.
Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model
2023Fine-tuning large pre-trained language models has become increasingly difficult due to extensive memory usage, with the primary bottleneck being the storage of activation feature maps needed for gradient computation. This paper proposes WTA-CRS (Winner-Take-All Column-Row Sampling), a new family of unbiased estimators for matrix products with reduced variance that only requires storing sub-sampled activations for gradient calculation, applied during the backward pass to maintain unbiased gradient estimation. Applied to LLM fine-tuning, WTA-CRS significantly reduces activation memory requirements while maintaining training convergence, enabling adaptation of large models on hardware that would otherwise lack sufficient memory.
Accelerating VQE Algorithms via Parameters and Measurement Reuse
2023Variational Quantum Eigensolver algorithms require many quantum circuit executions to converge, creating significant overhead on current quantum hardware. This paper accelerates VQE by reusing parameters and measurement results across iterations, reducing the number of quantum circuit executions required for convergence without sacrificing solution quality. The approach is validated on standard molecular simulation benchmarks, demonstrating meaningful reduction in quantum resource requirements.
Online Detection of Golden Circuit Cutting Points
2023Quantum circuit cutting enables large circuits to run on small quantum devices, but reconstructing measurement statistics requires computational resources that grow exponentially with the number of cuts. This paper introduces the concept of a golden cutting point—circuit structures that induce negligible basis components during reconstruction, allowing those downstream computations to be avoided entirely. A hypothesis-testing scheme is proposed for online detection of golden cutting points, with robustness results for low-probability test failures, and demonstrated applicability on Qiskit's Aer simulator achieving reduced wall time from identifying and avoiding obsolete measurements.
One Less Reason for Filter Pruning: Gaining Free Adversarial Robustness with Structured Grouped Kernel Pruning
2023Structured pruning has traditionally been viewed as trading accuracy for efficiency, often assumed to come at the expense of adversarial robustness. This paper reveals that structured grouped kernel pruning inherently confers adversarial robustness as a byproduct—without any adversarial training—showing that pruning and robustness are not competing objectives but complementary ones. By demonstrating one less reason to avoid filter pruning, the work shows practitioners can gain free adversarial robustness simply by adopting structured grouped kernel pruning as their compression strategy.
Accelerating Time to Science using CRADLE: A Framework for Materials Data Science
2023Accelerating materials data science requires scalable frameworks that can manage heterogeneous data and computation across distributed systems. This paper presents CRADLE, a distributed data-centric framework for materials data science workflows that integrates data management, computation, and analysis pipelines to significantly reduce time-to-science. Demonstrated on the 30th IEEE HiPC system, CRADLE shows substantial throughput improvements and workflow simplification for materials characterization and discovery tasks in HPC environments.
Quantum Noise in the Flow of Time: A Temporal Study of the Noise in Quantum Computers
2022Quantum noise in quantum computers is not static but evolves over time, yet most error characterization treats noise as temporally fixed. This paper conducts a temporal study of noise characteristics in quantum computers, revealing how quantum noise patterns change over time and analyzing the implications for circuit fidelity and error mitigation strategies. The findings provide insights for developing more effective time-aware calibration and error mitigation approaches for near-term quantum hardware.
Pinpointing the System Reliability Degradation in NISQ Machines
2022Noise in quantum hardware causes significant reliability degradation in NISQ machines, but the systematic patterns of this degradation are not well understood. This paper investigates the sources and temporal patterns of reliability degradation in NISQ machines, identifying when and where noise causes significant performance drops in quantum circuits. The analysis provides guidance for developing error mitigation strategies targeted at the most impactful reliability degradation patterns in near-term quantum hardware.
MARS: Malleable Actor-Critic Reinforcement Learning Scheduler
2022Scheduling jobs in HPC environments requires handling dynamic, time-varying workloads that challenge static scheduling policies. MARS (Malleable Actor-Critic Reinforcement Learning Scheduler) uses malleable actor-critic RL to adaptively schedule computing jobs, dynamically resizing allocations in response to changing workloads to optimize throughput and resource utilization. Evaluated against standard scheduling baselines, MARS demonstrates consistent improvements in job completion time and cluster utilization across varied workload scenarios.
Irrelevant Pixels are Everywhere: Find and Exclude Them for More Efficient Computer Vision
2022CNNs are compute-intensive because they indiscriminately compute features on all pixels of an input image, yet many pixels are irrelevant to the vision task at hand. This paper demonstrates through analysis of three popular computer vision datasets that approximately 48% of pixels are irrelevant, and proposes the focused convolution—a drop-in CNN replacement that operates only on relevant pixels identified by an area of interest mask. On an embedded device, the approach achieves no loss in accuracy while reducing inference latency, energy consumption, and multiply-add count by approximately 45%.
A Unified Framework to Assess Market Implications of Institutional Investments
2022Understanding the market implications of large institutional investment decisions requires modeling complex interactions between institutional behavior and market dynamics. This paper presents a unified framework using machine learning and statistical analysis to assess how large-scale institutional investment decisions affect market prices, volatility, and liquidity. The approach integrates multiple data sources to provide a comprehensive assessment of market implications across different investment types and market conditions.
Practical Implications of Dequantization on Machine Learning Algorithms
2022Quantum computing algorithms offer theoretical speedups for certain machine learning tasks, but dequantization results show that classical algorithms can sometimes achieve comparable performance. This paper examines the practical implications of dequantization on machine learning algorithms, providing a systematic analysis of when quantum approaches offer genuine advantages versus when classical alternatives are sufficient. The work offers guidance for practitioners on determining which ML tasks are promising candidates for quantum speedup versus those where dequantization renders quantum approaches redundant.
System-Auditing, Data Analysis and Characteristics of Cyber Attacks for Big Data System
2022Using big data, distributed computing systems such as Apache Hadoop requires processing massive amount of data to support business and research applications. Thus, it is critical to ensure the cyber security of such systems. To better defend from advanced cyber attacks that pose threats to even well-protected enterprises, system-auditing based techniques have been adopted for monitoring system activities and assisting attack investigation. In this demo, we are building a system that collects system auditing logs from a big data system and performs data analysis to understand how system auditing can be used more effectively to assist attack investigation on big systems. We also built a demo application that detects unexpected file deletion and presents root causes for the deletion.
Approximate Quantum Circuit Reconstruction
2022Current and imminent quantum hardware lacks reliability due to noise and limited qubit counts, and quantum circuit cutting—which divides large circuits into smaller subcircuits—faces exponential classical post-processing overhead. This paper introduces approximate circuit reconstruction using a sampling-based method (MCMC) to probabilistically select high-probability bit strings during reconstruction, avoiding excessive calculations for the full probability distribution. Results show that this sampling-based post-processing holds great potential for fast and reliable circuit reconstruction in the NISQ era and beyond.
TQEA: Temporal Quantum Error Analysis
2021Quantum errors in NISQ hardware vary temporally, but most error analysis tools treat noise as time-invariant. TQEA (Temporal Quantum Error Analysis) characterizes how quantum errors evolve over time by systematically measuring and modeling the temporal dynamics of noise in quantum computers. The framework provides insights for improving error mitigation strategies that account for drift and time-varying noise characteristics, supporting progress toward more reliable quantum computing.
A Predictive Analytics Framework for Multi-Horizon Financial Crises Forecasting using Macro-Economic Data
2021Predictive analytics framework for multi-horizon financial crisis forecasting using macroeconomic data and ML to provide early warning signals.
With or Without Knee Total Knee Arthroplasty? Deep Learning-powered Strategy to detect TKA in plain radiographs
2020Deep learning approach for automatically detecting TKA implants in plain radiographs, enabling efficient large-scale retrospective analysis of orthopedic registries.
Towards Performant Workflows, Monitoring and Measuring
2020Scientific HPC workflows require robust monitoring and measurement infrastructure to understand performance characteristics and enable optimization. This paper presents approaches for building performant scientific workflows with integrated monitoring and measurement, enabling better characterization and optimization of HPC workflow performance across distributed computing environments. The work provides practical methodologies for workflow developers to identify bottlenecks and improve end-to-end throughput.
A Predictive Analytics Framework for Insider Trading Events
2020Detecting and forecasting insider trading events using traditional methods is limited by their reliance on predefined rules and inability to capture subtle market signals. This paper presents a predictive analytics framework for insider trading events using machine learning applied to financial transaction data and market signals, demonstrating the ability to identify patterns predictive of insider trading. The approach provides early warning capabilities that can complement traditional regulatory surveillance methods.
Research Staff
PhD Students
Alan Luo
PhD Student
Andrew Yu
PhD Student
Biyao Zhang
PhD Student
Chaoda Song
PhD Student
Cheng Guo
PhD Student
Debargha Ganguly
PhD Student
Jierui Peng
PhD Student
Nahal Shahani
PhD Student
Nengbo Wang
PhD Student
Shouren Wang
PhD Student
Srihari Sankar
PhD Student
Thomas Zhang
PhD Student
Vikash Singh
PhD Student
Vinooth Kulkarni
PhD Student
Wang (Van) Yang
PhD Student
Xinpeng Li
PhD Student
Yanyan Zhang
PhD Student
Yuting Shao
PhD Student
Zahra Rahmani
PhD Student