Case Western Reserve University

Publications & Code

Filter by research area:

2026

AI-Guided Detection of Persuasion Strategies in Internal Marketing Communications

2026

Seventh AI and Strategy Consortium

This work presents an AI-guided approach to detecting persuasion strategies in internal marketing communications, rethinking the dominant entity-centric paradigm in retrieval for large language models by introducing a novel reasoning-oriented perspective.

Artificial Intelligence

Alternative Decomposed Message Passing for Efficient Geometric GNNs

2026

IEEE IPDPS 2026 Workshops (GrAPL)

This paper proposes an alternative decomposed message-passing framework for improving the efficiency of geometric graph neural networks. Instead of relying on concatenation-based message generation, the method decomposes node-, edge-, and angle-level transformations into reusable components, reducing redundant computation and memory overhead while preserving algebraic equivalence. The framework is designed for efficient GPU execution and serves as a drop-in replacement for representative architectures such as EGNN and CHGNet. Experimental results show up to 2x training speedup and 60% end-to-end memory reduction with no loss in accuracy across diverse geometric learning workloads.

Artificial Intelligence HPC

Categorical Evaluation of LLMs under Test-Time Scaling

2026

COLM 2026 (under review)

This work argues that binary pass-based metrics are too coarse for evaluating reasoning models under test-time scaling. It introduces a categorical Bayesian framework that scores rubric-defined outcomes with uncertainty rather than collapsing all outputs into pass-or-fail labels. The study shows that lightweight runtime signals can support accurate categorical evaluation without relying on a judge model and that rubric design can materially change model rankings. The paper extends uncertainty-aware LLM evaluation beyond binary correctness.

Artificial Intelligence Trustworthy AI

Don't Pass@k: A Bayesian Framework for Large Language Model Evaluation

2026

14th International Conference on Learning Representations (ICLR), April 23-27, 2026, Rio De Janeiro, Brazil

Pass@k is widely used to report LLM reasoning performance but often yields unstable and misleading rankings, especially when trial counts are limited and compute is constrained. This paper proposes a principled Bayesian evaluation framework that replaces Pass@k with posterior estimates of a model's underlying success probability and credible intervals, using a Dirichlet prior to give closed-form expressions for posterior mean and uncertainty under any weighted rubric. Empirically, on AIME'24/'25, HMMT'25, and BrUMO'25, the Bayesian approach achieves faster convergence and greater rank stability than Pass@k, enabling reliable model comparisons at far smaller sample counts. The framework also naturally extends to graded, rubric-based evaluations, making uncertainty explicit.

Artificial Intelligence Trustworthy AI

Efficient Transpilation of OpenQASM 3.0 Dynamic Circuits to CUDAQ: Performance and Expressiveness Advantages

2026

IEEE International Conference on Quantum Communications, Networking, and Computing (QCNC 2026), April 6-8, 2026, Kobe, Japan

Presents an efficient transpilation approach for converting OpenQASM 3.0 dynamic circuits to CUDAQ, demonstrating performance and expressiveness advantages.

Quantum Computing HPC

Empirical evaluation of variability and multi-institutional generalizability of deep learning survival models: Application to renal cancer CT scans

2026

Computers in Biology and Medicine

This paper systematically studies how methodological choices affect the robustness and external generalization of CT-based deep learning survival models for renal cancer. It examines data partitioning, data order, random initialization, and augmentation strategies on a multi-institutional cohort spanning nine institutions. The study finds that covariate-balanced partitioning and carefully chosen augmentations materially improve external validation performance, while initialization meaningfully affects variance. These results provide practical recommendations for building more stable and generalizable survival models from medical imaging.

Medical Imaging Computer Vision Artificial Intelligence

GSMem: 3D Gaussian Splatting as Persistent Spatial Memory for Zero-Shot Embodied Exploration and Reasoning

2026

arXiv

Effective embodied exploration requires agents to accumulate and retain spatial knowledge over time. However, existing scene representations, such as discrete scene graphs or static view-based snapshots, lack post-hoc re-observability. If an initial observation misses a target, the resulting memory omission is often irrecoverable. To bridge this gap, we propose GSMem, a zero-shot embodied exploration and reasoning framework built upon 3D Gaussian Splatting (3DGS). By explicitly parameterizing continuous geometry and dense appearance, 3DGS serves as a persistent spatial memory that endows the agent with Spatial Recollection: the ability to render photorealistic novel views from optimal, previously unoccupied viewpoints. To operationalize this, GSMem employs a retrieval mechanism that simultaneously leverages parallel object-level scene graphs and semantic-level language fields. This complementary design robustly localizes target regions, enabling the agent to “hallucinate” optimal views for high-fidelity Vision-Language Model (VLM) reasoning. Furthermore, we introduce a hybrid exploration strategy that combines VLM-driven semantic scoring with a 3DGS-based coverage objective, balancing task-aware exploration with geometric coverage. Extensive experiments on embodied question answering and lifelong navigation demonstrate the robustness and effectiveness of our framework.

Computer Vision Embodied AI

Geom@k: Fast to Converge, Slow to Drift

2026

COLM 2026 (under review)

This paper studies evaluation metrics for test-time scaling by separating answer discovery from repeated correctness. It derives Geom@k and the broader GeoSpectrum@K family from a common hypergeometric view of fixed-budget metrics. Across aggregate settings, Geom@2 provides a strong balance of fast convergence and low ranking drift relative to alternative summaries. The work offers a compute-aware perspective on stable evaluation under repeated sampling.

Artificial Intelligence Trustworthy AI

HugRAG: Hierarchical Causal Knowledge Graph Design for RAG

2026

HugRAG rethinks knowledge organization for graph-based RAG through causal gating across hierarchical modules. It explicitly models causal relationships to suppress spurious correlations while enabling scalable reasoning over large-scale knowledge graphs. Extensive experiments demonstrate that HugRAG consistently outperforms competitive graph-based RAG baselines across multiple datasets and evaluation metrics, establishing a principled foundation for structured, scalable, and causally grounded RAG systems.

Artificial Intelligence

K^4-Serve: Robust Streaming Log Anomaly Detection for HPC & AI Infrastructure

2026

ACM PEARC 2026 (under review)

K^4-Serve operationalizes the K^4 framework for streaming anomaly detection on production HPC and AI infrastructure logs. It combines Kafka-based ingestion, versioned normalization, sliding-window scoring, retraining, and observability features to support robust real-world deployment. The system achieves stable deployment on real HPC logs with near-perfect event-level detection and only one false alert in the reported study. The work bridges anomaly-detection methodology and production cyberinfrastructure practice.

HPC Artificial Intelligence

LRD-Net: A Lightweight Real-Centered Detection Network for Cross-Domain Face Forgery Detection

2026

14th International Symposium on Digital Forensics and Security, March 19-20, 2026, Boston, USA

Introduces LRD-Net, a lightweight real-centered detection network for cross-domain deepfake detection, generalizing face forgery detection across domains.

Artificial Intelligence Computer Vision Trustworthy AI

Less Prune, MoRE Experts: Recognizing and Restructuring Latent Experts for Model Compression

2026

Texas NLP Symposium, April 3, 2026, College Station, Texas

Proposes recognizing and restructuring latent expert structures within large models for compression, achieving efficiency while preserving accuracy.

Artificial Intelligence

Medical Image Spatial Grounding with Semantic Sampling

2026

MICCAI 2026 (under review)

This work studies spatial grounding for vision-language models in 3D medical imaging, where anatomy, modality, slice direction, and coordinate systems create unique challenges. It introduces MIS-Ground, a benchmark for analyzing failure modes in medical image spatial grounding, and MIS-SemSam, an inference-time semantic sampling method that improves grounding accuracy without retraining. The paper evaluates how visual and textual prompting choices influence grounding performance across clinical imaging settings. It advances reproducible evaluation and practical improvement of medical VLM grounding.

Medical Imaging Computer Vision Artificial Intelligence

NEBULA: Do We Evaluate Vision-Language-Action Agents Correctly?

2026

arXiv preprint arXiv:2510.16263

This paper introduces NEBULA, a unified ecosystem for evaluating Vision-Language-Action (VLA) agents beyond coarse end-task success metrics. It proposes a novel dual-axis evaluation framework that combines fine-grained capability tests for skill-specific diagnosis with systematic stress tests to measure robustness under real-world perturbations. In addition, NEBULA standardizes fragmented embodied AI datasets through a unified data format and API, enabling reproducible cross-dataset training and benchmarking. Experimental results reveal that state-of-the-art VLA models exhibit significant hidden weaknesses in critical capabilities such as spatial reasoning and dynamic adaptation, highlighting the need for more interpretable and reliability-aware evaluation. [oai_citation:0‡ICLR_2026_Nebula_Final.pdf](sediment://file_000000002e8c722f8ce2ecef4cc5af26)

Artificial Intelligence

QuMod: Parallel Quantum Job Scheduling on Modular QPUs using Circuit Cutting

2026

IEEE International Conference on Quantum Communications, Networking, and Computing (QCNC 2026), April 6-8, 2026, Kobe, Japan

Presents QuMod, a parallel quantum job scheduling framework for modular QPUs leveraging circuit cutting to improve throughput on heterogeneous quantum hardware.

Quantum Computing HPC

Quantize What Counts: More for Keys, Less for Values

2026

ACL 2026 Findings

This work studies asymmetric KV-cache quantization for large language models and shows that key tensors carry more information than value tensors. The analysis motivates allocating more bits and stronger outlier handling to keys than to values, instead of quantizing both sides identically. Experiments show that key-favored bit allocation preserves much more accuracy at the same memory budget. The paper provides both theoretical motivation and practical guidance for more efficient LLM inference.

Artificial Intelligence HPC

Ranking Reasoning LLMs under Test-Time Scaling

2026

ACL 2026 Main

This paper studies how to rank reasoning large language models when evaluation uses multiple stochastic samples per prompt under test-time scaling. It formalizes dense benchmark ranking in this repeated-trial setting and introduces Scorio, a library that implements Bayesian, paired-comparison, psychometric, voting, and spectral ranking methods. Across twenty reasoning models and four Olympiad-style math benchmarks, the study shows that many full-trial rankings closely match a Bayesian gold standard while low-budget methods can be less stable. The results provide practical guidance for reliable model ranking under both high- and low-budget evaluation settings.

Artificial Intelligence Trustworthy AI

Real-Time Online Learning Trajectory Prediction via Efficient Latent Predictor

2026

Autonomous Vehicles and Machines Conference, March 1-5, 2026, Burlingame, CA, USA

Presents an efficient latent predictor for real-time online trajectory prediction in autonomous vehicles, achieving high accuracy with reduced computational overhead.

Artificial Intelligence Computer Vision

Reconstruction Matters: Learning Geometry-Aligned BEV Representation through 3D Gaussian Splatting

2026

arXiv

Bird's-Eye-View (BEV) perception serves as a cornerstone for autonomous driving, offering a unified spatial representation that fuses surrounding-view images to enable reasoning for various downstream tasks, such as semantic segmentation, 3D object detection, and motion prediction. However, most existing BEV perception frameworks adopt an end-to-end training paradigm, where image features are directly transformed into the BEV space and optimized solely through downstream task supervision. This formulation treats the entire perception process as a black box, often lacking explicit 3D geometric understanding and interpretability, leading to suboptimal performance. In this paper, we claim that an explicit 3D representation matters for accurate BEV perception, and we propose Splat2BEV, a Gaussian Splatting-assisted framework for BEV tasks. Splat2BEV aims to learn BEV feature representations that are both semantically rich and geometrically precise. We first pre-train a Gaussian generator that explicitly reconstructs 3D scenes from multi-view inputs, enabling the generation of geometry-aligned feature representations. These representations are then projected into the BEV space to serve as inputs for downstream tasks. Extensive experiments on nuScenes and argoverse dataset demonstrate that Splat2BEV achieves state-of-the-art performance and validate the effectiveness of incorporating explicit 3D reconstruction into BEV perception.

Computer Vision Autonomous Driving

Scorio.jl: A Julia package for ranking stochastic responses

2026

JuliCon 2026

Scorio.jl is a Julia package for evaluating and ranking systems from repeated stochastic responses on shared tasks. It provides a common tensor-based interface for direct score-based, pairwise, psychometric, voting, graph, and listwise ranking methods. The package supports methodological studies of ranking stability as well as day-to-day leaderboard construction. It makes ranking under repeated stochastic observation easier to analyze across different assumptions and ranking families.

Artificial Intelligence HPC

Sweeping Promptable Spoofs under the DirtyRAG

2026

ICML 2026 (under review)

This paper studies security vulnerabilities in retrieval-augmented generation through DirtyRAG, a query-blind benign-passage attack that can be steered by prompting. It shows that promptable spoof passages remain effective against strong defenses and exposes a practical attack surface for real-world RAG systems. The work also introduces RAG-ATag, a benchmark for evaluating RAG security under these attack conditions. It highlights the need for more robust retrieval and generation defenses in deployed LLM systems.

Artificial Intelligence Trustworthy AI

Technological and Digitalization Forces Shaping B2B Sales: Confluence, Challenges, Promises, and Pitfalls

2026

Handbook of Interorganizational Relationships

This handbook chapter examines the technological and digitalization forces shaping B2B sales, analyzing the confluence of emerging technologies, the challenges they present, their promises for transforming sales processes, and potential pitfalls in implementation.

Artificial Intelligence

Trust the Typical

2026

14th International Conference on Learning Representations (ICLR), April 23-27, 2026, Rio De Janeiro, Brazil

Current approaches to LLM safety rely on a brittle pattern of identifying and blocking known threats via guardrails. This paper introduces Trust The Typical (T3), a framework that reframes safety as an out-of-distribution detection problem, learning the distribution of acceptable prompts in a semantic space and flagging significant deviations as potential threats. Unlike prior methods, T3 requires no training on harmful examples yet achieves state-of-the-art performance across 18 benchmarks spanning toxicity, jailbreaking, multilingual harms, and over-refusal—reducing false positive rates by up to 40× relative to specialized safety models. A single model trained on safe English text transfers effectively to over 14 languages without retraining.

Trustworthy AI Artificial Intelligence

Unpacking Generative AI for B2B Sales: Definitional Perspectives, Multidimensional Framework, and Sales Roles

2026

Journal of Personal Selling and Sales Management

This paper develops a theory-driven framework (AGA typology) to bridge the gap between AI conceptualizations and real-world deployment in B2B sales. Through systematic coding of 45 state-of-the-art applications based on diverse developer guides, the study provides a multidimensional perspective on how generative AI reshapes sales roles. Published in the Journal of Personal Selling and Sales Management (special issue, 14% acceptance rate).

Artificial Intelligence

Using AI to Increase Efficiency of Multilingual Test Materials: Spanish BEL Sentences

2026

Work in progress

This work-in-progress explores how AI can improve the efficiency of creating multilingual auditory test materials, with a focus on Spanish BEL sentences. The project investigates workflow acceleration and quality support for multilingual assessment design. It sits at the intersection of language technology, hearing research, and educational test development. The aim is to reduce manual burden while preserving the validity of test materials.

Artificial Intelligence

2025

100-LongBench: Are de facto Long-Context Benchmarks Literally Evaluating Long-Context Ability?

2025

63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025), July 27-August 1, 2025, Vienna, Austria

Existing long-context evaluation benchmarks fail to separate long-context performance from a model's baseline ability, making cross-model comparisons unclear, and are typically constructed with fixed input lengths that limit applicability across models with different context windows. This paper introduces 100-LongBench, a length-controllable long-context benchmark with a novel metric that disentangles baseline knowledge from true long-context capability across multiple task categories. Experiments demonstrate that existing benchmarks significantly conflate baseline model strength with genuine long-context ability, revealing a widespread evaluation gap.

Artificial Intelligence

70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float

2025

39th Conference on Neural Information Processing Systems (NeurIPS 2025), December 2025

Large-scale AI models have grown rapidly in size, creating significant challenges for deployment on resource-constrained hardware. This paper introduces Dynamic-Length Float (DFloat11), a lossless compression framework that reduces LLM size by 30% while preserving outputs that are bit-for-bit identical to the original model, exploiting the low entropy in BFloat16 weight representations through entropy coding and dynamic-length encodings. A custom GPU kernel enables fast online decompression, and experiments on Llama 3.3, Qwen 3, and Mistral 3 validate 30% size reduction with 2.3–46.2× higher throughput than CPU offloading—notably enabling lossless inference of Llama 3.1 405B on a single 8×80GB GPU node.

Artificial Intelligence HPC

BARD-GS: Blur-Aware Reconstruction of Dynamic Scenes via Gaussian Splatting

2025

CVPR

3D Gaussian Splatting (3DGS) has shown remarkable potential for static scene reconstruction, and recent advancements have extended its application to dynamic scenes. However, the quality of reconstructions depends heavily on high-quality input images and precise camera poses, which is not that trivial to fulfill in the real-world scenarios. Capturing dynamic scenes with handheld monocular cameras, for instance, typically involves simultaneous movement of both the camera and objects within a single exposure. This combined motion frequently results in image blur that existing methods cannot adequately handle. To address these challenges, we introduce BARD-GS, a novel approach for robust dynamic scene reconstruction that effectively handles blurry inputs and imprecise camera poses. BARD-GS comprises two main components: 1) camera motion deblurring and 2) object motion deblurring. By explicitly decomposing motion blur into camera motion blur and object motion blur and modeling them separately, we achieve significantly improved rendering results in dynamic regions. In addition, we collect a real-world motion blur dataset of dynamic scenes to evaluate our approach. Extensive experiments demonstrate that BARD-GS effectively reconstructs high-quality dynamic scenes under realistic conditions, significantly outperforming existing methods.

Computer Vision 3D Reconstruction

Balancing Fidelity and Diversity: Synthetic data could stand on the shoulder of the real in visual recognition

2025

With the rapid progress of generative models, synthetic data has become a common solution to data scarcity in AI. However, is using it directly without curation ideal for visual recognition? We systematically study how data fidelity and diversity affect recognition performance and show that balancing these factors significantly improves results through a training-free curation pipeline.

Artificial Intelligence Computer Vision Synthetic Data

CAUSAL3D: A Comprehensive Benchmark for Causal Learning from Visual Data

2025

True intelligence relies on understanding hidden causal relations, yet current AI and vision models lack benchmarks to assess this ability. We introduce Causal3D, a comprehensive 19-dataset benchmark linking structured and visual data to evaluate causal reasoning, revealing that performance drops sharply as causal complexity increases.

Artificial Intelligence Causality Trustworthy AI Computer Vision

CausalRAG: Integrating Causal Graphs into Retrieval-Augmented Generation

2025

63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025), July 27-August 1, 2025, Vienna, Austria

Traditional RAG systems face critical limitations including disrupted contextual integrity from text chunking and over-reliance on semantic similarity for retrieval. This paper proposes CausalRAG, a novel framework that incorporates causal graphs into the retrieval process, constructing and tracing cause-effect relationships to preserve contextual continuity and improve retrieval precision. Evaluated against regular RAG and graph-based RAG approaches across multiple metrics including answer faithfulness and context precision, CausalRAG demonstrates consistent superiority, showing that causal grounding is a promising direction for knowledge-intensive tasks.

Artificial Intelligence

Efficient Fine-Grained GPU Performance Modeling for Distributed Deep Learning of LLM

2025

IEEE/ACM International Conference on High Performance Computing (SC25), December 17-20, 2025, Hyderabad, India

Training large language models is one of the most compute-intensive tasks in HPC, and predicting end-to-end training time for multi-billion parameter models across hundreds of GPUs is challenging due to complex interactions between transformer components, parallelism strategies, and multi-tier communication. This paper addresses this by decomposing LLMs into core computational primitives and modeling them with operator-level decomposition, lightweight hardware-aware prediction models for key operations, and an end-to-end prediction system integrating these across complex parallelization strategies. The resulting framework enables accurate distributed LLM training performance prediction without costly full-scale sampling.

HPC Artificial Intelligence

FairSense: Long-Term Fairness Analysis of ML-Enabled Systems

2025

47th International Conference on Software Engineering (ICSE)

Algorithmic fairness of machine learning (ML) models has raised significant concern in the recent years. Many testing, verification, and bias mitigation techniques have been proposed to identify and reduce fairness issues in ML models. The existing methods are model-centric and designed to detect fairness issues under static settings. However, many ML-enabled systems operate in a dynamic environment where the predictive decisions made by the system impact the environment, which in turn affects future decision-making. Such a self- reinforcing feedback loop can cause fairness violations in the long term, even if the immediate outcomes are fair. In this paper, we propose a simulation- based framework called FairSense to detect and analyze long-term unfairness in ML-enabled systems. Given a fairness requirement, FairSense performs Monte- Carlo simulation to enumerate evolution traces for each system configuration. Then, FairSense performs sensitivity analysis on the space of possible configurations to understand the impact of design options and environmental factors on the long-term fairness of the system. We demonstrate FairSense's potential utility through three real-world case studies: Loan lending, opioids risk scoring, and predictive policing.

Trustworthy AI Artificial Intelligence

Fix False Transparency by Noise Guided Splatting

2025

NeurIPS

Opaque objects reconstructed by 3D Gaussian Splatting (3DGS) often exhibit a falsely transparent surface, leading to inconsistent background and internal patterns under camera motion in interactive viewing. This issue stems from the ill-posed optimization in 3DGS. During training, background and foreground Gaussians are blended via α-compositing and optimized solely against the input RGB images using a photometric loss. As this process lacks an explicit constraint on surface opacity, the optimization may incorrectly assign transparency to opaque regions, resulting in view-inconsistent and falsely transparent output. This issue is difficult to detect in standard evaluation settings (i.e., rendering static images) but becomes particularly evident in object-centric reconstructions under interactive viewing. Although other causes of view-inconsistency (e.g., popping artifacts) have been explored recently, false transparency has not been explicitly identified. To the best of our knowledge, we are the first to quantify, characterize, and develop solutions for this "false transparency" artifact, an underreported artifact in 3DGS. Our strategy, Noise Guided Splatting (NGS), encourages surface Gaussians to adopt higher opacity by injecting opaque noise Gaussians in the object volume during training, requiring only minimal modifications to the existing splatting process. To quantitatively evaluate false transparency in static renderings, we propose a transmittance-based metric that measures the severity of this artifact. In addition, we introduce a customized, high-quality object-centric scan dataset exhibiting pronounced transparency issues, and we augment popular existing datasets (e.g., DTU) with complementary infill noise specifically designed to assess the robustness of 3D reconstruction methods to false transparency. Experiments across multiple datasets show that NGS substantially reduces false transparency while maintaining competitive performance on standard rendering metrics (e.g., PSNR), demonstrating its overall effectiveness.

Computer Vision 3D Reconstruction

Flexible Group Count Enables Hassle-Free Structured Pruning

2025

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 11-15, 2025, Nashville, USA

Densely structured pruning methods maintain pruned models in a fully dense format, allowing immediate compression benefits, but existing grouped kernel pruning approaches introduce dynamic operations that add complications or impose limitations such as requiring expensive clustering schemes or custom architecture support. This paper argues that making Conv2d group count flexible under an integral optimization is the best practice for grouped kernel pruning, leveraging its ideal alignment with grouped convolution infrastructure. The resulting one-shot, post-train, data-agnostic method is more performant, adaptive, and user-friendly than its predecessors, requiring little to no hyperparameter tuning or handcrafted criteria.

Artificial Intelligence Computer Vision

Forte: Finding Outliers with Representation Typicality Estimation

2025

13th International Conference on Learning Representations (ICLR), April 24-28, 2025, Singapore

Generative models can now produce photorealistic synthetic data virtually indistinguishable from real training data, challenging OOD detectors that rely on generative model likelihoods due to likelihood misestimation and typicality issues. This paper introduces Forte, which hypothesizes that estimating typical sets using self-supervised learners leads to better OOD detection, using representation learning and informative summary statistics based on manifold estimation to address these issues. Forte outperforms other unsupervised approaches and achieves state-of-the-art performance on established challenging benchmarks as well as new synthetic data detection tasks, requiring no class labels.

Trustworthy AI Artificial Intelligence

Grammars of Formal Uncertainty: When to Trust LLMs in Automated Reasoning Tasks

2025

39th Conference on Neural Information Processing Systems (NeurIPS 2025), December 2025

Large language models show remarkable promise for automated reasoning by generating formal specifications, but a fundamental tension exists between their probabilistic nature and the deterministic guarantees required by formal verification. This paper comprehensively investigates failure modes and uncertainty quantification in LLM-generated formal artifacts, revealing that SMT-based autoformalization has highly domain-specific accuracy impacts ranging from +34.8% on logical tasks to −44.5% on factual ones. A probabilistic context-free grammar (PCFG) framework is introduced to model LLM outputs and yield a refined uncertainty taxonomy, finding that uncertainty signals are task-dependent—for example, grammar entropy for logic achieves AUROC > 0.93.

Artificial Intelligence Trustworthy AI

HOPPS: Hardware-Aware Optimal Phase Polynomial Synthesis with Blockwise Optimization for Quantum Circuits

2025

IEEE/ACM International Conference on High Performance Computing (SC25), December 17-20, 2025, Hyderabad, India

Blocks composed of CNOT and Rz gates are ubiquitous in modern quantum applications such as QAOA ansatzes and quantum adders, but after compilation they often exhibit large CNOT counts or depths that lower fidelity. This paper introduces HOPPS, a SAT-based hardware-aware optimal phase polynomial synthesis algorithm that generates CNOT/Rz blocks with CNOT count or depth optimality under hardware topology constraints. To address scalability for large circuits, an iterative blockwise optimization strategy partitions large circuits into smaller blocks and optimally refines each—achieving CNOT count reductions up to 50% and depth reductions up to 57.1% when used as a peephole optimizer.

Quantum Computing HPC

Integrating self-configuring and foundational deep learning segmentation models for identifying the anal sphincter complex and perianal fistulae on pelvic MRI

2025

SPIE Medical Imaging 2025

This paper introduces an automated pelvic MRI segmentation pipeline that combines nnU-Net with MedSAM for identifying perianal fistulae and the anal sphincter complex. The approach leverages self-configuring and foundation- model segmentation components to improve robustness on a difficult clinical anatomy problem. It is designed to support interventional guidance and surgical planning in Crohn's disease. The work demonstrates how task-specific and foundation-model methods can be integrated for clinically useful MRI analysis.

Medical Imaging Computer Vision Artificial Intelligence

K4: Online Log Anomaly Detection via Unsupervised Typicality Learning

2025

IEEE/ACM International Conference on High Performance Computing (SC25), December 17-20, 2025, Hyderabad, India

Existing log anomaly detection methods are often slow, dependent on error-prone parsing, and use unrealistic evaluation protocols. This paper introduces K4 (Knowing the Unknown by Knowing only the Known), a fully unsupervised, parser-independent framework that transforms arbitrary log embeddings into compact four-dimensional descriptors—Precision, Recall, Density, Coverage—using efficient k-nearest neighbor statistics. Under a realistic online chunk-based evaluation protocol, K4 achieves state-of-the-art AUROC of 0.995–0.999 across HDFS, BGL, and Thunderbird datasets, with training under 4 seconds and inference as low as 4 μs.

Trustworthy AI HPC Artificial Intelligence

LABELING COPILOT: A Deep Research Agent for Automated Data Curation in Computer Vision

2025

2025 IEEE International Conference on Big Data, December 8-11, 2025, Macau, China

Curating high-quality, domain-specific datasets is a major bottleneck for deploying robust vision systems. This paper introduces Labeling Copilot, the first data curation deep research agent for computer vision, powered by a large multimodal language model that uses multi-step reasoning to execute specialized tools across three core capabilities: Calibrated Discovery for sourcing in-distribution data from large repositories, Controllable Synthesis for generating rare-scenario data with robust filtering, and Consensus Annotation for producing accurate labels via a novel multi-model consensus mechanism. On the dense COCO dataset, the Consensus Annotation module achieves an annotation mAP of 37.1%, and on Open Images it discovers 903 new bounding box categories.

Artificial Intelligence Computer Vision

LoRATK: LoRA Once, Backdoor Everywhere in the Share-and-Play Ecosystem

2025

Findings of EMNLP 2025, November 5-9, 2025, Suzhou, China

Fine-tuning LLMs with LoRA has created a convenient share-and-play ecosystem where users download community-shared adapters to enhance base models, but this also introduces a new attack surface for distributing malicious LoRAs. This paper demonstrates that a backdoor LoRA can be trained once and then seamlessly merged in a training-free fashion with multiple task-enhancing LoRAs, retaining both malicious behavior and legitimate downstream capabilities. Such merged LoRAs are particularly dangerous because malicious intent is concealed behind improved downstream performance, creating strong incentive for voluntary adoption, and no safety measures exist to intervene during local deployment.

Trustworthy AI Artificial Intelligence

Longer Context, Deeper Thinking: Uncovering the Role of Long-Context Ability in Reasoning

2025

39th Conference on Neural Information Processing Systems (NeurIPS 2025), December 2025

Recent language models exhibit strong reasoning capabilities, yet the influence of long-context capacity on reasoning remains underexplored. This paper hypothesizes that current reasoning limitations stem partly from insufficient long-context capacity, motivated by observations that higher context window lengths correlate with stronger reasoning performance and that failed reasoning cases resemble failed long-context cases. Controlled experiments comparing architecturally identical models with varying long-context capacities confirm that enhancing long-context ability before supervised fine-tuning leads to improved reasoning, advocating for long-context capacity as a first-class design objective.

Artificial Intelligence

MQuAKE-Remastered: Multi-Hop Knowledge Editing Can Only Be Advanced with Reliable Evaluations

2025

13th International Conference on Learning Representations (ICLR), April 24-28, 2025, Singapore [Spotlight]

Multi-hop knowledge editing in LLMs has been evaluated using benchmarks with unreliable protocols that conflate editing success with benchmark artifacts, producing misleading results. This paper presents MQuAKE-Remastered, which corrects systematic flaws in prior multi-hop knowledge editing assessments and demonstrates that reliable evaluation methodology is largely absent—and essential—for advancing the field. Accepted as a Spotlight at ICLR 2025, the work shows that many reported gains in multi-hop editing do not hold under rigorous evaluation, calling for a reset of evaluation standards.

Artificial Intelligence

Masked-speech recognition using human and synthetic cloned speech

2025

Trends in Hearing

This study evaluates the intelligibility and human-likeness of AI-generated voice clones compared to human speech. Using transformer-based language models, the research demonstrates that synthetic speech can achieve similar recognition scores and perceptual similarity to original human talkers, even in noisy environments. The findings suggest that voice synthesis and automatic speech recognition (ASR) are promising tools for evaluating speech recognition in both clinical audiology and hearing research.

Artificial Intelligence Trustworthy AI

Novel Adaptation of Video Segmentation to 3D MRI: Efficient Zero-Shot Knee Segmentation with SAM2

2025

SPIE Medical Imaging 2025, February 16-20, 2025, San Diego, USA [Oral]

Medical image segmentation methods face the challenge of domain transfer, where performance degrades due to distribution shifts between source and target domains. This paper adapts SAM2, a general-purpose video segmentation model, for zero-shot single-prompt 3D knee MRI segmentation by treating volumetric slices as individual video frames and leveraging SAM2's memory mechanism to generate motion- and spatially-aware predictions across the volume. Experiments on the OAI-ZIB dataset demonstrate a Dice similarity coefficient of 0.9643 on tibia using only a single prompt and no task-specific training or fine-tuning.

Medical Imaging Artificial Intelligence Computer Vision

QuFlex: Parallel Quantum Job Scheduling Using Adaptive Circuit-Cutting

2025

Supercomputing India Conference, December 9-13, 2025, Hyderabad

Parallel quantum job scheduling across multiple QPUs is critical for maximizing throughput in heterogeneous quantum computing environments. QuFlex introduces an adaptive circuit-cutting approach that dynamically partitions quantum circuits based on available QPU resources, enabling efficient parallel scheduling across heterogeneous quantum hardware. The framework demonstrates improved QPU utilization and reduced job completion times compared to static partitioning approaches.

Quantum Computing HPC

Segment then Splat: Unified 3D Open-Vocabulary Segmentation via Gaussian Splatting

2025

NeurIPS

Open-vocabulary querying in 3D space is crucial for enabling more intelligent perception in applications such as robotics, autonomous systems, and augmented reality. However, most existing methods rely on 2D pixel-level parsing, leading to multi-view inconsistencies and poor 3D object retrieval. Moreover, they are limited to static scenes and struggle with dynamic scenes due to the complexities of motion modeling. In this paper, we propose Segment-then-Splat, a 3D-aware open vocabulary segmentation approach for both static and dynamic scenes based on Gaussian Splatting. Segment-then-Splat reverses the long established approach of segmentation after reconstruction by dividing Gaussians into distinct object sets before reconstruction. Once the reconstruction is complete, the scene is naturally segmented into individual objects, achieving true 3D segmentation. This approach not only eliminates Gaussian-object misalignment issues in dynamic scenes but also accelerates the optimization process, as it eliminates the need for learning a separate language field. After optimization, a CLIP embedding is assigned to each object to enable open-vocabulary querying. Extensive experiments on various datasets demonstrate the effectiveness of our proposed method in both static and dynamic scenarios.

Computer Vision 3D Reconstruction 3D Scene Understanding

Spatial Intelligence in Vision-Language Models: A Comprehensive Survey

2025

Vision-Language Models (VLMs) have achieved great success but still lack spatial intelligence, and this survey provides the first unified overview of recent advances, taxonomies, and evaluations toward building spatially intelligent AI.

Artificial Intelligence Spatial Intelligence Vision Language Model

Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time

2025

Conference on Language Modeling (COLM), October 7-10, 2025, Montreal, Canada

Recent advances in post-training enhance model reasoning but require costly training pipelines and produce inefficient, overly lengthy outputs. This paper introduces Speculative Thinking, a training-free framework enabling large reasoning models to guide smaller ones during inference at the reasoning level—distinct from token-level speculative decoding—by identifying structural cues such as paragraph breaks followed by reflective phrases where small models struggle and delegating those steps to a larger model. The method significantly boosts smaller model reasoning accuracy while shortening output length, offering an efficient inference-time paradigm that preserves the small model's compute efficiency.

Artificial Intelligence

2024

Achieving Low Latency Inference on High Resolution Images by Exploiting Sparsity in Vision Transformers

2024

40th IEEE International Parallel & Distributed Processing Symposium (IPDPS)

This paper presents a tile-aware sparse attention scheduling framework for improving the efficiency of structured sparse vision transformers on GPUs. The method represents attention masks as adjacency matrices, applies structure-aware reordering to expose dense computation blocks, and uses offline profiling with Integer Linear Programming (ILP) to select optimal tile shapes under hardware constraints. Integrated into models such as Vision Longformer, RegionViT, and DynamicViT, the framework achieves up to 2.1× end-to-end latency speedup over fixed-tile FlashAttention. The results show that aligning sparse attention computation with both sparsity structure and GPU characteristics can substantially improve inference efficiency.

HPC

An Automated Approach for Improving the Inference Latency and Energy Efficiency of Pretrained CNNs by Removing Irrelevant Pixels with Focused Convolutions

2024

29th Asia and South Pacific Design Automation Conference (ASP-DAC 2024), January 2024, South Korea

Computer vision CNNs achieve high accuracy but face ever-increasing energy and computation requirements, and making them more energy-efficient typically requires costly retraining. This paper proposes an automated method to improve the inference latency and energy efficiency of pretrained CNNs without retraining, by inserting a threshold layer that identifies irrelevant image regions and replacing subsequent convolutional layers with focused convolutions that ignore those regions entirely. The approach saves inference latency by up to 25% and energy costs by up to 22% on popular pretrained CNNs including ResNet, VGG, and ConvNeXt, with little to no accuracy loss.

Artificial Intelligence Computer Vision

Are Prompt Engineering and TODO Comments Friends or Foes? An Evaluation on GitHub Copilot

2024

46th International Conference on Software Engineering (ICSE)

Code intelligence tools such as GitHub Copilot have begun to bridge the gap between natural language and programming language. A frequent software development task is the management of technical debts, which are suboptimal solutions or unaddressed issues which hinder future software development. Developers have been found to “self-admit” technical debts (SATD) in software artifacts such as source code comments. Thus, is it possible that the information present in these comments can enhance code generative prompts to repay the described SATD? Or, does the inclusion of such comments instead cause code generative tools to reproduce the harmful symptoms of described technical debt? Does the modification of SATD impact this reaction? Despite the heavy maintenance costs caused by technical debt and the recent improvements of code intelligence tools, no prior works have sought to incorporate SATD towards prompt engineering. Inspired by this, this paper contributes and analyzes a dataset consisting of 36,381 TODO comments in the latest available revisions of their respective 102,424 repositories, from which we sample and manually generate 1,140 code bodies using GitHub Copilot. Our experiments show that GitHub Copilot can generate code with the symptoms of SATD, both prompted and unprompted. Moreover, we demonstrate the tool's ability to automatically repay SATD under different circumstances and qualitatively investigate the characteristics of successful and unsuccessful comments. Finally, we discuss gaps in which GitHub Copilot's successors and future researchers can improve upon code intelligence tasks to facilitate AI- assisted software maintenance.

Artificial Intelligence

Creating Intelligent Cyberinfrastructure for Democratizing AI

2024

AI Magazine

This paper provides an overview of the NSF-funded ICICLE AI Institute, which aims to fundamentally advance 'edge-to-center' AI-as-a-Service. By developing intelligent cyberinfrastructure (CI) that spans the edge-cloud-HPC computing continuum, the project seeks to enable plug-and-play AI that is accessible to a wider population. The work highlights high-impact applications in animal ecology, digital agriculture, and smart foodsheds as primary drivers for democratizing next-generation AI.

Artificial Intelligence HPC Trustworthy AI

Deep Learning Based Risk Stratification of Pre-operative CT Scans is Prognostic of Overall Survival in Kidney Cancers

2024

AACR Annual Meeting 2024

This abstract reports a deep learning model that uses pre-operative CT scans to predict overall survival in kidney cancer. The model improves pre-operative risk assessment and offers prognostic value beyond standard clinicopathological factors. It represents an early step toward multi- institutional imaging biomarkers for survival-based treatment planning. The work supports broader use of CT-derived representations for oncologic prognostication.

Medical Imaging Computer Vision Artificial Intelligence

Efficient Circuit Wire Cutting Based on Commuting Groups

2024

IEEE International Conference on Quantum Computing and Engineering (QCE24), September 2024, Montreal, Canada

Current quantum devices face challenges with large circuits due to increasing error rates as circuit size and qubit count grow. Inspired by ancilla-assisted quantum process tomography and MUBs-based grouping for simultaneous measurement, this paper proposes a new circuit wire cutting approach that uses ancillary qubits to transform quantum input initializations into quantum output measurements, allowing multiple measurements to be grouped and executed simultaneously. The technique significantly reduces subcircuit execution overhead and classical reconstruction complexity compared to standard wire cutting.

Quantum Computing HPC

Exploring Algorithmic Design Choices for Low Latency CNN Deployment

2024

2024 IEEE 31st International Conference on High Performance Computing, Data, and Analytics (HiPC)

This paper investigates algorithmic design choices for reducing latency in CNN deployment across diverse hardware platforms. Five convolution algorithms are implemented using SYCL and integrated into VGG16, ResNet101, and InceptionV4 by replacing the standard PyTorch Conv2d operator. Their performance is evaluated at both the layer and model level on GPUs against PyTorch and Intel PyTorch Extension baselines. Results show significant execution-time improvements, demonstrating the effectiveness of algorithm-level optimization for low-latency CNN inference.

HPC

Federated Image Quality Assessment of Prostate MRI Scans in a Multi-institutional Setting

2024

AACR Annual Meeting 2024

This work addresses image-quality variability in prostate MRI across multiple institutions using a federated analysis setting. It studies how artifact- related quality differences can affect the reliability and portability of downstream machine learning models. The abstract highlights the importance of multi-institutional quality assessment before model development and deployment. It contributes to more reliable imaging AI in federated and heterogeneous clinical environments.

Medical Imaging Artificial Intelligence

GNNs Also Deserve Editing, and They Need It More Than Once

2024

41st International Conference on Machine Learning (ICML), July 21, 2024, Vienna, Austria

Model editing—updating specific factual knowledge—has been extensively studied for LLMs but has received little attention for graph neural networks, which present unique challenges due to their relational structure. This paper extends model editing to GNNs, showing that they require iterative multi-round editing to maintain accuracy after knowledge updates, unlike LLMs where single-pass editing is often sufficient. The work proposes efficient multi-round GNN editing methods and demonstrates that both graph structure and node attributes must be carefully managed across editing rounds to prevent knowledge degradation.

Artificial Intelligence

Image Color Recognition and Recommendation Method and Device

2024

Patent CN118053001A

A patent describing a method and device for image color recognition and recommendation, applying computer vision techniques for automated color analysis.

Computer Vision

Image Processing Method and Apparatus

2024

Patent CN118053002A

A patent describing an image processing method and apparatus for automated visual data analysis and transformation.

Computer Vision

Intra-and Peri-tumoral Radiomic Features are Predictive of Pathologic Response to Multiple Neoadjuvant Therapy Regimen in Rectal Cancers via Pre-treatment MRI

2024

AACR Annual Meeting 2024

This study analyzes intra-tumoral and peri-tumoral radiomic features from pretreatment MRI to predict pathologic response in rectal cancer. It evaluates whether quantitative imaging phenotypes can identify responders across multiple neoadjuvant treatment regimens. The work aims to improve patient stratification beyond traditional staging and biomarker approaches. It contributes to noninvasive response prediction in rectal cancer management.

Medical Imaging Computer Vision Artificial Intelligence

KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches

2024

Findings of EMNLP 2024, November 12-16, 2024, Miami, USA

Long-context capability is critical for LLMs, but transformer architectures face significant challenges due to growing KV cache size and the complexity of attending to extended inputs. This paper provides a comprehensive taxonomy and benchmark evaluation of 10+ state-of-the-art approaches across seven long-context task categories—including KV cache quantization, token dropping, prompt compression, linear-time sequence models, and hybrid architectures—evaluated in a unified, aligned environment. The work reveals numerous previously unknown phenomena and offers a practical workbench and insights for the future development of long-context-capable LLMs.

Artificial Intelligence HPC

Knowledge Graphs Can be Learned with Just Intersection Features

2024

41st International Conference on Machine Learning (ICML), July 21, 2024, Vienna, Austria

Knowledge graph completion can be framed as link prediction where structural information is key, but quantifying this structural information poses a challenge. This paper demonstrates that the intersection among k-hop neighborhoods of the head, relation, and tail is the critical structural signal for valid triple prediction, and proposes a novel randomized algorithm to efficiently generate these intersection features. A straightforward fully-connected network leveraging these features outperforms established KG embedding models and graph neural network baselines, while also achieving substantial training time efficiency gains.

Artificial Intelligence

Materials Data Science Using CRADLE: A Distributed, Data-Centric Approach

2024

MRS Communications

This paper introduces CRADLE, a distributed framework designed to support data-centric AI and materials data science at scale. By integrating heterogeneous data management with elastic scaling, CRADLE addresses the challenges of massive datasets generated by modern experiments and simulations. The study demonstrates the framework's capabilities through five applications, including phase identification in X-ray diffraction and defect segmentation in computed tomography, emphasizing scalable and reproducible scientific insights.

HPC Data Science Materials Science

Optimizing Deployment of Unstructured Group Convolutions for Low Latency Inference

2024

2025 IEEE 32nd International Conference on High Performance Computing, Data, and Analytics (HiPC)

This paper presents an optimization framework for deploying unstructured group convolutions efficiently on GPUs. The method combines Knapsack-based partitioning, Integer Linear Programming (ILP), and matrix reordering strategies to improve load balancing and data reuse for irregular input-output channel connections. Evaluated on ShuffleNet and CondenseNet, the framework achieves up to 1.9× speedup over PyTorch, while reordering-enhanced ILP provides an additional 1.3× improvement. The results highlight the importance of hardware-aware scheduling for accelerating irregular CNN workloads.

HPC

Phase Identification in Synchrotron X-ray Diffraction Patterns of Ti-6Al-4V Using Computer Vision and Deep Learning

2024

Integrating Materials and Manufacturing Innovation

This research utilizes convolutional neural networks (CNNs) to automate the phase identification of titanium alloys from synchrotron X-ray diffraction (XRD) patterns. By treating XRD patterns as one-dimensional images, the deep learning model achieves high accuracy in distinguishing between alpha and beta phases, significantly reducing the time required for manual analysis in materials characterization.

Computer Vision Materials Science Artificial Intelligence

Privacy-Preserving Collaborative Genomic Research: A Real-Life Deployment and Vision

2024

24th Privacy Enhancing Technologies Symposium (PETS), July 15-20, 2024, Bristol, UK

The genomic domain stands to benefit greatly from advances in AI and data science, but increasing privacy and cybersecurity concerns necessitate robust solutions for sensitive collaborative research. This paper presents a practical deployment of a privacy-preserving framework for genomic research developed in collaboration with Lynx.MD, a secure health data collaboration platform, addressing challenges of enabling joint analysis of genomic data while mitigating data breach risks. The framework demonstrates scalable, privacy-preserving data sharing and analysis that maintains utility while satisfying rigorous security requirements in a real production environment.

Trustworthy AI Artificial Intelligence

QGroup: Parallel Quantum Job Scheduling Using Dynamic Programming

2024

IEEE International Conference on Quantum Computing and Engineering (QCE24), September 2024, Montreal, Canada

Scheduling quantum circuits across multiple QPUs requires efficient algorithms that minimize idle time while respecting hardware constraints. QGroup uses dynamic programming to optimally group and schedule quantum circuits across multiple QPUs, maximizing throughput and minimizing idle time through principled combinatorial optimization. Evaluated on realistic quantum workloads, QGroup achieves improved scheduling efficiency compared to greedy and heuristic-based baseline approaches.

Quantum Computing HPC

Radiomics to Detect Inflammation and Fibrosis on Magnetic Resonance Enterography in Stricturing Crohn’s Disease

2024

Journal of Crohn's and Colitis

This study develops radiomics-based machine learning models to characterize inflammation and fibrosis in Crohn's disease strictures from magnetic resonance enterography. The models improve diagnostic discrimination relative to radiologist visual scoring alone and show additional value when combined with expert assessment. The work addresses an important unmet need in noninvasive characterization of stricturing disease. It supports more quantitative and reproducible imaging-based assessment in inflammatory bowel disease.

Medical Imaging Computer Vision Artificial Intelligence

Recommendation Method, Device, and Electronic Apparatus Based on Multimodal Features

2024

Patent CN118093984A

A patent describing a recommendation method and device based on multimodal features, utilizing AI techniques for enhanced recommendation systems in commercial applications.

Artificial Intelligence

Spatial attention wavelon network (SpAWN) for survival-based risk stratification in kidney cancers via CT

2024

SPIE Medical Imaging 2024

SpAWN introduces a survival-risk stratification model for kidney cancer CT that combines spatial attention with wavelon activations. The design aims to improve interpretability and cross-cohort generalization for imaging-based survival prediction. The paper demonstrates that architectural choices tailored to spatial context can strengthen risk modeling from pre-operative scans. It contributes to clinically relevant prognostic modeling in oncologic imaging.

Medical Imaging Computer Vision Artificial Intelligence

Taylor Unswift: Secured Weight Release for Large Language Models via Taylor Expansion

2024

2024 Conference on Empirical Methods in Natural Language Processing (EMNLP 2024), November 12-16, 2024, Miami, USA

Releasing LLM weights poses a dilemma: open-sourcing compromises ownership while closed APIs raise data privacy concerns. This paper introduces TaylorMLP, which protects LLM ownership by transforming weights into Taylor-series parameters that can be released instead of original weights, and prevents unauthorized use by inducing low-speed token generation through increasing the number of Taylor-series terms. Empirical experiments across five datasets and three LLM architectures demonstrate TaylorMLP induces over 4× latency increase while producing tokens precisely matched with original models, effectively defending against weight reconstruction from downstream datasets.

Trustworthy AI Artificial Intelligence

Unsupervised Segmentation of Knee Bone Marrow Edema-like Lesions Using Conditional Generative Models

2024

Bioengineering

This study proposes a novel unsupervised method for the fully automated segmentation of Bone Marrow Edema-like Lesions (BMEL) in knee MRI. By leveraging conditional diffusion models and anomaly detection, the approach eliminates the need for labor-intensive and bias-prone manual annotations. The research sets new benchmarks for BMEL segmentation performance and provides a more reliable, quantitative tool for early diagnosis and prognosis of knee osteoarthritis.

Medical Imaging Artificial Intelligence Computer Vision

View-Consistent Object Removal in Radiance Fields

2024

ACM Multimedia (MM)

Radiance Fields (RFs) have emerged as a crucial technology for 3D scene representation, enabling the synthesis of novel views with remarkable realism. However, as RFs become more widely used, the need for effective editing techniques that maintain coherence across different perspectives becomes evident. Current methods primarily depend on per-frame 2D image inpainting, which often fails to maintain consistency across views, thus compromising the realism of edited RF scenes. In this work, we introduce a novel RF editing pipeline that significantly enhances consistency by requiring the inpainting of only a single reference image. This image is then projected across multiple views using a depth-based approach, effectively reducing the inconsistencies observed with per-frame inpainting. However, projections typically assume photometric consistency across views, which is often impractical in real-world settings. To accommodate realistic variations in lighting and viewpoint, our pipeline adjusts the appearance of the projected views by generating multiple directional variants of the inpainted image, thereby adapting to different photometric conditions. Additionally, we present an effective and robust multi-view object segmentation approach as a valuable byproduct of our pipeline. Extensive experiments demonstrate that our method significantly surpasses existing frameworks in maintaining content consistency across views and enhancing visual quality.

Computer Vision 3D Reconstruction 3D Editing

Visual Concept Networks: A Graph-Based Approach to Detecting Anomalous Data in Deep Neural Networks

2024

4th International Conference on Pattern Recognition and Artificial Intelligence (ICPRAI), July 3-6, 2024, Jeju Island, South Korea

Deep neural networks struggle with robustness against anomalous and out-of-distribution data, and current OOD benchmarks often oversimplify by focusing on single-object tasks. This paper introduces Visual Concept Networks, a graph-based method that converts images into networks of interconnected human-understandable visual concepts and uses topological features to detect both far-OOD and near-OOD data. Extensive testing on two novel complex real-world tasks with ablation studies using large vocabularies demonstrates the method's effectiveness for detecting anomalous data in DNNs.

Trustworthy AI Artificial Intelligence Computer Vision

2023

Accelerating Time to Science using CRADLE: A Framework for Materials Data Science

2023

30th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC), December 18-21, 2023, Goa, India

Accelerating materials data science requires scalable frameworks that can manage heterogeneous data and computation across distributed systems. This paper presents CRADLE, a distributed data-centric framework for materials data science workflows that integrates data management, computation, and analysis pipelines to significantly reduce time-to-science. Demonstrated on the 30th IEEE HiPC system, CRADLE shows substantial throughput improvements and workflow simplification for materials characterization and discovery tasks in HPC environments.

HPC Materials Science

Accelerating VQE Algorithms via Parameters and Measurement Reuse

2023

8th International Conference on Rebooting Computing (ICRC), December 2023, San Diego, CA

Variational Quantum Eigensolver algorithms require many quantum circuit executions to converge, creating significant overhead on current quantum hardware. This paper accelerates VQE by reusing parameters and measurement results across iterations, reducing the number of quantum circuit executions required for convergence without sacrificing solution quality. The approach is validated on standard molecular simulation benchmarks, demonstrating meaningful reduction in quantum resource requirements.

Quantum Computing HPC

Fairify: Fairness Verification of Neural Networks

2023

45th International Conference on Software Engineering (ICSE)

Fairness of machine learning (ML) software has become a major concern in the recent past. Although recent research on testing and improving fairness have demonstrated impact on real-world software, providing fairness guarantee in practice is still lacking. Certification of ML models is challenging because of the complex decision-making process of the models. In this paper, we proposed Fairify, an SMT-based approach to verify individual fairness property in neural network (NN) models. Individual fairness ensures that any two similar individuals get similar treatment irrespective of their protected attributes e.g., race, sex, age. Verifying this fairness property is hard because of the global checking and non-linear computation nodes in NN. We proposed sound approach to make individual fairness verification tractable for the developers. The key idea is that many neurons in the NN always remain inactive when a smaller part of the input domain is considered. So, Fairify leverages white-box access to the models in production and then apply formal analysis based pruning. Our approach adopts input partitioning and then prunes the NN for each partition to provide fairness certification or counterexample. We leveraged interval arithmetic and activation heuristic of the neurons to perform the pruning as necessary. We evaluated Fairify on 25 real-world neural networks collected from four different sources, and demonstrated the effectiveness, scalability and performance over baseline and closely related work. Fairify is also configurable based on the domain and size of the NN. Our novel formulation of the problem can answer targeted verification queries with relaxations and counterexamples, which have practical implications.

Trustworthy AI Artificial Intelligence

Fix Fairness, Don't Ruin Accuracy: Performance Aware Fairness Repair using AutoML

2023

31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE)

Machine learning (ML) is increasingly being used in critical decision-making software, but incidents have raised questions about the fairness of ML predictions. To address this issue, new tools and methods are needed to mitigate bias in ML-based software. Previous studies have proposed bias mitigation algorithms that only work in specific situations and often result in a loss of accuracy. Our proposed solution is a novel approach that utilizes automated machine learning (AutoML) techniques to mitigate bias. Our approach includes two key innovations: a novel optimization function and a fairness- aware search space. By improving the default optimization function of AutoML and incorporating fairness objectives, we are able to mitigate bias with little to no loss of accuracy. Additionally, we propose a fairness-aware search space pruning method for AutoML to reduce computational cost and repair time. Our approach, built on the state-of-the-art Auto-Sklearn tool, is designed to reduce bias in real-world scenarios. In order to demonstrate the effectiveness of our approach, we evaluated our approach on four fairness problems and 16 different ML models, and our results show a significant improvement over the baseline and existing bias mitigation techniques. Our approach, Fair-AutoML, successfully repaired 60 out of 64 buggy cases, while existing bias mitigation techniques only repaired up to 44 out of 64 cases.

Trustworthy AI Artificial Intelligence

One Less Reason for Filter Pruning: Gaining Free Adversarial Robustness with Structured Grouped Kernel Pruning

2023

37th Conference on Neural Information Processing Systems (NeurIPS 2023), December 2023

Structured pruning has traditionally been viewed as trading accuracy for efficiency, often assumed to come at the expense of adversarial robustness. This paper reveals that structured grouped kernel pruning inherently confers adversarial robustness as a byproduct—without any adversarial training—showing that pruning and robustness are not competing objectives but complementary ones. By demonstrating one less reason to avoid filter pruning, the work shows practitioners can gain free adversarial robustness simply by adopting structured grouped kernel pruning as their compression strategy.

Artificial Intelligence Trustworthy AI

Online Detection of Golden Circuit Cutting Points

2023

IEEE International Conference on Quantum Computing and Engineering (QCE23), September 2023, Seattle, Washington, USA

Quantum circuit cutting enables large circuits to run on small quantum devices, but reconstructing measurement statistics requires computational resources that grow exponentially with the number of cuts. This paper introduces the concept of a golden cutting point—circuit structures that induce negligible basis components during reconstruction, allowing those downstream computations to be avoided entirely. A hypothesis-testing scheme is proposed for online detection of golden cutting points, with robustness results for low-probability test failures, and demonstrated applicability on Qiskit's Aer simulator achieving reduced wall time from identifying and avoiding obsolete measurements.

Quantum Computing HPC

Towards Safe ML-Based Systems in Presence of Feedback Loops

2023

International Workshop on Dependability and Trustworthiness of Safety-Critical Systems with Machine Learned Components (SE4SafeML @ ESEC/FSE)

Machine learning (ML) based software is increasingly being deployed in a myriad of socio-technical systems, such as drug monitoring, loan lending, and predictive policing. Although not commonly considered safety-critical, these systems have a potential to cause serious, long-lasting harm to users and the environment due to their close proximity and effect on the society. One type of emerging problem in these systems is unintended side effects from a feedback loop; the decision of ML-based system induces certain changes in the environment, which, in turn, generates observations that are fed back into the system for further decision-making. When this cyclic interaction between the system and the environment repeats over time, its effect may be amplified and ultimately result in an undesirable outcome. In this position paper, we bring attention to the safety risks that are introduced by feedback loops in ML- based systems, and the challenges of identifying and addressing them. In particular, due to their gradual and long-term impact, we argue that feedback loops are difficult to detect and diagnose using existing techniques in software engineering. We propose a set of research problems in modeling, analyzing, and testing ML-based systems to identify, monitor, and mitigate the effects of an undesirable feedback loop.

Trustworthy AI Artificial Intelligence

Towards Understanding Fairness and its Composition in Ensemble Machine Learning

2023

45th International Conference on Software Engineering (ICSE)

Machine Learning (ML) software has been widely adopted in modern society, with reported fairness implications for minority groups based on race, sex, age, etc. Many recent works have proposed methods to measure and mitigate algorithmic bias in ML models. The existing approaches focus on single classifier-based ML models. However, real-world ML models are often composed of multiple independent or dependent learners in an ensemble (e.g., Random Forest), where the fairness composes in a non-trivial way. How does fairness compose in ensembles? What are the fairness impacts of the learners on the ultimate fairness of the ensemble? Can fair learners result in an unfair ensemble? Furthermore, studies have shown that hyperparameters influence the fairness of ML models. Ensemble hyperparameters are more complex since they affect how learners are combined in different categories of ensembles. Understanding the impact of ensemble hyperparameters on fairness will help programmers design fair ensembles. Today, we do not understand these fully for different ensemble algorithms. In this paper, we comprehensively study popular real-world ensembles: bagging, boosting, stacking and voting. We have developed a benchmark of 168 ensemble models collected from Kaggle on four popular fairness datasets. We use existing fairness metrics to understand the composition of fairness. Our results show that ensembles can be designed to be fairer without using mitigation techniques. We also identify the interplay between fairness composition and data characteristics to guide fair ensemble design. Finally, our benchmark can be leveraged for further research on fair ensembles. To the best of our knowledge, this is one of the first and largest studies on fairness composition in ensembles yet presented in the literature.

Trustworthy AI Artificial Intelligence

Transforming temporal-dynamic graphs into time-series data for solving event detection problems

2023

Turkish Journal of Electrical Engineering and Computer Sciences

This paper proposes a workflow for detecting important events in temporal-dynamic graphs by transforming graph snapshots into multivariate time-series data. The method first generates graph-level embeddings for each time step using temporal graph representation learning, and then applies unsupervised time-series anomaly detection models to identify abnormal events. The approach was evaluated on multiple real-world social media datasets and showed competitive or improved performance compared to prior event detection methods. The work demonstrates that graph embeddings can serve as an effective bridge between dynamic graph analysis and time-series anomaly detection.

Artificial Intelligence

Virtual Reality as an Acute Pain Reliever During Laceration Repair in Emergency Departments: A Randomized Controlled Trial

2023

Saudi Journal of Emergency Medicine

This randomized controlled trial investigates whether virtual reality can reduce acute pain during laceration repair in emergency departments. Adult patients undergoing repair were studied in a real clinical setting to assess pain relief during the procedure. The work explores immersive VR as a practical non-pharmacologic intervention for emergency care. It adds clinical evidence on the use of interactive technology for procedural pain management.

Artificial Intelligence

Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model

2023

37th Conference on Neural Information Processing Systems (NeurIPS 2023), December 2023

Fine-tuning large pre-trained language models has become increasingly difficult due to extensive memory usage, with the primary bottleneck being the storage of activation feature maps needed for gradient computation. This paper proposes WTA-CRS (Winner-Take-All Column-Row Sampling), a new family of unbiased estimators for matrix products with reduced variance that only requires storing sub-sampled activations for gradient calculation, applied during the backward pass to maintain unbiased gradient estimation. Applied to LLM fine-tuning, WTA-CRS significantly reduces activation memory requirements while maintaining training convergence, enabling adaptation of large models on hardware that would otherwise lack sufficient memory.

Artificial Intelligence

2022

23 Shades of Self-Admitted Technical Debt: An Empirical Study on Machine Learning Software

2022

30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE)

In software development, the term "technical debt" (TD) is used to characterize short-term solutions and workarounds implemented in source code that may incur a long-term cost. Technical debt has a variety of forms and can thus affect multiple qualities of software including but not limited to its legibility, performance, and structure. In this paper, we have conducted a comprehensive study on the technical debt in machine learning (ML) based software. Technical debt can appear differently in ML software by infecting the data that ML models are trained on, thus affecting the functional performance of ML systems. The growing inclusion of ML components in modern software systems are introducing new set of TDs. Does ML software have similar TDs to traditional software? If not, what are the new types of machine learning specific technical debts? Which ML pipeline stages those debts appear? Do these debts differ in ML tools and applications and when they get removed? Currently, we do not know the state of the ML TDs in the wild. To address these questions, we mined 68,821 self admitted technical debts (SATD) from all the revisions of a curated dataset consisting of 2,686 mature ML repositories from GitHub, along with their introduction and removal. By applying an open-coding scheme and following upon prior works, we provided a comprehensive taxonomy of ML SATDs. Our study analyzes ML SATD type organizations, their frequencies within stages of ML software, the differences between ML SATDs in applications and tools, and the effort of ML SATD removals. The findings discovered suggest implications for ML developers and researchers to create maintainable ML systems.

Artificial Intelligence

A Unified Framework to Assess Market Implications of Institutional Investments

2022

IEEE International Conference on Big Data, December 17-20, 2022, Osaka, Japan

Understanding the market implications of large institutional investment decisions requires modeling complex interactions between institutional behavior and market dynamics. This paper presents a unified framework using machine learning and statistical analysis to assess how large-scale institutional investment decisions affect market prices, volatility, and liquidity. The approach integrates multiple data sources to provide a comprehensive assessment of market implications across different investment types and market conditions.

Artificial Intelligence

Approximate Quantum Circuit Reconstruction

2022

IEEE International Conference on Quantum Computing and Engineering (QCE22), September 2022, Colorado, USA

Current and imminent quantum hardware lacks reliability due to noise and limited qubit counts, and quantum circuit cutting—which divides large circuits into smaller subcircuits—faces exponential classical post-processing overhead. This paper introduces approximate circuit reconstruction using a sampling-based method (MCMC) to probabilistically select high-probability bit strings during reconstruction, avoiding excessive calculations for the full probability distribution. Results show that this sampling-based post-processing holds great potential for fast and reliable circuit reconstruction in the NISQ era and beyond.

Quantum Computing HPC

Business Classification Method and Device Based on Machine Learning

2022

Patent CN115018405A

A patent describing a business classification method and device based on machine learning, applying ML algorithms for automated business categorization and analysis.

Artificial Intelligence

Image Generation Method and Apparatus Based on Artificial Intelligence

2022

Patent CN114723855A

A patent describing a method and apparatus for AI-based image generation, leveraging artificial intelligence techniques for automated visual content creation.

Computer Vision Artificial Intelligence

Irrelevant Pixels are Everywhere: Find and Exclude Them for More Efficient Computer Vision

2022

IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS 2022), June 13-15, 2022, Incheon, Korea

CNNs are compute-intensive because they indiscriminately compute features on all pixels of an input image, yet many pixels are irrelevant to the vision task at hand. This paper demonstrates through analysis of three popular computer vision datasets that approximately 48% of pixels are irrelevant, and proposes the focused convolution—a drop-in CNN replacement that operates only on relevant pixels identified by an area of interest mask. On an embedded device, the approach achieves no loss in accuracy while reducing inference latency, energy consumption, and multiply-add count by approximately 45%.

Artificial Intelligence Computer Vision

MARS: Malleable Actor-Critic Reinforcement Learning Scheduler

2022

International Performance Computing and Communications Conference (IPCCC), November 11-13, 2022, Austin, TX, USA

Scheduling jobs in HPC environments requires handling dynamic, time-varying workloads that challenge static scheduling policies. MARS (Malleable Actor-Critic Reinforcement Learning Scheduler) uses malleable actor-critic RL to adaptively schedule computing jobs, dynamically resizing allocations in response to changing workloads to optimize throughput and resource utilization. Evaluated against standard scheduling baselines, MARS demonstrates consistent improvements in job completion time and cluster utilization across varied workload scenarios.

Artificial Intelligence HPC

Method and Device for Determining and Evaluating Business Data Categories

2022

Patent CN114219037A

A patent describing a method and device for determining and evaluating business data categories, applying machine learning for automated business intelligence and data classification.

Artificial Intelligence

Pinpointing the System Reliability Degradation in NISQ Machines

2022

IEEE International Conference on Quantum Computing and Engineering (QCE22), September 2022, Colorado, USA

Noise in quantum hardware causes significant reliability degradation in NISQ machines, but the systematic patterns of this degradation are not well understood. This paper investigates the sources and temporal patterns of reliability degradation in NISQ machines, identifying when and where noise causes significant performance drops in quantum circuits. The analysis provides guidance for developing error mitigation strategies targeted at the most impactful reliability degradation patterns in near-term quantum hardware.

Quantum Computing HPC

Practical Implications of Dequantization on Machine Learning Algorithms

2022

7th International Conference on Connected Systems and Intelligence (ISI'22), September 2022, Trivandrum, India

Quantum computing algorithms offer theoretical speedups for certain machine learning tasks, but dequantization results show that classical algorithms can sometimes achieve comparable performance. This paper examines the practical implications of dequantization on machine learning algorithms, providing a systematic analysis of when quantum approaches offer genuine advantages versus when classical alternatives are sufficient. The work offers guidance for practitioners on determining which ML tasks are promising candidates for quantum speedup versus those where dequantization renders quantum approaches redundant.

Quantum Computing Artificial Intelligence

Quantum Noise in the Flow of Time: A Temporal Study of the Noise in Quantum Computers

2022

IEEE International Symposium on On-Line Testing and Robust System Design (IOLTS), September 2022, Torino, Italy

Quantum noise in quantum computers is not static but evolves over time, yet most error characterization treats noise as temporally fixed. This paper conducts a temporal study of noise characteristics in quantum computers, revealing how quantum noise patterns change over time and analyzing the implications for circuit fidelity and error mitigation strategies. The findings provide insights for developing more effective time-aware calibration and error mitigation approaches for near-term quantum hardware.

Quantum Computing HPC

System-Auditing, Data Analysis and Characteristics of Cyber Attacks for Big Data System

2022

31st ACM Conference on Information and Knowledge Management (CIKM), October 2022, Atlanta, USA

Using big data, distributed computing systems such as Apache Hadoop requires processing massive amount of data to support business and research applications. Thus, it is critical to ensure the cyber security of such systems. To better defend from advanced cyber attacks that pose threats to even well-protected enterprises, system-auditing based techniques have been adopted for monitoring system activities and assisting attack investigation. In this demo, we are building a system that collects system auditing logs from a big data system and performs data analysis to understand how system auditing can be used more effectively to assist attack investigation on big systems. We also built a demo application that detects unexpected file deletion and presents root causes for the deletion.

Trustworthy AI HPC

The Art and Practice of Data Science Pipelines: A Comprehensive Study of Data Science Pipelines In Theory, In-The-Small, and In-The-Large

2022

44th International Conference on Software Engineering (ICSE)

Increasingly larger number of software systems today are including data science components for descriptive, predictive, and prescriptive analytics. The collection of data science stages from acquisition, to cleaning/curation, to modeling, and so on are referred to as data science pipelines. To facilitate research and practice on data science pipelines, it is essential to understand their nature. What are the typical stages of a data science pipeline? How are they connected? Do the pipelines differ in the theoretical representations and that in the practice? Today we do not fully understand these architectural characteristics of data science pipelines. In this work, we present a three-pronged comprehensive study to answer this for the state- of-the-art, data science in-the-small, and data science in-the-large. Our study analyzes three datasets: a collection of 71 proposals for data science pipelines and related concepts in theory, a collection of over 105 implementations of curated data science pipelines from Kaggle competitions to understand data science in-the-small, and a collection of 21 mature data science projects from GitHub to understand data science in-the-large. Our study has led to three representations of data science pipelines that capture the essence of our subjects in theory, in-the-small, and in-the-large.

Artificial Intelligence

2021

A Predictive Analytics Framework for Multi-Horizon Financial Crises Forecasting using Macro-Economic Data

2021

IEEE International Conference on Big Data, December 15-18, 2021, Orlando, Florida, USA

Predictive analytics framework for multi-horizon financial crisis forecasting using macroeconomic data and ML to provide early warning signals.

Artificial Intelligence

Fair Preprocessing: Towards Understanding Compositional Fairness of Data Transformers in Machine Learning Pipeline

2021

29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE)

In recent years, many incidents have been reported where machine learning models exhibited discrimination among people based on race, sex, age, etc. Research has been conducted to measure and mitigate unfairness in machine learning models. For a machine learning task, it is a common practice to build a pipeline that includes an ordered set of data preprocessing stages followed by a classifier. However, most of the research on fairness has considered a single classifier based prediction task. What are the fairness impacts of the preprocessing stages in machine learning pipeline? Furthermore, studies showed that often the root cause of unfairness is ingrained in the data itself, rather than the model. But no research has been conducted to measure the unfairness caused by a specific transformation made in the data preprocessing stage. In this paper, we introduced the causal method of fairness to reason about the fairness impact of data preprocessing stages in ML pipeline. We leveraged existing metrics to define the fairness measures of the stages. Then we conducted a detailed fairness evaluation of the preprocessing stages in 37 pipelines collected from three different sources. Our results show that certain data transformers are causing the model to exhibit unfairness. We identified a number of fairness patterns in several categories of data transformers. Finally, we showed how the local fairness of a preprocessing stage composes in the global fairness of the pipeline. We used the fairness composition to choose appropriate downstream transformer that mitigates unfairness in the machine learning pipeline.

Trustworthy AI Artificial Intelligence

TQEA: Temporal Quantum Error Analysis

2021

51st Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), June 2021

Quantum errors in NISQ hardware vary temporally, but most error analysis tools treat noise as time-invariant. TQEA (Temporal Quantum Error Analysis) characterizes how quantum errors evolve over time by systematically measuring and modeling the temporal dynamics of noise in quantum computers. The framework provides insights for improving error mitigation strategies that account for drift and time-varying noise characteristics, supporting progress toward more reliable quantum computing.

Quantum Computing HPC

2020

A Predictive Analytics Framework for Insider Trading Events

2020

IEEE International Conference on Big Data, December 10-13, 2020, Atlanta, Georgia, USA

Detecting and forecasting insider trading events using traditional methods is limited by their reliance on predefined rules and inability to capture subtle market signals. This paper presents a predictive analytics framework for insider trading events using machine learning applied to financial transaction data and market signals, demonstrating the ability to identify patterns predictive of insider trading. The approach provides early warning capabilities that can complement traditional regulatory surveillance methods.

Artificial Intelligence

Do the Machine Learning Models on a Crowd Sourced Platform Exhibit Bias? An Empirical Study on Model Fairness

2020

28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE)

Machine learning models are increasingly being used in important decision- making software such as approving bank loans, recommending criminal sentencing, hiring employees, and so on. It is important to ensure the fairness of these models so that no discrimination is made based on protected attribute (e.g., race, sex, age) while decision making. Algorithms have been developed to measure unfairness and mitigate them to a certain extent. In this paper, we have focused on the empirical evaluation of fairness and mitigations on real-world machine learning models. We have created a benchmark of 40 top- rated models from Kaggle used for 5 different tasks, and then using a comprehensive set of fairness metrics, evaluated their fairness. Then, we have applied 7 mitigation techniques on these models and analyzed the fairness, mitigation results, and impacts on performance. We have found that some model optimization techniques result in inducing unfairness in the models. On the other hand, although there are some fairness control mechanisms in machine learning libraries, they are not documented. The mitigation algorithm also exhibit common patterns such as mitigation in the post-processing is often costly (in terms of performance) and mitigation in the pre-processing stage is preferred in most cases. We have also presented different trade-off choices of fairness mitigation decisions. Our study suggests future research directions to reduce the gap between theoretical fairness aware algorithms and the software engineering methods to leverage them in practice.

Trustworthy AI Artificial Intelligence

Towards Performant Workflows, Monitoring and Measuring

2020

International Conference on Computer Communication and Networks (ICCCN 2020), August 3-6, 2020, Honolulu, Hawaii, USA

Scientific HPC workflows require robust monitoring and measurement infrastructure to understand performance characteristics and enable optimization. This paper presents approaches for building performant scientific workflows with integrated monitoring and measurement, enabling better characterization and optimization of HPC workflow performance across distributed computing environments. The work provides practical methodologies for workflow developers to identify bottlenecks and improve end-to-end throughput.

HPC

With or Without Knee Total Knee Arthroplasty? Deep Learning-powered Strategy to detect TKA in plain radiographs

2020

9th Annual International Congress on Arthroplasty Registries, May 2020

Deep learning approach for automatically detecting TKA implants in plain radiographs, enabling efficient large-scale retrospective analysis of orthopedic registries.

Medical Imaging Artificial Intelligence