Publications & Code
2026
AI-Guided Detection of Persuasion Strategies in Internal Marketing Communications
2026This work presents an AI-guided approach to detecting persuasion strategies in internal marketing communications, rethinking the dominant entity-centric paradigm in retrieval for large language models by introducing a novel reasoning-oriented perspective.
Alternative Decomposed Message Passing for Efficient Geometric GNNs
2026This paper proposes an alternative decomposed message-passing framework for improving the efficiency of geometric graph neural networks. Instead of relying on concatenation-based message generation, the method decomposes node-, edge-, and angle-level transformations into reusable components, reducing redundant computation and memory overhead while preserving algebraic equivalence. The framework is designed for efficient GPU execution and serves as a drop-in replacement for representative architectures such as EGNN and CHGNet. Experimental results show up to 2x training speedup and 60% end-to-end memory reduction with no loss in accuracy across diverse geometric learning workloads.
Categorical Evaluation of LLMs under Test-Time Scaling
2026This work argues that binary pass-based metrics are too coarse for evaluating reasoning models under test-time scaling. It introduces a categorical Bayesian framework that scores rubric-defined outcomes with uncertainty rather than collapsing all outputs into pass-or-fail labels. The study shows that lightweight runtime signals can support accurate categorical evaluation without relying on a judge model and that rubric design can materially change model rankings. The paper extends uncertainty-aware LLM evaluation beyond binary correctness.
Don't Pass@k: A Bayesian Framework for Large Language Model Evaluation
2026Pass@k is widely used to report LLM reasoning performance but often yields unstable and misleading rankings, especially when trial counts are limited and compute is constrained. This paper proposes a principled Bayesian evaluation framework that replaces Pass@k with posterior estimates of a model's underlying success probability and credible intervals, using a Dirichlet prior to give closed-form expressions for posterior mean and uncertainty under any weighted rubric. Empirically, on AIME'24/'25, HMMT'25, and BrUMO'25, the Bayesian approach achieves faster convergence and greater rank stability than Pass@k, enabling reliable model comparisons at far smaller sample counts. The framework also naturally extends to graded, rubric-based evaluations, making uncertainty explicit.
Efficient Transpilation of OpenQASM 3.0 Dynamic Circuits to CUDAQ: Performance and Expressiveness Advantages
2026Presents an efficient transpilation approach for converting OpenQASM 3.0 dynamic circuits to CUDAQ, demonstrating performance and expressiveness advantages.
Empirical evaluation of variability and multi-institutional generalizability of deep learning survival models: Application to renal cancer CT scans
2026This paper systematically studies how methodological choices affect the robustness and external generalization of CT-based deep learning survival models for renal cancer. It examines data partitioning, data order, random initialization, and augmentation strategies on a multi-institutional cohort spanning nine institutions. The study finds that covariate-balanced partitioning and carefully chosen augmentations materially improve external validation performance, while initialization meaningfully affects variance. These results provide practical recommendations for building more stable and generalizable survival models from medical imaging.
GSMem: 3D Gaussian Splatting as Persistent Spatial Memory for Zero-Shot Embodied Exploration and Reasoning
2026Effective embodied exploration requires agents to accumulate and retain spatial knowledge over time. However, existing scene representations, such as discrete scene graphs or static view-based snapshots, lack post-hoc re-observability. If an initial observation misses a target, the resulting memory omission is often irrecoverable. To bridge this gap, we propose GSMem, a zero-shot embodied exploration and reasoning framework built upon 3D Gaussian Splatting (3DGS). By explicitly parameterizing continuous geometry and dense appearance, 3DGS serves as a persistent spatial memory that endows the agent with Spatial Recollection: the ability to render photorealistic novel views from optimal, previously unoccupied viewpoints. To operationalize this, GSMem employs a retrieval mechanism that simultaneously leverages parallel object-level scene graphs and semantic-level language fields. This complementary design robustly localizes target regions, enabling the agent to “hallucinate” optimal views for high-fidelity Vision-Language Model (VLM) reasoning. Furthermore, we introduce a hybrid exploration strategy that combines VLM-driven semantic scoring with a 3DGS-based coverage objective, balancing task-aware exploration with geometric coverage. Extensive experiments on embodied question answering and lifelong navigation demonstrate the robustness and effectiveness of our framework.
Geom@k: Fast to Converge, Slow to Drift
2026This paper studies evaluation metrics for test-time scaling by separating answer discovery from repeated correctness. It derives Geom@k and the broader GeoSpectrum@K family from a common hypergeometric view of fixed-budget metrics. Across aggregate settings, Geom@2 provides a strong balance of fast convergence and low ranking drift relative to alternative summaries. The work offers a compute-aware perspective on stable evaluation under repeated sampling.
HugRAG: Hierarchical Causal Knowledge Graph Design for RAG
2026HugRAG rethinks knowledge organization for graph-based RAG through causal gating across hierarchical modules. It explicitly models causal relationships to suppress spurious correlations while enabling scalable reasoning over large-scale knowledge graphs. Extensive experiments demonstrate that HugRAG consistently outperforms competitive graph-based RAG baselines across multiple datasets and evaluation metrics, establishing a principled foundation for structured, scalable, and causally grounded RAG systems.
K^4-Serve: Robust Streaming Log Anomaly Detection for HPC & AI Infrastructure
2026K^4-Serve operationalizes the K^4 framework for streaming anomaly detection on production HPC and AI infrastructure logs. It combines Kafka-based ingestion, versioned normalization, sliding-window scoring, retraining, and observability features to support robust real-world deployment. The system achieves stable deployment on real HPC logs with near-perfect event-level detection and only one false alert in the reported study. The work bridges anomaly-detection methodology and production cyberinfrastructure practice.
LRD-Net: A Lightweight Real-Centered Detection Network for Cross-Domain Face Forgery Detection
2026Introduces LRD-Net, a lightweight real-centered detection network for cross-domain deepfake detection, generalizing face forgery detection across domains.
Less Prune, MoRE Experts: Recognizing and Restructuring Latent Experts for Model Compression
2026Proposes recognizing and restructuring latent expert structures within large models for compression, achieving efficiency while preserving accuracy.
Medical Image Spatial Grounding with Semantic Sampling
2026This work studies spatial grounding for vision-language models in 3D medical imaging, where anatomy, modality, slice direction, and coordinate systems create unique challenges. It introduces MIS-Ground, a benchmark for analyzing failure modes in medical image spatial grounding, and MIS-SemSam, an inference-time semantic sampling method that improves grounding accuracy without retraining. The paper evaluates how visual and textual prompting choices influence grounding performance across clinical imaging settings. It advances reproducible evaluation and practical improvement of medical VLM grounding.
NEBULA: Do We Evaluate Vision-Language-Action Agents Correctly?
2026This paper introduces NEBULA, a unified ecosystem for evaluating Vision-Language-Action (VLA) agents beyond coarse end-task success metrics. It proposes a novel dual-axis evaluation framework that combines fine-grained capability tests for skill-specific diagnosis with systematic stress tests to measure robustness under real-world perturbations. In addition, NEBULA standardizes fragmented embodied AI datasets through a unified data format and API, enabling reproducible cross-dataset training and benchmarking. Experimental results reveal that state-of-the-art VLA models exhibit significant hidden weaknesses in critical capabilities such as spatial reasoning and dynamic adaptation, highlighting the need for more interpretable and reliability-aware evaluation. [oai_citation:0‡ICLR_2026_Nebula_Final.pdf](sediment://file_000000002e8c722f8ce2ecef4cc5af26)
QuMod: Parallel Quantum Job Scheduling on Modular QPUs using Circuit Cutting
2026Presents QuMod, a parallel quantum job scheduling framework for modular QPUs leveraging circuit cutting to improve throughput on heterogeneous quantum hardware.
Quantize What Counts: More for Keys, Less for Values
2026This work studies asymmetric KV-cache quantization for large language models and shows that key tensors carry more information than value tensors. The analysis motivates allocating more bits and stronger outlier handling to keys than to values, instead of quantizing both sides identically. Experiments show that key-favored bit allocation preserves much more accuracy at the same memory budget. The paper provides both theoretical motivation and practical guidance for more efficient LLM inference.
Ranking Reasoning LLMs under Test-Time Scaling
2026This paper studies how to rank reasoning large language models when evaluation uses multiple stochastic samples per prompt under test-time scaling. It formalizes dense benchmark ranking in this repeated-trial setting and introduces Scorio, a library that implements Bayesian, paired-comparison, psychometric, voting, and spectral ranking methods. Across twenty reasoning models and four Olympiad-style math benchmarks, the study shows that many full-trial rankings closely match a Bayesian gold standard while low-budget methods can be less stable. The results provide practical guidance for reliable model ranking under both high- and low-budget evaluation settings.
Real-Time Online Learning Trajectory Prediction via Efficient Latent Predictor
2026Presents an efficient latent predictor for real-time online trajectory prediction in autonomous vehicles, achieving high accuracy with reduced computational overhead.
Reconstruction Matters: Learning Geometry-Aligned BEV Representation through 3D Gaussian Splatting
2026Bird's-Eye-View (BEV) perception serves as a cornerstone for autonomous driving, offering a unified spatial representation that fuses surrounding-view images to enable reasoning for various downstream tasks, such as semantic segmentation, 3D object detection, and motion prediction. However, most existing BEV perception frameworks adopt an end-to-end training paradigm, where image features are directly transformed into the BEV space and optimized solely through downstream task supervision. This formulation treats the entire perception process as a black box, often lacking explicit 3D geometric understanding and interpretability, leading to suboptimal performance. In this paper, we claim that an explicit 3D representation matters for accurate BEV perception, and we propose Splat2BEV, a Gaussian Splatting-assisted framework for BEV tasks. Splat2BEV aims to learn BEV feature representations that are both semantically rich and geometrically precise. We first pre-train a Gaussian generator that explicitly reconstructs 3D scenes from multi-view inputs, enabling the generation of geometry-aligned feature representations. These representations are then projected into the BEV space to serve as inputs for downstream tasks. Extensive experiments on nuScenes and argoverse dataset demonstrate that Splat2BEV achieves state-of-the-art performance and validate the effectiveness of incorporating explicit 3D reconstruction into BEV perception.
Scorio.jl: A Julia package for ranking stochastic responses
2026Scorio.jl is a Julia package for evaluating and ranking systems from repeated stochastic responses on shared tasks. It provides a common tensor-based interface for direct score-based, pairwise, psychometric, voting, graph, and listwise ranking methods. The package supports methodological studies of ranking stability as well as day-to-day leaderboard construction. It makes ranking under repeated stochastic observation easier to analyze across different assumptions and ranking families.
Sweeping Promptable Spoofs under the DirtyRAG
2026This paper studies security vulnerabilities in retrieval-augmented generation through DirtyRAG, a query-blind benign-passage attack that can be steered by prompting. It shows that promptable spoof passages remain effective against strong defenses and exposes a practical attack surface for real-world RAG systems. The work also introduces RAG-ATag, a benchmark for evaluating RAG security under these attack conditions. It highlights the need for more robust retrieval and generation defenses in deployed LLM systems.
Technological and Digitalization Forces Shaping B2B Sales: Confluence, Challenges, Promises, and Pitfalls
2026This handbook chapter examines the technological and digitalization forces shaping B2B sales, analyzing the confluence of emerging technologies, the challenges they present, their promises for transforming sales processes, and potential pitfalls in implementation.
Trust the Typical
2026Current approaches to LLM safety rely on a brittle pattern of identifying and blocking known threats via guardrails. This paper introduces Trust The Typical (T3), a framework that reframes safety as an out-of-distribution detection problem, learning the distribution of acceptable prompts in a semantic space and flagging significant deviations as potential threats. Unlike prior methods, T3 requires no training on harmful examples yet achieves state-of-the-art performance across 18 benchmarks spanning toxicity, jailbreaking, multilingual harms, and over-refusal—reducing false positive rates by up to 40× relative to specialized safety models. A single model trained on safe English text transfers effectively to over 14 languages without retraining.
Unpacking Generative AI for B2B Sales: Definitional Perspectives, Multidimensional Framework, and Sales Roles
2026This paper develops a theory-driven framework (AGA typology) to bridge the gap between AI conceptualizations and real-world deployment in B2B sales. Through systematic coding of 45 state-of-the-art applications based on diverse developer guides, the study provides a multidimensional perspective on how generative AI reshapes sales roles. Published in the Journal of Personal Selling and Sales Management (special issue, 14% acceptance rate).
Using AI to Increase Efficiency of Multilingual Test Materials: Spanish BEL Sentences
2026This work-in-progress explores how AI can improve the efficiency of creating multilingual auditory test materials, with a focus on Spanish BEL sentences. The project investigates workflow acceleration and quality support for multilingual assessment design. It sits at the intersection of language technology, hearing research, and educational test development. The aim is to reduce manual burden while preserving the validity of test materials.
2025
100-LongBench: Are de facto Long-Context Benchmarks Literally Evaluating Long-Context Ability?
2025Existing long-context evaluation benchmarks fail to separate long-context performance from a model's baseline ability, making cross-model comparisons unclear, and are typically constructed with fixed input lengths that limit applicability across models with different context windows. This paper introduces 100-LongBench, a length-controllable long-context benchmark with a novel metric that disentangles baseline knowledge from true long-context capability across multiple task categories. Experiments demonstrate that existing benchmarks significantly conflate baseline model strength with genuine long-context ability, revealing a widespread evaluation gap.
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float
2025Large-scale AI models have grown rapidly in size, creating significant challenges for deployment on resource-constrained hardware. This paper introduces Dynamic-Length Float (DFloat11), a lossless compression framework that reduces LLM size by 30% while preserving outputs that are bit-for-bit identical to the original model, exploiting the low entropy in BFloat16 weight representations through entropy coding and dynamic-length encodings. A custom GPU kernel enables fast online decompression, and experiments on Llama 3.3, Qwen 3, and Mistral 3 validate 30% size reduction with 2.3–46.2× higher throughput than CPU offloading—notably enabling lossless inference of Llama 3.1 405B on a single 8×80GB GPU node.
BARD-GS: Blur-Aware Reconstruction of Dynamic Scenes via Gaussian Splatting
20253D Gaussian Splatting (3DGS) has shown remarkable potential for static scene reconstruction, and recent advancements have extended its application to dynamic scenes. However, the quality of reconstructions depends heavily on high-quality input images and precise camera poses, which is not that trivial to fulfill in the real-world scenarios. Capturing dynamic scenes with handheld monocular cameras, for instance, typically involves simultaneous movement of both the camera and objects within a single exposure. This combined motion frequently results in image blur that existing methods cannot adequately handle. To address these challenges, we introduce BARD-GS, a novel approach for robust dynamic scene reconstruction that effectively handles blurry inputs and imprecise camera poses. BARD-GS comprises two main components: 1) camera motion deblurring and 2) object motion deblurring. By explicitly decomposing motion blur into camera motion blur and object motion blur and modeling them separately, we achieve significantly improved rendering results in dynamic regions. In addition, we collect a real-world motion blur dataset of dynamic scenes to evaluate our approach. Extensive experiments demonstrate that BARD-GS effectively reconstructs high-quality dynamic scenes under realistic conditions, significantly outperforming existing methods.
Balancing Fidelity and Diversity: Synthetic data could stand on the shoulder of the real in visual recognition
2025With the rapid progress of generative models, synthetic data has become a common solution to data scarcity in AI. However, is using it directly without curation ideal for visual recognition? We systematically study how data fidelity and diversity affect recognition performance and show that balancing these factors significantly improves results through a training-free curation pipeline.
CAUSAL3D: A Comprehensive Benchmark for Causal Learning from Visual Data
2025True intelligence relies on understanding hidden causal relations, yet current AI and vision models lack benchmarks to assess this ability. We introduce Causal3D, a comprehensive 19-dataset benchmark linking structured and visual data to evaluate causal reasoning, revealing that performance drops sharply as causal complexity increases.
CausalRAG: Integrating Causal Graphs into Retrieval-Augmented Generation
2025Traditional RAG systems face critical limitations including disrupted contextual integrity from text chunking and over-reliance on semantic similarity for retrieval. This paper proposes CausalRAG, a novel framework that incorporates causal graphs into the retrieval process, constructing and tracing cause-effect relationships to preserve contextual continuity and improve retrieval precision. Evaluated against regular RAG and graph-based RAG approaches across multiple metrics including answer faithfulness and context precision, CausalRAG demonstrates consistent superiority, showing that causal grounding is a promising direction for knowledge-intensive tasks.
Efficient Fine-Grained GPU Performance Modeling for Distributed Deep Learning of LLM
2025Training large language models is one of the most compute-intensive tasks in HPC, and predicting end-to-end training time for multi-billion parameter models across hundreds of GPUs is challenging due to complex interactions between transformer components, parallelism strategies, and multi-tier communication. This paper addresses this by decomposing LLMs into core computational primitives and modeling them with operator-level decomposition, lightweight hardware-aware prediction models for key operations, and an end-to-end prediction system integrating these across complex parallelization strategies. The resulting framework enables accurate distributed LLM training performance prediction without costly full-scale sampling.
FairSense: Long-Term Fairness Analysis of ML-Enabled Systems
2025Algorithmic fairness of machine learning (ML) models has raised significant concern in the recent years. Many testing, verification, and bias mitigation techniques have been proposed to identify and reduce fairness issues in ML models. The existing methods are model-centric and designed to detect fairness issues under static settings. However, many ML-enabled systems operate in a dynamic environment where the predictive decisions made by the system impact the environment, which in turn affects future decision-making. Such a self- reinforcing feedback loop can cause fairness violations in the long term, even if the immediate outcomes are fair. In this paper, we propose a simulation- based framework called FairSense to detect and analyze long-term unfairness in ML-enabled systems. Given a fairness requirement, FairSense performs Monte- Carlo simulation to enumerate evolution traces for each system configuration. Then, FairSense performs sensitivity analysis on the space of possible configurations to understand the impact of design options and environmental factors on the long-term fairness of the system. We demonstrate FairSense's potential utility through three real-world case studies: Loan lending, opioids risk scoring, and predictive policing.
Fix False Transparency by Noise Guided Splatting
2025Opaque objects reconstructed by 3D Gaussian Splatting (3DGS) often exhibit a falsely transparent surface, leading to inconsistent background and internal patterns under camera motion in interactive viewing. This issue stems from the ill-posed optimization in 3DGS. During training, background and foreground Gaussians are blended via α-compositing and optimized solely against the input RGB images using a photometric loss. As this process lacks an explicit constraint on surface opacity, the optimization may incorrectly assign transparency to opaque regions, resulting in view-inconsistent and falsely transparent output. This issue is difficult to detect in standard evaluation settings (i.e., rendering static images) but becomes particularly evident in object-centric reconstructions under interactive viewing. Although other causes of view-inconsistency (e.g., popping artifacts) have been explored recently, false transparency has not been explicitly identified. To the best of our knowledge, we are the first to quantify, characterize, and develop solutions for this "false transparency" artifact, an underreported artifact in 3DGS. Our strategy, Noise Guided Splatting (NGS), encourages surface Gaussians to adopt higher opacity by injecting opaque noise Gaussians in the object volume during training, requiring only minimal modifications to the existing splatting process. To quantitatively evaluate false transparency in static renderings, we propose a transmittance-based metric that measures the severity of this artifact. In addition, we introduce a customized, high-quality object-centric scan dataset exhibiting pronounced transparency issues, and we augment popular existing datasets (e.g., DTU) with complementary infill noise specifically designed to assess the robustness of 3D reconstruction methods to false transparency. Experiments across multiple datasets show that NGS substantially reduces false transparency while maintaining competitive performance on standard rendering metrics (e.g., PSNR), demonstrating its overall effectiveness.
Flexible Group Count Enables Hassle-Free Structured Pruning
2025Densely structured pruning methods maintain pruned models in a fully dense format, allowing immediate compression benefits, but existing grouped kernel pruning approaches introduce dynamic operations that add complications or impose limitations such as requiring expensive clustering schemes or custom architecture support. This paper argues that making Conv2d group count flexible under an integral optimization is the best practice for grouped kernel pruning, leveraging its ideal alignment with grouped convolution infrastructure. The resulting one-shot, post-train, data-agnostic method is more performant, adaptive, and user-friendly than its predecessors, requiring little to no hyperparameter tuning or handcrafted criteria.
Forte: Finding Outliers with Representation Typicality Estimation
2025Generative models can now produce photorealistic synthetic data virtually indistinguishable from real training data, challenging OOD detectors that rely on generative model likelihoods due to likelihood misestimation and typicality issues. This paper introduces Forte, which hypothesizes that estimating typical sets using self-supervised learners leads to better OOD detection, using representation learning and informative summary statistics based on manifold estimation to address these issues. Forte outperforms other unsupervised approaches and achieves state-of-the-art performance on established challenging benchmarks as well as new synthetic data detection tasks, requiring no class labels.
Grammars of Formal Uncertainty: When to Trust LLMs in Automated Reasoning Tasks
2025Large language models show remarkable promise for automated reasoning by generating formal specifications, but a fundamental tension exists between their probabilistic nature and the deterministic guarantees required by formal verification. This paper comprehensively investigates failure modes and uncertainty quantification in LLM-generated formal artifacts, revealing that SMT-based autoformalization has highly domain-specific accuracy impacts ranging from +34.8% on logical tasks to −44.5% on factual ones. A probabilistic context-free grammar (PCFG) framework is introduced to model LLM outputs and yield a refined uncertainty taxonomy, finding that uncertainty signals are task-dependent—for example, grammar entropy for logic achieves AUROC > 0.93.
HOPPS: Hardware-Aware Optimal Phase Polynomial Synthesis with Blockwise Optimization for Quantum Circuits
2025Blocks composed of CNOT and Rz gates are ubiquitous in modern quantum applications such as QAOA ansatzes and quantum adders, but after compilation they often exhibit large CNOT counts or depths that lower fidelity. This paper introduces HOPPS, a SAT-based hardware-aware optimal phase polynomial synthesis algorithm that generates CNOT/Rz blocks with CNOT count or depth optimality under hardware topology constraints. To address scalability for large circuits, an iterative blockwise optimization strategy partitions large circuits into smaller blocks and optimally refines each—achieving CNOT count reductions up to 50% and depth reductions up to 57.1% when used as a peephole optimizer.
Integrating self-configuring and foundational deep learning segmentation models for identifying the anal sphincter complex and perianal fistulae on pelvic MRI
2025This paper introduces an automated pelvic MRI segmentation pipeline that combines nnU-Net with MedSAM for identifying perianal fistulae and the anal sphincter complex. The approach leverages self-configuring and foundation- model segmentation components to improve robustness on a difficult clinical anatomy problem. It is designed to support interventional guidance and surgical planning in Crohn's disease. The work demonstrates how task-specific and foundation-model methods can be integrated for clinically useful MRI analysis.
K4: Online Log Anomaly Detection via Unsupervised Typicality Learning
2025Existing log anomaly detection methods are often slow, dependent on error-prone parsing, and use unrealistic evaluation protocols. This paper introduces K4 (Knowing the Unknown by Knowing only the Known), a fully unsupervised, parser-independent framework that transforms arbitrary log embeddings into compact four-dimensional descriptors—Precision, Recall, Density, Coverage—using efficient k-nearest neighbor statistics. Under a realistic online chunk-based evaluation protocol, K4 achieves state-of-the-art AUROC of 0.995–0.999 across HDFS, BGL, and Thunderbird datasets, with training under 4 seconds and inference as low as 4 μs.
LABELING COPILOT: A Deep Research Agent for Automated Data Curation in Computer Vision
2025Curating high-quality, domain-specific datasets is a major bottleneck for deploying robust vision systems. This paper introduces Labeling Copilot, the first data curation deep research agent for computer vision, powered by a large multimodal language model that uses multi-step reasoning to execute specialized tools across three core capabilities: Calibrated Discovery for sourcing in-distribution data from large repositories, Controllable Synthesis for generating rare-scenario data with robust filtering, and Consensus Annotation for producing accurate labels via a novel multi-model consensus mechanism. On the dense COCO dataset, the Consensus Annotation module achieves an annotation mAP of 37.1%, and on Open Images it discovers 903 new bounding box categories.
LoRATK: LoRA Once, Backdoor Everywhere in the Share-and-Play Ecosystem
2025Fine-tuning LLMs with LoRA has created a convenient share-and-play ecosystem where users download community-shared adapters to enhance base models, but this also introduces a new attack surface for distributing malicious LoRAs. This paper demonstrates that a backdoor LoRA can be trained once and then seamlessly merged in a training-free fashion with multiple task-enhancing LoRAs, retaining both malicious behavior and legitimate downstream capabilities. Such merged LoRAs are particularly dangerous because malicious intent is concealed behind improved downstream performance, creating strong incentive for voluntary adoption, and no safety measures exist to intervene during local deployment.
Longer Context, Deeper Thinking: Uncovering the Role of Long-Context Ability in Reasoning
2025Recent language models exhibit strong reasoning capabilities, yet the influence of long-context capacity on reasoning remains underexplored. This paper hypothesizes that current reasoning limitations stem partly from insufficient long-context capacity, motivated by observations that higher context window lengths correlate with stronger reasoning performance and that failed reasoning cases resemble failed long-context cases. Controlled experiments comparing architecturally identical models with varying long-context capacities confirm that enhancing long-context ability before supervised fine-tuning leads to improved reasoning, advocating for long-context capacity as a first-class design objective.
MQuAKE-Remastered: Multi-Hop Knowledge Editing Can Only Be Advanced with Reliable Evaluations
2025Multi-hop knowledge editing in LLMs has been evaluated using benchmarks with unreliable protocols that conflate editing success with benchmark artifacts, producing misleading results. This paper presents MQuAKE-Remastered, which corrects systematic flaws in prior multi-hop knowledge editing assessments and demonstrates that reliable evaluation methodology is largely absent—and essential—for advancing the field. Accepted as a Spotlight at ICLR 2025, the work shows that many reported gains in multi-hop editing do not hold under rigorous evaluation, calling for a reset of evaluation standards.
Masked-speech recognition using human and synthetic cloned speech
2025This study evaluates the intelligibility and human-likeness of AI-generated voice clones compared to human speech. Using transformer-based language models, the research demonstrates that synthetic speech can achieve similar recognition scores and perceptual similarity to original human talkers, even in noisy environments. The findings suggest that voice synthesis and automatic speech recognition (ASR) are promising tools for evaluating speech recognition in both clinical audiology and hearing research.
Novel Adaptation of Video Segmentation to 3D MRI: Efficient Zero-Shot Knee Segmentation with SAM2
2025Medical image segmentation methods face the challenge of domain transfer, where performance degrades due to distribution shifts between source and target domains. This paper adapts SAM2, a general-purpose video segmentation model, for zero-shot single-prompt 3D knee MRI segmentation by treating volumetric slices as individual video frames and leveraging SAM2's memory mechanism to generate motion- and spatially-aware predictions across the volume. Experiments on the OAI-ZIB dataset demonstrate a Dice similarity coefficient of 0.9643 on tibia using only a single prompt and no task-specific training or fine-tuning.
QuFlex: Parallel Quantum Job Scheduling Using Adaptive Circuit-Cutting
2025Parallel quantum job scheduling across multiple QPUs is critical for maximizing throughput in heterogeneous quantum computing environments. QuFlex introduces an adaptive circuit-cutting approach that dynamically partitions quantum circuits based on available QPU resources, enabling efficient parallel scheduling across heterogeneous quantum hardware. The framework demonstrates improved QPU utilization and reduced job completion times compared to static partitioning approaches.
Segment then Splat: Unified 3D Open-Vocabulary Segmentation via Gaussian Splatting
2025Open-vocabulary querying in 3D space is crucial for enabling more intelligent perception in applications such as robotics, autonomous systems, and augmented reality. However, most existing methods rely on 2D pixel-level parsing, leading to multi-view inconsistencies and poor 3D object retrieval. Moreover, they are limited to static scenes and struggle with dynamic scenes due to the complexities of motion modeling. In this paper, we propose Segment-then-Splat, a 3D-aware open vocabulary segmentation approach for both static and dynamic scenes based on Gaussian Splatting. Segment-then-Splat reverses the long established approach of segmentation after reconstruction by dividing Gaussians into distinct object sets before reconstruction. Once the reconstruction is complete, the scene is naturally segmented into individual objects, achieving true 3D segmentation. This approach not only eliminates Gaussian-object misalignment issues in dynamic scenes but also accelerates the optimization process, as it eliminates the need for learning a separate language field. After optimization, a CLIP embedding is assigned to each object to enable open-vocabulary querying. Extensive experiments on various datasets demonstrate the effectiveness of our proposed method in both static and dynamic scenarios.
Spatial Intelligence in Vision-Language Models: A Comprehensive Survey
2025Vision-Language Models (VLMs) have achieved great success but still lack spatial intelligence, and this survey provides the first unified overview of recent advances, taxonomies, and evaluations toward building spatially intelligent AI.
Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time
2025Recent advances in post-training enhance model reasoning but require costly training pipelines and produce inefficient, overly lengthy outputs. This paper introduces Speculative Thinking, a training-free framework enabling large reasoning models to guide smaller ones during inference at the reasoning level—distinct from token-level speculative decoding—by identifying structural cues such as paragraph breaks followed by reflective phrases where small models struggle and delegating those steps to a larger model. The method significantly boosts smaller model reasoning accuracy while shortening output length, offering an efficient inference-time paradigm that preserves the small model's compute efficiency.
2024
Achieving Low Latency Inference on High Resolution Images by Exploiting Sparsity in Vision Transformers
2024This paper presents a tile-aware sparse attention scheduling framework for improving the efficiency of structured sparse vision transformers on GPUs. The method represents attention masks as adjacency matrices, applies structure-aware reordering to expose dense computation blocks, and uses offline profiling with Integer Linear Programming (ILP) to select optimal tile shapes under hardware constraints. Integrated into models such as Vision Longformer, RegionViT, and DynamicViT, the framework achieves up to 2.1× end-to-end latency speedup over fixed-tile FlashAttention. The results show that aligning sparse attention computation with both sparsity structure and GPU characteristics can substantially improve inference efficiency.
An Automated Approach for Improving the Inference Latency and Energy Efficiency of Pretrained CNNs by Removing Irrelevant Pixels with Focused Convolutions
2024Computer vision CNNs achieve high accuracy but face ever-increasing energy and computation requirements, and making them more energy-efficient typically requires costly retraining. This paper proposes an automated method to improve the inference latency and energy efficiency of pretrained CNNs without retraining, by inserting a threshold layer that identifies irrelevant image regions and replacing subsequent convolutional layers with focused convolutions that ignore those regions entirely. The approach saves inference latency by up to 25% and energy costs by up to 22% on popular pretrained CNNs including ResNet, VGG, and ConvNeXt, with little to no accuracy loss.
Are Prompt Engineering and TODO Comments Friends or Foes? An Evaluation on GitHub Copilot
2024Code intelligence tools such as GitHub Copilot have begun to bridge the gap between natural language and programming language. A frequent software development task is the management of technical debts, which are suboptimal solutions or unaddressed issues which hinder future software development. Developers have been found to “self-admit” technical debts (SATD) in software artifacts such as source code comments. Thus, is it possible that the information present in these comments can enhance code generative prompts to repay the described SATD? Or, does the inclusion of such comments instead cause code generative tools to reproduce the harmful symptoms of described technical debt? Does the modification of SATD impact this reaction? Despite the heavy maintenance costs caused by technical debt and the recent improvements of code intelligence tools, no prior works have sought to incorporate SATD towards prompt engineering. Inspired by this, this paper contributes and analyzes a dataset consisting of 36,381 TODO comments in the latest available revisions of their respective 102,424 repositories, from which we sample and manually generate 1,140 code bodies using GitHub Copilot. Our experiments show that GitHub Copilot can generate code with the symptoms of SATD, both prompted and unprompted. Moreover, we demonstrate the tool's ability to automatically repay SATD under different circumstances and qualitatively investigate the characteristics of successful and unsuccessful comments. Finally, we discuss gaps in which GitHub Copilot's successors and future researchers can improve upon code intelligence tasks to facilitate AI- assisted software maintenance.
Creating Intelligent Cyberinfrastructure for Democratizing AI
2024This paper provides an overview of the NSF-funded ICICLE AI Institute, which aims to fundamentally advance 'edge-to-center' AI-as-a-Service. By developing intelligent cyberinfrastructure (CI) that spans the edge-cloud-HPC computing continuum, the project seeks to enable plug-and-play AI that is accessible to a wider population. The work highlights high-impact applications in animal ecology, digital agriculture, and smart foodsheds as primary drivers for democratizing next-generation AI.
Deep Learning Based Risk Stratification of Pre-operative CT Scans is Prognostic of Overall Survival in Kidney Cancers
2024This abstract reports a deep learning model that uses pre-operative CT scans to predict overall survival in kidney cancer. The model improves pre-operative risk assessment and offers prognostic value beyond standard clinicopathological factors. It represents an early step toward multi- institutional imaging biomarkers for survival-based treatment planning. The work supports broader use of CT-derived representations for oncologic prognostication.
Efficient Circuit Wire Cutting Based on Commuting Groups
2024Current quantum devices face challenges with large circuits due to increasing error rates as circuit size and qubit count grow. Inspired by ancilla-assisted quantum process tomography and MUBs-based grouping for simultaneous measurement, this paper proposes a new circuit wire cutting approach that uses ancillary qubits to transform quantum input initializations into quantum output measurements, allowing multiple measurements to be grouped and executed simultaneously. The technique significantly reduces subcircuit execution overhead and classical reconstruction complexity compared to standard wire cutting.
Exploring Algorithmic Design Choices for Low Latency CNN Deployment
2024This paper investigates algorithmic design choices for reducing latency in CNN deployment across diverse hardware platforms. Five convolution algorithms are implemented using SYCL and integrated into VGG16, ResNet101, and InceptionV4 by replacing the standard PyTorch Conv2d operator. Their performance is evaluated at both the layer and model level on GPUs against PyTorch and Intel PyTorch Extension baselines. Results show significant execution-time improvements, demonstrating the effectiveness of algorithm-level optimization for low-latency CNN inference.
Federated Image Quality Assessment of Prostate MRI Scans in a Multi-institutional Setting
2024This work addresses image-quality variability in prostate MRI across multiple institutions using a federated analysis setting. It studies how artifact- related quality differences can affect the reliability and portability of downstream machine learning models. The abstract highlights the importance of multi-institutional quality assessment before model development and deployment. It contributes to more reliable imaging AI in federated and heterogeneous clinical environments.
GNNs Also Deserve Editing, and They Need It More Than Once
2024Model editing—updating specific factual knowledge—has been extensively studied for LLMs but has received little attention for graph neural networks, which present unique challenges due to their relational structure. This paper extends model editing to GNNs, showing that they require iterative multi-round editing to maintain accuracy after knowledge updates, unlike LLMs where single-pass editing is often sufficient. The work proposes efficient multi-round GNN editing methods and demonstrates that both graph structure and node attributes must be carefully managed across editing rounds to prevent knowledge degradation.
Image Color Recognition and Recommendation Method and Device
2024A patent describing a method and device for image color recognition and recommendation, applying computer vision techniques for automated color analysis.
Image Processing Method and Apparatus
2024A patent describing an image processing method and apparatus for automated visual data analysis and transformation.
Intra-and Peri-tumoral Radiomic Features are Predictive of Pathologic Response to Multiple Neoadjuvant Therapy Regimen in Rectal Cancers via Pre-treatment MRI
2024This study analyzes intra-tumoral and peri-tumoral radiomic features from pretreatment MRI to predict pathologic response in rectal cancer. It evaluates whether quantitative imaging phenotypes can identify responders across multiple neoadjuvant treatment regimens. The work aims to improve patient stratification beyond traditional staging and biomarker approaches. It contributes to noninvasive response prediction in rectal cancer management.
KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches
2024Long-context capability is critical for LLMs, but transformer architectures face significant challenges due to growing KV cache size and the complexity of attending to extended inputs. This paper provides a comprehensive taxonomy and benchmark evaluation of 10+ state-of-the-art approaches across seven long-context task categories—including KV cache quantization, token dropping, prompt compression, linear-time sequence models, and hybrid architectures—evaluated in a unified, aligned environment. The work reveals numerous previously unknown phenomena and offers a practical workbench and insights for the future development of long-context-capable LLMs.
Knowledge Graphs Can be Learned with Just Intersection Features
2024Knowledge graph completion can be framed as link prediction where structural information is key, but quantifying this structural information poses a challenge. This paper demonstrates that the intersection among k-hop neighborhoods of the head, relation, and tail is the critical structural signal for valid triple prediction, and proposes a novel randomized algorithm to efficiently generate these intersection features. A straightforward fully-connected network leveraging these features outperforms established KG embedding models and graph neural network baselines, while also achieving substantial training time efficiency gains.
Materials Data Science Using CRADLE: A Distributed, Data-Centric Approach
2024This paper introduces CRADLE, a distributed framework designed to support data-centric AI and materials data science at scale. By integrating heterogeneous data management with elastic scaling, CRADLE addresses the challenges of massive datasets generated by modern experiments and simulations. The study demonstrates the framework's capabilities through five applications, including phase identification in X-ray diffraction and defect segmentation in computed tomography, emphasizing scalable and reproducible scientific insights.
Optimizing Deployment of Unstructured Group Convolutions for Low Latency Inference
2024This paper presents an optimization framework for deploying unstructured group convolutions efficiently on GPUs. The method combines Knapsack-based partitioning, Integer Linear Programming (ILP), and matrix reordering strategies to improve load balancing and data reuse for irregular input-output channel connections. Evaluated on ShuffleNet and CondenseNet, the framework achieves up to 1.9× speedup over PyTorch, while reordering-enhanced ILP provides an additional 1.3× improvement. The results highlight the importance of hardware-aware scheduling for accelerating irregular CNN workloads.
Phase Identification in Synchrotron X-ray Diffraction Patterns of Ti-6Al-4V Using Computer Vision and Deep Learning
2024This research utilizes convolutional neural networks (CNNs) to automate the phase identification of titanium alloys from synchrotron X-ray diffraction (XRD) patterns. By treating XRD patterns as one-dimensional images, the deep learning model achieves high accuracy in distinguishing between alpha and beta phases, significantly reducing the time required for manual analysis in materials characterization.
Privacy-Preserving Collaborative Genomic Research: A Real-Life Deployment and Vision
2024The genomic domain stands to benefit greatly from advances in AI and data science, but increasing privacy and cybersecurity concerns necessitate robust solutions for sensitive collaborative research. This paper presents a practical deployment of a privacy-preserving framework for genomic research developed in collaboration with Lynx.MD, a secure health data collaboration platform, addressing challenges of enabling joint analysis of genomic data while mitigating data breach risks. The framework demonstrates scalable, privacy-preserving data sharing and analysis that maintains utility while satisfying rigorous security requirements in a real production environment.
QGroup: Parallel Quantum Job Scheduling Using Dynamic Programming
2024Scheduling quantum circuits across multiple QPUs requires efficient algorithms that minimize idle time while respecting hardware constraints. QGroup uses dynamic programming to optimally group and schedule quantum circuits across multiple QPUs, maximizing throughput and minimizing idle time through principled combinatorial optimization. Evaluated on realistic quantum workloads, QGroup achieves improved scheduling efficiency compared to greedy and heuristic-based baseline approaches.
Radiomics to Detect Inflammation and Fibrosis on Magnetic Resonance Enterography in Stricturing Crohn’s Disease
2024This study develops radiomics-based machine learning models to characterize inflammation and fibrosis in Crohn's disease strictures from magnetic resonance enterography. The models improve diagnostic discrimination relative to radiologist visual scoring alone and show additional value when combined with expert assessment. The work addresses an important unmet need in noninvasive characterization of stricturing disease. It supports more quantitative and reproducible imaging-based assessment in inflammatory bowel disease.
Recommendation Method, Device, and Electronic Apparatus Based on Multimodal Features
2024A patent describing a recommendation method and device based on multimodal features, utilizing AI techniques for enhanced recommendation systems in commercial applications.
Spatial attention wavelon network (SpAWN) for survival-based risk stratification in kidney cancers via CT
2024SpAWN introduces a survival-risk stratification model for kidney cancer CT that combines spatial attention with wavelon activations. The design aims to improve interpretability and cross-cohort generalization for imaging-based survival prediction. The paper demonstrates that architectural choices tailored to spatial context can strengthen risk modeling from pre-operative scans. It contributes to clinically relevant prognostic modeling in oncologic imaging.
Taylor Unswift: Secured Weight Release for Large Language Models via Taylor Expansion
2024Releasing LLM weights poses a dilemma: open-sourcing compromises ownership while closed APIs raise data privacy concerns. This paper introduces TaylorMLP, which protects LLM ownership by transforming weights into Taylor-series parameters that can be released instead of original weights, and prevents unauthorized use by inducing low-speed token generation through increasing the number of Taylor-series terms. Empirical experiments across five datasets and three LLM architectures demonstrate TaylorMLP induces over 4× latency increase while producing tokens precisely matched with original models, effectively defending against weight reconstruction from downstream datasets.
Unsupervised Segmentation of Knee Bone Marrow Edema-like Lesions Using Conditional Generative Models
2024This study proposes a novel unsupervised method for the fully automated segmentation of Bone Marrow Edema-like Lesions (BMEL) in knee MRI. By leveraging conditional diffusion models and anomaly detection, the approach eliminates the need for labor-intensive and bias-prone manual annotations. The research sets new benchmarks for BMEL segmentation performance and provides a more reliable, quantitative tool for early diagnosis and prognosis of knee osteoarthritis.
View-Consistent Object Removal in Radiance Fields
2024Radiance Fields (RFs) have emerged as a crucial technology for 3D scene representation, enabling the synthesis of novel views with remarkable realism. However, as RFs become more widely used, the need for effective editing techniques that maintain coherence across different perspectives becomes evident. Current methods primarily depend on per-frame 2D image inpainting, which often fails to maintain consistency across views, thus compromising the realism of edited RF scenes. In this work, we introduce a novel RF editing pipeline that significantly enhances consistency by requiring the inpainting of only a single reference image. This image is then projected across multiple views using a depth-based approach, effectively reducing the inconsistencies observed with per-frame inpainting. However, projections typically assume photometric consistency across views, which is often impractical in real-world settings. To accommodate realistic variations in lighting and viewpoint, our pipeline adjusts the appearance of the projected views by generating multiple directional variants of the inpainted image, thereby adapting to different photometric conditions. Additionally, we present an effective and robust multi-view object segmentation approach as a valuable byproduct of our pipeline. Extensive experiments demonstrate that our method significantly surpasses existing frameworks in maintaining content consistency across views and enhancing visual quality.
Visual Concept Networks: A Graph-Based Approach to Detecting Anomalous Data in Deep Neural Networks
2024Deep neural networks struggle with robustness against anomalous and out-of-distribution data, and current OOD benchmarks often oversimplify by focusing on single-object tasks. This paper introduces Visual Concept Networks, a graph-based method that converts images into networks of interconnected human-understandable visual concepts and uses topological features to detect both far-OOD and near-OOD data. Extensive testing on two novel complex real-world tasks with ablation studies using large vocabularies demonstrates the method's effectiveness for detecting anomalous data in DNNs.
2023
Accelerating Time to Science using CRADLE: A Framework for Materials Data Science
2023Accelerating materials data science requires scalable frameworks that can manage heterogeneous data and computation across distributed systems. This paper presents CRADLE, a distributed data-centric framework for materials data science workflows that integrates data management, computation, and analysis pipelines to significantly reduce time-to-science. Demonstrated on the 30th IEEE HiPC system, CRADLE shows substantial throughput improvements and workflow simplification for materials characterization and discovery tasks in HPC environments.
Accelerating VQE Algorithms via Parameters and Measurement Reuse
2023Variational Quantum Eigensolver algorithms require many quantum circuit executions to converge, creating significant overhead on current quantum hardware. This paper accelerates VQE by reusing parameters and measurement results across iterations, reducing the number of quantum circuit executions required for convergence without sacrificing solution quality. The approach is validated on standard molecular simulation benchmarks, demonstrating meaningful reduction in quantum resource requirements.
Fairify: Fairness Verification of Neural Networks
2023Fairness of machine learning (ML) software has become a major concern in the recent past. Although recent research on testing and improving fairness have demonstrated impact on real-world software, providing fairness guarantee in practice is still lacking. Certification of ML models is challenging because of the complex decision-making process of the models. In this paper, we proposed Fairify, an SMT-based approach to verify individual fairness property in neural network (NN) models. Individual fairness ensures that any two similar individuals get similar treatment irrespective of their protected attributes e.g., race, sex, age. Verifying this fairness property is hard because of the global checking and non-linear computation nodes in NN. We proposed sound approach to make individual fairness verification tractable for the developers. The key idea is that many neurons in the NN always remain inactive when a smaller part of the input domain is considered. So, Fairify leverages white-box access to the models in production and then apply formal analysis based pruning. Our approach adopts input partitioning and then prunes the NN for each partition to provide fairness certification or counterexample. We leveraged interval arithmetic and activation heuristic of the neurons to perform the pruning as necessary. We evaluated Fairify on 25 real-world neural networks collected from four different sources, and demonstrated the effectiveness, scalability and performance over baseline and closely related work. Fairify is also configurable based on the domain and size of the NN. Our novel formulation of the problem can answer targeted verification queries with relaxations and counterexamples, which have practical implications.
Fix Fairness, Don't Ruin Accuracy: Performance Aware Fairness Repair using AutoML
2023Machine learning (ML) is increasingly being used in critical decision-making software, but incidents have raised questions about the fairness of ML predictions. To address this issue, new tools and methods are needed to mitigate bias in ML-based software. Previous studies have proposed bias mitigation algorithms that only work in specific situations and often result in a loss of accuracy. Our proposed solution is a novel approach that utilizes automated machine learning (AutoML) techniques to mitigate bias. Our approach includes two key innovations: a novel optimization function and a fairness- aware search space. By improving the default optimization function of AutoML and incorporating fairness objectives, we are able to mitigate bias with little to no loss of accuracy. Additionally, we propose a fairness-aware search space pruning method for AutoML to reduce computational cost and repair time. Our approach, built on the state-of-the-art Auto-Sklearn tool, is designed to reduce bias in real-world scenarios. In order to demonstrate the effectiveness of our approach, we evaluated our approach on four fairness problems and 16 different ML models, and our results show a significant improvement over the baseline and existing bias mitigation techniques. Our approach, Fair-AutoML, successfully repaired 60 out of 64 buggy cases, while existing bias mitigation techniques only repaired up to 44 out of 64 cases.
One Less Reason for Filter Pruning: Gaining Free Adversarial Robustness with Structured Grouped Kernel Pruning
2023Structured pruning has traditionally been viewed as trading accuracy for efficiency, often assumed to come at the expense of adversarial robustness. This paper reveals that structured grouped kernel pruning inherently confers adversarial robustness as a byproduct—without any adversarial training—showing that pruning and robustness are not competing objectives but complementary ones. By demonstrating one less reason to avoid filter pruning, the work shows practitioners can gain free adversarial robustness simply by adopting structured grouped kernel pruning as their compression strategy.
Online Detection of Golden Circuit Cutting Points
2023Quantum circuit cutting enables large circuits to run on small quantum devices, but reconstructing measurement statistics requires computational resources that grow exponentially with the number of cuts. This paper introduces the concept of a golden cutting point—circuit structures that induce negligible basis components during reconstruction, allowing those downstream computations to be avoided entirely. A hypothesis-testing scheme is proposed for online detection of golden cutting points, with robustness results for low-probability test failures, and demonstrated applicability on Qiskit's Aer simulator achieving reduced wall time from identifying and avoiding obsolete measurements.
Towards Safe ML-Based Systems in Presence of Feedback Loops
2023Machine learning (ML) based software is increasingly being deployed in a myriad of socio-technical systems, such as drug monitoring, loan lending, and predictive policing. Although not commonly considered safety-critical, these systems have a potential to cause serious, long-lasting harm to users and the environment due to their close proximity and effect on the society. One type of emerging problem in these systems is unintended side effects from a feedback loop; the decision of ML-based system induces certain changes in the environment, which, in turn, generates observations that are fed back into the system for further decision-making. When this cyclic interaction between the system and the environment repeats over time, its effect may be amplified and ultimately result in an undesirable outcome. In this position paper, we bring attention to the safety risks that are introduced by feedback loops in ML- based systems, and the challenges of identifying and addressing them. In particular, due to their gradual and long-term impact, we argue that feedback loops are difficult to detect and diagnose using existing techniques in software engineering. We propose a set of research problems in modeling, analyzing, and testing ML-based systems to identify, monitor, and mitigate the effects of an undesirable feedback loop.
Towards Understanding Fairness and its Composition in Ensemble Machine Learning
2023Machine Learning (ML) software has been widely adopted in modern society, with reported fairness implications for minority groups based on race, sex, age, etc. Many recent works have proposed methods to measure and mitigate algorithmic bias in ML models. The existing approaches focus on single classifier-based ML models. However, real-world ML models are often composed of multiple independent or dependent learners in an ensemble (e.g., Random Forest), where the fairness composes in a non-trivial way. How does fairness compose in ensembles? What are the fairness impacts of the learners on the ultimate fairness of the ensemble? Can fair learners result in an unfair ensemble? Furthermore, studies have shown that hyperparameters influence the fairness of ML models. Ensemble hyperparameters are more complex since they affect how learners are combined in different categories of ensembles. Understanding the impact of ensemble hyperparameters on fairness will help programmers design fair ensembles. Today, we do not understand these fully for different ensemble algorithms. In this paper, we comprehensively study popular real-world ensembles: bagging, boosting, stacking and voting. We have developed a benchmark of 168 ensemble models collected from Kaggle on four popular fairness datasets. We use existing fairness metrics to understand the composition of fairness. Our results show that ensembles can be designed to be fairer without using mitigation techniques. We also identify the interplay between fairness composition and data characteristics to guide fair ensemble design. Finally, our benchmark can be leveraged for further research on fair ensembles. To the best of our knowledge, this is one of the first and largest studies on fairness composition in ensembles yet presented in the literature.
Transforming temporal-dynamic graphs into time-series data for solving event detection problems
2023This paper proposes a workflow for detecting important events in temporal-dynamic graphs by transforming graph snapshots into multivariate time-series data. The method first generates graph-level embeddings for each time step using temporal graph representation learning, and then applies unsupervised time-series anomaly detection models to identify abnormal events. The approach was evaluated on multiple real-world social media datasets and showed competitive or improved performance compared to prior event detection methods. The work demonstrates that graph embeddings can serve as an effective bridge between dynamic graph analysis and time-series anomaly detection.
Virtual Reality as an Acute Pain Reliever During Laceration Repair in Emergency Departments: A Randomized Controlled Trial
2023This randomized controlled trial investigates whether virtual reality can reduce acute pain during laceration repair in emergency departments. Adult patients undergoing repair were studied in a real clinical setting to assess pain relief during the procedure. The work explores immersive VR as a practical non-pharmacologic intervention for emergency care. It adds clinical evidence on the use of interactive technology for procedural pain management.
Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model
2023Fine-tuning large pre-trained language models has become increasingly difficult due to extensive memory usage, with the primary bottleneck being the storage of activation feature maps needed for gradient computation. This paper proposes WTA-CRS (Winner-Take-All Column-Row Sampling), a new family of unbiased estimators for matrix products with reduced variance that only requires storing sub-sampled activations for gradient calculation, applied during the backward pass to maintain unbiased gradient estimation. Applied to LLM fine-tuning, WTA-CRS significantly reduces activation memory requirements while maintaining training convergence, enabling adaptation of large models on hardware that would otherwise lack sufficient memory.
2022
23 Shades of Self-Admitted Technical Debt: An Empirical Study on Machine Learning Software
2022In software development, the term "technical debt" (TD) is used to characterize short-term solutions and workarounds implemented in source code that may incur a long-term cost. Technical debt has a variety of forms and can thus affect multiple qualities of software including but not limited to its legibility, performance, and structure. In this paper, we have conducted a comprehensive study on the technical debt in machine learning (ML) based software. Technical debt can appear differently in ML software by infecting the data that ML models are trained on, thus affecting the functional performance of ML systems. The growing inclusion of ML components in modern software systems are introducing new set of TDs. Does ML software have similar TDs to traditional software? If not, what are the new types of machine learning specific technical debts? Which ML pipeline stages those debts appear? Do these debts differ in ML tools and applications and when they get removed? Currently, we do not know the state of the ML TDs in the wild. To address these questions, we mined 68,821 self admitted technical debts (SATD) from all the revisions of a curated dataset consisting of 2,686 mature ML repositories from GitHub, along with their introduction and removal. By applying an open-coding scheme and following upon prior works, we provided a comprehensive taxonomy of ML SATDs. Our study analyzes ML SATD type organizations, their frequencies within stages of ML software, the differences between ML SATDs in applications and tools, and the effort of ML SATD removals. The findings discovered suggest implications for ML developers and researchers to create maintainable ML systems.
A Unified Framework to Assess Market Implications of Institutional Investments
2022Understanding the market implications of large institutional investment decisions requires modeling complex interactions between institutional behavior and market dynamics. This paper presents a unified framework using machine learning and statistical analysis to assess how large-scale institutional investment decisions affect market prices, volatility, and liquidity. The approach integrates multiple data sources to provide a comprehensive assessment of market implications across different investment types and market conditions.
Approximate Quantum Circuit Reconstruction
2022Current and imminent quantum hardware lacks reliability due to noise and limited qubit counts, and quantum circuit cutting—which divides large circuits into smaller subcircuits—faces exponential classical post-processing overhead. This paper introduces approximate circuit reconstruction using a sampling-based method (MCMC) to probabilistically select high-probability bit strings during reconstruction, avoiding excessive calculations for the full probability distribution. Results show that this sampling-based post-processing holds great potential for fast and reliable circuit reconstruction in the NISQ era and beyond.
Business Classification Method and Device Based on Machine Learning
2022A patent describing a business classification method and device based on machine learning, applying ML algorithms for automated business categorization and analysis.
Image Generation Method and Apparatus Based on Artificial Intelligence
2022A patent describing a method and apparatus for AI-based image generation, leveraging artificial intelligence techniques for automated visual content creation.
Irrelevant Pixels are Everywhere: Find and Exclude Them for More Efficient Computer Vision
2022CNNs are compute-intensive because they indiscriminately compute features on all pixels of an input image, yet many pixels are irrelevant to the vision task at hand. This paper demonstrates through analysis of three popular computer vision datasets that approximately 48% of pixels are irrelevant, and proposes the focused convolution—a drop-in CNN replacement that operates only on relevant pixels identified by an area of interest mask. On an embedded device, the approach achieves no loss in accuracy while reducing inference latency, energy consumption, and multiply-add count by approximately 45%.
MARS: Malleable Actor-Critic Reinforcement Learning Scheduler
2022Scheduling jobs in HPC environments requires handling dynamic, time-varying workloads that challenge static scheduling policies. MARS (Malleable Actor-Critic Reinforcement Learning Scheduler) uses malleable actor-critic RL to adaptively schedule computing jobs, dynamically resizing allocations in response to changing workloads to optimize throughput and resource utilization. Evaluated against standard scheduling baselines, MARS demonstrates consistent improvements in job completion time and cluster utilization across varied workload scenarios.
Method and Device for Determining and Evaluating Business Data Categories
2022A patent describing a method and device for determining and evaluating business data categories, applying machine learning for automated business intelligence and data classification.
Pinpointing the System Reliability Degradation in NISQ Machines
2022Noise in quantum hardware causes significant reliability degradation in NISQ machines, but the systematic patterns of this degradation are not well understood. This paper investigates the sources and temporal patterns of reliability degradation in NISQ machines, identifying when and where noise causes significant performance drops in quantum circuits. The analysis provides guidance for developing error mitigation strategies targeted at the most impactful reliability degradation patterns in near-term quantum hardware.
Practical Implications of Dequantization on Machine Learning Algorithms
2022Quantum computing algorithms offer theoretical speedups for certain machine learning tasks, but dequantization results show that classical algorithms can sometimes achieve comparable performance. This paper examines the practical implications of dequantization on machine learning algorithms, providing a systematic analysis of when quantum approaches offer genuine advantages versus when classical alternatives are sufficient. The work offers guidance for practitioners on determining which ML tasks are promising candidates for quantum speedup versus those where dequantization renders quantum approaches redundant.
Quantum Noise in the Flow of Time: A Temporal Study of the Noise in Quantum Computers
2022Quantum noise in quantum computers is not static but evolves over time, yet most error characterization treats noise as temporally fixed. This paper conducts a temporal study of noise characteristics in quantum computers, revealing how quantum noise patterns change over time and analyzing the implications for circuit fidelity and error mitigation strategies. The findings provide insights for developing more effective time-aware calibration and error mitigation approaches for near-term quantum hardware.
System-Auditing, Data Analysis and Characteristics of Cyber Attacks for Big Data System
2022Using big data, distributed computing systems such as Apache Hadoop requires processing massive amount of data to support business and research applications. Thus, it is critical to ensure the cyber security of such systems. To better defend from advanced cyber attacks that pose threats to even well-protected enterprises, system-auditing based techniques have been adopted for monitoring system activities and assisting attack investigation. In this demo, we are building a system that collects system auditing logs from a big data system and performs data analysis to understand how system auditing can be used more effectively to assist attack investigation on big systems. We also built a demo application that detects unexpected file deletion and presents root causes for the deletion.
The Art and Practice of Data Science Pipelines: A Comprehensive Study of Data Science Pipelines In Theory, In-The-Small, and In-The-Large
2022Increasingly larger number of software systems today are including data science components for descriptive, predictive, and prescriptive analytics. The collection of data science stages from acquisition, to cleaning/curation, to modeling, and so on are referred to as data science pipelines. To facilitate research and practice on data science pipelines, it is essential to understand their nature. What are the typical stages of a data science pipeline? How are they connected? Do the pipelines differ in the theoretical representations and that in the practice? Today we do not fully understand these architectural characteristics of data science pipelines. In this work, we present a three-pronged comprehensive study to answer this for the state- of-the-art, data science in-the-small, and data science in-the-large. Our study analyzes three datasets: a collection of 71 proposals for data science pipelines and related concepts in theory, a collection of over 105 implementations of curated data science pipelines from Kaggle competitions to understand data science in-the-small, and a collection of 21 mature data science projects from GitHub to understand data science in-the-large. Our study has led to three representations of data science pipelines that capture the essence of our subjects in theory, in-the-small, and in-the-large.
2021
A Predictive Analytics Framework for Multi-Horizon Financial Crises Forecasting using Macro-Economic Data
2021Predictive analytics framework for multi-horizon financial crisis forecasting using macroeconomic data and ML to provide early warning signals.
Fair Preprocessing: Towards Understanding Compositional Fairness of Data Transformers in Machine Learning Pipeline
2021In recent years, many incidents have been reported where machine learning models exhibited discrimination among people based on race, sex, age, etc. Research has been conducted to measure and mitigate unfairness in machine learning models. For a machine learning task, it is a common practice to build a pipeline that includes an ordered set of data preprocessing stages followed by a classifier. However, most of the research on fairness has considered a single classifier based prediction task. What are the fairness impacts of the preprocessing stages in machine learning pipeline? Furthermore, studies showed that often the root cause of unfairness is ingrained in the data itself, rather than the model. But no research has been conducted to measure the unfairness caused by a specific transformation made in the data preprocessing stage. In this paper, we introduced the causal method of fairness to reason about the fairness impact of data preprocessing stages in ML pipeline. We leveraged existing metrics to define the fairness measures of the stages. Then we conducted a detailed fairness evaluation of the preprocessing stages in 37 pipelines collected from three different sources. Our results show that certain data transformers are causing the model to exhibit unfairness. We identified a number of fairness patterns in several categories of data transformers. Finally, we showed how the local fairness of a preprocessing stage composes in the global fairness of the pipeline. We used the fairness composition to choose appropriate downstream transformer that mitigates unfairness in the machine learning pipeline.
TQEA: Temporal Quantum Error Analysis
2021Quantum errors in NISQ hardware vary temporally, but most error analysis tools treat noise as time-invariant. TQEA (Temporal Quantum Error Analysis) characterizes how quantum errors evolve over time by systematically measuring and modeling the temporal dynamics of noise in quantum computers. The framework provides insights for improving error mitigation strategies that account for drift and time-varying noise characteristics, supporting progress toward more reliable quantum computing.
2020
A Predictive Analytics Framework for Insider Trading Events
2020Detecting and forecasting insider trading events using traditional methods is limited by their reliance on predefined rules and inability to capture subtle market signals. This paper presents a predictive analytics framework for insider trading events using machine learning applied to financial transaction data and market signals, demonstrating the ability to identify patterns predictive of insider trading. The approach provides early warning capabilities that can complement traditional regulatory surveillance methods.
Do the Machine Learning Models on a Crowd Sourced Platform Exhibit Bias? An Empirical Study on Model Fairness
2020Machine learning models are increasingly being used in important decision- making software such as approving bank loans, recommending criminal sentencing, hiring employees, and so on. It is important to ensure the fairness of these models so that no discrimination is made based on protected attribute (e.g., race, sex, age) while decision making. Algorithms have been developed to measure unfairness and mitigate them to a certain extent. In this paper, we have focused on the empirical evaluation of fairness and mitigations on real-world machine learning models. We have created a benchmark of 40 top- rated models from Kaggle used for 5 different tasks, and then using a comprehensive set of fairness metrics, evaluated their fairness. Then, we have applied 7 mitigation techniques on these models and analyzed the fairness, mitigation results, and impacts on performance. We have found that some model optimization techniques result in inducing unfairness in the models. On the other hand, although there are some fairness control mechanisms in machine learning libraries, they are not documented. The mitigation algorithm also exhibit common patterns such as mitigation in the post-processing is often costly (in terms of performance) and mitigation in the pre-processing stage is preferred in most cases. We have also presented different trade-off choices of fairness mitigation decisions. Our study suggests future research directions to reduce the gap between theoretical fairness aware algorithms and the software engineering methods to leverage them in practice.
Towards Performant Workflows, Monitoring and Measuring
2020Scientific HPC workflows require robust monitoring and measurement infrastructure to understand performance characteristics and enable optimization. This paper presents approaches for building performant scientific workflows with integrated monitoring and measurement, enabling better characterization and optimization of HPC workflow performance across distributed computing environments. The work provides practical methodologies for workflow developers to identify bottlenecks and improve end-to-end throughput.
With or Without Knee Total Knee Arthroplasty? Deep Learning-powered Strategy to detect TKA in plain radiographs
2020Deep learning approach for automatically detecting TKA implants in plain radiographs, enabling efficient large-scale retrospective analysis of orthopedic registries.