Case Western Reserve University
Jierui Peng

Jierui Peng

PhD Student

Education

  • B.S., Brandeis University, 2021
  • M.S., New York University, 2023
  • Ph.D., Case Western Reserve University, current

Current Projects

NEBULA: Diagnostic and Robust Evaluation for Vision-Language-Action Systems

This project introduces NEBULA, a unified ecosystem for evaluating Vision-Language-Action (VLA) agents through a dual-axis framework that disentangles capability and robustness. It addresses the limitations of traditional end-task success metrics by proposing fine-grained capability tests with controlled variable isolation and systematic stress tests for reliability assessment. In addition, NEBULA provides a standardized data format, unified API, and large-scale aggregated dataset to enable reproducible cross-dataset training and benchmarking, revealing critical failure modes in modern embodied agents.

Related Publications:

CLAIRE: Causally Explainable AI for EKG-based Risk Prediction

This project presents CLAIRE, a causally explainable AI framework for predicting mortality and major adverse cardiovascular events (MACE) from structured EKG data. The system integrates large language models with structured clinical features to enable both high predictive performance and interpretable reasoning. A two-stage pipeline combines end-to-end prediction with feature attribution and causal graph generation, linking EKG abnormalities to physiological mechanisms. The framework achieves strong accuracy while providing clinically validated explanations, bridging the gap between black-box prediction and mechanistic understanding in medical AI.

Related Publications:

RT-LTP: Real-Time Latent Trajectory Prediction with Efficient Online Adaptation

This project proposes RT-LTP, an efficient trajectory prediction framework designed for real-time online learning under distribution shift. The method reformulates trajectory forecasting as a latent-space alignment problem, predicting future motion in a compact, semantically consistent latent space. It incorporates a lightweight low-rank adaptation module to enable fast test-time learning without full model updates, significantly reducing optimization latency. The approach improves both prediction accuracy and computational efficiency, enabling robust deployment in high-speed dynamic environments such as autonomous driving.

Related Publications:
Filter:

NEBULA: Do We Evaluate Vision-Language-Action Agents Correctly?

2026

arXiv preprint arXiv:2510.16263

This paper introduces NEBULA, a unified ecosystem for evaluating Vision-Language-Action (VLA) agents beyond coarse end-task success metrics. It proposes a novel dual-axis evaluation framework that combines fine-grained capability tests for skill-specific diagnosis with systematic stress tests to measure robustness under real-world perturbations. In addition, NEBULA standardizes fragmented embodied AI datasets through a unified data format and API, enabling reproducible cross-dataset training and benchmarking. Experimental results reveal that state-of-the-art VLA models exhibit significant hidden weaknesses in critical capabilities such as spatial reasoning and dynamic adaptation, highlighting the need for more interpretable and reliability-aware evaluation. [oai_citation:0‡ICLR_2026_Nebula_Final.pdf](sediment://file_000000002e8c722f8ce2ecef4cc5af26)

Artificial Intelligence