Yiren Lu

PhD Student

Yiren Lu is a Ph.D. student at the Computer and Data Sciences Department, Case Western Reserve University (CWRU). His research lies at the intersection of 3D computer vision and embodied AI, focusing on neural scene representations, 3D Gaussian Splatting, and geometry-aware perception. His research aims to develop interpretable spatial intelligence systems that bridge reconstruction and downstream tasks such as reasoning and long-horizon decision-making.

Education

M.S. in Computer Science & Engineering, University at Buffalo, 2024
B.Eng. in Computer Science, ShanghaiTech University, 2021

Awards and Honors

Kevin J. Kranzusch Fellowship, 2025
Outstanding Graduate Research Award, 2024

Before joining Case, Yiren received his M.S. in Computer Science and Engineering from University at Buffalo in 2024 under the supervision of Prof. Chen Wang and his B.Eng. degree in Computer Science from ShanghaiTech University in 2021 under the supervision of Prof. Sören Schwertfeger. He has also interned at Bosch Research North America as a Research Scientist Intern and at Tencent as an Applied Scientist Intern.

My current research focuses on developing robust 3D visual representations that bridge geometric reconstruction, semantic understanding, and downstream decision-making. Traditionally, 3D reconstruction has been viewed primarily as a graphics or geometry problem, while reasoning and navigation have been studied separately in robotics. My work connects these areas by exploring how 3D scene representations can move beyond passive reconstruction to become active, useful computational structures for perception, memory, and reasoning in complex real-world environments.

Current Projects

3D Gaussian Splatting as Memory for Downstream Reasoning and Decision Making

A detailed description of the project goals and methods.

Related Publications:

Robust 3D & 4D Reconstruction for Complex Real-world Scenarios

A detailed description of the project goals and methods.

Related Publications:

GSMem: 3D Gaussian Splatting as Persistent Spatial Memory for Zero-Shot Embodied Exploration and Reasoning

2026

Yiren Lu , Yi Du , Disheng Liu , Yunlai Zhou , Chen Wang , Yu Yin

arXiv

Effective embodied exploration requires agents to accumulate and retain spatial knowledge over time. However, existing scene representations, such as discrete scene graphs or static view-based snapshots, lack post-hoc re-observability. If an initial observation misses a target, the resulting memory omission is often irrecoverable. To bridge this gap, we propose GSMem, a zero-shot embodied exploration and reasoning framework built upon 3D Gaussian Splatting (3DGS). By explicitly parameterizing continuous geometry and dense appearance, 3DGS serves as a persistent spatial memory that endows the agent with Spatial Recollection: the ability to render photorealistic novel views from optimal, previously unoccupied viewpoints. To operationalize this, GSMem employs a retrieval mechanism that simultaneously leverages parallel object-level scene graphs and semantic-level language fields. This complementary design robustly localizes target regions, enabling the agent to “hallucinate” optimal views for high-fidelity Vision-Language Model (VLM) reasoning. Furthermore, we introduce a hybrid exploration strategy that combines VLM-driven semantic scoring with a 3DGS-based coverage objective, balancing task-aware exploration with geometric coverage. Extensive experiments on embodied question answering and lifelong navigation demonstrate the robustness and effectiveness of our framework.

Computer Vision Embodied AI

arXiv View

Reconstruction Matters: Learning Geometry-Aligned BEV Representation through 3D Gaussian Splatting

2026

Yiren Lu , Xin Ye , Burhaneddin Yaman , Jingru Luo , Zhexiao Xiong , Liu Ren , Yu Yin

arXiv

Bird's-Eye-View (BEV) perception serves as a cornerstone for autonomous driving, offering a unified spatial representation that fuses surrounding-view images to enable reasoning for various downstream tasks, such as semantic segmentation, 3D object detection, and motion prediction. However, most existing BEV perception frameworks adopt an end-to-end training paradigm, where image features are directly transformed into the BEV space and optimized solely through downstream task supervision. This formulation treats the entire perception process as a black box, often lacking explicit 3D geometric understanding and interpretability, leading to suboptimal performance. In this paper, we claim that an explicit 3D representation matters for accurate BEV perception, and we propose Splat2BEV, a Gaussian Splatting-assisted framework for BEV tasks. Splat2BEV aims to learn BEV feature representations that are both semantically rich and geometrically precise. We first pre-train a Gaussian generator that explicitly reconstructs 3D scenes from multi-view inputs, enabling the generation of geometry-aligned feature representations. These representations are then projected into the BEV space to serve as inputs for downstream tasks. Extensive experiments on nuScenes and argoverse dataset demonstrate that Splat2BEV achieves state-of-the-art performance and validate the effectiveness of incorporating explicit 3D reconstruction into BEV perception.

Computer Vision Autonomous Driving

arXiv View

Segment then Splat: Unified 3D Open-Vocabulary Segmentation via Gaussian Splatting

2025

Yiren Lu , Yunlai Zhou , Yiran Qiao , Chaoda Song , Tuo Liang , Jing Ma , Huan Wang , Yu Yin

NeurIPS

Open-vocabulary querying in 3D space is crucial for enabling more intelligent perception in applications such as robotics, autonomous systems, and augmented reality. However, most existing methods rely on 2D pixel-level parsing, leading to multi-view inconsistencies and poor 3D object retrieval. Moreover, they are limited to static scenes and struggle with dynamic scenes due to the complexities of motion modeling. In this paper, we propose Segment-then-Splat, a 3D-aware open vocabulary segmentation approach for both static and dynamic scenes based on Gaussian Splatting. Segment-then-Splat reverses the long established approach of segmentation after reconstruction by dividing Gaussians into distinct object sets before reconstruction. Once the reconstruction is complete, the scene is naturally segmented into individual objects, achieving true 3D segmentation. This approach not only eliminates Gaussian-object misalignment issues in dynamic scenes but also accelerates the optimization process, as it eliminates the need for learning a separate language field. After optimization, a CLIP embedding is assigned to each object to enable open-vocabulary querying. Extensive experiments on various datasets demonstrate the effectiveness of our proposed method in both static and dynamic scenarios.

Computer Vision 3D Reconstruction 3D Scene Understanding

arXiv View

Fix False Transparency by Noise Guided Splatting

2025

Aly El Hakie* , Yiren Lu* , Yu Yin , Michael Jenkins , Yehe Liu

NeurIPS

Opaque objects reconstructed by 3D Gaussian Splatting (3DGS) often exhibit a falsely transparent surface, leading to inconsistent background and internal patterns under camera motion in interactive viewing. This issue stems from the ill-posed optimization in 3DGS. During training, background and foreground Gaussians are blended via α-compositing and optimized solely against the input RGB images using a photometric loss. As this process lacks an explicit constraint on surface opacity, the optimization may incorrectly assign transparency to opaque regions, resulting in view-inconsistent and falsely transparent output. This issue is difficult to detect in standard evaluation settings (i.e., rendering static images) but becomes particularly evident in object-centric reconstructions under interactive viewing. Although other causes of view-inconsistency (e.g., popping artifacts) have been explored recently, false transparency has not been explicitly identified. To the best of our knowledge, we are the first to quantify, characterize, and develop solutions for this "false transparency" artifact, an underreported artifact in 3DGS. Our strategy, Noise Guided Splatting (NGS), encourages surface Gaussians to adopt higher opacity by injecting opaque noise Gaussians in the object volume during training, requiring only minimal modifications to the existing splatting process. To quantitatively evaluate false transparency in static renderings, we propose a transmittance-based metric that measures the severity of this artifact. In addition, we introduce a customized, high-quality object-centric scan dataset exhibiting pronounced transparency issues, and we augment popular existing datasets (e.g., DTU) with complementary infill noise specifically designed to assess the robustness of 3D reconstruction methods to false transparency. Experiments across multiple datasets show that NGS substantially reduces false transparency while maintaining competitive performance on standard rendering metrics (e.g., PSNR), demonstrating its overall effectiveness.

Computer Vision 3D Reconstruction

arXiv View

BARD-GS: Blur-Aware Reconstruction of Dynamic Scenes via Gaussian Splatting

2025

Yiren Lu , Yunlai Zhou , Disheng Liu , Tuo Liang , Yu Yin

CVPR

3D Gaussian Splatting (3DGS) has shown remarkable potential for static scene reconstruction, and recent advancements have extended its application to dynamic scenes. However, the quality of reconstructions depends heavily on high-quality input images and precise camera poses, which is not that trivial to fulfill in the real-world scenarios. Capturing dynamic scenes with handheld monocular cameras, for instance, typically involves simultaneous movement of both the camera and objects within a single exposure. This combined motion frequently results in image blur that existing methods cannot adequately handle. To address these challenges, we introduce BARD-GS, a novel approach for robust dynamic scene reconstruction that effectively handles blurry inputs and imprecise camera poses. BARD-GS comprises two main components: 1) camera motion deblurring and 2) object motion deblurring. By explicitly decomposing motion blur into camera motion blur and object motion blur and modeling them separately, we achieve significantly improved rendering results in dynamic regions. In addition, we collect a real-world motion blur dataset of dynamic scenes to evaluate our approach. Extensive experiments demonstrate that BARD-GS effectively reconstructs high-quality dynamic scenes under realistic conditions, significantly outperforming existing methods.

Computer Vision 3D Reconstruction

arXiv View

View-Consistent Object Removal in Radiance Fields

2024

Yiren Lu , Jing Ma , Yu Yin

ACM Multimedia (MM)

Radiance Fields (RFs) have emerged as a crucial technology for 3D scene representation, enabling the synthesis of novel views with remarkable realism. However, as RFs become more widely used, the need for effective editing techniques that maintain coherence across different perspectives becomes evident. Current methods primarily depend on per-frame 2D image inpainting, which often fails to maintain consistency across views, thus compromising the realism of edited RF scenes. In this work, we introduce a novel RF editing pipeline that significantly enhances consistency by requiring the inpainting of only a single reference image. This image is then projected across multiple views using a depth-based approach, effectively reducing the inconsistencies observed with per-frame inpainting. However, projections typically assume photometric consistency across views, which is often impractical in real-world settings. To accommodate realistic variations in lighting and viewpoint, our pipeline adjusts the appearance of the projected views by generating multiple directional variants of the inpainted image, thereby adapting to different photometric conditions. Additionally, we present an effective and robust multi-view object segmentation approach as a valuable byproduct of our pipeline. Extensive experiments demonstrate that our method significantly surpasses existing frameworks in maintaining content consistency across views and enhancing visual quality.

Computer Vision 3D Reconstruction 3D Editing

arXiv View

Mentors

Yu Yin

Assistant Professor, Department of Computer and Data Sciences, Case School of Engineering

Yiren Lu

Education

Awards and Honors

Current Projects

3D Gaussian Splatting as Memory for Downstream Reasoning and Decision Making

Robust 3D & 4D Reconstruction for Complex Real-world Scenarios

GSMem: 3D Gaussian Splatting as Persistent Spatial Memory for Zero-Shot Embodied Exploration and Reasoning

Reconstruction Matters: Learning Geometry-Aligned BEV Representation through 3D Gaussian Splatting

Segment then Splat: Unified 3D Open-Vocabulary Segmentation via Gaussian Splatting

Fix False Transparency by Noise Guided Splatting

BARD-GS: Blur-Aware Reconstruction of Dynamic Scenes via Gaussian Splatting

View-Consistent Object Removal in Radiance Fields

Mentors

Yu Yin

Collaborators