Case Western Reserve University
Disheng Liu

Disheng Liu

PhD Student

Current Projects

Spatial Intelligence in VLM

This project investigates spatial intelligence in vision-language models (VLMs), with a focus on how these models perceive, represent, and reason about 3D space from visual inputs. The project aims to systematically study core capabilities such as depth understanding, relative position reasoning, viewpoint consistency, object interaction, and embodied spatial decision-making. Methodologically, it combines literature review, taxonomy building, benchmark analysis, and empirical evaluation of representative VLMs on spatial reasoning tasks. The project also examines common failure modes, dataset biases, and architectural design choices that affect spatial understanding, with the goal of identifying promising directions for improving VLMs toward more robust and generalizable spatial reasoning.

Related Publications:
Filter:

Spatial Intelligence in Vision-Language Models: A Comprehensive Survey

2025

Vision-Language Models (VLMs) have achieved great success but still lack spatial intelligence, and this survey provides the first unified overview of recent advances, taxonomies, and evaluations toward building spatially intelligent AI.

Artificial Intelligence Spatial Intelligence Vision Language Model

Balancing Fidelity and Diversity: Synthetic data could stand on the shoulder of the real in visual recognition

2025

With the rapid progress of generative models, synthetic data has become a common solution to data scarcity in AI. However, is using it directly without curation ideal for visual recognition? We systematically study how data fidelity and diversity affect recognition performance and show that balancing these factors significantly improves results through a training-free curation pipeline.

Artificial Intelligence Computer Vision Synthetic Data

CAUSAL3D: A Comprehensive Benchmark for Causal Learning from Visual Data

2025

True intelligence relies on understanding hidden causal relations, yet current AI and vision models lack benchmarks to assess this ability. We introduce Causal3D, a comprehensive 19-dataset benchmark linking structured and visual data to evaluate causal reasoning, revealing that performance drops sharply as causal complexity increases.

Artificial Intelligence Causality Trustworthy AI Computer Vision