Disheng Liu

PhD Student

I am a Ph.D. candidate in Computer Science at Case Western Reserve University, advised by Prof. Yu Yin. My research lies in computer vision and vision-language models, with a particular focus on spatial intelligence in next-generation AI systems.

I am a Ph.D. candidate in Computer Science at Case Western Reserve University (CWRU), where I am advised by Prof. Yu Yin. My research interests broadly span computer vision and vision-language models, with a particular focus on advancing spatial intelligence in next-generation AI systems.

Before joining CWRU, I was a visiting student at ShanghaiTech University, supervised by Prof. Dinggang Shen. I received my M.S. in Information Science from the University of Pittsburgh, where I was supervised by Prof. Yu-Ru Lin. I received my B.S. in Computing and Information Science from Guangdong University of Technology, supervised by Prof. Weihua He.

My current research explores how AI systems can better understand spatial structure, reason about the physical world, and integrate visual and language information for more robust multimodal intelligence.

I have broad research interests in Computer Vision and Vision-Language Models, with a particular focus on advancing spatial intelligence in the next generation of AI systems.

Current Projects

Spatial Intelligence in VLM

This project investigates spatial intelligence in vision-language models (VLMs), with a focus on how these models perceive, represent, and reason about 3D space from visual inputs. The project aims to systematically study core capabilities such as depth understanding, relative position reasoning, viewpoint consistency, object interaction, and embodied spatial decision-making. Methodologically, it combines literature review, taxonomy building, benchmark analysis, and empirical evaluation of representative VLMs on spatial reasoning tasks. The project also examines common failure modes, dataset biases, and architectural design choices that affect spatial understanding, with the goal of identifying promising directions for improving VLMs toward more robust and generalizable spatial reasoning.

Related Publications:

Spatial Intelligence in Vision-Language Models: A Comprehensive Survey

2025

Disheng Liu , Tuo Liang , Zhe Hu , Jierui Peng , Yiren Lu , Yi Xu , Yun Fu , Yu Yin

Vision-Language Models (VLMs) have achieved great success but still lack spatial intelligence, and this survey provides the first unified overview of recent advances, taxonomies, and evaluations toward building spatially intelligent AI.

Artificial Intelligence Spatial Intelligence Vision Language Model

DOI Code View

Balancing Fidelity and Diversity: Synthetic data could stand on the shoulder of the real in visual recognition

2025

Disheng Liu , Tuo Liang , Yu Yin

With the rapid progress of generative models, synthetic data has become a common solution to data scarcity in AI. However, is using it directly without curation ideal for visual recognition? We systematically study how data fidelity and diversity affect recognition performance and show that balancing these factors significantly improves results through a training-free curation pipeline.

Artificial Intelligence Computer Vision Synthetic Data

Code

CAUSAL3D: A Comprehensive Benchmark for Causal Learning from Visual Data

2025

Disheng Liu , Yiran Qiao , Wuche Liu , Yiren Lu , Yunlai Zhou , Tuo Liang , Yu Yin , Jing Ma

True intelligence relies on understanding hidden causal relations, yet current AI and vision models lack benchmarks to assess this ability. We introduce Causal3D, a comprehensive 19-dataset benchmark linking structured and visual data to evaluate causal reasoning, revealing that performance drops sharply as causal complexity increases.

Artificial Intelligence Causality Trustworthy AI Computer Vision

DOI arXiv Code View

Mentors

Yu Yin, PhD

Assistant Professor, Department of Computer and Data Sciences, Case School of Engineering

Collaborators

Yiren Lu

PhD Student

Yunlai Zhou

PhD Student