Wang Van Yang | PEATAI

Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time

2025

Van Yang , Xiang Yue , Vipin Chaudhary , Xiaotian Han

Conference on Language Modeling (COLM), October 7-10, 2025, Montreal, Canada

Recent advances in post-training enhance model reasoning but require costly training pipelines and produce inefficient, overly lengthy outputs. This paper introduces Speculative Thinking, a training-free framework enabling large reasoning models to guide smaller ones during inference at the reasoning level—distinct from token-level speculative decoding—by identifying structural cues such as paragraph breaks followed by reflective phrases where small models struggle and delegating those steps to a larger model. The method significantly boosts smaller model reasoning accuracy while shortening output length, offering an efficient inference-time paradigm that preserves the small model's compute efficiency.

Artificial Intelligence

arXiv

BibTeX Citation

@misc{yang2025speculativethinkingenhancingsmallmodel,
      title={Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time}, 
      author={Wang Yang and Xiang Yue and Vipin Chaudhary and Xiaotian Han},
      year={2025},
      eprint={2504.12329},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2504.12329}, 
}

Longer Context, Deeper Thinking: Uncovering the Role of Long-Context Ability in Reasoning

2025

Van Yang , Zirui Liu , Hongye Jin , Qingyu Yin , Vipin Chaudhary , Xiaotian Han

39th Conference on Neural Information Processing Systems (NeurIPS 2025), December 2025

Recent language models exhibit strong reasoning capabilities, yet the influence of long-context capacity on reasoning remains underexplored. This paper hypothesizes that current reasoning limitations stem partly from insufficient long-context capacity, motivated by observations that higher context window lengths correlate with stronger reasoning performance and that failed reasoning cases resemble failed long-context cases. Controlled experiments comparing architecturally identical models with varying long-context capacities confirm that enhancing long-context ability before supervised fine-tuning leads to improved reasoning, advocating for long-context capacity as a first-class design objective.

Artificial Intelligence

arXiv

BibTeX Citation

@misc{yang2025longercontextdeeperthinking,
      title={Longer Context, Deeper Thinking: Uncovering the Role of Long-Context Ability in Reasoning}, 
      author={Wang Yang and Zirui Liu and Hongye Jin and Qingyu Yin and Vipin Chaudhary and Xiaotian Han},
      year={2025},
      eprint={2505.17315},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2505.17315}, 
}

100-LongBench: Are de facto Long-Context Benchmarks Literally Evaluating Long-Context Ability?

2025

Wang Yang , Hongye Jin , Shaochen Zhong , Song Jiang , Qifan Wang , Vipin Chaudhary , Xiaotian Han

63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025), July 27-August 1, 2025, Vienna, Austria

Existing long-context evaluation benchmarks fail to separate long-context performance from a model's baseline ability, making cross-model comparisons unclear, and are typically constructed with fixed input lengths that limit applicability across models with different context windows. This paper introduces 100-LongBench, a length-controllable long-context benchmark with a novel metric that disentangles baseline knowledge from true long-context capability across multiple task categories. Experiments demonstrate that existing benchmarks significantly conflate baseline model strength with genuine long-context ability, revealing a widespread evaluation gap.

Artificial Intelligence

DOI arXiv

BibTeX Citation

@misc{yang2025100longbenchfactolongcontextbenchmarks,
      title={100-LongBench: Are de facto Long-Context Benchmarks Literally Evaluating Long-Context Ability?}, 
      author={Wang Yang and Hongye Jin and Shaochen Zhong and Song Jiang and Qifan Wang and Vipin Chaudhary and Xiaotian Han},
      year={2025},
      eprint={2505.19293},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.19293}, 
}

Wang (Van) Yang

Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time

Longer Context, Deeper Thinking: Uncovering the Role of Long-Context Ability in Reasoning

100-LongBench: Are de facto Long-Context Benchmarks Literally Evaluating Long-Context Ability?

Mentors

Vipin Chaudhary, PhD