Wenzhen Zheng

About

I am currently working on foundation model pretraining at StepFun. My research interests include large language model pretraining, scaling laws, training stability, muP, and Muon.

I received my M.S. from the Academy of Mathematics and Systems Science, Chinese Academy of Sciences, and my B.S. in Mathematics from Shandong University.

News

  • 2026 ICLR 2026 Oral 🎉 "Can Mixture-of-Experts Surpass Dense LLMs Under Strictly Equal Resources?" accepted.
  • 2026 ICLR 2026 🎉 "How Many Code and Test Cases Are Enough?" accepted.
  • 2026 Release 🎉 Step-3.5 Flash released. Blog · GitHub · Tech report.
  • 2025 NeurIPS 2025 Spotlight 🎉 "Farseer: A Refined Scaling Law in Large Language Models" spotlight.
  • 2025 EMNLP 2025 🎉 "Debate-to-Detect: Reformulating Misinformation Detection as a Real-World Debate" accepted.
  • 2025.06 Edu 🎉 Obtained M.S. from University of Chinese Academy of Sciences.
  • 2025 AAAI 2025 Oral 🎉 "Beyond Detection: Exploring Evidence-based Multi-Agent Debate …" oral.
  • 2024.12 Work 🎉 Joined StepFun (foundation model pretraining).
  • 2024 EMNLP 2024 🎉 "Breaking Language Barriers: Cross-Lingual Continual Pre-Training at Scale" accepted.

Publications

* equal contribution

  1. Breaking Language Barriers: Cross-Lingual Continual Pre-Training at Scale
    EMNLP 2024
    W Zheng*, W Pan*, X Xu*, L Qin, L Yue, M Zhou
  2. StepLaw — Optimal Hyperparameter Scaling Law in Large Language Model Pretraining
    ★ Representative Open-source
    Houyi Li*, Wenzhen Zheng*, Qiufeng Wang*, Hanshan Zhang, Zili Wang, Shijie Xuyang, Yuantao Fan, Shuigeng Zhou, Xiangyu Zhang, Daxin Jiang
  3. Farseer: A Refined Scaling Law in Large Language Models
    NeurIPS 2025 Spotlight
    Houyi Li*, Wenzhen Zheng*, Qiufeng Wang, Zhenyu Ding, Haoying Wang, Zili Wang, Shijie Xuyang, Ning Ding, Shuigeng Zhou, Xiangyu Zhang, Daxin Jiang
  4. Scaling Laws for Code: A More Data-Hungry Regime
    Preprint
    Xianzhen Luo*, Wenzhen Zheng*, Qingfu Zhu, Rongyi Zhang, Houyi Li, Siming Huang, YuanTao Fan, Wanxiang Che
  5. Debate-to-Detect: Reformulating Misinformation Detection as a Real-World Debate with Large Language Models
    EMNLP 2025
    C Han*, W Zheng*, X Tang
  6. Beyond Detection: Exploring Evidence-based Multi-Agent Debate for Misinformation Intervention & Persuasion
    AAAI 2025 Oral
    C Han, Y Ma, J Tan, W Zheng, X Tang
  7. How Many Code and Test Cases Are Enough? Evaluating Test Cases Generation from a Binary-Matrix Perspective
    ICLR 2026
    X Luo, J Huang, W Zheng, Q Zhu, M Xu, Y Xu, Y Fan, L Qin, W Che
  8. Can Mixture-of-Experts Surpass Dense LLMs Under Strictly Equal Resources?
    ICLR 2026 Oral
    H Li, KM Lo, Z Wang, Z Wang, W Zheng, S Zhou, X Zhang, D Jiang
  9. Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective Decoding
    Model report Open-source
    StepFun Team
  10. Step 3.5 Flash: Fast, Sharp & Reliable Agentic Intelligence
    ★ Representative Model report Open-source
    StepFun Team
  11. Simulating social network with LLM agents: An analysis of information propagation and echo chambers
    KSS 2024 Oral
    W Zheng, X Tang

Experience

2024.12 – Present
StepFun (阶跃星辰) · Foundation Model Pretraining
Advised by Xiangyu Zhang. Contributed to Step-3 and Step-3.5 Flash.
2024.05 – 2024.10
Meituan (美团) · LLM Pretraining Intern
2023.07 – 2024.03
Langboat Technology (澜舟科技) · Foundation Model Pretraining Intern

Education

University of Chinese Academy of Sciences · M.S.
2022.09 – 2025.06
Academy of Mathematics and Systems Science, Chinese Academy of Sciences.
Shandong University · B.S. in Mathematics
2018.09 – 2022.06

Awards & Honors

  • MCM/ICM F Award (Finalist, top 1%)
  • National College Math Competition (Math Major Category A) Second Prize
  • Hua Luogeng Scholarship, CAS Institute of Mathematics
  • Shandong Province College Physics Competition First Prize
  • National High School Math League (Anhui Province) First Prize