About
I am currently working on foundation model pretraining at StepFun. My research interests include large language model pretraining, scaling laws, training stability, muP, and Muon.
I received my M.S. from the Academy of Mathematics and Systems Science, Chinese Academy of Sciences, and my B.S. in Mathematics from Shandong University.
News
- 2026 ICLR 2026 Oral 🎉 "Can Mixture-of-Experts Surpass Dense LLMs Under Strictly Equal Resources?" accepted.
- 2026 ICLR 2026 🎉 "How Many Code and Test Cases Are Enough?" accepted.
- 2026 Release 🎉 Step-3.5 Flash released. Blog · GitHub · Tech report.
- 2025 NeurIPS 2025 Spotlight 🎉 "Farseer: A Refined Scaling Law in Large Language Models" spotlight.
- 2025 EMNLP 2025 🎉 "Debate-to-Detect: Reformulating Misinformation Detection as a Real-World Debate" accepted.
- 2025.06 Edu 🎉 Obtained M.S. from University of Chinese Academy of Sciences.
- 2025 AAAI 2025 Oral 🎉 "Beyond Detection: Exploring Evidence-based Multi-Agent Debate …" oral.
- 2024.12 Work 🎉 Joined StepFun (foundation model pretraining).
- 2024 EMNLP 2024 🎉 "Breaking Language Barriers: Cross-Lingual Continual Pre-Training at Scale" accepted.
Publications
* equal contribution
-
Breaking Language Barriers: Cross-Lingual Continual Pre-Training at ScaleEMNLP 2024
-
StepLaw — Optimal Hyperparameter Scaling Law in Large Language Model Pretraining★ Representative Open-source
-
Farseer: A Refined Scaling Law in Large Language ModelsNeurIPS 2025 Spotlight
-
Scaling Laws for Code: A More Data-Hungry RegimePreprint
-
Debate-to-Detect: Reformulating Misinformation Detection as a Real-World Debate with Large Language ModelsEMNLP 2025
-
Beyond Detection: Exploring Evidence-based Multi-Agent Debate for Misinformation Intervention & PersuasionAAAI 2025 Oral
-
How Many Code and Test Cases Are Enough? Evaluating Test Cases Generation from a Binary-Matrix PerspectiveICLR 2026
-
Can Mixture-of-Experts Surpass Dense LLMs Under Strictly Equal Resources?ICLR 2026 Oral
-
Step-3 is Large yet Affordable: Model-system Co-design for Cost-effective DecodingModel report Open-source
-
Step 3.5 Flash: Fast, Sharp & Reliable Agentic Intelligence★ Representative Model report Open-source
-
Simulating social network with LLM agents: An analysis of information propagation and echo chambersKSS 2024 Oral
Experience
2024.12 – Present
StepFun (阶跃星辰) · Foundation Model Pretraining
2024.05 – 2024.10
Meituan (美团) · LLM Pretraining Intern
2023.07 – 2024.03
Langboat Technology (澜舟科技) · Foundation Model Pretraining Intern
Education
University of Chinese Academy of Sciences · M.S.
Academy of Mathematics and Systems Science, Chinese Academy of Sciences.
Shandong University · B.S. in Mathematics
Awards & Honors
- MCM/ICM F Award (Finalist, top 1%)
- National College Math Competition (Math Major Category A) Second Prize
- Hua Luogeng Scholarship, CAS Institute of Mathematics
- Shandong Province College Physics Competition First Prize
- National High School Math League (Anhui Province) First Prize