Senior ML Research Scientist · Skywork AI · Beijing, China
Wei Shen
I build frontier reasoning systems, scalable RL post-training pipelines, and production multimodal agents. My work focuses on long-horizon agentic reinforcement learning, reward modeling, anti-reward-gaming, curriculum learning, and reliable multimodal reasoning.
This homepage reflects the latest resume updated on April 20, 2026.
About
Welcome to my academic homepage. I am a Senior ML Research Scientist at Skywork AI, where I work on frontier reasoning systems, scalable RL post-training, and multimodal agents. Before joining Skywork, I worked at Baichuan on post-training and reasoning for the Baichuan series, and at ByteDance AI Lab on robust RLHF and reward design.
I completed my M.S. in Computer Science at Fudan University in the NLP Lab, advised by Prof. Xuanjing Huang, Prof. Xipeng Qiu, and Prof. Tao Gui, after receiving my B.S. from Huazhong University of Science and Technology. My research centers on alignment, reward modeling, reinforcement learning, and multimodal reasoning.
Research Interests
- RLHF and scalable post-training for reasoning models
- Reward modeling, GRPO-style optimization, and anti-reward-gaming
- Long-horizon agentic reinforcement learning
- Multimodal reasoning and VLM agents
- Reliable alignment under noisy or shifted feedback
Background
Senior ML Research Scientist
Leading long-horizon agentic RL and multimodal post-training for frontier reasoning systems and production agents.
Research Scientist
Worked on reasoning-oriented post-training, medical reasoning, coding RL, and reward modeling for the Baichuan family.
Research Intern
Studied robust RLHF, noisy-reward dynamics, and PPO variants for more stable and generalizable alignment.
Early Research Experience
Applied graph neural networks and reinforcement learning to routing and placement optimization in EDA workflows.
Recent News
- Jul 2025: Contributed to the release of Skywork-R1V3 and its technical report.
- Mar 2025: Joined Skywork AI as a Senior ML Research Scientist.
- May 2024: Joined Baichuan as a Research Scientist working on RL and reasoning.
- Jan 2024: Group Invariant Learning was accepted as an ICLR 2024 Spotlight paper.
- Dec 2023: PPO work received the Best Paper award at the NeurIPS 2023 Instruction Workshop.
Current Focus
Reasoning and Agentic RL
Developing long-horizon RL training pipelines for general, coding, and multimodal agent tasks.
Multimodal Systems
Building multimodal reasoning models and production browser / search agents with stronger reliability.
Reliable Alignment
Studying reward design, post-training stability, and robustness under noisy, sparse, or gameable feedback.
Selected Projects
Skywork-R1V3-38B
Led the July 2025 release that improved MMMU from 64.3% to 76.0% through GRPO and curriculum learning.
Skywork-OR1
Contributed to scalable reasoning models that reached 82.2 on AIME24 and 73.3 on AIME25.
MOSS-RLHF
Helped build one of the earliest open-source RLHF frameworks in China and a foundation for later reasoning post-training stacks.
Selected Publications
Skywork-R1V3 Technical Report
Technical report, July 2025 · Lead contributor
Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards
arXiv preprint, January 2024 · Co-first author
Loose Lips Sink Ships: Mitigating Length Bias in RLHF
EMNLP 2024
Improving Generalization of Alignment with Human Preferences through Group Invariant Learning
ICLR 2024 Spotlight
Secrets of RLHF in Large Language Models Part I: PPO
Best Paper, NeurIPS 2023 Instruction Workshop
Education and Highlights
Fudan University
M.S. in Computer Science, September 2021 – January 2024
Fudan NLP Lab. Advisors: Prof. Xuanjing Huang, Prof. Xipeng Qiu, and Prof. Tao Gui.
Huazhong University of Science and Technology
B.S. in Computer Science, September 2016 – May 2020
Changjun High School
High School, 2013 – 2016, Changsha
Recognition
NeurIPS Best Paper, ICLR Spotlight, and sustained open-source impact across RLHF and reasoning systems.
