Post-training · Reasoning · Multimodal Agents

Wei Shen

沈蔚

Senior ML Research Scientist at Skywork AI, working on post-training, reasoning systems, and multimodal agents for frontier foundation models.

Skywork AI · ex-Baichuan · ex-ByteDance Seed / AI Lab · Fudan NLP · Beijing

Reasoning Models RLHF Multimodal Agents

My work spans RLHF, reward modeling, long-horizon agentic reinforcement learning, and multimodal reasoning, with experience across leading industry labs, strong academic recognition, and open-source impact.

Recognition NeurIPS Best Paper, ICLR Spotlight, and widely cited RLHF research
Selected impact Core contributor or lead across reasoning, post-training, and multimodal open releases
1917 Google Scholar citations
20 h-index
3 leading AI teams across research and industry
2 major distinctions: NeurIPS Best Paper and ICLR Spotlight

Research Profile

I work on post-training for frontier foundation models, with a focus on reasoning systems, RLHF, and multimodal agents. Most of my recent work sits at the boundary between research ideas and production-facing model development: reward design, scalable reinforcement learning, curriculum construction, and evaluation for reliable reasoning behavior.

I am especially interested in training pipelines that remain robust under noisy, sparse, or gameable feedback, and in systems that transfer across general reasoning, coding, and multimodal tasks. The goal is not only stronger benchmarks, but models that are harder to exploit and more dependable in real use.

Research Interests

  • RLHF and scalable post-training for reasoning models
  • Reward modeling, GRPO-style optimization, and anti-reward-gaming
  • Long-horizon agentic reinforcement learning
  • Multimodal reasoning and VLM agents
  • Reliable alignment under noisy or shifted feedback

Career Path

Skywork AI · 2025–Present

Senior ML Research Scientist

Leading long-horizon agentic RL and multimodal post-training for frontier reasoning systems and production agents.

Baichuan Inc. · 2024–2025

Research Scientist

Worked in the RL team led by Dong Yan on reasoning-oriented post-training, medical reasoning, coding RL, and reward modeling for the Baichuan family.

ByteDance Seed / AI Lab · 2023–2024

Research Intern

Worked with Yang Liu and Hang Li on robust RLHF, noisy-reward dynamics, and PPO variants for more stable and generalizable alignment.

Fudan NLP Lab · 2021–2024

Early Research Training

Started formal research training in the Fudan NLP community, building the foundation for later work on language understanding, alignment, and reasoning systems.

Selected Impact

  • Led or contributed to major reasoning and multimodal releases including Skywork-R1V3, Skywork-OR1, and MOSS-RLHF.
  • Published award- and spotlight-level work on RLHF, alignment, and post-training.
  • Built end-to-end experience from reward design and evaluation to training pipelines and public releases.
  • Collaborated across top industry labs and Fudan NLP on alignment, reasoning, and multimodal learning.

Current Focus

Reasoning and Agentic RL

Developing long-horizon RL training pipelines for general, coding, and multimodal agent tasks.

Multimodal Systems

Building multimodal reasoning models and production browser / search agents with stronger reliability.

Reliable Alignment

Studying reward design, post-training stability, and robustness under noisy, sparse, or gameable feedback.

Selected Projects

Tech Lead · 2025

Skywork-R1V3-38B

Led the July 2025 release that improved MMMU from 64.3% to 76.0% through GRPO and curriculum learning.

Core Contributor · 2025

Skywork-OR1

Contributed to scalable reasoning models that reached 82.2 on AIME24 and 73.3 on AIME25.

Core Contributor · 2023

MOSS-RLHF

Helped build one of the earliest open-source RLHF frameworks in China and a foundation for later reasoning post-training stacks.

Education and Highlights

Fudan University

M.S. in Computer Science, September 2021 – January 2024

Fudan NLP Lab. Advisors: Prof. Xuanjing Huang, Prof. Xipeng Qiu, and Prof. Tao Gui.

Huazhong University of Science and Technology

B.S. in Computer Science, September 2016 – May 2020

Recognition

NeurIPS Best Paper, ICLR Spotlight, and sustained open-source impact across RLHF and reasoning systems.