Senior ML Research Scientist · Skywork AI · Beijing, China

Wei Shen

I build frontier reasoning systems, scalable RL post-training pipelines, and production multimodal agents. My work focuses on long-horizon agentic reinforcement learning, reward modeling, anti-reward-gaming, curriculum learning, and reliable multimodal reasoning.

This homepage reflects the latest resume updated on April 20, 2026.

1,900+ Google Scholar citations
20 h-index
76.0 MMMU on Skywork-R1V3-38B
3.2k GitHub stars on Skywork-R1V3

About

Welcome to my academic homepage. I am a Senior ML Research Scientist at Skywork AI, where I work on frontier reasoning systems, scalable RL post-training, and multimodal agents. Before joining Skywork, I worked at Baichuan on post-training and reasoning for the Baichuan series, and at ByteDance AI Lab on robust RLHF and reward design.

I completed my M.S. in Computer Science at Fudan University in the NLP Lab, advised by Prof. Xuanjing Huang, Prof. Xipeng Qiu, and Prof. Tao Gui, after receiving my B.S. from Huazhong University of Science and Technology. My research centers on alignment, reward modeling, reinforcement learning, and multimodal reasoning.

Research Interests

  • RLHF and scalable post-training for reasoning models
  • Reward modeling, GRPO-style optimization, and anti-reward-gaming
  • Long-horizon agentic reinforcement learning
  • Multimodal reasoning and VLM agents
  • Reliable alignment under noisy or shifted feedback

Background

Skywork AI · 2025–Present

Senior ML Research Scientist

Leading long-horizon agentic RL and multimodal post-training for frontier reasoning systems and production agents.

Baichuan Inc. · 2024–2025

Research Scientist

Worked on reasoning-oriented post-training, medical reasoning, coding RL, and reward modeling for the Baichuan family.

ByteDance AI Lab · 2023–2024

Research Intern

Studied robust RLHF, noisy-reward dynamics, and PPO variants for more stable and generalizable alignment.

Fudan CISL Lab · 2021

Early Research Experience

Applied graph neural networks and reinforcement learning to routing and placement optimization in EDA workflows.

Recent News

  • Jul 2025: Contributed to the release of Skywork-R1V3 and its technical report.
  • Mar 2025: Joined Skywork AI as a Senior ML Research Scientist.
  • May 2024: Joined Baichuan as a Research Scientist working on RL and reasoning.
  • Jan 2024: Group Invariant Learning was accepted as an ICLR 2024 Spotlight paper.
  • Dec 2023: PPO work received the Best Paper award at the NeurIPS 2023 Instruction Workshop.

Current Focus

Reasoning and Agentic RL

Developing long-horizon RL training pipelines for general, coding, and multimodal agent tasks.

Multimodal Systems

Building multimodal reasoning models and production browser / search agents with stronger reliability.

Reliable Alignment

Studying reward design, post-training stability, and robustness under noisy, sparse, or gameable feedback.

Selected Projects

Tech Lead · 2025

Skywork-R1V3-38B

Led the July 2025 release that improved MMMU from 64.3% to 76.0% through GRPO and curriculum learning.

Core Contributor · 2025

Skywork-OR1

Contributed to scalable reasoning models that reached 82.2 on AIME24 and 73.3 on AIME25.

Core Contributor · 2023

MOSS-RLHF

Helped build one of the earliest open-source RLHF frameworks in China and a foundation for later reasoning post-training stacks.

Selected Publications

Skywork-R1V3 Technical Report

Technical report, July 2025 · Lead contributor

Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards

arXiv preprint, January 2024 · Co-first author

Education and Highlights

Fudan University

M.S. in Computer Science, September 2021 – January 2024

Fudan NLP Lab. Advisors: Prof. Xuanjing Huang, Prof. Xipeng Qiu, and Prof. Tao Gui.

Huazhong University of Science and Technology

B.S. in Computer Science, September 2016 – May 2020

Changjun High School

High School, 2013 – 2016, Changsha

Recognition

NeurIPS Best Paper, ICLR Spotlight, and sustained open-source impact across RLHF and reasoning systems.