“In the middle of difficulty lies opportunity.” –Albert Einstein

About me

Welcome to my academic page! My name is Wei Shen (沈蔚). I am excited to share my journey and research endeavors in the field of artificial intelligence. I am currently a Research Scientist at Baichuan Inc., with a special focus on improving the capabilities of the Baichuan-series LLMs with Dong Yan.

I completed my bachelor’s degree at Huazhong University of Science and Technology (HUST) in 2020. And I competed my master’s degree at Fudan University, working in the NLP Lab under the guidance of Xuanjing Huang as my advisor, with Qi Zhang as my co-advisor.

I am passionate about leveraging language models to enhance various aspects of AI and contribute to the development of advanced and responsible AI systems.

Research Interests

My research interests primarily revolve around Natural Language Processing (NLP), with a specific focus on LLM alignment, including reward modeling, RL. Additionally, I am eager to explore the realm of LLM agents and contribute to advancements in that area.

Education

  • M.Eng. in Fudan University, NLP Lab, 2021-2024, Shanghai
  • B.Eng. in Huazhong University of Science and Technology, 2016-2020, Wuhan
  • High School in Changjun High School, 2013-2016, Changsha

Experience

  • 2024.5 - present: Research Scientist
  • 2023.8 - 2024.4: Research Internship
    • ByteDance AI Lab, Responsible AI team
    • Supervisor: Liu Yang, Xiaoying Zhang
  • 2021.9 - 2022.3: Research Internship

News

[2024.5.22] I am proud to announce my joining Baichuan Inc. as a Research Scientist!

[2024.1.16] Our paper “Improving Generalization of Alignment with Human Preferences through Group Invariant Learning” has been accepted for spotlight(5%) on ICLR 2024!

[2023.12.15] Our paper “Delve into PPO: Implementation Matters for Stable RLHF” has won the Best Paper on Instruction Workshop @ NeurIPS 2023!

Projects

MOSS-RLHF (An open-source RLHF project aims to help LLMs to achieve alignment easily)

Publications

2024

Linear Alignment: A Closed-form Solution for Aligning Human Preferences without Tuning and Feedback (ICML 2024)

  • Songyang Gao★, Qiming Ge★, Wei Shen, Shihan Dou, Junjie Ye, Xiao Wang, Rui Zheng, Yicheng Zou, Zhi Chen, Hang Yan, Qi Zhang, Dahua Lin

Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning (ICML 2024)

  • Zhiheng Xi★, Wenxiang Chen★, Boyang Hong★, Senjie Jin★, Rui Zheng, Wei He, Yiwen Ding, Shichun Liu, Xin Guo, Junzhe Wang, Honglin Guo, Wei Shen, Xiaoran Fan, Yuhao Zhou, Shihan Dou, Xiao Wang, Xinbo Zhang, Peng Sun, Tao Gui, Qi Zhang, Xuanjing Huang

LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment (ACL 2024)

  • Shihan Dou★, Enyu Zhou★, Yan Liu, Songyang Gao, Jun Zhao, Wei Shen, Yuhao Zhou, Zhiheng Xi, Xiao Wang, Xiaoran Fan, Shiliang Pu, Jiang Zhu, Rui Zheng, Tao Gui, Qi Zhang, Xuanjing Huang

Improving Reinforcement Learning from Human Feedback Using Contrastive Rewards (Preprint)

  • Wei Shen★, Xiaoying Zhang★, Yuanshun Yao, Rui Zheng, Hongyi Guo, Yang Liu

Overcoming Reward Overoptimization via Adversarial Policy Optimization with Lightweight Uncertainty Estimation (Preprint)

  • Xiaoying Zhang★, Jean-Francois Ton★, Wei Shen, Hongning Wang, Yang Liu

Secrets of RLHF in Large Language Models Part II: Reward Modeling (Preprint)

  • Binghai Wang★, Rui Zheng★, Lu Chen★, Yan Liu, Shihan Dou, Caishuang Huang, Wei Shen, Senjie Jin, Enyu Zhou, Chenyu Shi, Songyang Gao, Nuo Xu, Yuhao Zhou, Xiaoran Fan, Zhiheng Xi, Jun Zhao, Xiao Wang, Tao Ji, Hang Yan, Lixing Shen, Zhan Chen, Tao Gui, Qi Zhang, Xipeng Qiu, Xuanjing Huang, Zuxuan Wu, Yu-Gang Jiang

Human-Instruction-Free LLM Self-Alignment with Limited Samples (Preprint)

  • Hongyi Guo★, Yuanshun Yao★, Wei Shen, Jiaheng Wei, Xiaoying Zhang, Zhaoran Wang, Yang Liu

2023

Mitigating Length Bias in Reinforcement Learning from Human Feedback (EMNLP 2023 Findings)

  • Wei Shen★, Rui Zheng★, Wenyu Zhan, Jun Zhao, Shihan Dou, Tao Gui, Qi Zhang, Xuanjing Huang

Improving Generalization of Alignment with Human Preferences through Group Invariant Learning (ICLR 2024 Spotlight)

  • Rui Zheng★, Wei Shen★, Yuan Hua, Wenbin Lai, Shihan Dou, Yuhao Zhou, Zhiheng Xi, Xiao Wang, Haoran Huang, Tao Gui, Qi Zhang, Xuanjing Huang

Delve into PPO: Implementation Matters for Stable RLHF Training (Instruction Workshop @ NeurIPS 2023 best paper) (a.k.a. Secrets of RLHF in Large Language Models Part I: PPO)

  • Rui Zheng★, Shihan Dou★, Songyang Gao★, Wei Shen, Binghai Wang, Yan Liu, Senjie Jin, Qin Liu, Yuhao, Zhou, Limao Xiong, Lu Chen, Zhiheng Xi, Nuo Xu, Wenbin Lai, Minghao Zhu, Cheng Chang, Zhangyue Yin, Rongxiang, Weng, Wensen Cheng, Yuan Hua, Haoran Huang, Tianxiang Sun, Hang Yan, Tao Gui, Qi Zhang, Xipeng Qiu, Xuanjing Huang