Publications

Improving Generalization of Alignment with Human Preferences through Group Invariant Learning

preprint, 2023

Loose lips sink ships: Mitigating Length Bias in Reinforcement Learning from Human Feedback

EMNLP 2023 findings, 2023

Secrets of rlhf in large language models part i: Ppo

Instruction Workshop @ NeurIPS 2023, 2023