Improving Generalization of Alignment with Human Preferences through Group Invariant Learning preprint, 2023
Loose lips sink ships: Mitigating Length Bias in Reinforcement Learning from Human Feedback EMNLP 2023 findings, 2023