Towards Data-Centric RLHF: Simple Metrics for Preference Dataset Comparison Apple Machine Learning Research
The goal of aligning language models to human preferences requires data that reveal these preferences. Ideally, time and money can be spent carefully collecting and tailoring bespoke preference data to each downstream application. However, in practice, a select few publicly available preference datasets are often… Read More »Towards Data-Centric RLHF: Simple Metrics for Preference Dataset Comparison Apple Machine Learning Research