ICLR 2026 - Submissions

Submissions

Quantity AI Content: 0-10%10-30%30-50%50-70%70-90%90-100%All

Avg Rating: 0-1 1-2 2-3 3-4 4-5 5-6 6-7 7-8 8-9 9-10 All

Title	Abstract	Avg Rating	Quantity AI Content	Reviews	Pangram Dashboard
Stable Preference Optimization: Learning preference is more important than imitation	Direct Preference Optimization (DPO; \citet{rafailov2023direct}) is a widely used method for aligning large language models (LLMs) with human feedback. However, its objective often leads to reward hac...	2.50	50%	See Reviews	View AI Dashboard

PreviousPage 1 of 1 (1 total rows)Next