ICLR 2026 - Submissions

SubmissionsReviews

Submissions

Summary Statistics

Quantity AI Content Count Avg Rating
0-10% 0 (0%) N/A
10-30% 0 (0%) N/A
30-50% 1 (100%) 2.50
50-70% 0 (0%) N/A
70-90% 0 (0%) N/A
90-100% 0 (0%) N/A
Total 1 (100%) 2.50
Title Abstract Avg Rating Quantity AI Content Reviews Pangram Dashboard
Stable Preference Optimization: Learning preference is more important than imitation Direct Preference Optimization (DPO; \citet{rafailov2023direct}) is a widely used method for aligning large language models (LLMs) with human feedback. However, its objective often leads to reward hac... 2.50 50% See Reviews View AI Dashboard
PreviousPage 1 of 1 (1 total rows)Next