ICLR 2026 - Submissions
Submissions
Summary Statistics
| Quantity AI Content | Count | Avg Rating |
|---|---|---|
| 0-10% | 0 (0%) | N/A |
| 10-30% | 0 (0%) | N/A |
| 30-50% | 1 (100%) | 5.00 |
| 50-70% | 0 (0%) | N/A |
| 70-90% | 0 (0%) | N/A |
| 90-100% | 0 (0%) | N/A |
| Total | 1 (100%) | 5.00 |
| Title | Abstract | Avg Rating | Quantity AI Content | Reviews | Pangram Dashboard |
|---|---|---|---|---|---|
| Principled Policy Optimization for LLMs via Self-Normalized Importance Sampling | Reinforcement Learning from Human Feedback (RLHF) is a key technique for aligning Large Language Models (LLMs) with human preferences. While Proximal Policy Optimization (PPO) is the standard algorith... | 5.00 | 39% | See Reviews | View AI Dashboard |