|
Property-Driven Protein Inverse Folding with Multi-Objective Preference Alignment |
Soundness: 3: good
Presentation: 3: good
Contribution: 3: good
Rating: 6: marginally above the acceptance threshold
Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. |
The paper proposes ProtAlign, a multi-objective preference-alignment framework for protein inverse folding that optimizes developability properties without compromising designability. The method uses a semi-online DPO loop: generate rollouts at higher temperature, score them with property predictors, construct pairwise preferences per property, then train offline with an adaptive preference margin to reconcile conflicts among objectives. Instantiated on ProteinMPNN as MoMPNN, the approach is evaluated on CATH 4.3, de novo backbones from RFDiffusion, and realistic de novo binders; results show developability gains while maintaining or improving structural consistency relative to strong baselines.
- Method is simple and general: multi-objective DPO with an adaptive preference margin to mitigate conflicts across properties; the training pipeline evenly samples pairwise entries across properties and alternates rollout and training for efficiency.
- Practical semi-online training decouples rollout/evaluation from optimization, enabling batch computation and easier deployment while retaining online exploration benefits.
- Evaluations are broad and application-relevant: crystal redesign, de novo backbones, and realistic binder design; the study systematically integrates developability metrics into inverse-folding evaluation beyond amino acid recovery.
- The presentation style is good, with nice-looking figures and easy-to-follow-up narration styles.
- Limited ablations on multi-objective weights and margin settings. It might be helpful to quantify how weights, temperature, and margin thresholds shape the Pareto front and to provide transferable default configurations as the paper heavily relies on it.
- The adaptive preference margin m(yw,yl) is precomputed from auxiliary property deltas and then kept fixed during training. This is simple and fast, but it cannot react if the policy distribution drifts, predictors recalibrate, or property trade-offs evolve; the “right” margin may change as the frontier moves.
- Pair construction may over-represent “easy wins” and under-sample ambiguous regions. Preference pairs are formed by sorting rollouts and pairing top-half vs. bottom-half, with a delta threshold to drop uncertain pairs. While this stabilizes supervision, it can bias learning away from the decision boundary where the frontier is decided. Active pair mining (hard-negative selection) or uncertainty-aware sampling could help learn more from the ambiguous region and reduce label imbalance across properties.
- Can the weights across properties and the adaptive margin be tuned online using objective-improvement rates to more reliably approach a Pareto front across backbones and lengths?
- What is the effect of the number of rollouts and sampling temperature on the stability of training and final metrics in the semi-online loop, given that the paper uses a higher temperature for exploration but evaluates at a lower temperature for ProteinMPNN-family models? |
Moderately AI-edited |
|
Property-Driven Protein Inverse Folding with Multi-Objective Preference Alignment |
Soundness: 3: good
Presentation: 3: good
Contribution: 3: good
Rating: 8: accept, good paper
Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. |
This paper applies multi-objective preference optimization to protein inverse folding, using semi-online DPO with adaptive margins to balance structural accuracy against properties like solubility and thermostability. The resulting model, MoMPNN, beats existing baselines across several benchmarks. The approach is solid but not particularly novel—it's essentially transplanting techniques from LLM alignment into protein design. That said, the execution is strong: the experiments are thorough, the amino acid distribution analysis shows the model learns sensible patterns, and the framework appears general enough to extend to other properties. The comprehensive evaluation is strong.
See summary
See summary
No questions. |
Moderately AI-edited |
|
Property-Driven Protein Inverse Folding with Multi-Objective Preference Alignment |
Soundness: 2: fair
Presentation: 3: good
Contribution: 2: fair
Rating: 4: marginally below the acceptance threshold
Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. |
The paper addresses the challenge that protein inverse folding models must balance designability (recovering a backbone) with developability properties (e.g., solubility, thermostability). The authors propose ProtAlign, a multi-objective preference alignment framework that fine-tunes pretrained models using a semi-online Direct Preference Optimization (DPO) strategy. The method uses a flexible preference margin to mitigate conflicts between competing objectives and constructs preference pairs using in silico property predictors. Applying this to ProteinMPNN yields MoMPNN. Experiments on CATH 4.3 crystal structures , de novo backbones , and binder design scenarios show that MoMPNN enhances developability properties without compromising structural fidelity compared to baselines.
This method improves developability metrics using a preference alignment framework , which does not require additional specific, curated datasets of experimentally-validated proteins.
The authors evaluate MoMPNN on a strong set of tasks beyond standard sequence recovery. This includes redesigning CATH 4.3 crystal structures, designing sequences for de novo generated backbones, and a practical de novo binder design scenario. This rigorous evaluation demonstrates the method's utility in realistic design workflows where other baselines show performance degradation.
It would be better to report the metrics on ground truth sequences, as these metrics are based on prediction models as approximations.
Full names of abbr.’s in tables are missing in the captions.
The temperatures used in inference of different baselines are not identical, resulting in potentially unfair comparison. A fair comparison would be either the greedy strategy (without temperature), or comparing the best point on the temperature-performance curves between different methods; or at least report the results under one identical temperature.
is it a typo in eq 4? k appears in the formula of m, which seems irrelevant to k.
Explanation for the relationship between L and L_MO is needed.
Why is the AAR of ProteinMPNN on CATH 4.3 test 0.39, which seems lower than most of the reproduction of this model, e.g., 0.44 on CATH 4.3 was reported in ProteinInvBench? If this AAR is not correct, does it indicate a significant compromise of AAR?
RL-based preference methods like ProteinDPO for inverse folding are discussed in the related work section. Why are they compared as baselines? They are supposed to be the most related baselines.
Regarding the semi-online training strategy, is the preference dataset $\mathcal{D}_k$ at iteration t cumulative (containing all rollouts from iterations $1 \dots t$), or is it replaced entirely by the new rollouts?
The paper provides a compelling comparison against a "Weighted-score DPO" baseline in Appendix A.2, showing MoMPNN is more stable. Can the authors provide more intuition on why the flexible margin (Eq. 4 ) achieves better and more stable multi-objective optimization compared to directly optimizing on a weighted sum of preference scores?
The model is trained on protein monomers but evaluated on a de novo binder design task, which involves protein complexes. Did the authors observe any specific failure modes or performance issues at the binder-target interface, given that the model was not explicitly trained for complex-specific properties? |
Fully human-written |