|
Circuits, Features, and Heuristics in Molecular Transformers |
Soundness: 2: fair
Presentation: 2: fair
Contribution: 2: fair
Rating: 4: marginally below the acceptance threshold
Confidence: 2: You are willing to defend your assessment, but it is quite likely that you did not understand the central parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. |
This paper conducts a mechanistic interpretability study of autoregressive molecular transformers. It identifies attention head circuits responsible for SMILES syntactic correctness, discovers a linear residual feature encoding valence capacity that influences bond order prediction, and uses sparse autoencoders to extract chemically meaningful features. The authors further validate these insights through downstream molecular property prediction tasks, showing that SAE-derived features provide competitive or complementary predictive performance.
1. The work focuses on internal mechanisms related to SMILES syntax handling (e.g., ring closure and branch matching), which is directly relevant to the problem of generating syntactically valid molecules. This aligns with concerns in molecular generation research, where validity constraints are important.
2. The identification of a linearly readable representation related to valence capacity in the residual stream is interesting and may offer conceptual hints for improving structural consistency in generative models.
3. The use of sparse autoencoders to decompose intermediate representations into more interpretable latent features is an exploratory direction that could be valuable for connecting molecular language models to chemically meaningful concepts.
1. Limited connection to practical improvements in molecular generation
Although the paper reveals mechanisms for syntax and valence representation, it is not yet clear how these findings could be used to improve molecular generative performance (e.g., validity, novelty, or property-aware design). The interpretability results currently feel more diagnostic than actionable.
2. Interpretability of SAE-derived features varies considerably
While some features appear to align with recognizable chemical motifs (e.g., urea groups), a substantial portion remains difficult to interpret or validate. The paper also acknowledges this, which makes it unclear how reliably these features can guide model understanding or downstream applications.
3. Downstream evaluation suggests modest and inconsistent performance improvements
On some TDC tasks, SAE features are competitive or complementary with ECFP, but overall improvements are relatively small and task-dependent. It is difficult to determine whether the observed performance differences justify the additional complexity introduced by SAE analysis.
4. Comparisons do not include more expressive molecular representation methods
The experimental evaluation compares primarily against ECFP and the raw LM embeddings. However, recent advances in graph neural networks and 3D-aware models have shown strong performance in molecular property prediction. Without comparison to these stronger baselines, it is hard to contextualize the practical value of the proposed interpretability findings.
5. Causality interpretation remains preliminary
The valence budgeting direction in the residual space is evaluated via linear probing and steering, which provides suggestive but not conclusive causal evidence. The paper itself is careful about this, but it still means the main chemical reasoning claims remain partly correlational.
- Do SAE features provide measurable benefit beyond ECFP or learned embeddings when used in larger-scale or more challenging prediction settings?
It would help to better understand whether the value of SAE features extends beyond interpretability.
- Can the authors provide any quantitative measure or estimate of how many SAE features are consistently interpretable across runs?
This would help clarify how robust and generalizable the feature discovery pipeline is. |
Fully AI-generated |
|
Circuits, Features, and Heuristics in Molecular Transformers |
Soundness: 2: fair
Presentation: 3: good
Contribution: 2: fair
Rating: 4: marginally below the acceptance threshold
Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. |
The work investigates the internal representations learned by Transformer models when applied to molecular data (SMILES representation). By analyzing attention patterns and feature extraction, the authors aim to uncover how these models capture chemical properties and relationships. The study utilizes sparse autoencoders to offer fresh insights into the interpretability of Transformer architectures in the context of cheminformatics.
I am in the borderline region for this paper. The interpretability perspective is fresh for small molecules, yet, the discovered insights provide limited actionable information for the design of better architectures or for improving downstream performance.
1. A fresh perspective on chemical language models is presented using sparse auto-encoders. Interpretable is always interesting in molecular sciences.
2. A broad and in-depth analysis is conducted, and syntactic and semantic patterns are identified.
1. The analysis is interesting, but it is unclear how these insights can be used to design better architectures or improve downstream performance. The practical implications of the findings are limited.
2. Similar analyses on syntactic patterns were conducted previously as well, in the first molecular transformer paper (MolGPT; pubs.acs.org/doi/10.1021/acs.jcim.1c00600).
3. The patterns are extracted only using a single transformer, for unconditional molecule design and property prediction. It would be interesting to study the same patterns across different models and semantic patterns while generating task-specific molecules, e.g., for bioactive molecule designs.
1. Can you use the discovered insights to edit transformers? For instance, can the learned circuits be modified to improve performance?
2. Can the authors elaborate on the implementation details of the trained transformer? What is the training data and model configuration? Is the model trained with or without SMILES augmentation?
3. How would SMILES augmentation affect the learned patterns? |
Lightly AI-edited |
|
Circuits, Features, and Heuristics in Molecular Transformers |
Soundness: 2: fair
Presentation: 4: excellent
Contribution: 3: good
Rating: 8: accept, good paper
Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. |
This study explores how to interpret what Chemical Language Models (CLMs) trained on SMIELS are actually learning. Then, using Sparse AutoEncoders (SAEs) they extract higher level features that are useful for different predictive task.
1. The paper provides a novel perspective on mechanistic interpretability of CLMs focusing on syntactic rules like: ring opening and closing or valence budgeting.
2. Although the use of SAE for interpretability in LMs is not new, the approach here proposed leveraging SAE with SMAER patterns is novel to the best of my knowledge.
3. The analysis of the SAE features is fair and the limitations are properly acknowledged, and it is interesting that there is still need for human expert selection; despite the role of the automatic pipeline in identifying promising candidate features.
4. The experimental validation in the TDC ADMET predictive tasks provide important contextualization for the benefits, or more accurately, the informativeness of the SAE features. The results with SAE alone are convincing when compare to the LM or LM-PCA approaches.
1. Statistical robustness: the analysis of causal impact in sections 3.1 and 3.2 lack an appropriate statistical analysis. I'm not entirely sure what statistical test would be the most appropriate, but considering the number of implicit hypotheses that are being tested, I think that it is important to ensure that the results are not spurious. Similarly, Table 1 and Figure 2, should contain dispersion metrics (or true confidence intervals) calculated through different samples to show the uncertainty of the results.
None |
Fully human-written |
|
Circuits, Features, and Heuristics in Molecular Transformers |
Soundness: 3: good
Presentation: 3: good
Contribution: 2: fair
Rating: 4: marginally below the acceptance threshold
Confidence: 5: You are absolutely certain about your assessment. You are very familiar with the related work and checked the math/other details carefully. |
The focus of the paper is a mechanistic analysis of autoregressive transformer trained on drug-like molecules that elucidates computational structure behind chemical inference.
The paper addressed an important task of understanding how deep learning architectures mechanistically solve inference tasks in a way that is congruent with the scientific structure of the domain and can be assessed by subject matter experts.
Application of sparse autoencoder clearly helps with feature engineering for pharmacological tasks.
The authors are unreasonably generous with the term "chemical reasoning". Neither the model that they chose nor analysis that they performed elevate to reasoning level - only good old correlations.
it's not obvious what exactly one gains from the mechanistic analysis performed on SMILES. SMILES have peculiar syntax - any model that handle SMILES has to be able to deal with it. We know that transformers can deal with SMILES - what has changed in our understanding once we learned which head tracks the positions of opening and closing parentheses? it would be interesting to see analysis of this type on some non-trivial abstractions, but parsing syntax of the primary representation, such as SMILES, is not enlightening.
If "chemical reasoning" comes down to capturing SMILES syntax, it is not a meaningful bar to cross for ICLR paper in 2025.
Please unpack the phrase: "Understanding how these models encode chemical knowledge enables more targeted interventions, better failure diagnosis, and principled approaches to model improvement". None of the "enabled" items here immediately follows from the study. |
Fully human-written |