ICLR 2026 - Reviews

SubmissionsReviews

Reviews

Summary Statistics

EditLens Prediction Count Avg Rating Avg Confidence Avg Length (chars)
Fully AI-generated 0 (0%) N/A N/A N/A
Heavily AI-edited 0 (0%) N/A N/A N/A
Moderately AI-edited 0 (0%) N/A N/A N/A
Lightly AI-edited 1 (25%) 6.00 4.00 3660
Fully human-written 3 (75%) 4.67 3.67 4249
Total 4 (100%) 5.00 3.75 4102
Title Ratings Review Text EditLens Prediction
What is Missing? Explaining Neurons Activated by Absent Concepts Soundness: 3: good Presentation: 3: good Contribution: 3: good Rating: 4: marginally below the acceptance threshold Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. The paper argues that the common XAI approach assumes that neurons activate for current concepts and largely ignores neurons whose activation increases when a concept is absent. The authors claim that they formalize the so-called encoded absence causality, explain how and why commonly used attribution and feature visualization methods miss the point, and propose simple extensions. e.g., not-target attribution and the so-called feature visualization by minimization to reveal relationships. They include experimentation results on a couple of datasets, including ImageNet. In this reviewer's opinion, the draft motivates both methods and explicitly positions them as responses to a gap in current XAI evaluation protocols. In the authors' opinion, currently used protocols assess explanations only with respect to the given input, thus failing to evaluate concept absences. 1. In this reviewer's opinion, the clear, causal definition of “encoded absence” and framing of absences as first-class explanatory objects, not merely negative saliency, represent a contribution as it reframes minimization and multi-class saliency as tools for surfacing inhibitory evidence rather than merely as technical variants. 2. I should also state that I found the "mechanistic demonstrations" to be useful rather than cosmetic. For example, in Figure 3, we see that a Hassenstein–Reichardt-style toy is implemented, where the first output unit encodes the absence of right-to-left motion; established target attributions and max-patch visualization fail there, while the proposed non-target attribution and minimization succeed. The green-pixel toy then shows a learned unit with positive weights to red/blue and negative to green, cleanly matching the “encoded absence” construction. 3. The draft self-acknowledges scope limits, such as axis-aligned neuron assumption, CNN focus, scalability of non-target attribution, and deferred LLM/ViT generality. I appreciate the authors' candour and their choice not to oversell the contribution. 1. In this reviewer's opinion, there is a "mental" leap. The connection between negative evidence to encoded absence is not fully causal. Non-target negative attributions on out-of-class inputs conflate many forms of anti-evidence. The draft paper sketches a causal definition (see definition 2.1, referenced around the non-target discussion). However, the experiments rarely isolate just the hypothesized concept while holding all else constant. The authors may want to explore the issue further. Controlled counterfactuals may be useful in that regard. Just one suggestion, although other options may be more useful. 2. The ImageNet “least-activating patch insertion” result risks a distribution shift. Corners and copy-paste borders can depress activations for the wrong reasons. In my view, copying and pasting “least-activating” patches into corners could introduce artifacts, such as edges and texture breaks. The strong suppression may partly be from low-level off-manifold cues rather than true concept absence encoding. It's not clear in the current draft. 3. Unless I am missing something, the papers note that common protocols only evaluate with respect to pixels present in x. However, the experiment section does not introduce a general absence-aware metric beyond patch suppression. See commentary on weaknesses. Fully human-written
What is Missing? Explaining Neurons Activated by Absent Concepts Soundness: 3: good Presentation: 3: good Contribution: 3: good Rating: 6: marginally above the acceptance threshold Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. The paper investigates the concept of encoded absences in neural networks within the context of explainable AI (xAI). It introduces adaptations of existing xAI methods to visualize image structures that suppress specific classes or features. In the experiments, the authors demonstrate the approach using two toy examples and evaluate it on an image classification task with VGG-19 and ResNet-50 models. They further illustrate its applicability in a debiasing context. - The paper is well written and presents the concept of **encoded absences** in a logical and structured manner, which helps the reader understand its significance. - The experimental design is well-thought-out: the authors begin with conceptual and toy examples, then progress to image classification and finally to a debiasing task. This gradual development effectively builds understanding. - The **debiasing task** is particularly valuable, and the authors propose a **promising** approach to addressing this important problem. - Similarity to other xAI methods: While the paper introduces a novel analytical perspective, some of its ideas resemble concepts explored in existing explainable AI methods. For example, the notion of **negative features**—or features contributing against a prediction—has been addressed in approaches such as **Shapley values** [1]. Furthermore, the idea of leveraging the **entire dataset** to obtain a global understanding of both positive and negative influences aligns with the concepts of **prototypes and criticisms** introduced by Kim et al [2]. (2016). I do see the value in incorporating information from **other classes** to explain the **absence of features** in the target class; however, I feel this idea could potentially be **explored using existing techniques**. - Experiment's motivation: The purpose of the experiment in **Section 5.3**, particularly the use of a **minimization patch**, is not entirely clear. It seems expected that such a patch would decrease the activationas you specifically chose to include a patch that minimizes activation. Was this measured at the **last layer**? Additionally, it would be interesting to test the opposite case: what happens if you insert a highly activating patch (e.g., replacing the Border Collie’s snout with that of a Leonberger)? - Experimet’s methodology: The examples in **Figure 6** appear to have been selected a monosemantic subset of patches based on **visual similarity**, which is inherently qualitative. Was there any **systematic human evaluation** to support this categorization? Moreover, it could be insightful to **classify these patches individually** to examine whether they carry recognizable semantic meaning for the network—this could make the conclusions more robust. - Discussion: Regarding **Figure 6**, were the channels shown important only for this specific class? If so, how does the **polymorphism of channels** (i.e., channels being important for multiple classes) affect the analysis? Clarifying this would strengthen the interpretation. [1] Lundberg, S. M., & Lee, S.-I. (2017). *A Unified Approach to Interpreting Model Predictions*. Advances in Neural Information Processing Systems (NeurIPS). [2] Kim, B., Khanna, R., & Koyejo, O. (2016). *Examples are not Enough, Learn to Criticize! Criticism for Interpretability*. Advances in Neural Information Processing Systems (NeurIPS). - The discussion on the limitations of existing explanation methods appears throughout the paper. To improve readability and structure, it may be more effective to consolidate these points into the state-of-the-art (SOTA) section instead. - I include some questions above. Lightly AI-edited
What is Missing? Explaining Neurons Activated by Absent Concepts Soundness: 1: poor Presentation: 2: fair Contribution: 1: poor Rating: 2: reject Confidence: 5: You are absolutely certain about your assessment. You are very familiar with the related work and checked the math/other details carefully. This paper studies the phenomenon of encoded absences, which occurs when deep neural networks detect or use the absence of a given concept during inference. The paper first highlights some interpretability frameworks and explains why those frameworks do not or cannot capture encoded absence in their explanations. Then, it proposes two extensions for feature attribution and feature visualization aiming to capture this phenomenon. The feature visualization extension consists of an adaptation of the activation maximization method, where minimization is used in place of maximization. For feature attributions, the authors propose a “Non-target” attribution, where the attribution of the original sample is compared against the attribution of a (or multiple) sample of a different class but with respect to the target class. The paper presents applications in image classification and an extension to the debiasing case, where including absence information in the loss function improves model performance. - With some exceptions about missing details (see below), the paper is well written - The application of encoded absence to debiasing techniques seem a promising and interesting future direction - **Novelty, Significance, and Contribution**: Both the concept of encoded absence and the extensions proposed by the authors have already been studied in the literature. For the extensions, as the authors noted, the algorithms are unchanged. Therefore, there is not enough contribution in the paper to support publication. More details are provided below: - At the conceptual level, the concept of **encoded absence in activations and neurons has already been extensively studied**. The authors attempt to distinguish their approach from counterfactuals by stating that *“our approach is applied on a neuron level allowing for a mechanistic understanding”* However, encoded absence has already been examined in the context of neuron explanations. For example, previous work (Mu et al [1]) explicitly includes the NOT operator in neuron explanations, and compositional explanations reflect the absence of certain concepts. The class of compositional explanations has been widely explored and extended in the literature. The same is true for the property of *“whose absence causes high activations, or vice versa, whose presence strongly suppresses the activation of a specific internal”* These properties have been demonstrated in several works (e.g., [2, 3]), so sections 2.1 and 2.2 do not represent novelty at the conceptual level. Note that the lack of acknowledge of this area of work induced authors to **several overclaims** over the text about proofs related to this phenomenon. - Both the **proposed extensions have also been previously introduced and widely studied in the literature**. The minimization objective for feature visualization is a well-known method, **as even acknowledged by the authors**. The same applies to feature attribution, where similar mechanisms have been used and proposed before. The authors argue that *”while these methods share the same underlying algorithms, their intent and interpretive framing differ fundamentally”*. From the reviewer’s perspective, this is not enough to substantiate a contribution. In fact, the proposed “interpretative framing” is tied to the concept of absence, which is itself not novel, as previously described. - To the best of my knowledge, the only novel application appears in the debiasing experiments (but I could be wrong). However, the paper’s narrative revolves around neurons and the concept of absence in general, which is not novel. The debiasing section is not presented as a main contribution and comprises only a small part of the experiments. Based on this analysis, this is the paper’s main issue, which cannot be resolved without a complete change of narrative. Other concerns are related to the following areas: - **Missing details and structure**: Several details are missing in the main text (and the appendix does not clarify them). For example, how are non-target attributions computed at the mathematical level? Is it just a visual comparison of the same sample but with attributions to different classes (as seems to be the case from the code), or is there a mathematical process to select specific samples from other classes for the explanations? How are these samples selected? Is it just a visual combination over the full dataset? - **Overgeneralized statements**: There are several broad statements such as *“standard XAI methods fail to explain encoded absence”* that are vague and inaccurate. The authors discuss a few methods in section 4, highlighting their weaknesses with respect to encoded absence. However, the list is not exhaustive, and from a scientific perspective, listing all the methods that do not address a phenomenon is not informative. It would have been more useful to discuss methods similar to (or approaching) the proposed approaches and the differences among them. The current list omits several relevant works and includes some odd citations. For example, self-explainable DNNs are included but not relevant in later sections or experiments, and some relevant references (e.g., [4], which addresses reasoning like “this does not look like that,” and supports encoded absence) are excluded. This also applies to work on neuronal explanations mentioned earlier. Listing just one or two examples per class is not sufficiently informative and cannot justify claims such as “standard XAI methods”. [1] Mu, Jesse, and Jacob Andreas. "Compositional explanations of neurons." Advances in Neural Information Processing Systems 33 (2020): 17153-17163. [2] La Rosa, Biagio, Leilani Gilpin, and Roberto Capobianco. "Towards a fuller understanding of neurons with clustered compositional explanations." Advances in Neural Information Processing Systems 36 (2023): 70333-70354. [3] Oikarinen, Tuomas, and Tsui-Wei Weng. "Linear explanations for individual neurons." Proceedings of the 41st International Conference on Machine Learning. 2024. [4] G. Singh and K. -C. Yow, "These do not Look Like Those: An Interpretable Deep Learning Model for Image Recognition," in IEEE Access, vol. 9, pp. 41482-41493, 2021, doi: 10.1109/ACCESS.2021.3064838. see weaknesses. Fully human-written
What is Missing? Explaining Neurons Activated by Absent Concepts Soundness: 3: good Presentation: 4: excellent Contribution: 3: good Rating: 8: accept, good paper Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. The authors propose modest modifications to existing interpretability and visualization methods in order to enable explainability from encoded absences. Specifically, they propose non-target attributions (extending from target attributions) and feature visualization through minimization (extending from maximization) via patches. They provide an illustrative toy example for each extension as well as a more realistic performance comparison (where previous XAI methods can’t capture encoded absences) on a real image dataset. They also show how these enhanced understandings of interpretability which include encoded absences enable improved debiasing of image classification models, with another realistic example. Kudos to the authors for a well-written, compact, and conceptually clean paper that executes on a simple, but good, idea. The paper is easy to follow, which is important for adoption of the methodological extensions in various applications. It is about as thorough in explaining and demonstrating the utility of the proposed extensions as I can expect a 9 page paper to be, from providing carefully designed, illustrative toy examples to showing performance on more realistic image datasets. The debiasing example on real images is compelling. The Appendix helps the work feel complete, particularly the documentation of the experiments. The figures greatly help the reader understand the core contribution of the paper. Fig 4, Fig 5. - appreciated the notes to “zoom in”. Since the (two) updates to existing methods feel somewhat modest, my biggest concern is whether the scope of contribution is large enough for a main paper. However, the simplicity is also appealing – it’s a good idea! The Intro overpromises application in other fields (you name drop “biological neural networks”) but the paper doesn’t illustrate a general use case beyond images. Furthermore, the Appendix section C Broader Impacts doesn’t address what domains you would imagine your proposed methods to be useful in. I think you can be more honest about the scope illustrated in the paper while also suggesting specific problems and use cases for your methods in other domains. Questions Do you expect your proposed methods to have a use beyond images? Can you make this clearer in the Intro/Discussion/C Broader Impacts by specifically stating fields and problems where the absence of features can be important? Having a recommendation for where to apply these methodological extensions next would be great. Minor comments not affecting score, suggestions to the authors In the introduction, so many key words are italicized that it’s hard to know which are most important - would reconsider. Italicizing the same heavily used word (e.g. presence/present) over and over does not add value or clarity. Just once, the first time, is sufficient. The rest of the paper seems to italicize appropriately. Additionally, italicizing the word “biological” felt misleading since the paper didn’t contain a good biological example other than the older, cited Drosophila anecdote. 299-300 missing an “or”? Fully human-written
PreviousPage 1 of 1 (4 total rows)Next