ICLR 2026 - Reviews

SubmissionsReviews

Reviews

Summary Statistics

EditLens Prediction Count Avg Rating Avg Confidence Avg Length (chars)
Fully AI-generated 0 (0%) N/A N/A N/A
Heavily AI-edited 2 (50%) 5.00 2.50 2224
Moderately AI-edited 1 (25%) 4.00 3.00 3153
Lightly AI-edited 1 (25%) 2.00 4.00 4575
Fully human-written 0 (0%) N/A N/A N/A
Total 4 (100%) 4.00 3.00 3044
Title Ratings Review Text EditLens Prediction
Usage-Aware Sentiment Representations in Large Language Models Soundness: 1: poor Presentation: 1: poor Contribution: 2: fair Rating: 2: reject Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. In this paper, the authors investigate how sentiment is represented in Large Language Models (LLMs). They argue that a single, universal sentiment "direction" is insufficient and propose a "usage-aware" framework that incorporates linguistic factors like tone, topic, context, and genre. They train linear probes on both a pooled dataset and usage-specific datasets, showing that combining these probes leads to marginally improved sentiment classification performance. They also analyze sentiment at the neuron level, identifying "usage-invariant" and "usage-sensitive" neurons. Finally, they demonstrate that these derived axes can be used for activation steering, and that the component of a usage-axis orthogonal to the main sentiment axis can modulate stylistic elements without altering the core sentiment. See below ## Overall Rating I recommend rejecting this paper. I found the paper very difficult to follow, with key methodological details unclear. Based on my current understanding, the core results are either marginal performance gains that are not particularly surprising, or novel claims that lack rigorous evidence. I am open to the possibility that I have misunderstood key aspects of the work and would welcome clarification from the authors, but in its current state, I cannot recommend acceptance. ## Major Comments *The following are things that, if adequately addressed, may change my score.* 1. **Methodological Clarity:** The paper is difficult to parse due to a lack of clarity on crucial methodological points, making it harder to interpret the results. * **"Usage-Specific" Datasets:** The concept of a "usage-specific" dataset is core to the paper but is not clearly defined until the appendices (line 675). The main text should precisely explain how these datasets are constructed and how they make a particular usage factor salient for a sentiment classification task. My initial assumption was that a "tone" probe would be trained on data of a single tone, or to predict tone itself. Instead, it appears a single probe is trained on a collection of texts with varied tones, with the hope that tone is the most salient differentiating factor. * **"Main+Sub" Combination:** The paper's central results rely on a "Main+Sub" axis combination (e.g., Table 1, line 217), but this operation is never defined. Assuming it is vector addition, this is not a standard or principled method for combining probe directions, and its meaning is unclear. The paper should define this operation and justify its use before any conclusions can be drawn from it. 2. **Limited Significance of Classification Results:** The main quantitative result—that "Main+Sub" axes slightly outperform the "Main" axis alone—is not particularly surprising. I do not find this result surprising. Supervised methods like linear probing generally perform better when the training data is more similar to the evaluation data. By creating more specialized probes and combining them, it is expected that they would better cover the nuances of diverse evaluation sets, leading to marginal gains. 3. **Confounded Geometric Analysis:** The authors claim that because the "main axis lies within the usage subspace" (lines 413-417), sentiment is "largely shaped by usage-conditioned variation." This conclusion is built on a confounded experimental design. The "main" probe was trained on the union of the datasets used for the usage-specific probes. It is therefore tautological that the resulting "main" vector would be well-approximated by the span of the usage vectors. Even if it were constructed with separate datasets, this observation does not rule out the simpler hypothesis that there is a single primary sentiment direction and the usage-specific probes are merely noisy measurements of it. 4. **Qualitative Steering Evidence is Unconvincing**: The idea that steering with the orthogonal component of a usage axis can alter style without changing sentiment (Table 5, line 388) is potentially interesting. However, the qualitative examples provided are not compelling enough to serve as strong evidence. The effect seems subtle, and it is difficult to judge its consistency and magnitude from a few examples. The authors do not state whether these examples were cherry-picked or selected randomly. To be compelling, this claim requires a rigorous, large-scale evaluation, for example using an LLM judge to rate sentiment and stylistic attributes across a randomized set of prompts, to demonstrate a statistically significant and meaningful effect. See above Lightly AI-edited
Usage-Aware Sentiment Representations in Large Language Models Soundness: 2: fair Presentation: 2: fair Contribution: 2: fair Rating: 6: marginally above the acceptance threshold Confidence: 2: You are willing to defend your assessment, but it is quite likely that you did not understand the central parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. This paper introduces a linguistically grounded framework for modeling sentiment representations in LLMs through usage-aware axes. By decomposing sentiment into usage-specific factors and analyzing both axis- and neuron-level behaviors, the study enhances interpretability and reliability. 1. The core idea grounds sentiment variability in explicit linguistic usage factors, offering a highly interpretable framework for LLM sentiment analysis, superior to distributional methods that sacrifice linguistic meaning. 2. The usage-aware axes achieved a substantial average improvement in cross-domain sentiment classification accuracy, demonstrating superior robustness and transferability. 1. The core dataset is synthetically generated by ChatGPT-4.0, which risks introducing model-induced biases and may fail to capture the subtle linguistic nuances of natural human-generated text. 2. The analysis is restricted to four predefined usage factors (tone, topic, context, genre), neglecting other key influences on sentiment variability, such as sarcasm or cultural context. Please see the Weaknesses. Heavily AI-edited
Usage-Aware Sentiment Representations in Large Language Models Soundness: 3: good Presentation: 2: fair Contribution: 2: fair Rating: 4: marginally below the acceptance threshold Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. # Summary This paper proposes a Usage-Aware affective representation framework to address the instability of probed sentiment directions across datasets, which harms downstream reliability. The authors attribute this variability to linguistic usage factors—such as tone, topic, context, and genre—and introduce two complementary analyses: - Axis Level: A usage-invariant axis is derived as the intersection of sentiment axes from aggregated and usage-specific data, capturing core affective signals. - Neuron Level: Neurons are classified as usage-invariant or usage-sensitive for fine-grained interpretation. # Strengths 1. The paper explicitly identifies the reliability issue of probe-derived sentiment axes in cross-dataset applications and innovatively links this problem to linguistic "usage" factors, offering a theoretically grounded explanation for variability in representation learning. 2. The framework conducts complementary analyses at both the axis and neuron levels. The intersection operation at the axis level is the core contribution addressing cross-dataset consistency, while the distinction at the neuron level provides a fine-grained understanding of the model’s internal mechanisms. # Weaknesses 1. This is the biggest barrier to the practical deployment of the framework. The core of the method—computing the intersection between usage-specific axes and the invariant axis—relies heavily on fine-grained, high-quality labels of usage factors (e.g., tone, genre, topic) in the training data. In real-world scenarios, acquiring such detailed, multi-dimensional annotations is extremely costly, difficult to label consistently, and prone to low inter-annotator agreement. This severely limits the method’s potential for real-world adoption and scalability. 2. The paper introduces four usage dimensions—genre, tone, context, and topic—but does not sufficiently justify their mutual exclusivity or independence. For example, tone and genre may be highly correlated (e.g., tweets often inherently carry an informal tone). Without theoretical or empirical evidence demonstrating that these dimensions can be treated as orthogonal, decomposing sentiment representations along them risks conflating interdependent factors, potentially inflating the perceived benefit of usage-specific axes. 3. The authors should analyze the distribution of usage factors across the benchmark datasets. It is possible that the observed performance gains stem not from the proposed method itself, but from alignment between the usage distributions in the training data and those in the evaluation benchmarks. 4. Although probes are trained at every layer, the paper only reports aggregated results and does not investigate which layers benefit most from usage-specific axes. Understanding how usage-aware sentiment signals evolve across model depth would significantly strengthen the interpretability of the representations and provide practical guidance for deployment (e.g., which layer to use for feature extraction). 5. typos: page 16, line 831, "We use instruction-tuned versions of three popular decoder-only LLMs." repeat. Refer to Weakness. Moderately AI-edited
Usage-Aware Sentiment Representations in Large Language Models Soundness: 3: good Presentation: 3: good Contribution: 2: fair Rating: 4: marginally below the acceptance threshold Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. This paper proposed a usage-aware sentiment representation framework for LLMs that grounds sentiment variability in linguistic usage factors such as tone, topic, context, and genre. The approach was motivated by two limitations of existing methods: the instability of sentiment directions extracted via linear probes, and the lack of interpretability in distributional representations such as Gaussian subspaces. The authors introduced a two-level analysis: (1) at the axis level, constructing both pooled and usage-specific sentiment directions to examine how linguistic usage influences representational reliability; and (2) at the neuron level, differentiating usage-invariant from usage-sensitive neurons to reveal finer-grained encoding patterns. Empirical results showed that usage-aware sentiment representations improve both classification accuracy and controllability of sentiment steering. The work addresses a feasible limitation of current sentiment representation extraction methods: linear probes lack reliability, while more complex methods lack interpretability. The authors showed that the natural variability and interpretability of sentiment lie in linguistic usage factors and designed a two-level framework to capture higher-quality sentiment representations, through sentiment-guided axis construction and neuron-level analysis. The writing is clear and logically structured, and the experimental design is coherent, complete, and solid. The application of fine-grained representation analysis to the sentiment domain appears original and contributes meaningfully to decoding factors that contribute to sentiment-based interpretability in LLMs. It is not particularly surprising for me that extracting separate linear probes for different usage-specific axes yields more specialized representations that perform better at classification and steering. This essentially demonstrates that representations separated by linguistic patterns possess greater predictive power for sentiment. This raises some concerns about conceptual novelty. While the framework provides a linguistically motivated decomposition of sentiment axes, it mainly repackages an established idea—linear disentanglement by conditioning on auxiliary attributes—into a sentiment-specific context. Prior studies on domain adaptation and representation disentanglement (e.g., Blitzer et al., 2007 (https://aclanthology.org/P07-1056.pdf); Hewitt & Manning, 2019 (https://aclanthology.org/N19-1419.pdf)) have already shown that task-specific subspaces can improve generalization. To strengthen the contribution, the authors may provide a clearer theoretical justification of what “usage-awareness” adds beyond such conditional probing and what do the results imply. 1. What do your results imply about the role of linguistic usage factors in sentiment interpretability? Do the axis- and neuron-level analyses jointly suggest that usage factors causally structure sentiment representations in LLMs, or mainly correlate with them? Adding such discussion could be beneficial. 2. Can you provide causal evidence that “usage-invariant” and “usage-sensitive” neurons drive predictions rather than merely correlate with them? Further analyses using targeted neuron ablations, activation patching, or causal mediation methods could strengthen the causal claims. Heavily AI-edited
PreviousPage 1 of 1 (4 total rows)Next