ICLR 2026 - Reviews

SubmissionsReviews

Reviews

Summary Statistics

EditLens Prediction Count Avg Rating Avg Confidence Avg Length (chars)
Fully AI-generated 0 (0%) N/A N/A N/A
Heavily AI-edited 0 (0%) N/A N/A N/A
Moderately AI-edited 0 (0%) N/A N/A N/A
Lightly AI-edited 0 (0%) N/A N/A N/A
Fully human-written 4 (100%) 5.50 2.75 1947
Total 4 (100%) 5.50 2.75 1947
Title Ratings Review Text EditLens Prediction
Counterfactual Structural Causal Bandits Soundness: 3: good Presentation: 2: fair Contribution: 3: good Rating: 6: marginally above the acceptance threshold Confidence: 2: You are willing to defend your assessment, but it is quite likely that you did not understand the central parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. The paper extends the structural causal bandit framework by introducing **Counterfactual Structural Causal Bandits (CTF-SCB)**, wherein actions corresponds to realizable counterfactual regimes. It defines minimal counterfactual action sets (CTF-MIS), and further refines them by identifying those that are *possibly-optimal* (CTF-POMIS). Building on this, the authors present an enumeration algorithm with complexity $\mathcal{O}(n^2\cdot 2^{|E|})$ that systematically constructs a representative CTF-POMIS set suitable for standard bandit solvers . They prove that restricting exploration to this set preserves optimality and can reduce regret. Across synthetic tasks using Thompson Sampling and KL-UCB, the method consistently achieves lower cumulative regret than baselines that explore either larger counterfactual spaces (CTF-MIS) or purely lower-level action spaces (POMIS). In Markovian graphs, the procedure collapses to intervening on the parents of the outcome node, yielding no additional benefit from counterfactuals. - Addresses an interesting problem and introduces a novel framework for realizable counterfactual interventions within causal bandits. - Provides a clear and coherent motivation, positioning the extension of counterfactual reasoning to bandit settings as a natural and meaningful conceptual advance. - Offers a potentially valuable theoretical foundation for subsequent research that may benefit from richer intervention classes in sequential decision-making. - The exposition presupposes substantial familiarity with the CTF-calculus (Correa & Bareinboim, 2025) and related work (e.g., Correa et al., 2021). Consequently, several statements would benefit from further elaboration. - The paper is quite dense, and the notation is not intuitive, which makes it hard to read. - Although the substantial improvement over the super-exponential naive verification, the proposed algorithm remains exponential in the number of edges, raising questions about applicability. - At present, the manuscript does not address finite-time regret guarantees; a short discussion would be beneficial. 1. Could you please clarify the difference between $\boldsymbol{Pa}_V$ and $\boldsymbol{pa}_V$? If you are using the convention: capital letter -> variable and lowercase -> realization, where does the randomness come from in the $Pa$ operator, given a variable $V$? 2. What does $X_{\boldsymbol{w}}$ (line 102-103) refer to? You haven't defined before a rv $X$ with subscript $\boldsymbol{w}$ 3. Could you please clarify how $\mathbb{E} N_T(\boldsymbol{X}_*)$ arises in eq. 1? Is an expectation missing? 4. Does $An(Y_x, X_*)$ mean $An(\\{Y_x, X_*\\})=An(Y_x) \cup An(X_*)$? Could you please clarify it in the paper? 5. It would be interesting if the authors can formally discuss (or better derive) finite-time regret guarantees that quantify the benefit of pruning to the representative CTF-POMIS set. Suggestions: - move the sentence "We use kinship notation for variable relationships..." (line 105) above mentioning Pa (line 94-95). Also the font used is different. - end of line 100-101: $\boldsymbol{X}_{1[\boldsymbol{w}_1]}$ should not be bold. - typo line 282: "... are cannot..". Fully human-written
Counterfactual Structural Causal Bandits Soundness: 4: excellent Presentation: 4: excellent Contribution: 4: excellent Rating: 8: accept, good paper Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Summary - SCB but also taking into account counterfactuals from L3 - Logical and crucial extension of theoretical base - well executed My review is short since I am familiar with the previous SCB work and can see this is an obvious and clear next step of extensions. It is written and present very clear, thorough, yet accessible, with the right amount of empirical experimental evidence. - Very thorough, clear and precise presentation of - CTF MIS, CTF POMIS - Their algorithms - Their theorems The paper is very well written and makes a clear and strong contribution, hence I only have one point on writing style: - This reads a bit clunky with a rather long subclause” When X⋆∗ lies in L≤2 (i.e., ∆L≤2 = 0)—a special case and does not undermine our theoretical results, since the deployed agent can never be certain prior to interaction whether 471 the optimal arm lies in in L≤2—the smaller action space allows POMIS to converge faster than the 472 others.” - Maybe rephrase as: “a special scale that does not undermine” - In Fig 8, Task 3, left: is POMIS about to cross over CTFMIS TS at 100k trials? Fully human-written
Counterfactual Structural Causal Bandits Soundness: 3: good Presentation: 2: fair Contribution: 2: fair Rating: 4: marginally below the acceptance threshold Confidence: 2: You are willing to defend your assessment, but it is quite likely that you did not understand the central parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. This paper introduces a variant of the causal bandit setting in which the agent has more power: they can also perform certain "counterfactual" interventions in which a variable $X$ is set to value $x$ as seen by one child, but to $x'$ as seen by another. The topic is theoretically interesting. I have the impression that the theory is sound. - I am not convinced of the significance of this contribution. My impression is that counterfactual actions as used here are only possible in practice under special circumstances. See also my first question below. - The paper is very dense in technical material. Additionally, it builds closely on very recent work; familiarity with that work is necessary to build intuition about the present work. This makes the paper hard to review in a reasonable amount of time. - To what extent can counterfactual actions be modelled by defining a new graph which explicitly adds counterfactual mediators, and performing ordinary interventions on it? - line 78-79 (3rd contribution): what does it mean that suboptimal interventions are "clearly" removed? - line 100/101: I initially didn't understand what you meant by "when the variables are indexed". Now I think you mean: when the main variable already has a subscript, the counterfactual subscript is put between brackets for visual distinction. Could you confirm? - In Proposition 1, what does it mean for a counterfactual to "consist of" an action space? ##### Comments - The limitations section is in the supplement and is not referenced from the main paper. - line 105: "correlated" should be "dependent" (only the same for Gaussians) ##### Textual - line 31: "were" -> "was" ("were" is subjunctive mood, but this is factual) - line 125: "behave**s**" - Definition 3: "no ~~an~~other" - line 282: "are cannot be" - several places: "interventional bo~~a~~rder" Fully human-written
Counterfactual Structural Causal Bandits Soundness: 3: good Presentation: 3: good Contribution: 2: fair Rating: 4: marginally below the acceptance threshold Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. This paper extends the structural causal bandit framework to a counterfactual scenario, where interventions are mixed counterfactuals. In this setup, the action space is defined to satisfy ancestral consistency. Leveraging the existing result on the possibly-optimal minimal intervention set (POMIS), the paper developed a method to search for the POMISs in this counterfactual setup. - The problem formulation is novel and relevant to the field. - I skimmed through the theoretical results and found them sound. - The experimental section effectively demonstrates the merits of the algorithm. In particular, it is helpful to see comparisons when the optimal action lies in $\mathcal{L}_{\leq 2}$. - The challenge of the problem is unclear to me, as the method for finding POMISs is already available. - Could you explain the challenge in algorithm design? By checking Figure 7, Algorithm 1 seems to be an application of Lee and Bareinboim (2018) (algorithm to find POMISs). - The counterfactual framework in the paper is somewhat confusing. Standard counterfactual inference typically requires observed data to constrain the exogenous variables, and then uses these constraints to reason about what would happen under a hypothetical intervention. Simply replacing the hard interventions in Lee and Bareinboim (2018) with counterfactual distributions does not appear to constitute a substantial contribution. Moreover, the mixed counterfactuals considered here could, in principle, be handled by embedding $W$ into a multi-world SCM. - In Page 2 "(e.g. $X_{1,[w_1]}$)", the X should not be boldface. Fully human-written
PreviousPage 1 of 1 (4 total rows)Next