ICLR 2026 - Reviews

SubmissionsReviews

Reviews

Summary Statistics

EditLens Prediction Count Avg Rating Avg Confidence Avg Length (chars)
Fully AI-generated 1 (25%) 6.00 2.00 2149
Heavily AI-edited 0 (0%) N/A N/A N/A
Moderately AI-edited 0 (0%) N/A N/A N/A
Lightly AI-edited 2 (50%) 5.00 3.00 1764
Fully human-written 1 (25%) 6.00 4.00 4571
Total 4 (100%) 5.50 3.00 2562
Title Ratings Review Text EditLens Prediction
Overlap-Adaptive Regularization for Conditional Average Treatment Effect Estimation Soundness: 2: fair Presentation: 4: excellent Contribution: 3: good Rating: 6: marginally above the acceptance threshold Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. The paper proposes and studies methods for regularizing causal meta-learners so that, in a particular sense, the strength of the regularization explicitly adapts to the degree of overlap. More precisely, the regularization is stronger in regions of the covariate space in which there is limited overlap (i.e., the propensity score is close to zero or one). Some existing regularization methods like dropout have been shown to implicitly adapt to overlap in certain settings (Wager, Wang, and Liang, 2013), however the present work appears to be the first paper in the context of causal meta-learners to propose explicitly tying the degree of regularization to the degree of overlap differentially over the covariate space. The authors consider existing regularization methods namely: dropout, noising of the regressors, and kernel ridge regularization. They propose overlap-adaptive versions of these techniques. In order to regularize adaptively, the degree of overlap, as measured by the conditional variance of the treatment, must be estimated in a first stage. This conditional variance function constitutes a high-dimensional nuisance parameter and the authors suggest a first-order bias correction so that the corrected objective function is Neyman Orthogonal to this new nuisance parameter in addition to the other nuisance parameters. The authors examine the performance of their methods relative to benchmarks on a range of synthetic and semi-synthetic datasets. The adaptive approach leads to significant improvements on some of these benchmarks and generally depends on a) the degree of regularization, b) which regularization method is used, and c) which meta-learning objective is regularized. The most consistent improvement appears in the context of the HC-MNIST data. At least in my reading, no one particular version of their approach (i.e., dropout vs. noising, choice of base meta-learner, debiased vs. not debiased) seems to dominate the others. I found the paper to be clear and well-explained. I find the proposed methods intuitively very reasonable, and the debiasing of the adaptively regularized objective makes the problem quite non-trivial. The discussion of the interpretation of the regularization methods in linear models seemed to me to be quite helpful. The empirical analysis of the performance of the methods is quite thorough. While the method leads to significant improvements over baselines in some cases, the results are somewhat mixed. This is likely inevitable because, as the authors note, adaptively regularizing is only likely to be effective when the CATE is smooth in around regions with limited overlap. Nonetheless, given the mixed results, empirical practitioners may find it rather difficult to choose whether or not to regularize adaptively and additionally, which adaptive method to use. I wonder if there could be some more guidance on this front? Figure 2 seems to suggest that performance is monotonically improving with the degree of regularization almost uniformly for all methods. The same is true for the results in Table 3. For the synthetic data this is not surprising given that the true CATE is constant (equal to zero) so one would ideally one would regularize as much as possible. In order to make the comparison meaningful, the authors compare methods for a given average regularization strength, but I do not see why this is the correct comparison to make: in practice practitioners would presumably choose the regularization parameters by a data driven method like cross-validation, and may get very different parameters for the different methods. Personally, I think the constant CATE in the simulated data makes it unsuitable for comparing the methods, and for the HC-MNIST data, it might be helpful to see how performance compares when the regularization parameters are chosen empirically or to compare performance under the best choice of regularization parameter for a given method (which would require a comparison for small enough regularization parameters such that performance is no longer monotonically improving. More a suggestion than a question, but I wonder whether there may be some way to get the best of both worlds. That is to allow for some compromise between constant and adaptive regularization. For example, suppose we let $\lambda(v):= (4v)^{-\gamma}-\gamma$ where $\gamma\geq0$ is an additional regularization parameter to be selected by the researcher. Then in the limiting case of $\gamma= 0$ we recover constant regularization and with $\gamma=1$ you recover $\lambda_n(v)$. Fully human-written
Overlap-Adaptive Regularization for Conditional Average Treatment Effect Estimation Soundness: 2: fair Presentation: 2: fair Contribution: 2: fair Rating: 6: marginally above the acceptance threshold Confidence: 2: You are willing to defend your assessment, but it is quite likely that you did not understand the central parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. This paper addresses the challenge of conditional average treatment effect (CATE) estimation under low-overlap conditions. The authors propose Overlap-Adaptive Regularization (OAR), which adjusts the regularization strength according to estimated overlap weights. The paper also introduces a debiased version that preserves Neyman-orthogonality. 1. The paper addresses a well-known and practically important issue in CATE estimation—poor performance under low overlap conditions. 2. Introducing overlap-adaptive regularization is a straightforward and intuitive idea that appears novel in the context of CATE estimation. 3. The proposed OAR framework can be applied to various existing meta-learners (DR-, R-, and IVW-learners) and accommodates both parametric and non-parametric models. 4. The inclusion of a debiased version that preserves Neyman-orthogonality enhances theoretical soundness and robustness to nuisance estimation errors. 1. The method’s performance is sensitive to the accuracy of estimated overlap weights, which may be unstable in practice. 2. Theoretical analysis focuses mainly on orthogonality preservation; further insights into convergence or generalization behavior under varying overlap levels could add depth. 3. Adaptive regularization requires computing overlap-dependent weights for each sample, which may increase overhead for large-scale or high-dimensional settings. 4. A structural issue is that Section 3 contains material that could be moved to the appendix, while several important technical details and proofs are placed in the appendix instead of the main text, which may hinder readability and comprehension. 1. How sensitive is OAR/dOAR to the choice of the regularization function? Are there guidelines for choosing one over the others in practice? 2. Since the method relies on estimated overlap weights, how robust is the approach to mis-specification or noise in the propensity score? Does the debiased version fully mitigate this sensitivity? 3. How does per-instance adaptive regularization scale with large sample sizes or high-dimensional covariates, especially in neural network settings? Fully AI-generated
Overlap-Adaptive Regularization for Conditional Average Treatment Effect Estimation Soundness: 3: good Presentation: 2: fair Contribution: 2: fair Rating: 4: marginally below the acceptance threshold Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. This paper introduces an overlap-adaptive regularization approach for estimating Conditional Average Treatment Effects (CATE). The key idea is to adjust the regularization strength based on the degree of overlap: regions with strong overlap receive lower regularization, while regions with weak overlap are penalized more heavily to mitigate instability. Additionally, the authors propose a debiased version of the estimator to address bias arising from estimation errors in propensity scores. The idea of overlap-adaptive regularization appears to be a reasonable strategy for CATE estimation. The authors introduce two practical implementation techniques—noise injection and dropout—to facilitate ease of application in real-world settings. They provide theoretical analysis under a linear model framework. Numerical experiments are conducted to illustrate the performance of the proposed estimator. 1. The theoretical analysis provided in the paper appears inadequate. The results for the linear model can be considered preliminary, as the simplicity of the linear setting limits the relevance of regularization—making it less critical in practice. It would be more valuable to establish theoretical guarantees for nonparametric models, which are more representative of real-world applications. Furthermore, the numerical experiments primarily involve neural networks and kernel-based methods. While Appendix C presents theoretical results for RKHS, these should ideally be included in the main paper to highlight their importance. Moreover, Appendix C primarily focuses on deriving the closed-form solution for the RKHS estimator, without providing any discussion of the statistical properties of the final estimator. This omission limits the theoretical contribution and leaves readers uncertain about the estimator’s theoretical improvement over the traditional CR RKHS estimator. 2. The simulation results are based on only 10 runs, which raises concerns. Moreover, the improvement of the proposed overlap-adaptive regularization (OAR) over CR does not appear to be statistically significant when accounting for the reported standard errors. Similarly, for the synthetic data example. I am curious about the different performance of the noise-injection and dropout methods shown in Figure 2. These two regularization strategies appear to behave quite differently under constant regularization. Specifically, for noise injection, the conventional regularization (CR) performs relatively well, whereas for dropout, CR performs poorly. Lightly AI-edited
Overlap-Adaptive Regularization for Conditional Average Treatment Effect Estimation Soundness: 3: good Presentation: 2: fair Contribution: 3: good Rating: 6: marginally above the acceptance threshold Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. This paper proposes overlap-adaptive regularizers for the second stage of meta-learners, gives parametric instantiations plus and RKHS variant, adds a debiased version. - The concept is interesting, addresses a gap in overlapping issue in causal inference. - Definitions and the family of $\lambda$ functions are precise. - Provide some solid theory for parametric part. RKHS part is useful but assumes strong overlap. - The paper says the noise/dropout (implicit) and explicit forms are “equivalent,” but the exact equivalence is proved only for linear target classes and generally holds as a first-order approximation for NNs. This is stated too broadly. - Proposition 6 requires that $P(\epsilon < \pi(X) < 1-\epsilon)=1$, which makes the explicit solution neat but excludes the core part the paper aims to help (for regions with very low overlap). - Several IFs scale like $1/\nu(x)^k$ can blow up when it's approaching 0. Please discuss the integrability/robustness. See Weaknesses. Lightly AI-edited
PreviousPage 1 of 1 (4 total rows)Next