|
Differentially Private Two-Stage Gradient Descent for Instrumental Variable Regression |
Soundness: 3: good
Presentation: 3: good
Contribution: 3: good
Rating: 4: marginally below the acceptance threshold
Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. |
This paper studies differentially private instrumental variables (IV) regression and proposes **DP-2S-GD**, a two-stage gradient-descent algorithm that enforces Renyi differential privacy through clipping and Gaussian noise injection (in a usual DP-GD style but with two different sets of parameters). The method provides end-to-end privacy guarantees for the classical two-stage least squares (2SLS) setting for IVaR. The authors prove a non-asymptotic convergence bound that decomposes optimization error, sampling error, and noise due to privacy. They also characterize a privacy-iteration trade-off and show how privacy budgets affect convergence behavior. Experiments on synthetic data and the Angrist–Evans dataset support the theoretical predictions, demonstrating that the estimator remains stable and competitive under useful privacy levels.
The paper fills a clear gap by proposing a differentially private version of instrumental variable regression. The formulation is simple and technically consistent, combining a two-stage gradient procedure with zCDP-based privacy guarantees. The analysis provides non-asymptotic convergence bounds for $\beta$ (the regression parameter of interest) that decompose statistical, optimization, and privacy errors, making the results partially interpretable. The privacy proof is correct and applies zCDP effectively to handle iterative updates. The experiments, though limited, support the theoretical claims and clearly show the predicted privacy–accuracy and overfitting behavior (from Figure 2). The writing is clear, notation consistent, and the argument proceeds logically from setup to results.
**Untuned step-sizes and incomplete convergence analysis (major weakness)**
The paper provides only upper bounds for the step sizes \(\eta\) and \(\alpha\) ). These conditions ensure convergence in principle but leave the question of the optimal step-sizes open. As a result, the convergence theorem lacks interpretability: there are too many definitions, and not all of them depend on just problem dependent parameters. This limitation gives the impression that the analysis has not been fully worked out. A sharper result would express convergence explicitly in terms of well-defined problem constants and privacy noise scales. Since this is the main contribution of the paper, and the algorithm itself is not very surprising, I am inclined to reject the paper in its current form. I would like to see a fully worked out, non-asymptotic convergence guarantee before I can recommed acceptance. If it is hard to tune all the parameters for a high probability guarantee, can the authors atleast provide one for an in expectation result?
**Limited baselines**
The experiments lack comparison with simple private 2SLS variants, such as perturbing sufficient statistics or applying output perturbation (which were indeed discussed in the paper while providing a survey of related methods). Including such baselines would help quantify the benefit of the proposed algorithm beyond intuitive reasoning.
**Non-private performance gap**
Even without privacy, the estimator remains slower than classical 2SLS, introducing an additional factor in the rate. The paper acknowledges this but does not explain what causes this.
1. Can the authors report the privacy levels used in all experiments in terms of the standard $(\epsilon, \delta)$-DP, in addition to $\rho$-zCDP, so that readers can directly compare privacy strengths with prior work?
2. Can the step sizes be tuned or derived in a way that yields an explicit non-asymptotic convergence guarantee depending only on problem parameters (e.g., Lipschitz constants or curvature), rather than through broad stability bounds?
3. Would it be possible to consider a stochastic variant of the algorithm, where each optimization step uses a new sample so that $T$ and $n$ are coupled? This might lead to a simpler and more interpretable result, since the trade-offs between optimization and generalization would coincide. It could also remove the “overfitting” behavior seen in Figure 2, where the error decreases and then rises as $T$ grows. This is not a major issue, but it could better bring other aspects such as the utility-privacy trade-off on the centerstage.
4. The motivation for the IVaR setup remains somewhat unclear. Can the authors give concrete examples of real situations where one can actually observe or construct the full dataset $(z, x, y)$ required for the two-stage regression? In settings where the true causal graph is unknown, how can one justify the validity of Assumptions 1 and 2? Some discussion or examples of practical data sources would help clarify this premise. |
Lightly AI-edited |
|
Differentially Private Two-Stage Gradient Descent for Instrumental Variable Regression |
Soundness: 3: good
Presentation: 3: good
Contribution: 3: good
Rating: 6: marginally above the acceptance threshold
Confidence: 2: You are willing to defend your assessment, but it is quite likely that you did not understand the central parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. |
The paper proposes a differentially private estimator for (a specific formulation of) instrumental-variable regression. The paper suggests noisy gradient descent, in which the key parameters $\Theta$ (a matrix) and $\beta$ (a vector) are updated at each round using a noisy gradient measurement. THe paper gives natural conditions under which this algorithm provably converges (and also provides bounds on its error in estimating the vector $\beta$).
The algorithm's convergence is also evaluated empirically on synthetic and real data.
* The paper addresses a widely-used statistical method
* The paper appears technically sound. Though I did not check the proofs carefully, the authors present and discuss their results clearly, and the results and proofs seem plausible.
* The final bounds were hard to interpret, partly because they necessarily involves quite a few different parameters. It would be helpful to provide some baselines, both in the form of nonprivate methods, and in the form of alternate approaches to designing DP algorithms. For example, a few of the questions I couldn't answer as a reader were:
* What level of noise is acceptable? That is, how are the results of DP IVaR supposed to be used, and when does the noisy version meet the needs of application? Even if the answer is context dependent, providing one context in which the overall usefulness of the answer can be evaluated would be helpful.
* How does the proposed method compare to a more straightforward perturbation of the sufficient statistics for the model (presumably the covariance matrix of the concatenation $(z,x,y)$).
* When is DP IVaR more useful than DP OLS on its own? As far as I know, IVaR and OLS are normally used in tandem. However, presumably the noise levels have to be low enough to for them to be meaningfully compared. When is that possible? In what parameter ranges do the results of the paper imply the validity of such a comparison?
* It would be good to compare the bounds in this paper to those implied by other work on DP nonconvex optimization.
* Privacy notions: Remark 3.8 states that, when $\Theta^{(t)}$ is not released, "setting $\rho=\infty$ implies that no noise $\Xi^{(t)}$ needs to be injected in the first state, and we can simply return $\{\beta^{(t)}\}$ under privacy budget $\rho_2$." I don't see why that's true---the hidden $\\Theta^{(t)}$ values encode information about the sensitive data and propagate it throughout the computation. At the very least the statement deserves a more careful formulation and explanation. (Also, why would $\beta$ be useful without $\Theta$? The latter seems necessary to interpret the former as a predictor of $y$)
* What are the relevant facts about the standard 2S-LS estimator for this paper? It is mentionedand compared to the 2S-GD estimator in Remark 3.7, but the algorithm and its convergence rate are not precisely described, which makes the comparison hard to follow. |
Fully human-written |
|
Differentially Private Two-Stage Gradient Descent for Instrumental Variable Regression |
Soundness: 4: excellent
Presentation: 3: good
Contribution: 3: good
Rating: 8: accept, good paper
Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. |
This submission presents a differentially private algorithm for instrumental variable regression, a central tool in causal inference. The standard non-private algorithm for this task involves solving two least-squares problems. The algorithm presented here is a two-stage version of DP gradient descent. They give a theoretical analysis of the algorithm's convergence (under the assumption that the data arise from a well-specified model with subgaussian noise). For appropriate hyperparameter settings, as we add get more samples the error from privacy becomes lower than the error from sampling.
They also present experiments on synthetic and real data, and the method seems quite practical.
Instrumental variable regression is an important task and extremely worthy of study under privacy. While there is a lot of work on private ordinary least squares, I'm surprised there's none on this topic.
The method presented here is a clear contribution, both theoretically and practically. I was unable to verify the proofs, but the results are very plausible.
The main convergence proof is quite long and, to my taste, comes with insufficient exposition. It was hard for me to understand which parts of the analysis are new.
There are no approaches that explicitly tackle DP IVaR, but there are well-known techniques that directly yield results. In particular, the "subsample and aggregate" framework can usually make use of any non-private estimator with meaningful concentration guarantees. For the task here, combining the Algorithm 5.1 of "FriendlyCore" [1] with the submission's Lemma D.7 yields a zCDP estimator with $\ell_2$ error roughly $\sqrt{pq/n}$, I think. This matches the non-private error and, in some regimes, might be better than Theorem 3.1's $\sqrt{p}q^{3/2}/n$ term.
[1] https://arxiv.org/abs/2110.10132
Can you formally combine Lemma D.7 with the FriendlyCore approach and compare the resulting theorem with your Theorem 3.1?
Can you briefly describe what (if any) new technical challenges your main convergence proof must overcome?
Can you fix the following minor issues with your background discussion?
- $(\varepsilon,\delta)$-DP algorithms satisfy basic composition, where the privacy terms accumulate linearly, but also *advanced* composition [see 2]. Here the asymptotics match those of zCDP, but the latter is usually more practical.
- You mention that gradient-perturbation methods can avoid spectrum-based blowups, but (i) private first-order methods still suffer on ill-conditioned data [3,4] and (ii) there are techniques for privately estimating $X^T X$ that avoid these blow-ups [5,6], although when used within a sufficient-statistics approach for regression they may incur higher sample complexity than necessary [see 7 for discussion].
[2] https://dpcourse.github.io/2025-spring/lecnotes-web/DP-S25-notes-lec-10-adv-composition.pdf
[3] https://arxiv.org/abs/2207.04686
[4] https://arxiv.org/abs/2301.13273
[5] https://arxiv.org/abs/1805.00216
[6] https://arxiv.org/abs/2301.12250
[7] https://arxiv.org/abs/2404.15409 |
Fully human-written |
|
Differentially Private Two-Stage Gradient Descent for Instrumental Variable Regression |
Soundness: 3: good
Presentation: 3: good
Contribution: 3: good
Rating: 4: marginally below the acceptance threshold
Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. |
This paper addresses the problem of instrumental variable (IV) regression under differential privacy constraints. The authors propose a noisy two-stage gradient-descent algorithm that injects calibrated Gaussian noise and uses per-sample gradient clipping to obtain $\rho$-zero-concentrated differential privacy (zCDP). They also provide a formal privacy accountant and derive finite-sample convergence bounds that explicitly characterize the trade-offs among privacy budgets $(\rho_1,\rho_2)$, sample size $n$, problem dimensions $p,q$, and the number of iterations $T$. The authors validate the theory with synthetic simulations and a real-data application.
1. This paper addresses a compelling problem that is likely to interest the statistics and econometric community, particularly those focused on IV regression and privacy.
2. This paper gives a clean privacy accountant and a detailed non-asymptotic utility bound.
3. This paper is well written and clearly structured.
1. The method and proofs are limited to the linear IV model. It may be more interesting to consider nonlinear IV or machine learning-based IV methods.
2. Even under the non-private setting, the error rate has an additional $\sqrt{p}$ factor compared to that of the z2SLS estimator. This worsened dimension dependence could be limiting in moderate-to-high-dimensional problems.
3. The necessity of privacy protection in IV method should be more clarified, it is more likely a combination of two methods in the present version.
1. Do you see a path to extend your framework and privacy analysis to nonlinear IV methods? What are the main technical obstacles?
2. Can you provide more intuition about the source of the extra $\sqrt{p}$ factor compared to 2SLS, and whether an improved analysis or algorithmic modification could remove or reduce this gap?
3. In the utility analysis, it seems that the authors do not explicitly incorporate gradient clipping into the iterative update equations. The clipping thresholds only appear in the gradient sensitivity, which determines the noise scale. Could the authors explicitly include the gradient clipping step in the iteration and provide a corresponding convergence analysis? Furthermore, how should practitioners choose the clipping thresholds in practice, and how sensitive is the empirical performance to these choices?
4. How should the step sizes be selected in practice? Have the authors conducted any sensitivity analysis to evaluate how the choice of step sizes affects the empirical performance and convergence?
5. The utility analysis requires a condition on $T$ to keep noise scales manageable. In practice, does early stopping based on a private validation objective suffice? Could you provide an algorithmic prescription to choose $T$ that balances convergence and privacy noise?
6. In the real data analysis, both the endogenous dimension $p$ and the instrumental dimension $q$ are equal to 1. Are there any real datasets with higher-dimensional instruments or regressors where the proposed method could be demonstrated? It would be interesting to see how the algorithm performs in such multi-dimensional settings. |
Fully human-written |