|
Nearly Space-Optimal Graph and Hypergraph Sparsification in Insertion-Only Data Streams |
Soundness: 2: fair
Presentation: 1: poor
Contribution: 2: fair
Rating: 2: reject
Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. |
This work proposes an improved algorithm for both cut and spectral sparsification of hypergraphs in the insertion-only data stream model. In particular, they remove the $\log m$ factors that can grow up to $n$ for hypergraphs of arbitrarily high rank. They achieve this by proposing an online weight assignment following the strategy of Jambulapati et al. (2023), and improve the computational cost of the sampling process by leveraging the online row sampling idea of Cohen et al. 2020. They also employ their techniques to obtain better bounds (in terms of number of edges stored, or overall space complexity) for other settings, such as online, sliding window model, and for robust hypergraph sparsification. They also demonstrate the performance of their algorithm through empirical evaluations.
The paper improves the bound for the insertion-only streaming model by a multiplicative factor that can grow up to $n$, the number of vertices.
They also propose better bounds for several other models, including resolving an open question of Soma et al. (2024) in the online hypergraph case.
The paper is structured poorly and difficult to read. The main results are not presented clearly. The comparison with other works (Appendix A, Fig. 3) should have been presented in the main paper.
The absence of a technical overview or a clear presentation of the main technical challenges and proposed solutions makes it difficult to comprehend the core ideas of the paper.
The comparison with Khanna et al. 2025a seems a bit unfair, given that their algorithm is for the dynamic setting, although this has been addressed in the concurrent and independent work section.
The improvements demonstrated in the experimental section, even in the limited resources setting, do not seem significant. Although this is not a major issue, given the theoretical nature of the work.
1. What are the technical challenges and novel approaches involved in the work? Possibly provide a technical overview.
2. The sliding window model result (Theorem 1.6) seems to have higher space complexity than the streaming model, which is a weaker model. Am I missing something here?
Remark/Typos:
3. Fix the “additive $\log n$ factor” to “additional (multiplicative) $\log n$ factor”
4. It seems the reference in Line 119 should be Khanna et al. 2025a. |
Fully human-written |
|
Nearly Space-Optimal Graph and Hypergraph Sparsification in Insertion-Only Data Streams |
Soundness: 3: good
Presentation: 3: good
Contribution: 3: good
Rating: 8: accept, good paper
Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. |
The paper considers the problem of graph and hypergraph sparsification in a streaming setting. The input is a (hyper)graph where the edges arrive online in a stream, and the goal of the algorithm is to maintain a spectral sparsifier of the graph with few edges. For graph sparsification, they obtain an algorithm that gives an $(1+\varepsilon)$ sparsifier with high probability in $n$ storing only $n/\varepsilon^2\cdot poly(\log\log(n))$ edges with $poly(n)$ update time (Theorem 1.1). This improves by several logarithmic factors over what one can obtain using algorithms for offline spectral sparsification combined with the merge-and-reduce framework and is in fact optimal within $\log\log n$ factors. Additionally, they provide an algorithm for streaming min-cut which stores $n/\varepsilon\cdot poly(\log(n),1/\varepsilon)$ edges and returns a $(1+\varepsilon)$ approximation to the min-cut improving by a $\log n$ factor on previous work. I don't know how impressive this improvement is, since it's not quite clear to me on page 2, how big a polynomial dependence on $\log n$ past work (Ding et al., 2024) incurred.
Next, the paper considers algorithms for hypergraph sparsification. The number of edges needed to store in past work had factors of either $\log m$, where $m$ is the number of edges, or $r$, where $r$ is the rank of the hypergraph. Both of these are unpleasant, since $m$ could be exponential in $n$ and $r$ could be as large as $n$. In contrast, the present paper provides a streaming algorithm that only stores $n/\varepsilon^2 poly(\log n,\log r\log\log m)$, removing these unpleasant dependencies.
The paper provides a related algorithm in the online setting where edges have to be either included or not irrevocably upon arrival.
Finally, the paper provides related results in the setting where the edge arrivals are adversarial and can be based on past decisions of the algorithm, and results in the well-motivated sliding window model, where the algorithm only cares about maintaining a sparsifier of the graph defined by the edge arrivals in a past window of time in the past, and all updates that occurred before the window, should be disregarded.
The techniques seem to be quite sophisticated modifications of past work that aim to sample edges according to effective resistances in the final graph (which turns out to be the right thing to do when constructing spectral sparsifiers). I am quite unfamiliar with these techniques and got quite lost in the matrix algebra, so while the paper seems to be mathematically precisely written, I can't vouch for correctness.
Spectral sparsification of graphs and hypergraphs are important problems from both a theoretical and practical perspective and studying these problems in streaming and online settings seem quite natural. It is impressive that the paper gets improved algorithms for so many of these problems. Especially Theorem 1.1 (nearly tight bounds) and 1.3 (removing dependencies on $\log m$ and $r$) seem nice. The paper provides experiments that demonstrate that a version of their algorithm performs quite favorably in on real and synthetic data sets.
The paper might be difficult to read as the techniques are quite technical. It was somewhat difficult for me to assess the novelty of the ideas due to this technical opacity. It is however quite possible that people stronger in linear algebra and spectral methods find it more understandable.
Another weakness is that the polynomial update time in $n$ might be a little disappointing, but I don't think this is a big issue.
What is the update time of your algorithm? I'm aware that it's $poly(n)$, but do you have a bound on the degree of the polynomial? Do previous works also have polynomial update time?
Is it always the case that you can construct a spectral sparsifier by sampling edges, or do you also need to reweight them? I was a little unsure about this, and it might be useful to the reader to discuss this. |
Fully human-written |
|
Nearly Space-Optimal Graph and Hypergraph Sparsification in Insertion-Only Data Streams |
Soundness: 2: fair
Presentation: 2: fair
Contribution: 2: fair
Rating: 6: marginally above the acceptance threshold
Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. |
The paper shows that in insertion-only streams one can build $(1\pm\varepsilon)$ spectral sparsifiers at essentially offline sample complexity: for graphs it stores $\tilde O(n/\varepsilon^{2})$ edges with $\mathrm{poly}(n)$ update time, and for hypergraphs it stores $O\big(n/\varepsilon^{2}\cdot \log n\cdot \mathrm{poly}(\log r,\log\log m)\big)$ hyperedges; it also gives a $(1\pm\varepsilon)$ min-cut estimator using $O\big(n/\varepsilon\cdot \mathrm{polylog}(n,1/\varepsilon)\big)$ space. The key idea is an online, for-all sparsification scheme combined with merge-and-reduce coresets, and the framework extends to sliding-window and adversarially robust settings.
The problem is important, and the paper delivers near–space-optimal $(1\pm\varepsilon)$ sparsification in insertion-only streams for graphs and hypergraphs, shaving prior $\log n$ factors and extending to min-cut, adversarially robust, and sliding-window settings—useful breadth for streaming theory and practice.
The problem is important, but the results feel incremental—they mainly tighten log factors and integrate known techniques, with the novelty lying more in careful refinement and synthesis than in new methods. Experiments are light: few/small datasets, no hypergraph or min-cut evaluations, no sliding-window/adversarial tests, limited comparisons, and few metrics ( no throughput or memory-per-update reporting).
See weaknesses. |
Heavily AI-edited |