|
On Universality of Deep Equivariant Networks |
Soundness: 3: good
Presentation: 3: good
Contribution: 3: good
Rating: 8: accept, good paper
Confidence: 2: You are willing to defend your assessment, but it is quite likely that you did not understand the central parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. |
The authors show a separation-constrained universality theorem for invariant networks, and they extend the result to the case of equivariant networks by introducing the notion of entry-wise separability.
The authors theoretically investigate separation-constrained universality for invariant and equivariant settings. They introduce a notion called entry-wise separability and show a connection between the depth of the neural networks and the universality. The results are solid and the topic is interesting.
The notation should be consistent. For example, $h$ represents a natural number in some cases, a function in other cases, and an element in $G$ in other cases. Since this paper focuses on theory, the consistency of the notation improves the readability.
- In theorem 1, how large can $k$ become? In the standard neural network case, we can obtain arbitrary well-approximated function by increasing the number of the hidden layers. How does $k$ depend on the input dimension and $|G|$?
- Similarly, in Theorem 2, can we determine optimal value of $d$? Since $M$ depends on the functions $\phi^1,\ldots,\phi^k$, are there any relationships between $d$ and $k$? |
Fully human-written |
|
On Universality of Deep Equivariant Networks |
Soundness: 2: fair
Presentation: 2: fair
Contribution: 2: fair
Rating: 4: marginally below the acceptance threshold
Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. |
This paper addresses the theoretical foundations of universality in equivariant neural networks, a topic for which rigorous results remain limited. Existing universality theorems often rely on highly restrictive assumptions—such as regular or higher-order tensor representations that induce prohibitively large hidden dimensions, or architectures that are restricted to the invariant case. The authors develop a more general and unified theoretical framework that encompasses both invariant and equivariant networks. For invariant architectures, they prove a universality theorem under separation constraints, showing that the inclusion of a fully connected readout layer enables universal approximation within the class of separation-constrained continuous functions. For equivariant networks, where existing results are even rarer, the paper identifies a key limitation of standard separability assumptions and introduces the concept of entry-wise separability, a stronger and more suitable criterion. The authors prove that equivariant networks achieve universality within this regime, provided they have sufficient depth or appropriate readout layers. Together with earlier results showing the failure of universality for shallow models, this work highlights depth and readout structure as fundamental mechanisms for universality. The framework also unifies and extends prior specialized results, offering a clearer theoretical understanding of how equivariant architectures can approximate complex functions while maintaining symmetry constraints.
This paper tackles a compelling and important direction—building theory for symmetry-aware (equivariant/invariant) neural networks—and does so with clarity and breadth. The authors provide a unified, architecture-agnostic framework that cleanly separates the roles of separation, depth, and readout layers, resolving long-standing confusion about why models with identical separation power can differ in approximation power. The introduction of entry-wise separability is a crisp and genuinely new lens that both diagnoses the failure of standard separability to characterize equivariant universality and supplies the right notion under which strong theorems can be proved. The results are conceptually satisfying and practically meaningful: for invariants, a fully connected readout restores universality within the separation-constrained class; for equivariants, universality emerges either after finite depth stabilization or via specific output layers that act as equivariant analogues of readouts. These statements not only subsume and extend prior specialized results but also give a reassuring message for practice—maximal expressivity is reached at finite depth, avoiding the specter of unbounded “depth chasing.” The presentation is mathematically careful (clear definitions, clean statements, explicit assumptions) and the work advances the field by turning an intuitively appealing idea—that symmetry should guide learning—into a rigorous set of principles that explain when and how symmetry-respecting networks are universal.
While the paper offers a clean and unified treatment, its contribution risks being perceived as incremental given a substantial body of prior work on universality for symmetry-aware models (invariant and equivariant networks, WL-based expressivity, PointNet/graph settings, and regular-representation proofs). The central message—that depth and readout layers restore universality under separation constraints—resonates with, and in places refines, known intuitions; however, the practical novelty over existing universality theorems may feel limited without sharper separations or new, unexpected consequences. Methodologically, the results hinge on point-wise activations and permutation representations under finite groups, leaving unclear how far the theory extends to continuous/compact Lie groups, non-pointwise nonlinearities, or mixed representation stacks that are common in modern E(3)/SO(3) models. The proposed entry-wise separability is mathematically convenient but may be difficult to verify or estimate in practice, and its relationship to measurable statistics used by practitioners (e.g., tensor orders, message-passing depth, spectral gaps) is not yet operationalized. The universality guarantees are asymptotic and do not provide approximation rates, sample complexity, or robustness to model misspecification—key levers for assessing when added depth/readouts actually help. Moreover, the theory presumes or requires stabilization of separation with depth but offers limited guidance for identifying the stabilization threshold for concrete architectures. Finally, the paper is purely theoretical: without constructive recipes, empirical probes, or optimization analysis, it remains uncertain how the results translate into trainable, parameter-efficient designs under real-world constraints.
Here are reviewer-style questions that incorporate your points:
1. Generalization error and sample complexity. Your results are asymptotic in function approximation. Can the framework yield generalization bounds—e.g., via Rademacher or covering-number control of the neural spaces—expressed in terms of the entry-wise separation profile, the depth at which separation stabilizes, and the size/structure of the readout? In particular, can you derive sample complexity or margin-based bounds that certify when added depth/readout improves test error rather than only representation capacity?
2. Function spaces beyond continuity. Universality is stated over continuous functions. Can you extend the results—or obtain rates—for other spaces such as L^p, Hölder/Sobolev/Besov, BV, or Barron-type spaces? Is entry-wise separability compatible with norms/seminorms in these spaces so that one can state approximation theorems with explicit error decay under smoothness or spectral assumptions?
3. Connections to integral representations of shallow nets. Classical approximation theory links shallow networks to integral representations (ridgelet transform, Barron spaces, Radon-type decompositions). Is there a formal relationship between your entry-wise separability (or separation stabilization with depth) and the spectral measures that appear in ridgelet/Barron analyses? For instance, can depth plus an appropriate readout be interpreted as enforcing or approximating certain ridgelet spectra or moment constraints, and do your results recover known rate results in Barron regimes or suggest new ones for equivariant settings?
4. Operational criteria. Practically, how can one estimate or test entry-wise separability (and its stabilization depth) from finite data or from an architecture’s design (e.g., filter classes, tensor orders)? Are there computable proxies (e.g., spectral norms of intertwiners, orbit counts, WL-width analogs) that permit a-priori generalization guarantees or architecture selection?
5. Robustness and misspecification. Can your theory quantify how approximation/generalization degrades under model misspecification (e.g., when the true target violates the assumed separation relations slightly), or under noisy labels/inputs? Are there stability bounds for the entry-wise notion analogous to Lipschitz-type robustness?
6. From existence to construction. Your theorems are existential. Do they imply constructive schemes (e.g., explicit readout designs or depth schedules) that achieve provable approximation rates and generalization bounds in the above function spaces, perhaps by leveraging ridgelet-inspired initialization or kernel/NTK limits adapted to entry-wise separability? |
Fully AI-generated |
|
On Universality of Deep Equivariant Networks |
Soundness: 2: fair
Presentation: 2: fair
Contribution: 3: good
Rating: 4: marginally below the acceptance threshold
Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. |
This paper develops a general theory for universality in invariant and equivariant neural networks.
For invariant networks, the authors show that adding a fully connected readout layer ensures approximation of all continuous functions consistent with the network’s separation relation.
For equivariant networks, they argue that the usual notion of separation is too weak, introducing the new concept of *entry-wise separation*, which examines separability for each output coordinate.
They then prove two theorems: universality holds (i) once entry-wise separation stabilizes with depth, or (ii) when the network includes a width-1 convolutional readout.
Overall, the paper identifies **depth** and **readout layers** as the key factors controlling universality and provides a unified theoretical framework that connects and extends prior architecture-specific results. :contentReference[oaicite:0]{index=0}
- **Originality:** Entry-wise separation is a simple but fresh idea that fixes a real gap in how we reason about equivariant universality. The unifying view across invariant and equivariant settings is valuable.
- **Quality** The Example~3 counterexample is instructive and motivates the new definition well.
- **Clarity:** The big picture is clear: depth stabilizes separation, and readouts enable universal approximation within the correct class. The relation to prior work is well positioned.
- **Significance:** Results inform model design (when to use a readout layer and how deep to go). They also explain why some shallow models fail even if they have the same separation power.
- **Lack of assumption:** Theorem 1 (and also other theorems) fails, for example, if $\sigma = {\rm id}$.
- **Sparse definitions:** The permutation representation and Eq. (3) are only informally presented. Adding a clear algebraic and matrix-form definition would remove ambiguity.
- **Scope:** The analysis is limited to permutation representations and point-wise activations; potential extensions to other representations are only mentioned briefly.
- **No quantitative results:** The paper proves existence of universality but gives no approximation rates or bounds on the required depth threshold.
- Could the authors explicitly assume a non-linear activation in the readout MLP and restate Theorem 1 accordingly? Would Theorems 2 and 3 need similar clarification?
- Are there examples showing the exact depth at which entry-wise separation stabilizes for common groups (e.g., $S_n$)?
- Would entry-wise separation remain the correct notion if the representation were not a permutation one (e.g., vector or tensor field representations)? |
Moderately AI-edited |
|
On Universality of Deep Equivariant Networks |
Soundness: 3: good
Presentation: 3: good
Contribution: 2: fair
Rating: 4: marginally below the acceptance threshold
Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. |
This paper develops a unified theory of separation-constrained universality for deep invariant and equivariant neural networks built from permutation representations and point-wise activations.
* For invariant networks, adding a fully connected readout guarantees universality within the class of continuous functions that respect the model’s separation relation (Theorem 1).
* For equivariant networks, the authors show that standard separation is too coarse and introduce entry-wise separation; universality holds either once entry-wise separation stabilizes with depth (Theorem 2) or by appending a width-1 convolutional readout (Theorem 3). These results recover and generalize several prior architecture-specific theorems and identify depth and readout layers as the decisive mechanisms for universality.
* Originality: Introduces entry-wise separation and proves that it exactly characterizes universality classes for broad equivariant architectures---cleanly explaining why standard separation can fail.
* Quality: Precise definitions (layer spaces, neural spaces, universality classes), clear statement of Theorems 1--3, and complete proofs; careful reduction from equivariant to invariant projections via stabilizers.
* Clarity: Running examples (PointNet-style \(P\), convolution \(C\), invariant \(I\)) make abstract statements tangible; Example 3 vividly shows the need for entry-wise separation.
* Significance: Unifies and extends prior results (Maron et al., Ravanbakhsh, Segol & Lipman, Chen et al., Joshi et al.) and translates them into a common separation-constrained viewpoint with practical design levers (depth and readout).
* Scope limitations: Results are restricted to finite groups and permutation representations with point-wise activations; many modern models rely on continuous groups (e.g., \(E(n)\), \(SO(3)\)) and richer nonlinearities (gates, attention). The authors acknowledge this as future work, but a discussion of which parts of the proofs break (e.g., reconstruction via stabilizers, existence of enough invariant functionals) would sharpen the limits.
* Non-quantitative nature: The theorems are existential; no rates, width bounds, or explicit depth thresholds are provided (Theorem 2 uses a stabilization depth from prior work). Concrete bounds for common groups (e.g., \(S_n\), dihedral groups) and standard blocks would increase practical impact.
* Readout specificity: The readouts that guarantee universality are either fully connected (invariant) or width-1 convolution (equivariant). While natural, it would help to clarify how far this extends to other linear equivariant heads used in practice (e.g., steerable bases, attention-style heads), and whether the “identity-containing” condition on \(M\) can be weakened.
* Examples breadth: Example 3 focuses on width-1 convolutions; complementary examples for graph and manifold settings (beyond the WL context) could better demonstrate the generality of entry-wise separation.
- Can the authors outline how their proofs might be adapted to yield approximation rates or width/depth trade-offs, even under simplifying assumptions (e.g., Lipschitz target functions on compact domains)?
- Given a working equivariant model, how should one diagnose whether entry-wise separation has stabilized in practice (e.g., via synthetic distinguishability tests), and how does one choose a minimal readout ensuring universality while controlling parameter count? |
Fully AI-generated |