ICLR 2026 - Reviews

SubmissionsReviews

Reviews

Summary Statistics

EditLens Prediction Count Avg Rating Avg Confidence Avg Length (chars)
Fully AI-generated 3 (75%) 3.33 3.00 3781
Heavily AI-edited 0 (0%) N/A N/A N/A
Moderately AI-edited 1 (25%) 2.00 4.00 1266
Lightly AI-edited 0 (0%) N/A N/A N/A
Fully human-written 0 (0%) N/A N/A N/A
Total 4 (100%) 3.00 3.25 3152
Title Ratings Review Text EditLens Prediction
Automated Architecture Synthesis for Arbitrarily Structured Neural Networks Soundness: 2: fair Presentation: 1: poor Contribution: 2: fair Rating: 2: reject Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. This paper introduces a novel framework for designing directionless neural network architectures, inspired by the complex connectivity of biological neural systems. The authors propose a method to automatically learn arbitrary graph structures during training, organized into "Neural Modules" (NMs) to facilitate unrestricted communication between nodes. While the approach showcases out-of-the-box thinking and a refreshing departure from traditional tree-like or DAG structures, I have significant concerns about its theoretical grounding, computational feasibility, and experimental rigor. The paper’s theoretical claims (e.g., Theorems 3.1 and 3.2) feel oversold, and the backward gradient computation in non-DAG structures is unclear and potentially problematic. The computational overhead of the proposed approach is prohibitive, scaling poorly with large graphs due to its dependency on the number of edges. Additionally, the experimental evaluation is limited, ignoring state-of-the-art baselines like Graph Neural Networks (GNNs) and Transformers, which are standard for graph-structured data. The performance gains over traditional approaches appear negligible, and the paper is poorly written, lacking clarity in critical components. While the idea is innovative, the execution and validation fall short of expectations for a rigorous contribution. - The authors propose a truly novel perspective on neural network architectures, moving beyond traditional DAG structures to embrace directionless, biologically inspired graphs. This represents a bold and creative departure from conventional designs. - The source code is available, which is commendable for reproducibility and further research. - The theoretical foundation is shaky. Theorems 3.1 and 3.2 are oversold: one is a trivial application of the universal approximation theorem, and the other is a basic SGD formula. The authors need to rephrase or remove these claims to avoid misleading readers. - The backward gradient computation is unclear and problematic. The role of the operator $H_j$ is not well-defined and seems arbitrarily introduced. Traditional backpropagation relies on a notion of order (e.g., layer-wise gradients), which is lost in arbitrary or cyclic graphs. The authors do not address how gradients are computed in such structures, raising doubts about the soundness of the approach. - The proposed approach has a prohibitive computational complexity $O(N+E+s\cdot NM^2)$, which scales poorly with large graphs due to the dependency on the number of edges E. This makes it impractical for modern, large-scale NNs with billions of parameters and edges. - The experimental evaluation is limited and incomplete. The authors ignore state-of-the-art baselines like GNNs and Transformers, which are standard for graph-structured tasks (e.g., Facebook datasets). The performance gains over traditional approaches are negligible, undermining the claimed advantages. The evaluation lacks diversity in datasets and tasks, making it difficult to assess the method’s generalizability. - The paper is poorly written, with missing details and confusing explanations. Key components, such as the synchronization method and Neural Module formation, are difficult to follow. - The authors use Tarjan’s algorithm to identify strongly connected components, but they do not justify this choice or explore alternatives. It is unclear how different algorithms might affect performance or scalability. - Theorems 3.1 and 3.2 seem oversimplified and misleading. Could the authors rephrase or remove these claims to better reflect their actual contributions? - How does the backward gradient computation work in a non-DAG structure? Without a clear notion of order, how are gradients propagated, and how is convergence guaranteed? - In cyclic or fully connected graphs, establishing an order for backpropagation is non-trivial. How do the authors handle this, and what guarantees can they provide for gradient stability and correctness? - The complexity $O(N+E+s\cdot NM^2)$ suggests the approach does not scale to large graphs. How do the authors envision deploying this method in real-world, large-scale NNs? - Why did the authors choose Tarjan’s algorithm for identifying strongly connected components? Were alternative algorithms considered, and how might they impact performance or scalability? - The authors ignored state-of-the-art baselines like GNNs and Transformers. Why were these omitted, and how does the proposed method compare to them on graph-structured tasks (e.g., Facebook datasets)? - Could the proposed framework be integrated with existing architectures (e.g., GNNs, Transformers) to leverage their strengths while addressing the limitations of traditional DAG structures? - Are there specific applications or domains where this approach might be particularly effective, despite its current limitations? Fully AI-generated
Automated Architecture Synthesis for Arbitrarily Structured Neural Networks Soundness: 2: fair Presentation: 3: good Contribution: 2: fair Rating: 4: marginally below the acceptance threshold Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. The paper proposes a new approach which is different from layered, DAG-based networks. It starts from a complete directed graph whose edges are pruned on-the-fly, and solves the resulting cyclic graph as a system of non-linear equations during both forward and backward passes. The authors proposed the “Neural Modules” (strongly-connected components) together with a repulsion-based regulariser that keeps the modules small and balanced. Inference is therefore the numerical solution of a fixed-point system; training updates the coefficients of that system. The authors claim that this removes the structural bias of trees/DAGs, allows arbitrary connectivity, and can be parallelised module-wise. Experiments on four medium-scale tabular/graph tasks show lower error rates than DEQ, OptNet, DARTS or standard MLPs of comparable node budget, and a ~10× speed-up when modules are processed in parallel on GPU. (1) The authors answer with a fully-differentiable, cyclic-graph neural model whose forward pass is literally a Newton solver and whose topology is pruned on-line by a repulsion prior that encourages strongly-connected components to fragment into GPU-friendly micro-clusters. (2) The mathematical framing is good and clean: forward propagation = root-finding on a nonlinear system, backward propagation = solution of the dual linear system, training = gradient updates on the coefficients of that system. 1) I have a concern about the memory usage: the model stores the dense adjacency matrix and the dense Newton correction explicitly—an O(p²) memory footprint that already exhausts 40 GB of VRAM at p≈70K, i.e. three orders of magnitude smaller than a single transformer layer. Is it an efficient model compared to existing ones. 2) The proposed regulariser is a repulsion term on the raw edge weights; there is no spectral penalty, no curvature constraint, and no mechanism to prevent the graph from becoming a single strongly-connected component, at which point the promised parallel decomposition collapses and you are back to solving a monolithic 70K×70K linear system every mini-batch. 3) I don't deny the fact that the idea is novel, but given the scale of experiments, the submission remains an elegant thought experiment (with some initial attempts on limited datasets) rather than a credible path to the next generation of foundation models. 1) DEQ was designed for weight-tied infinite-depth models, whereas the proposed new work here uses unique edge weights; a fair comparison would be a finite-depth DEQ with learned layer-wise weights, which the authors never run. Do the authors agree? 2) The experimental canvas is restricted to four small tabular datasets whose input dimensionality is three orders of magnitude below the visual or linguistic entropy that modern architectures are expected to model; on such low-entropy data any sufficiently expressive inductive bias (including a cyclic graph) can look superior to a vanilla MLP, but the results may not be that good with curse-of-dimensionality that accompanies pixels or sub-words. Fully AI-generated
Automated Architecture Synthesis for Arbitrarily Structured Neural Networks Soundness: 2: fair Presentation: 2: fair Contribution: 2: fair Rating: 4: marginally below the acceptance threshold Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. This paper proposes a novel framework for automatically constructing neural network architectures with arbitrary graph structures. Instead of relying on predefined Directed Acyclic Graph (DAG) or tree-like topologies, the model begins with a complete graph and adaptively learns and optimizes the connectivity among nodes during training. The authors introduce a synchronous computation mechanism for both forward and backward propagation, as well as a Neural Module (NM) Regularization technique that organizes nodes into balanced subgraphs to enhance efficiency and generalization. 1. The paper challenges the traditional DAG-based view of neural architectures by proposing a general graph-based structure that can theoretically encompass existing designs as special cases. This idea has clear conceptual novelty and could inspire new directions in architectural design. 2. The inclusion of algorithmic pseudocode, complexity analysis, and proof sketches contributes to the overall completeness of the work. The NM regularization method is interesting, as it allows parallel processing. 3. Experiments across four datasets show performance improvements over traditional NN baselines and comparable methods. The framework also appears to enable more efficient computation via modular parallelization. 1. It remains unclear whether Neural Modules (NMs) are individual components within a larger structure or whether they represent the entire architecture. A clear, high-level topology diagram of the complete system is missing. Furthermore, it is unclear how the model handles different data modalities (e.g., images, graphs, sequences). 2. The experimental setup lacks sufficient detail. Baselines such as NN, DEQ, DAG, and OPTNET are referenced but not fully specified; key implementation parameters, dataset preprocessing steps, and hyperparameter settings are omitted. The evaluation protocol and metric definitions need clarification to ensure reproducibility. 3. Table 1 contains inconsistent naming (e.g., "NMs", "NMsL2", "NMsNM", "NMs&L1") without clear explanations. Figures (especially Figure 2) are visually cluttered, making it difficult to interpret results. These issues reduce the paper’s readability and empirical credibility. 5. While the authors claim efficiency improvements via NM regularization and parallelization, concrete runtime profiling and scalability analyses (e.g., GPU utilization, training time comparisons) are not provided. 6. The writing style sometimes overstates claims (e.g., "unlock the full potential of neural networks"), which can reduce perceived objectivity. Mathematical equations, while thorough, are dense and not always supported by intuitive explanations or visualization. 1. Please clarify the structural role of Neural Modules. Are they subcomponents of a larger graph or equivalent to the entire network topology? How does this framework adapt to various input modalities (e.g., graphs, sequences)? 2. Please provide full implementation details for the baseline methods, dataset splits, and evaluation metrics. 3. Explain the meaning of result notations such as "NMsL2", "NMsNM", "NMs&L1", etc. 4. Justify the choice of NN, DEQ, DAG, and OPTNET as baselines. Are these the state-of-the-art for the target tasks? 5. Include visual diagrams of the overall network topology and module formation during training for clarity. Fully AI-generated
Automated Architecture Synthesis for Arbitrarily Structured Neural Networks Soundness: 3: good Presentation: 1: poor Contribution: 2: fair Rating: 2: reject Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. The paper proposes a novel framework that departs from traditional tree-like or DAG-based artificial neural network architectures, introducing a dynamic graph-based system inspired by biological neural networks. The authors enable neural units to form flexible, arbitrary connections during training, organizing them into "Neural Modules" that allow synchronous communication and parallel computation **Novel Architecture Design Beyond DAG Constraints** The paper introduces a biologically inspired framework that allows neural networks to autonomously learn arbitrary graph structures during training, overcoming the inherent limitations of traditional DAG-based architectures. This enables more flexible communication between neural units and enhances the model’s representational capacity. See questions. 1. The writing of this paper needs further improved before consideration of acceptance. For example, the Punctuation Mistake in abstract, un-unified reference format. 2. The paper lacks theoretical justification and rigorous comparative analysis. Why your method is more impressive than current DAG baselines? A theoretical justification is appreciated. 3. Authors study the method with toy datasets. How it will scale to larger ones such as ImageNet? Moderately AI-edited
PreviousPage 1 of 1 (4 total rows)Next