|
Pick Your Channel: Ultra-Sparse Readouts for Recovering Functional Cell Types |
Soundness: 3: good
Presentation: 3: good
Contribution: 2: fair
Rating: 4: marginally below the acceptance threshold
Confidence: 2: You are willing to defend your assessment, but it is quite likely that you did not understand the central parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. |
This paper introduces ultra-sparse readouts for neural encoding models to uncover functional cell types in neural populations by enforcing sparsity in the mapping from shared visual representations to individual neuron responses. The ultra-sparse readouts combines 3 strategies: Gumbel-Softmax readout, 3D Grid readout, and REINFORCE readout. Using mouse retinal ganglion cell (RGC) and primary visual cortex (V1) datasets, the authors show that the Gumbel-Softmax readout has almost the same predictive performance as unconstrained models while naturally grouping neurons into consistent functional types. In the retina, the model identifies canonical cell classes with high internal consistency. However, performance drops in V1, which suggests that cortical neurons have a more continuous functional organization rather than discrete clustering.
1. This paper presents a method to improve the interpretability of vision-based neural encoding models. The approach is mathematically grounded and shows good empirical performance based on the reported results.
2. The analyses are detailed and comprehensive, and support the authors’ claims.
3. Improving interpretability in neural encoding models is a meaningful contribution to the field.
1. Although the proposed method improves interpretability, there is a notable performance drop in V1. This raises concerns about its general applicability. Ultimately, we want models that not only provide interpretability but also accurately predict neural responses to visual stimuli, as predictive performance is a prerequisite for studying how visual information is represented in the retina and the brain.
2. Although the method successfully identifies functionally consistent neuron groups, it does not directly uncover the underlying receptive field computations associated with each channel. Developing methods to this end would improve the scientific value of these interpretability methods.
1. The paper uses a fixed CNN core as a proof of concept, but does the proposed method remain model-agnostic and compatible with other modern architectures, such as Vision Transformers, which are likely to be used in future large-scale studies?
2. The single-trial correlation metrics in Fig. 2 are quite low. Could the authors clarify the reason for this? For example, is it due to the use of naturalistic video stimuli rather than repeated trials? Also, why was correlation chosen as the primary evaluation metric, given that it only captures linear relationships and does not account for nonlinear dependencies or variance structure in the data? |
Fully human-written |
|
Pick Your Channel: Ultra-Sparse Readouts for Recovering Functional Cell Types |
Soundness: 2: fair
Presentation: 3: good
Contribution: 2: fair
Rating: 2: reject
Confidence: 2: You are willing to defend your assessment, but it is quite likely that you did not understand the central parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. |
The paper presents a novel approach to learning ultra-sparse readouts in deep neural network models for predicting neuronal responses to arbitrary visual stimuli. This sparsity constraint minimally degrades predictive performance relative to unconstrained models, while inherently revealing functional cell types consistent with known retinal ganglion cell categories in mice. When applied to V1 neurons, the model performs worse, supporting the hypothesis of a more continuous and less discrete functional organization in V1.
The paper is scientifically motivated and proposes a conceptually novel method that bridges interpretability and performance in neural response modeling.
A key missing element is a clear justification of why identifying functional cell types from readout channels is important. The paper would benefit from clarifying whether the discovered functional cell types differ meaningfully from those obtained by clustering readout vectors in unconstrained models. It remains ambiguous whether the primary contribution lies in biological interpretability, computational efficiency, or predictive insight. A more explicit articulation of this contribution—and comparisons to simpler clustering-based baselines—would strengthen the paper.
- In Figure 5, additional details are needed: What do the solid lines and shaded regions represent? Are these population averages and confidence intervals?
- Are the responses of known functional cell types illustrated in these plots?
- Did the method identify any novel or previously unreported cell types?
- As noted above, a more direct comparison between the sparse readout–based classification and clustering results from unconstrained models would clarify the added value of sparsity. |
Moderately AI-edited |
|
Pick Your Channel: Ultra-Sparse Readouts for Recovering Functional Cell Types |
Soundness: 3: good
Presentation: 3: good
Contribution: 2: fair
Rating: 4: marginally below the acceptance threshold
Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. |
This paper investigates how adding sparse constraints to neural networks trained on neural data can improve the interpretability and selectivity of modeled neural responses. Specifically, it explores whether such constraints allow the network to maintain comparable performance to unconstrained models while aligning more closely with biological upstream channels. The authors analyze model performance across different visual processing stages (retina and V1), attempting to explain the differences in prediction accuracy through response continuity and channel specialization. The overall goal is to assess whether the model can learn neuron-like selectivity properties found in biological visual systems.
* The idea of incorporating sparse regularization to simulate selective neural representations provides a biologically inspired direction for neural modeling.
* The model retains comparable predictive performance despite the addition of sparse constraints, suggesting robustness and flexibility.
* Effort to connect neural network representations with retinal and cortical processing – The attempt to interpret model behavior in relation to retinal and V1 responses adds neuroscientific relevance.
* While the sparse constraint helps maintain similar performance, the work doesn’t convincingly demonstrate why this matters or what new insights it provides beyond showing robustness.
* The link between learned representations and specific neuronal functions (e.g., orientation selectivity, spatial frequency tuning) is vague, reducing the biological interpretability of results.
* The drop in predictive accuracy for V1 neurons indicates that the model may fail to capture hierarchical processing or contextual integration occurring in cortex.
* The authors attribute V1–retina performance differences to “response continuity,” but this reasoning feels weak and insufficiently validated.
* Both retina and V1 were modeled using the same CNN architecture, but the paper doesn’t discuss whether this choice limits representational specialization.
* The experiments do not comprehensively test how different constraints or model components affect performance and feature representation.
I have serveral questions listed as belows:
Does the sparse constraint truly promote the emergence of biologically meaningful features (e.g., orientation, direction selectivity)? Can these be visualized or quantified?
Why was the same CNN architecture used for both retina and V1? Could architectural differences (e.g., receptive field size, nonlinearity) better reflect biological distinctions?
How sensitive are the results to the degree or form of sparsity applied?
Could the authors provide more evidence supporting their “response continuity” explanation for the retina–V1 performance gap?
Would more targeted ablations or neuron-type-specific analyses reveal whether the model captures functionally distinct neural subpopulations? |
Fully AI-generated |