|
BrainPro: Towards Large-scale Brain State-aware EEG Representation Learning |
Soundness: 3: good
Presentation: 2: fair
Contribution: 2: fair
Rating: 4: marginally below the acceptance threshold
Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. |
The paper introduces BrainPro, an EEG foundation model that incorporates retrieval-based spatial learning and brain state decoupling to handle variable electrode layouts and diverse brain states. The method shows competitive performance across nine public BCI datasets.
The paper is well-organized and clearly presented. The proposed method meaningfully integrates both brain-region and channel-level encoding, which aligns well with the neurophysiological structure of EEG signals. The evaluation is comprehensive, and the authors provide ample additional experiments and analyses in the appendix to support their claims.
1. While the paper claims to use parallel encoders to disentangle brain-state-specific representations, in practice only two specialized encoders are implemented (for affect and motor tasks), with all other brain states collapsed into a generic “other” encoder. Given that affect and motor represent only a small subset of possible brain states, this coarse categorization significantly limits the generality and practical impact of the proposed disentanglement strategy.
2. The paper lacks clarity on how encoder aggregation is performed during downstream fine-tuning. Although the authors argue that BrainPro can flexibly activate task-relevant encoders, Table 13 shows that using all encoders yields the best performance on affect tasks. Since the total model size is modest (7.69M parameters) and activating all encoders incurs minimal computational overhead, it is unclear why a selective activation mechanism is necessary—raising questions about the actual utility of the proposed modular design.
3. The method section is difficult to follow due to an excessive amount of notation, some of which is either undefined (e.g., Sp on line 210) or appears to contain errors (e.g., “KTT” on line 197). Additionally, the formulation in Equation 9 is ambiguous—specifically, it is unclear whether the positional embedding p is added before or after the flattening operation (line 213). The authors should thoroughly revise and reorganize this section to improve clarity and readability.
4. While the paper states that BrainPro supports heterogeneous electrode layouts through its retrieval-based spatial learning block, the current implementation appears to rely on a fixed universal template of only 60 channels. This may limit its applicability to datasets or clinical protocols that use denser montages (e.g., 62, 128, or even 256 channels), where fine-grained spatial resolution is critical—particularly for tasks like source localization or high-fidelity cognitive decoding. The authors should clarify whether the framework can scale to higher-density setups, and if so, how the retrieval mechanism generalizes beyond the 60-channel prior. If not, this constitutes a practical limitation that should be acknowledged.
5. The pretraining corpus appears to be dominated by datasets such as TUH EEG Seizure Corpus (TUSZ) and TUH Abnormal EEG Corpus (TUEP). According to the authors’ own categorization, these datasets would fall under the “other” brain state, meaning that during pretraining—per Equation 13—only the shared encoder and the “other”-specific encoder receive gradients, while the affect- and motor-specific encoders remain largely unused. This leads to highly imbalanced parameter updates across encoders during pretraining. I suspect this issue stems primarily from the coarse brain-state taxonomy (affect / motor / other), which oversimplifies the rich diversity of neural processes in real-world EEG data. A more fine-grained and neuroscientifically grounded state partitioning could substantially strengthen the model’s design and the paper’s overall contribution.
The experiments in Table 13 are central to validating the paper’s core claim — that BrainPro enables flexible, task-adaptive encoder activation. However, the current ablation only shows a subset of combinations. To better support this contribution, the authors should provide a complete set of comparisons, including:
$\mathcal{E}$(S), $\mathcal{E}$(S+A), $\mathcal{E}$(S+O), $\mathcal{E}$(S+M), and $\mathcal{E}$(S+A+O+M).
This would clarify whether performance gains stem from specific state encoders or simply from increased model capacity, and help assess the true value of modular design. |
Heavily AI-edited |
|
BrainPro: Towards Large-scale Brain State-aware EEG Representation Learning |
Soundness: 2: fair
Presentation: 2: fair
Contribution: 2: fair
Rating: 2: reject
Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. |
The authors identify three fundamental limitations in existing EEG foundation models (EFMs). First, current EFMs underutilize spatial interactions among electrodes and brain regions. Second, they pretrain a single encoder without explicitly disentangling brain state–related representations. Third, the use of a single shared encoder limits the flexibility of downstream adaptation. To address these limitations, the authors propose **BrainPro**, a novel framework designed to enhance spatial representation and brain state modeling. Specifically, BrainPro introduces a *retrieval-based spatial learning* mechanism to overcome the first limitation, employs *parallel encoders* with *decoupling* and *region-aware reconstruction objectives* to address the second, and integrates a *shared encoder* with one or more *brain-state-specific encoders* to resolve the third.
The topic of EEG foundation models (EFMs) is important and highly relevant to the advancement of brain–computer interfaces (BCIs).
1.The *Methods* section lacks coherent logical flow. While the authors describe the individual components of the proposed framework in detail, they provide insufficient explanation of how these components are organized or why such an organization is effective.
2.The three claimed innovations which should be the core of this paper are not supported by adequate theoretical justification or experimental validation.
**These two issues form the primary basis for my rejection decision.**
3.The motivation presented in the *Introduction* is also confusing. The first paragraph lists numerous challenges, many of which are not directly related to the objectives of this work. Removing or refining these unrelated points would help clarify and strengthen the paper’s motivation.
4.The design of multiple brain-state-specific encoders appears to involve three categories—affect, motor, and other. However, the rationale for this categorization is not explained, nor are the details provided regarding how the encoders specifically capture affective and motor-related representations.
5.In Section 2.5, the authors state that “for a downstream task, any subset SSS of encoders can be activated and concatenated.” However, the method for determining or selecting subset SSS is not described, leaving a key implementation detail unclear.
1.Selective updates are applied during encoder training, which appears conceptually similar to the sparse Mixture-of-Experts (MoE) mechanism. Does this approach introduce a potential *winner-takes-all* problem?
2.In the *Introduction*, the authors claim that BrainPro possesses strong scalability, interpretability, and generalization capabilities. However, no theoretical analysis or experimental evidence is provided to substantiate these claims. The paper would benefit from including relevant justification or empirical validation.
3.The authors also state that existing models often fail to explicitly capture channel-to-channel and region-to-region interactions. It would be valuable to provide theoretical reasoning or empirical results supporting this assertion.
4.Similarly, the claim that current models rarely learn state-aware representations during self-supervised pre-training lacks supporting evidence. The authors are encouraged to include either theoretical discussion or experimental analysis to validate this point. |
Lightly AI-edited |
|
BrainPro: Towards Large-scale Brain State-aware EEG Representation Learning |
Soundness: 2: fair
Presentation: 2: fair
Contribution: 2: fair
Rating: 4: marginally below the acceptance threshold
Confidence: 5: You are absolutely certain about your assessment. You are very familiar with the related work and checked the math/other details carefully. |
The paper presents BrainPro, a large-scale EEG foundation model that incorporates a retrieval-based spatial learning block to handle heterogeneous electrode montages and a brain state-decoupling block with parallel encoders to learn disentangled representations for affect, motor, and other brain processes. Pre-trained on approximately 2,180 hours of EEG data, BrainPro is evaluated on nine BCI datasets across six task types, reporting state-of-the-art performance in most cases.
The paper presents a large-scale EEG foundation model, BrainPro, pre-trained on over 2,000 hours of data and evaluated across nine diverse BCI datasets, offering a broad empirical scope.
It introduces a retrieval-based spatial learning mechanism that accommodates heterogeneous electrode montages, which addresses a practical challenge in cross-dataset EEG modeling.
The method reports competitive performance on several standard benchmarks, particularly in emotion recognition and mental stress detection tasks.
The paper's motivation is somewhat unclear and partially redundant. Specifically, the "Second" and "Third" limitations highlighted in the Introduction largely overlap--both concern the inflexibility of a single shared encoder in handling diverse or overlapping brain processes--yet are presented as distinct issues. Moreover, the claim that a single shared encoder inherently limits downstream adaptability contradicts a core premise of foundation models, which is precisely that a well-pretrained shared representation can generalize across tasks with appropriate fine-tuning. The paper provides little theoretical or empirical justification for why this principle fails in the EEG context.
The proposed retrieval-based spatial learning block essentially combines channel-level and region-level feature aggregation. While practically useful, this approach builds on well-established ideas in EEG modeling (e.g., region-of-interest analysis, local-global graph representations) and does not introduce a fundamentally novel spatial modeling mechanism.
The brain state taxonomy--limited to "affect", "motor", and "others"--is overly coarse. Real-world EEG data often reflect mixed or nuanced cognitive states (e.g., attention, working memory, fatigue), and such a simplistic categorization may hinder the model's ability to capture fine-grained or overlapping neural processes, limiting its applicability to more complex BCI scenarios.
The core claim is that parallel encoders enable better disentanglement than a single shared encoder. However, the paper does not compare against a strong baseline: a single encoder with the same total parameters as BrainPro's combined encoders, but conditioned on brain state (e.g., via input token or Feature-wise Linear Modulation). Would such a model achieve comparable performance? Without this comparison, it is unclear whether the gains stem from architectural novelty or simply increased capacity.
1. The spatial filter visualizations in Appendix M (Figure 9) are presented without specifying the pre-training or downstream dataset they are derived from. On which dataset (or aggregated across which datasets) were these filters learned? Clarifying this is essential to assess whether the observed neuroanatomical patterns (e.g., frontal emphasis for affect) are consistent or merely dataset-specific.
2. The paper advocates for *flexible downstream adaptation* by selecting subsets of encoders. However, Table 13 shows that using *all* encoders (EA + EM + ES) yields the best performance on both FACED and BCI-IV-2A. If the full combination is consistently optimal, what is the practical benefit of flexibility? Does this imply that the “flexible adaptation” is unnecessary in practice, and simply fusing all available encoders is sufficient?
3. The downstream fine-tuning protocol for the parallel encoders is ambiguous. For a given task (e.g., emotion recognition), whether are all encoders fine-tuned or only the relevant state-specific encoder (e.g., the affect encoder) together with the shared encoder? |
Fully AI-generated |
|
BrainPro: Towards Large-scale Brain State-aware EEG Representation Learning |
Soundness: 1: poor
Presentation: 1: poor
Contribution: 2: fair
Rating: 2: reject
Confidence: 5: You are absolutely certain about your assessment. You are very familiar with the related work and checked the math/other details carefully. |
To address the variability of EEG signals arising from factors such as emotional states, uncontrolled movements, and other task-irrelevant influences, the authors propose incorporating additional encoders to capture these sources of variability. A flexible selection mechanism is introduced during fine-tuning to adaptively choose the most relevant encoders for different downstream tasks. Furthermore, a retrieval-based module is designed for all encoders, which integrates spatial filters corresponding to predefined electrode montages and brain regions, thereby preserving spatial information across datasets with diverse electrode configurations. The proposed methods are validated through classification tasks across multiple BCI applications, demonstrating comparative performance against existing EEG foundation models.
1. This paper highlights that subjects may experience varying internal states (e.g., emotional state, level of concentration, uncontrolled movements of muscles like eyes) while performing tasks during trials, and accordingly designs the model architecture to account for such variations. Such design is novel in the literature of EEG Foundation models. This perspective encourages the community to recognize and address uncontrolled experimental variabilities that influence the stability of EEG signals during decoding.
2. The proposed retrieval-based spatial learner considers both electrode positions and their corresponding coarse brain regions, enabling flexible adaptation to unseen channel configurations by retrieving the most spatially similar electrodes. This design allows the model to be effectively applied to new datasets with electrode layouts not present in the pre-training data.
3. The authors conducted ablation studies by systematically removing each proposed or employed module and comparing the resulting reduced models with the full model. This analysis is crucial for evaluating the contribution of individual components and validating the effectiveness of the proposed designs in addressing the three major limitations identified in existing foundation models.
1. Overall, the logical flow and clarity of presentation are weak, making it difficult for readers to grasp the objectives and novel contributions of the paper until they reach the detailed methodology section.
2. The limitations of existing EEG foundation models described in Lines 54–67 are expressed in rather vague terms (for example, “diverse brain states in different brain processes”), making it difficult for readers to fully grasp their specific implications. Moreover, the paper does not provide sufficient theoretical or empirical justification—or comparative analysis—to substantiate the claim that these limitations have a tangible impact on model performance. Including clearer biophysical or mathematical definitions of the terms and supporting evidence would strengthen the arguments.
3. Sections 2.2–2.4 contain an overwhelming amount of mathematical detail, including extensive descriptions of existing modules adopted from prior models. This makes it difficult for readers to clearly identify and focus on the novel components proposed in this work.
4. The experimental results in Section 5 present high-level comparisons of decoding accuracy but lack targeted experiments that directly validate the claimed advantages of BrainPro in adapting to different electrode montages or varying brain states. For instance, it remains unclear to what extent the affect and motor encoders effectively capture and differentiate the subjects’ corresponding emotional and motor states.
There are some imprecise parts:
1. The citation (Abiri et al., 2019) in Line 60 does not include the term “brain states” and contains no dedicated discussion of the non-stationary nature of EEG signals. The reference therefore appears misaligned with the context in which it is cited.
2. The citation (Mane et al., 2020) in Line 65 focuses on the relationships between motor, cognitive, and emotional functions in rehabilitation contexts, rather than their co-occurrence in motor imagery tasks.
3. The notion of the shared encoder changes between $E_{\mathrm{shared}}$ and $E_{\mathrm{S}}$ in Section 2.3 and Section 2.4.
4. A square bracket is not closed in Equation 16.
5. The use of cos to represent cosine similarity in Equation (19) could be slightly misleading to the readers, as cos is usually preserved for the cosine function.
1. Could you please clarify the meaning of “Spatial interactions between electrodes and brain regions” in Line 54 and “explicit and flexible modelling of channel- and region-level dependencies”in Line 58? How are these dependencies / interactions define? Other terminologies that require definitions are: “diverse brain states” in Line 59, “overlapping or interacting processes” in Line 64.
2. How is the state-specific importance vector defined for the “other” category? Which brain regions are considered important for this state?
3. Could you elaborate on the rationale behind the selection of both foundation and non-foundation model baselines? In particular, how did you ensure that these baselines adequately represent the major approaches in the literature that address the three limitations of existing EEG foundation models discussed in the paper?
4. Could you please clarify what is meant by “reliable and generalizable representations,” which are claimed as advantages of BrainPro in Line 452 of Section 5? Additionally, how do the reported classification performances substantiate these two advantages?
5. Figure 5 displays noticeable fluctuations in the loss during pre-training, while Line 914 of Section C.4 states that BrainPro “converges quickly and maintains stable optimization.” Could you clarify the possible sources of these fluctuations and elaborate on how stability was assessed? Specifically, how is the optimization process considered stable despite the observed oscillations in the loss curve?
6. Does the number of parameters in the classification head increase with the addition of more state encoders? If so, how can the performance improvement demonstrated in Table 13 be attributed specifically to the effectiveness of the state encoders rather than to the increased parameter count in the classification head? |
Fully human-written |