|
MTS-UNMixers: Multivariate Time Series Forecasting via Channel-Time Dual Unmixing |
Soundness: 3: good
Presentation: 2: fair
Contribution: 2: fair
Rating: 6: marginally above the acceptance threshold
Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. |
The paper introduces MTS-UNMixers, a new framework for multivariate time-series forecasting based on dual-path unmixing mechanisms.
The model decouples shared and variable-specific temporal dependencies through channel-wise and temporal unmixing blocks, further enhanced with a Mamba-based state-space backbone for efficient long-range modeling.
This design aims to mitigate signal aliasing and redundancy problems commonly seen in Transformer-style architectures.
Experiments on seven public datasets (including ETT, Electricity, Weather, and Traffic) demonstrate strong performance against nine recent baselines, such as PatchTST, FEDformer, and TimeMixer.
The results show consistent improvements in accuracy and effective reconstruction.
1. **Clear and practical architectural motivation.**
The unmixing mechanism offers an intuitive way to separate shared temporal dynamics from variable-specific behaviors, which is valuable for multivariate forecasting tasks.
2. **Solid empirical results.**
The model achieves competitive or better results than strong baselines including TimeMixer and FEDformer across multiple datasets.
3. **Integration with Mamba blocks.**
Leveraging a state-space backbone improves both computational efficiency and modeling of long-term dependencies.
4. **Well-written and structured paper.**
The methodology is clearly described with detailed illustrations and quantitative validation.
5. **Interpretability through reconstruction visualization.**
The figures effectively demonstrate how dual unmixing captures complementary channel and temporal information.
1. **Missing comparisons with several recent benchmarks.**
While PathTST, TimeMixer, and FEDformer are included, it would strengthen the paper to add comparisons with **PathFormer**, **iTransformer**, and **CARD** — two strong baselines known for path-level and decomposition-based temporal reasoning.
These would provide more context on how MTS-UNMixers performs against newer architectures targeting similar goals.
2. **Limited analysis on parameter sensitivity.**
The ablation study primarily focuses on architectural settings; however, analyzing the sensitivity of Mamba block size or unmixing depth would help understand robustness.
3. **Theoretical insight is relatively shallow.**
The paper’s arguments remain empirical; a more formal treatment of the unmixing process (e.g., via subspace decomposition theory) would be valuable.
4. **Dataset diversity.**
While mid-scale datasets are well covered, evaluations on very large or irregularly sampled datasets (e.g., weather radar or energy trading) would highlight scalability and adaptability.
1. Could you include **PathFormer**, **iTransformer**, and **CARD** as additional baselines in your comparison table to provide a broader empirical view?
2. How sensitive is the model to the number of Mamba layers and unmixing depth?
3. Can MTS-UNMixers generalize to longer forecasting horizons beyond 720 steps without retraining?
4. Is there any benefit in combining MTS-UNMixers with contrastive or self-supervised pretraining?
5. How does the dual-path mechanism behave when one modality (e.g., channel group) is noisy or partially missing? |
Fully AI-generated |
|
MTS-UNMixers: Multivariate Time Series Forecasting via Channel-Time Dual Unmixing |
Soundness: 2: fair
Presentation: 3: good
Contribution: 3: good
Rating: 6: marginally above the acceptance threshold
Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. |
This paper introduces MTS-UNMixers, a novel model for multivariate time series forecasting (MTSF) designed to address the challenge of highly mixed temporal and channel features in high-dimensional data. Its core mechanism is "channel-time dual unmixing", which establishes an explicit mapping from historical to future sequences by decomposing the series into key "bases" and "coefficients". Architecturally, the model employs a standard Mamba network to capture temporal causality and estimate channel coefficients, while using a bidirectional Mamba to process non-causal bidirectional channel interactions and extract the shared time bases. Experimental results demonstrate that MTS-UNMixers significantly outperforms existing methods across multiple benchmark datasets.
1. The paper is generally well-organized and clearly structured.
2. The proposed method is novel and presents a unified approach to addressing challenges in both the channel and time dimensions within the MTSF domain, achieving strong experimental results.
3. The paper offers a novel perspective by reformulating the MTSF problem as a matrix decomposition task across the channel and time dimensions. This mechanism establishes an explicit mapping between historical and future sequences, a notable advancement in model interpretability over existing black-box approaches.
1. A primary motivation for the unmixing model is its physical interpretability (highlighted in the Abstract and Introduction). However, the paper provides no qualitative results to support this claim.
2. The method lacks strong theoretical justification. While temporal unmixing is a common practice, the paper’s direct extension of this formulation to the channel dimension (as in Eq. (6)) is presented by analogy and lacks a rigorous theoretical foundation for its validity in modeling non-causal channel correlations.
3. The proposed model relies heavily on Mamba blocks. However, the experiments omit comparisons with recent Mamba-based baseline models for MTSF, a significant gap in the experimental evaluation.
4. The notation for the dimensions of the input $\mathrm{X}$ is inconsistent and confusing throughout the paper. Section 2.1 defines $\mathrm{X} \in R^{T \times N}$, but the formulation for temporal mixing (e.g., Eq. (2)) implies $\mathrm{X} \in R^{N \times T}$.
1. The paper's claim of interpretability needs to be substantiated. Could the authors provide visualizations or a qualitative analysis of the learned bases and coefficients to support this?
2. Sharing the temporal basis $A_t$ (e.g., seasonality) is reasonable, but sharing the channel coefficients $S_c$ (interpreted as static channel correlations) is a very strong assumption. It implies that inter-variable relationships remain constant over time, which seems unlikely for dynamic domains such as financial or traffic forecasting. How do the authors justify this static assumption, and wouldn’t it limit the model’s generalization capability?
3. Why did the authors choose to share $A_t$ and $S_c$, instead of the alternative (e.g., sharing $A_c$ and $S_t$)? The paper does not provide a justification, making this design choice appear arbitrary.
4. The experimental comparisons should be updated to include more recent and relevant baselines. |
Lightly AI-edited |
|
MTS-UNMixers: Multivariate Time Series Forecasting via Channel-Time Dual Unmixing |
Soundness: 2: fair
Presentation: 3: good
Contribution: 2: fair
Rating: 2: reject
Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. |
This paper addresses the challenge of establishing an interpretable and explicit mapping for multivariate time series forecasting. It proposes a channel-time dual unmixing network built upon the Mamba framework, which decomposes the time series into bases and coefficients across both temporal and channel dimensions. Extensive experiments on seven benchmark datasets demonstrate that the proposed model consistently outperforms ten state-of-the-art baselines.
- The paper presents a novel dual-mixing mechanism that effectively mitigates noise and redundancy in multivariate time series data.
- A component-sharing strategy is adopted to enhance the model's conciseness and simplicity.
- Comprehensive experimental evaluations demonstrate the superior performance of the proposed method.
- It remains unclear what specific advantages the proposed method provides in contrast to decomposition-based approaches (e.g., Autoformer, FEDformer, DLinear) and state-space models (e.g., Mamba). This weakens the overall novelty of the paper. The author should provide a detailed comparative analysis in the Introduction section between the proposed model and these representative methods to better highlight the unique contributions of the proposed method.
- Several key arguments in this paper require further clarification. For instance, the rationale behind performing decomposition along the channel dimension is unclear. There are multiple existing strategies for modeling channel correlations, including the channel-independent strategy [1], the channel-dependent strategy [2], and several trade-off strategies [3, 4]. The authors should include a comparative analysis between the proposed channel unmixing strategy and these established channel modeling methods to justify their design choice. In addition, the definition of $S$ in lines 165 and 170 appears inconsistent, and the meaning of $t$ is ambiguous. Moreover, in Eq. 7, it remains unclear why the authors choose to share the coefficient matrix along the channel dimension while sharing the base matrix along the temporal dimension. A detailed explanation or ablation study should be provided to clarify why this asymmetric sharing strategy is preferred over alternative designs, such as sharing either the coefficient or the base matrix across both dimensions. It is also unclear why the authors chose to employ bidirectional Mamba blocks instead of graph neural networks or Transformer-based architectures for modeling channel dependencies. Besides, the original order in the channel dimension mentioned in line 268 is not well explained and requires further clarification.
- In line 128, the authors claim that the proposed model enhances physical interpretability and prediction reliability. However, the paper does not provide any visual analysis of the decomposed components along either the temporal or channel dimensions.
- The authors should include state-space models, such as Mamba, as baseline methods to provide a more comprehensive performance comparison.
[1] Y. Nie, N. H. Nguyen, P. Sinthong, and J. Kalagnanam, "A time series is worth 64 words: Long-term forecasting with transformers," in International Conference on Learning Representations (ICLR), 2023.
[2] Y. Liu, T. Hu, H. Zhang, H. Wu, S. Wang, L. Ma, and M. Long, "itransformer: Inverted transformers are effective for time series forecasting," in International Conference on Learning Representations (ICLR), 2024.
[3] L. Han, H.-J. Ye, and D.-C. Zhan, "The capacity and robustness trade-off: Revisiting the channel independent strategy for multivariate time series forecasting," IEEE Transactions on Knowledge and Data Engineering, 2024.
[4] J. Chen, J. E. Lenssen, A. Feng, W. Hu, M. Fey, L. Tassiulas, J. Leskovec, and R. Ying, "From similarity to superiority: Channel clustering for time series forecasting," in Advances in Neural Information Processing Systems, 2024.
- In line 105, 'Ummixing' appears to be a typo. |
Lightly AI-edited |
|
MTS-UNMixers: Multivariate Time Series Forecasting via Channel-Time Dual Unmixing |
Soundness: 3: good
Presentation: 3: good
Contribution: 2: fair
Rating: 4: marginally below the acceptance threshold
Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. |
The paper proposes MTS-UNMixers, a dual-path architecture that unmixes multivariate time-series along time and channel dimensions to obtain shared, interpretable bases and coefficients used for both reconstruction and forecasting.
1. The formulation (mixing models, simplex constraints, explicit mapping) is well-motivated and easy to follow.
2. Results span seven standard datasets and four horizons, with tabled averages showing **top-2** performance overall and many first-place results; ablations show large drops when removing time unmixing or Mamba, supporting the design.
1. The shared $A_t$ and $S_c$ are learned jointly from reconstruction and prediction objectives; while the design is appealing, the paper should clarify how the training pipeline prevents future information leakage into shared components (e.g., batch construction, masking) and how sensitive the method is to distribution shift between history/future.
2. The dual factorization with simplex constraints (sum-to-one, non-negativity) helps, but the paper does not fully analyze **identifiability** of these parameters under the joint objective (two factorization views plus shared components). Without additional regularization or priors, multiple decompositions may fit equally well.
1. Under what conditions are $S_c, A_t$ uniquely recoverable? Have you explored sparsity, orthogonality, or diversity regularizers on bases to reduce degeneracy?
2. If future patterns diverge (e.g., new seasonalities), does a shared $A_t$ hinder adaptation? Would allowing a low-rank delta to $A_t$ for the future help? |
Fully AI-generated |