|
Leveraging Label Dependencies for Calibration in Multi-Label Classification through Proper Scoring Rule |
Soundness: 3: good
Presentation: 3: good
Contribution: 2: fair
Rating: 4: marginally below the acceptance threshold
Confidence: 5: You are absolutely certain about your assessment. You are very familiar with the related work and checked the math/other details carefully. |
The calibration of multi-label deep neural networks is considered. The paper introduces the Correlated Multi-Label Loss (CMLL), a novel loss function designed to improve calibration in Multi-Label Classification (MLC) tasks by explicitly capturing label interdependencies. CMLL is proven to be a strictly proper loss and to be Fisher consistent. The loss incorporates dependency information by minimizing the absolute difference between the empirical correlation of the predicted scores for label pairs and the correlation of their ground truths. Extensive experiments on three benchmark datasets, PASCAL VOC, MS-COCO, and WIDER-A, demonstrate that CMLL reduces calibration error while maintaining classification accuracy compared to some other popular loss functions.
1. The work proposed the Correlated Multi-Label Loss (CMLL) and established a generalization bound for it.
2. Empirical evaluation of CMLL in terms of the accuracy and calibration on multiple real-world multi-label datasets.
More experiments are necessary.
-- a. Only one metric (hamming loss) is used for evaluating the accuracy of multi-label classification. In the modern multi-label learning literature, more metrics such as mAP, OF1, CF1 are widely employed.
-- b. Lack of comparison against state-of-the-art baselines. Comparison to SOTA multi-label losses, such as Ridnik et al., 2021; and Cheng & Vasconcelos (2024), is necessary for validating the superiority of the proposed CMLL loss against losses that do not take into consideration label dependency.
There should be a space between text and Parenthesis (. |
Fully human-written |
|
Leveraging Label Dependencies for Calibration in Multi-Label Classification through Proper Scoring Rule |
Soundness: 3: good
Presentation: 3: good
Contribution: 2: fair
Rating: 4: marginally below the acceptance threshold
Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. |
This paper tackles the problem of poor confidence calibration in modern deep neural networks for multi-label classification tasks. This is a crucial issue, as miscalibrated models are unreliable in safety-critical applications and often involve multiple labels per instance. The authors identify a key gap in current methods: existing "proper scoring rules" (PSR) losses like BCE ignore label dependencies, and other losses that model the dependencies like focal loss are not PSR. To fix this, the paper introduces a new loss function called Correlated Multi-Label Loss (CMLL), including a regularization term penalizes the difference between the model's predicted label correlations and the ground-truth label correlations. The authors provide a key theoretical proof that their combined CMLL loss is still a PSR loss.
1. The paper is very well-motivated. Calibration is a known, hard problem in MLC. This work is well-positioned. It directly addresses the limitations of recent key papers.
2. The main claim isn't just based on intuition.
3. The experimental results are good.
The paper's entire theoretical foundation rests on the claim that CMLL is a PSR. A PSR must be uniquely minimized when the prediction $\hat{\rho}$ equals the true probability $\rho$. However, CMLL is a weighted trade-off between the BCE loss (a PSR) and a new correlation term. It is possible that a model will sacrifice perfect calibration (increasing the BCE loss) to better match the in-batch label correlations (decreasing the new term). This means the minimum of the CMLL loss is no longer guaranteed to be at the point of perfect calibration in particle. Since the proposed correlation term is calculated in-batch, the loss for any single sample dependent on the other samples present in its batch. This formulation contradicts the standard definition of a PSR, which is based on the expectation $E_{x,y}[L(h(x), y)]$. Also it makes the training gradient highly sensitive to batch composition and sampling noise.
The $\lambda$ is the most important part of the proposed method, as it controls the trade-off between calibration and the regularization term. The paper simply states $\lambda=1$ is used for all experiments with no justification, ablation study, or sensitivity analysis.
Please refer to above |
Fully human-written |
|
Leveraging Label Dependencies for Calibration in Multi-Label Classification through Proper Scoring Rule |
Soundness: 2: fair
Presentation: 1: poor
Contribution: 2: fair
Rating: 2: reject
Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. |
This paper proposes Correlated Multi-Label Loss (CMLL), a novel loss function designed to improve the calibration of deep neural networks in multi-label classification (MLC) tasks. Unlike conventional losses such as Binary Cross-Entropy, which assume label independence, CMLL explicitly models pairwise label dependencies while maintaining the property of being strictly proper, ensuring reliable posterior probability estimates. The authors provide theoretical guarantees, proving that CMLL is both Fisher consistent and $\ell_2$-Lipschitz continuous, and they derive a generalization bound that scales linearly with the number of labels. Extensive experiments on benchmark datasets (PASCAL VOC 2012, MS-COCO, and WIDER-A) demonstrate that CMLL significantly reduces calibration errors without compromising classification accuracy, establishing it as an effective and theoretically grounded approach for trustworthy multi-label learning.
1. The proposed Correlated Multi-Label Loss (CMLL) innovatively combines pairwise label dependency modeling with the property of strict properness, bridging a clear gap between calibration theory and multi-label learning.
2. The paper provides formal proofs showing that CMLL is strictly proper, Fisher consistent, and $\ell_2$-Lipschitz continuous, and it derives a generalization bound with interpretable dependence on the number of labels.
1. The notation in this paper could be made clearer. For example, when describing $\boldsymbol{h}(\mathcal{X})$ and $Y$, it would be helpful to explicitly clarify what their rows and columns represent;
2. Although Lemma 1 seems intended to express the difference between two labels, based on my understanding of the notation, the computation of $\tau$ appears to measure the discrepancy between two instances rather than between a pair of labels.
3. In Assumption 1, it seems that the loss function $L$ corresponds to the proposed CMLL loss. If this is the case, it would be helpful to specify the valid ranges of $M$ and $B$. In particular, under certain extreme cases, the $\log$ term in Equation (3) might lead to an unbounded $M$, which could invalidate Assumption 1 and consequently affect the soundness of Theorem 2.
Please carefully check the Weaknesses.
Minor comment:
1. In Lemma 1, could the authors clarify whether the dataset $\mathcal{D}$ is defined as D = \{(\varepsilon_i, Y_i)\}_{i=1}^n? |
Moderately AI-edited |
|
Leveraging Label Dependencies for Calibration in Multi-Label Classification through Proper Scoring Rule |
Soundness: 2: fair
Presentation: 2: fair
Contribution: 2: fair
Rating: 2: reject
Confidence: 5: You are absolutely certain about your assessment. You are very familiar with the related work and checked the math/other details carefully. |
This paper introduces a new loss function, Correlated Multi-Label Loss (CMLL), for improving confidence calibration in multi-label classification. The authors argue that existing methods either assume label independence or lack theoretical guarantees such as strict propriety. CMLL is designed to explicitly model pairwise label correlations while maintaining the property of being a Strictly Proper Scoring Rule (PSR). The paper also provides theoretical justification (Fisher consistency and generalization analysis) and experimental validation on three standard datasets (PASCAL VOC, MS-COCO, and WIDER-A), showing improvements in calibration metrics while maintaining comparable accuracy.
1. The work tackles the important and underexplored problem of multi-label confidence calibration, which is highly relevant for real-world applications where label dependencies are common (e.g., medical imaging, scene recognition).
2. Experiments across multiple datasets and architectures (ResNet-50 and ViT-B/32) demonstrate consistent improvement in calibration metrics such as ACE and MCE.
1. The proposed method’s originality is somewhat incremental compared to recent works such as [Chen et al., TIP 2024][Peng et al., CVPR 2024; TPAMI 2025]. These papers also explore correlation-based or dependency-aware regularization for calibration. The current submission does not clearly articulate how CMLL is fundamentally different or superior in modeling dependencies beyond reformulating correlation alignment as a proper scoring rule.
2. The experiments do not include a comparison with [Chen et al., TIP 2024], which introduced both a multi-label calibration method and comprehensive evaluation metrics for multi-label confidence calibration.
3. Only simple baselines (BCE, Focal Loss, TWL, LDACE-CCL) are used. Missing comparisons with state-of-the-art multi-label calibration methods significantly weakens the empirical validation.
4. The paper evaluates only on relatively standard architectures (ResNet-50 and ViT-B/32) and basic multi-label baselines. Recent multi-label recognition backbones (e.g., ASL [Ridnik et al., ICCV 2021], ML-Decoder, or transformer-based decoders) are not included, making it difficult to assess general applicability.
[Chen et al., TIP 2024] Dynamic Correlation Learning and Regularization for Multi-Label Confidence Calibration.
[Peng et al., CVPR 2024; TPAMI 2025] Perception/Semantic Aware Regularization for Sequential Confidence Calibration.
See Weakness. |
Fully AI-generated |