|
Layer-Based 3D Gaussian Splatting for Sparse-View CT Reconstruction |
Soundness: 3: good
Presentation: 2: fair
Contribution: 2: fair
Rating: 6: marginally above the acceptance threshold
Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. |
This paper proposed a hierarchical approach to 3D gaussian splatting for sparse-view CT reconstruction, first introducing large-scale Gaussians, and then refining in later steps. Refinement choices are based on reconstruction of the residual error.
- The paper addresses an important challenge in CT, and the proposed method seems reasonable for the challenge.
- Experimental results are included for both synthetic and real-world datasets.
- The main contributions of the paper are not clearly described. Both Gaussian splatting and hierarchical reconstruction approaches are quite well established in the CT community (as the authors state), so it is important to clearly state how the proposed paper exactly contributes to the already known knowledge.
- It is not clear to me why a negative reconstructed error indicates areas where overfitting or redundancy is present. For example, if a sample has a small hole somewhere, and the algorithm represents that part with a single large Gaussian, a negative error would occur in the error map, but the model is actually underfitting. The specific choice of interpreting negative as overfitting and positive as underfitting should be clearly motivated in the paper, and ideally evidence should be given that this is indeed a valid choice.
- It is not clear how specific hyperparameters settings are chosen by the authors, and how results are affected by different choices for the hyperparameters. This is also try for comparison methods -- how are the hyperparameters chosen for these?
- The terminology used, especially the use of 'layer' is unclear. 'Layer' has a very specific meaning in deep learning, so using it for a completely different concept is confusing. I suggest using a different term.
- What are the main contributions of the paper?
- Why does a negative error indicates overfitting?
- How were hyperparameters chosen, and how do hyperparameters affect results? |
Fully human-written |
|
Layer-Based 3D Gaussian Splatting for Sparse-View CT Reconstruction |
Soundness: 3: good
Presentation: 3: good
Contribution: 3: good
Rating: 6: marginally above the acceptance threshold
Confidence: 5: You are absolutely certain about your assessment. You are very familiar with the related work and checked the math/other details carefully. |
This work introduces a hierarchical, layer-based framework for 3D Gaussian Splatting (3DGS) tailored for sparse-view CT reconstruction. The core contribution is a 3D error-guided refinement strategy, where 2D projection residuals are back-projected using a tomographic solver to create a 3D volumetric error map. This 3D error map directly guides a coarse-to-fine process, acting as an adaptive importance sampling mechanism for both adding new, smaller Gaussians (densification) in under-represented regions (positive error). The error map also guides the merging of existing Gaussians (sparsification) in over-represented regions (negative error), effectively regularizing the model against overfitting.
1. The method directly addresses a key failure mode of standard 3DGS in sparse-view settings: overfitting to 2D projections. Guiding densification and sparsification with a 3D error map (from back-projected 2D residuals) is a novel and more geometrically sound approach than relying on 2D gradient-based density control alone.
2. The proposed layered, coarse-to-fine refinement strategy is well-motivated and empirically effective. Ablation studies (Table 3) clearly show that this layered approach outperforms a single-stage, dense initialization, while often converging to a more compact model (fewer Gaussians) and reducing training time.
3. The paper demonstrates state-of-the-art results, consistently outperforming strong baselines (including classical methods, implicit fields like NAF, and 3DGS methods like R2-Gaussian) on both synthetic and real-world datasets, especially in highly sparse (5-15 view) scenarios.
1. The quality of the 3D error map, which is central to the method, is dependent on the CGLS solver and the quality of the 2D residuals. As acknowledged by the authors, this map can become noisy in extremely sparse settings, potentially leading to error amplification where noise is densified. While denoising is applied, the robustness of this feedback loop could be analyzed further.
2. The normalization term for initializing new Gaussian density, $\alpha_{i}^{(l)}=C_{\alpha}\frac{e_{i}^{(l)}}{\sqrt[3]{N^{(l-1)}}}$ (Equation 4), is heuristic. It is "motivated by the physical process" but relies on a "quasi-uniform distribution" assumption. A more principled derivation or analysis of this scaling factor would strengthen the method's technical foundation.
3. The layer-building process seems to be on a fixed schedule (2500 Gaussians every 500 iterations for 20 layers). An adaptive strategy, where the number of new Gaussians and the timing of new layers are determined by the 3D error map's magnitude or distribution, would be a more elegant and efficient extension (as noted in the discussion).
Please refer to the weakness part. |
Fully AI-generated |
|
Layer-Based 3D Gaussian Splatting for Sparse-View CT Reconstruction |
Soundness: 2: fair
Presentation: 2: fair
Contribution: 2: fair
Rating: 4: marginally below the acceptance threshold
Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. |
This paper proposes a hierarchical, layer-based 3D Gaussian Splatting (3DGS) framework for sparse-view CT. Instead of a one-shot dense initialization, the method adds new layers of smaller Gaussians in a coarse-to-fine manner, where placement is guided by a 3D volumetric error map reconstructed via back-projecting 2D residuals with CGLS. Positive-error regions trigger densification (adding Gaussians); negative-error regions trigger sparsification (fusing Gaussians). The system starts from a classical reconstruction (SART-TV) to derive a soft Otsu mask and to seed the first layer. Experiments on synthetic and real datasets show improved 3D PSNR/SSIM over traditional solvers and prior explicit/implicit baselines (notably R$^2$-Gaussian), especially at very sparse views (5–15).
1. The layerwise residual-fitting idea is well-motivated and implemented end-to-end in CT with explicit 3D error maps driving where capacity is allocated.
2. Experiment in Table 1 shows improvements vs. strong baselines at 5–15 views on both real and synthetic sets (e.g., Real/10 views: PSNR 33.04 vs. 31.90 for R$^2$-Gaussian). Qualitative figures support crisper geometry with fewer view artifacts.
3. The paper varies number of layers, sparsification radius/centers, masking, and layer-optimization strategies; the 20-layer configuration emerges as the best trade-off and uses fewer final Gaussians with competitive time.
1. The R$^2$-Gaussian has been missed spelled as R2-Gaussian in all places in this paper, which is very non-professional.
2. The core idea of hierarchically allocating Gaussian capacity is closely parallels existing hierarchical/level-of-detail 3DGS schemes and explicit voxel/atom allocation strategies. The paper does not clearly articulate a substantive technical advance beyond adapting these known capacity-allocation ideas to CT, nor does it provide head-to-head analyses that convincingly demarcate what is genuinely new.
3. The method critically depends on the CGLS-reconstructed 3D error map. The discussion admits noise/streaks in highly sparse regimes and uses mask+Gaussian blur to denoise, but quantitative sensitivity to solver iterations, regularization, and noise level is limited.
4. The paper regularizes R$^2$-Gaussian for sparse views (higher TV, minimum scale, densification threshold). While this avoids needle artifacts, it may underplay R$^2$-Gaussian’s potential at moderate views and shifts the comparison space.
Please refer to the weaknesses part, especially weakness 2&3. I would like to read the rebuttal and improve my ratings if my concerns are adequately addressed. |
Fully AI-generated |
|
Layer-Based 3D Gaussian Splatting for Sparse-View CT Reconstruction |
Soundness: 3: good
Presentation: 3: good
Contribution: 3: good
Rating: 6: marginally above the acceptance threshold
Confidence: 5: You are absolutely certain about your assessment. You are very familiar with the related work and checked the math/other details carefully. |
This paper introduce a hierarchical layer-based 3D Gaussian Splatting (3DGS) computed tomography reconstruction framework. The reconstructed objects are iteratively refined by correcting the volumetric errors of previous layers. The core technical contribution is the 3D error-driven strategy to guide densification and sparsification. This strategy estimates a volumetric error map from back-projected 2D residuals, providing direct structural guidance for adding Gaussians in underrepresented regions and fusing them in over-represented regions.
(i) The idea of hierarchical layer-based 3D Gaussian Splatting is novel and interesting and insighted as the basic shape of the scanned object can be easily reconstructed while the fine-grained details are hard to captured. Existing 3DGS-based method usually neglect this while this work fills this research gap. The designed technique, 3D error-guided importance sampling is also very reasonable by adding Gaussians in the underrepresented regions estimated by the positive error maps and fusing them in the over-represented regions estimated by the negative error maps.
(ii) The performance on 3D CT reconstruction is solid. This work is based on the NeurIPS 2024 work R2-Gaussian. By applying the new densification and sparsification strategy to the baseline method, the performance are improved significantly by large margins on both real and synthetic datasets, as shown in Table 1. These results suggest the effectiveness of the proposed method. The visual comparison in figure can also show the propose method reconstructs clearer structural details.
(iii) The overall writing is clear, especially the method part from line 187 to line 288. The presentation is also well-dressed. The workflow of the pipeline is clearly shown in the figure 2. I like the style as almost all the technical contributions are reflected in the figure.
(iv) The ablation study is pretty comprehensive. All the modification are investigated, including the layered densification, layer sparsification, Masking, layer selection, and so on. The results in Table 2, 3, 4, 5, and 6 can clearly demonstrate the effectiveness of the proposed technical modifications.
(v) Code has been submitted. The reproducibility can be checked.
(i) The motivation is not very clear. As described in Line 038 – 042, why the regular 3DGS provides only indirect and incomplete information about the true 3D structure is not well discussed. From my point of view, this paper mainly modifies the densification of the Gaussian point clouds and the initialization has not been improved. So why also mention the one-time initialization here? It is a little weird.
(ii) The differences of the proposed densification and sparsification strategies and the regular ones should be highlighted and comprehensively compared. Now the authors just plainly describe their methods. I suggest the author draw some figures and mention the differences in the method section. Meanwhile, in the teaser figure, the authors just show the changes of the Gaussian point clouds of the proposed method. How about the regular one? There is no more comparison to show the advantages of the proposed method.
(iii) The main results are not very convincing. The authors claim their method beats the state-of-the-art methods but they do not use the public benchmark – X3D and did not make comparisons to the recent best neural radiance fields (NeRFs) method – SAX-NeRF, which was published by CVPR 2024. Instead, they compare with an old method NAF, which was accepted by MICCAI 2022.
(iv) The main visual results also look very weird because the color is somewhat red, which is significantly different from the visual results shown in previous works such as NAF, SAX-NeRF, X-Gaussian, R2-Gaussian, etc. There is no explanation for this.
(i) Could you please explain why using the TV loss in Eq.(6)? Do you do an ablatio study of this loss function? |
Fully human-written |