|
NVE-Adaptor: Novel View Editing Adaptor for Unseen View Consistent 3D Editing |
Soundness: 2: fair
Presentation: 2: fair
Contribution: 2: fair
Rating: 2: reject
Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. |
This paper proposes a novel view editing adaptor (NVE-Adaptor) for the 3D editing task. The main idea is to explore more novel views in 3D space and use Diffusion model to edit those novel views. Then the novel views and original reference views are combined to optimise the 3D scenes such as NeRF or 3D Gaussian Splatting.
The main contritution is the 3D Gaussian Probabilistic View Sampling (3DG-PVS) module, which explore novel views in a more effective way for 3D editing.
Experiments are conducted on Instruct-NeRF2NeRF (IN2N) and Mip-NeRF 360 datasets. The proposed method NVE-Adaptor is added to several 3D editing methods, and improves the performance w.r.t. several metrics.
## Strengths
- Illustration of vulnerability of unseen view rendering results of current 3D editing systems. This paper explores this phenomenen from two perspectives: angle ($\phi>25$) and distance ($r<0.5$). The visualization and continous set of angle/distance analysis show the severe degration of editing effect on novel views far from reference views.
- The Novel View Editing-Adaptor (NVE-Adaptor), at its core, it proposes a 3D GAUSSIAN PROBABILISTIC VIEW SAMPLING strategy for more effective sampling of new views for 3D editing.
- After sampling more views and utilizing them in the Consistent Diffusion Model, the proposed method can be combined with various method, and improve the performance.
## Weaknesses
- Even the vulnerability of unseen view is comprehensively explored and visualized, this behaviour is a well-know common sense [R1-R3]. Simply mentioning this phenomenen is not a strong contribution.
- Overall, the proposed method is very simple, selecting more views for optimising the 3D scene (shown in Figure 2). Trivially modifying Eq. 4 to include novel views. And the newly added novel views go the same procedure as previous ones: Diffusion Model editing. Only difference is adding a new prompt "high quality", which is qutie simple.
- 3D Gaussian Probabilistic View Sampling (3DG-PVS) can be a minor contribution, which samples more views by iteratively selecting the inverse probability. But no other comparison strategies except for the default 'Regular' strategy is compared.
- The performance is improved after combined with several methods. Howevere, this requries much more views to edit the 3D scene, which is an unfair comparison. In general, more views for optimisation should have better performance. Even the Regular strategy might have better performance when combined with the baseline methods.
- Are there other novel view selection/sampling methods? This should be surveyed and discussed in the related work and in the experiments.
- How is Human performance evaluated? Why the number differs so much e.g., 0.22 VS. 0.78 while the other metrics distinctions are not that significant?
- In Table 2, T*=40 is better than T*=30, wrongly highlighted the number.
[R1] Remondino, F., Karami, A., Yan, Z., Mazzacca, G., Rigon, S., & Qin, R. (2023). A critical analysis of NeRF-based 3D reconstruction. Remote Sensing, 15(14), 3585.
[R2] Zhang, J., Zhang, Y., Fu, H., Zhou, X., Cai, B., Huang, J., ... & Tang, X. (2022). Ray priors through reprojection: Improving neural radiance fields for novel view extrapolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 18376-18386).
[R3] Hull, M., Yang, H., Mehta, P., Phute, M., Cho, A., Wang, H., ... & Chau, P. (2025). 3D Gaussian Splat Vulnerabilities. arXiv preprint arXiv:2506.00280.
Are there other novel view selection/sampling methods? This should be surveyed and discussed in the related work and in the experiments.
How is Human performance evaluated? Why the number differs so much e.g., 0.22 VS. 0.78 while the other metrics distinctions are not that significant? |
Fully human-written |
|
NVE-Adaptor: Novel View Editing Adaptor for Unseen View Consistent 3D Editing |
Soundness: 3: good
Presentation: 2: fair
Contribution: 2: fair
Rating: 4: marginally below the acceptance threshold
Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. |
- This paper proposes NVE-Adaptor, a plug-and-play module that improves multi-view consistent 3D editing, especially for unseen viewpoints that were not part of the original reference views.
- The method explores novel camera viewpoints, renders images from those new views, refines them with diffusion-based editing, and uses them as extra supervision to improve consistency in 3D editing results.
- This paper works in a model-agnostic manner, complementing existing 3D editing pipelines (e.g., NeRF / Gaussian Splatting–based systems), and shows consistent improvements on seen and unseen views across multiple benchmarks and real-world scenarios.
- This paper clearly identifies a critical limitation of existing multi-view 3D editing systems, degradation and inconsistency when rendering from unseen viewpoints, and provides quantitative sensitivity analysis to motivate the problem.
- This paper proposes a simple, model-agnostic adaptor that can be plugged into existing 3D editing pipelines without architectural changes, making it widely applicable to NeRF-based and Gaussian Splatting-based systems.
- This paper provides strong empirical evidence across multiple datasets, demonstrating consistent quality improvements in both seen and unseen viewpoints, including real-world data, validating the practicality and robustness of the method.
- In demo.mp4 around the timestamp 00:58, most of the input 3D characteristics disappear during the editing process. This phenomenon appears in multiple examples, yet it is not addressed in the limitations section, which is disappointing.
- The illustration in Figure 3 describing the functionality of the Novel View Editing-Adaptor (NVE-Adaptor) could be made more intuitive. Instead of using red and blue dots, the figure could be revised so that readers can clearly distinguish reference views and novel views without referring back to a glossary.
- Including the code snippet (source_code.py in supplementary) in the supplementary material is appreciated, but the implementation details are too brief. To ensure reproducibility, more detailed explanations should be provided, and relevant code components should be explicitly linked or referenced in the appendix.
- For experimental settings such as Table 8, where specific parameters are used, it would be helpful to explain why those parameter choices were made. Providing this rationale would improve the clarity and interpretability of the paper.
Mentioned in the weaknesses. |
Lightly AI-edited |
|
NVE-Adaptor: Novel View Editing Adaptor for Unseen View Consistent 3D Editing |
Soundness: 3: good
Presentation: 3: good
Contribution: 3: good
Rating: 4: marginally below the acceptance threshold
Confidence: 5: You are absolutely certain about your assessment. You are very familiar with the related work and checked the math/other details carefully. |
This paper brought up an interesting challenge of limited view coverage in current 3D editing, where the edited 3D model exhibits visual artifacts in the views that are distant from the edited reference images. To address this issue, the authors propose a novel view sampling strategy to expand the reference views. Subsequently, the newly sampled views are forwarded into multi-view diffusion model along with original reference views for editing. Experiments show the expansion achieves significant improvement on the unseen view’s visual quality.
1.This paper discusses on a novel and practical view coverage issue in current 3D editing task. Starting from this challenge, the author propose a novel view sampling strategy that can be plug-and-played by current 3D editing methods.
2.Both qualitative and quantitative experiments in this paper are comprehensive and reasonably designed. The presented visual quality of the rendered video results are promising.
3.The paper is clearly presented and easy to follow
My concerning mainly lies on the setting. In my understanding, the main contribution of this paper is the view sampling issue. In another world, after optimally selecting the novel views, the rest processing is feed-forwarding the views to current 3D editing models. Current experiments seem to focus more on the comparison between the results with view expansion and that without view expansion to emphasis the effect of view expansion. In my opinion, the idea of
using view expansion itself is not sufficiently novel, as it is a common sense in reconstruction. Instead, I would like to see the improvement of proposed view sampling strategy corresponding to the baseline random sampling or uniform sampling, presenting how and why the proposed sampling strategy outperforms a vanilla strategy. I would temporarily give a borderline reject rating and raise my score upon this contribution is well illustrated.
See weaknesses. I would suggest the authors respond to the concern. |
Fully human-written |
|
NVE-Adaptor: Novel View Editing Adaptor for Unseen View Consistent 3D Editing |
Soundness: 3: good
Presentation: 3: good
Contribution: 3: good
Rating: 4: marginally below the acceptance threshold
Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. |
The paper targets the unseen-view inconsistency problem in multi-view 3D editing: edits look good near trained reference views but degrade at novel viewpoints. It proposes NVE-Adaptor, which (i) samples novel viewpoints via a probabilistic strategy (3DG-PVS), (ii) renders those views, (iii) enhances them with diffusion-based multi-image editing (with prompts like “high quality”), and (iv) uses the edited renders as extra supervision during 3D editing, improving view consistency on Mip-NeRF 360 and IN2N, plus a real-world set.
- The proposed NVE-Adaptor only augments supervision with edited novel-view images and can attach to various NeRF / 3DGS edit pipelines.
- Across several baselines (IN2N, GaussCtrl, VcEdit, DGE), the adaptor improves FID/SSIM and CLIP-based consistency on both seen and unseen views.
- The method improves novel-view renders by editing them with a text-to-image model (multi-image consistency + prompts like “high quality”) before using them as supervision. This can hallucinate textures/geometry, drifting from the true scene and biasing the 3D optimization. Therefore, it would be good to quantify drift with image-space metrics to ground-truth photos where available (you already capture real images + COLMAP extrinsics for the unseen set) and report an edit-intensity control or mask-based constraints to limit semantic changes.
- Tables report average scores, but variances / CIs are missing. Several gains (e.g., CLIP/SSIM deltas) are modest. Add std/CI over seeds (you mention 10 or 30 seeds in places) to support significance claims, especially for HumanFID and directional CLIP where variance can be high.
- 3DG-PVS uses isotropic Σ with σ=0.4 and a cube-based outlier-removal heuristic. The performance sensitivity to these hyper-parameters is unclear. It would be good to add a comprehensive hyperparameter analysis.
- The authors provided per-scene training time overheads (e.g., IN2N: 56→68 min; DGE: 15→18 min), but not inference-time cost or total GPU hours for the full training + novel-view editing pipeline. Please add wall-clock and GPU memory profiles for typical scenes, and break down where time is spent (rendering vs diffusion editing vs 3D optimization). This would help better understand the compute-quality trade-offs.
- Table 8 shows that when reference views are very sparse (N≤40), gains shrink or vanish; at N=20 performance can even degrade (Base vs Base+NVE nearly equal or worse). Please discuss failure modes (diffusion editor invents content due to under-constrained views) and consider confidence-based filtering to discard low-reliability edited novel views in those regimes.
Please refer to the weakness section. |
Moderately AI-edited |