ICLR 2026 - Reviews

SubmissionsReviews

Reviews

Summary Statistics

EditLens Prediction Count Avg Rating Avg Confidence Avg Length (chars)
Fully AI-generated 0 (0%) N/A N/A N/A
Heavily AI-edited 0 (0%) N/A N/A N/A
Moderately AI-edited 0 (0%) N/A N/A N/A
Lightly AI-edited 2 (67%) 5.00 3.50 1908
Fully human-written 1 (33%) 4.00 5.00 1373
Total 3 (100%) 4.67 4.00 1730
Title Ratings Review Text EditLens Prediction
TeFlow: Enabling Multi-frame Supervision for Feed-forward Scene Flow Estimation Soundness: 3: good Presentation: 3: good Contribution: 2: fair Rating: 4: marginally below the acceptance threshold Confidence: 5: You are absolutely certain about your assessment. You are very familiar with the related work and checked the math/other details carefully. The paper presents a feed-forward network that learns how to solve scene flow using temporal ensembling strategy. The results are strong and on part of the dataset examines showed significant improvement over STOA. The used technique involves adding temporal data and a joined cost function over points and blocks. The primary strength of the TeFlow method is its introduction of cluster loss that enables balanced multi-frame supervision. While prior feed-forward methods rely on two-frame correspondence losses, TeFlow first aggregates a highly stable and temporally consistent motion target for each dynamic object cluster through a temporal ensembling strategy. This cluster-level averaging prevents the loss from being dominated by larger objects with more points, ensuring that smaller dynamic objects, such as pedestrians, receive fair and effective supervision. The ideas presented in this paper are not new but their combination provides strong outcome. Specifically, clustering of object-level loss enforcement was already published (and cited by the authors), as well as temporal constraints (more than two frames). Hence, while they provided solution is worthy and achieve STOA in some cases, it is an incremental improvement over known methods. Please elaborate on the contribution of each item already used and known in literature over the provided solution. Fully human-written
TeFlow: Enabling Multi-frame Supervision for Feed-forward Scene Flow Estimation Soundness: 2: fair Presentation: 2: fair Contribution: 2: fair Rating: 4: marginally below the acceptance threshold Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. This paper investigates self-supervised scene flow estimation from multi-frame point clouds. It introduces a self-supervised framework that segments the scene into static and dynamic regions, and leverages temporal ensembling and voting to obtain supervision signals for the dynamic parts. Experimental results on the Argoverse 2 and nuScenes datasets show that the proposed approach achieves competitive performance with low computational cost. - The proposed approach demonstrates competitive performance compared to other feed-forward methods on the Argoverse 2 and nuScenes datasets. - The experimental evaluation is comprehensive. 1. The writing should be improved. - Figure 2 needs improvement. As the key figure illustrating the overall framework, Figure 2 does not effectively help readers understand the temporal ensembling and voting algorithms. In particular, the meanings of the different colors and arrows in the motion candidate pool are not explained, making it difficult to interpret the figure. - The writing of Section 4.1 should be improved. In Line 213, the paper states that "we establish correspondences by finding, for each point p_i, its nearest neighbor in P," whereas in Eq. (3), the nearest-neighbor search is performed between p_k and P. This makes the process of motion candidate generation hard to follow. 2. The rationale of Motion Candidate Generation needs to be further clarified. When generating the supervisory signal from previous frames, the method directly finds the nearest neighbor of the current points in the previous frame, without warping the current points according to the (predicted) motion between the two frames. By ignoring the inter-frame motion, performing nearest-neighbor search without such warping or motion compensation becomes inappropriate for establishing accurate correspondences. It is worth noting that in self-supervised scene flow estimation, almost all self-supervised loss functions (e.g., Chamfer loss) warp the source points toward the target frame to find correspondences and thereby generate the supervision signal. 3. In Eq. (5), the authors use the motion direction (i.e., cosine similarity) to measure the consistency between two flow candidates. It would be helpful to explain why the end point error (EPE) is not used. Since cosine similarity only accounts for the direction of motion while ignoring its magnitude, the consistency evaluation may be incomplete or potentially misleading. 1. Please explain the detailed process of Motion Candidate Generation, especially Line 213 and Eq. (3). 2. Please clarify the rationale behind the design of Motion Candidate Generation. 3. Why is cosine similarity used instead of the EPE for measuring the consistency? Is there any experimental evidence supporting this design choice? Lightly AI-edited
TeFlow: Enabling Multi-frame Supervision for Feed-forward Scene Flow Estimation Soundness: 3: good Presentation: 3: good Contribution: 3: good Rating: 6: marginally above the acceptance threshold Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. This work introduces a supervised multi-frame scene flow prediction framework. To address issues such as occlusion and multi-frame temporal expansion, the proposed TeFlow presents an effective temporal aggregation strategy, according to the authors, which has significant speed improvements and performance advantages. 1. Good presentation and clear writng, which makes it easy to read. 2. Effective method design and good performance. 3. Comprehensive Experimental Validation. The study includes rigorous evaluations on two large-scale autonomous driving datasets, with detailed ablation studies on input frame count, loss components, and hyperparameters. 1. Although the method is leading in many metrics, it can be learned that TeFlow has room for improvement on some indicators, which are areas that can be done better. 2. Line 464, inconsistent capitalization. 3. Has the speed of this method been averaged from multiple measurements? Specifically, how many times? Please refer to the weaknesses. Lightly AI-edited
PreviousPage 1 of 1 (3 total rows)Next