ICLR 2026 - Reviews

SubmissionsReviews

Reviews

Summary Statistics

EditLens Prediction Count Avg Rating Avg Confidence Avg Length (chars)
Fully AI-generated 1 (25%) 4.00 3.00 1677
Heavily AI-edited 0 (0%) N/A N/A N/A
Moderately AI-edited 0 (0%) N/A N/A N/A
Lightly AI-edited 3 (75%) 5.33 4.00 1917
Fully human-written 0 (0%) N/A N/A N/A
Total 4 (100%) 5.00 3.75 1857
Title Ratings Review Text EditLens Prediction
3DPhysVideo: 3D Scene Reconstruction and Physical Animation Leveraging a Video Generation Model via Consistency-Guided Flow SDE Soundness: 3: good Presentation: 3: good Contribution: 3: good Rating: 6: marginally above the acceptance threshold Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. This paper proposes a training-free pipeline that generates physically realistic videos from a single image. It repurposes an off-the-shelf image-to-video flow model for two stages: reconstructing full 3D scene geometry using rendered point clouds, and synthesizing final videos guided by Material Point Method physics simulations. The authors also propose Consistency-Guided Flow SDE that decomposes predicted flow velocities to enable effective 3D reconstruction and simulation-guided video generation. - Research on physically realistic video generation is both practical and meaningful. - The proposed pipeline is well-designed, feasible, and reasonable. - Experimental results demonstrate performance improvements across multiple scenarios. - From the appendix video examples, some cases appear worse than other methods. For instance, in the Apple sample, the back video shows no water splashing when the apple falls. What could be the possible reason for this? - What is the speed of generating a video sequence, and how does it compare to other methods? - The paper lacks a discussion of limitations and corresponding analysis. Could the authors provide intermediate visual results showing MPM-simulated outputs under different types of interactions, such as solid–fluid collisions, fluid–fluid interactions, and so on? Lightly AI-edited
3DPhysVideo: 3D Scene Reconstruction and Physical Animation Leveraging a Video Generation Model via Consistency-Guided Flow SDE Soundness: 2: fair Presentation: 2: fair Contribution: 2: fair Rating: 2: reject Confidence: 4: You are confident in your assessment, but not absolutely certain. It is unlikely, but not impossible, that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. The paper presents 3DPhysVideo, a training-free pipeline designed to generate physically realistic videos from a single image input. It reuses a pre-trained Image-to-Video (I2V) flow model across two stages. In Stage 1: Single Image to 3D, the I2V model functions as a view synthesizer to reconstruct 360-degree 3D scene geometry. In Stage 2: Simulation to Video, Material Point Method (MPM) physics simulation is applied to the geometry. The resulting simulated point trajectories, which support complex dynamics like fluids and viscous substances, then guide the same I2V model to synthesize the final photorealistic video. The core mechanism, Consistency-Guided Flow SDE, adapts the I2V model for both 3D reconstruction and simulation-guided rendering. The 3DPhysVideo pipeline generates physically realistic videos from a single image using a training-free approach. It repurposes an off-the-shelf Image-to-Video (I2V) model in two stages. 1. 3D Reconstruction: The I2V model first acts as a novel view synthesizer to reconstruct 360-degree 3D scene geometry. 2. Physics Generation: The geometry undergoes Material Point Method (MPM) physics simulation. The resulting simulated dynamics then guide the same I2V model to synthesize the final photorealistic video. This dual functionality is enabled by the Consistency-Guided Flow SDE, which adapts the pre-trained model for both geometry and dynamics synthesis. The method achieves good physical realism compared to baselines, especially in multi-object and fluid interaction scenarios, while offering user control over physical properties. 1. The proposed method appears incremental, with limited distinction from prior work. 2.Experiments are limited in scope; key baselines and datasets are missing. 3. Core assumptions lack rigorous justification or mathematical support. 4. Result interpretation is shallow; no discussion of failure cases or parameter sensitivity. 5. Figures and explanations are sometimes unclear, reducing readability and impact. 1. Could the authors elaborate on the empirical or theoretical rationale for entirely eliminating the denoising bias ? 2. What is the measured reliability or accuracy of these automatically inferred physical parameters compared to manually specified inputs? 3. Since the current SDE is heavily reliant on visual consistency, how would the core consistency metric and the model’s latent inputs need to be adapted or redefined to effectively enforce a non-visual inductive bias, such as alignment with a detailed text prompt, without requiring additional model training? Lightly AI-edited
3DPhysVideo: 3D Scene Reconstruction and Physical Animation Leveraging a Video Generation Model via Consistency-Guided Flow SDE Soundness: 4: excellent Presentation: 3: good Contribution: 3: good Rating: 8: accept, good paper Confidence: 5: You are absolutely certain about your assessment. You are very familiar with the related work and checked the math/other details carefully. This paper introduces 3DPhysVideo, a novel, training-free pipeline that generates physically realistic videos from a single input image. Instead of training a new model, it cleverly repurposes a single, pre-trained image-to-video (I2V) model for two distinct stages: 3D Scene Reconstruction and Physics-Guided Video Generation. 1. The entire pipeline requires no additional training. It runs on a single consumer GPU, making it highly accessible and efficient compared to methods that require training large, specialized models. 2. By grounding the animation in an explicit physics engine (MPM), the final video exhibits a high degree of physical plausibility, especially in complex scenarios like fluid dynamics and multi-object interactions, where purely data-driven models (e.g., Sora, Gen-3) often fail. 3. The paper is well written and organized. 1. As a multi-stage pipeline, errors from any stage may make the result fail. In particular, the 3D reconstruction and physical property estimation (using LLM) parts are prone to errors. For example, the apple in the demo appears elastic (it should actually be similar to a rigid body). It would be better if the accuracy of these two parts could be assessed, and the potential limitations could be analyzed. 2. While it can run on a consumer GPU, this method predictably significantly increases inference time due to the introduction of 3D reconstruction, physical property estimation, and MPM simulation. It would be better to report a comparison of inference time. 3. Were the liquids in the scene also reconstructed in 3D? How is the physical realism of the fluid dynamics ensured? 4. The article states that PhysGen3D cannot maintain the relative position of objects. However, PhysGen3D does perform pose estimation, so is this statement somewhat unreasonable? Please see Weaknesses. Lightly AI-edited
3DPhysVideo: 3D Scene Reconstruction and Physical Animation Leveraging a Video Generation Model via Consistency-Guided Flow SDE Soundness: 3: good Presentation: 3: good Contribution: 3: good Rating: 4: marginally below the acceptance threshold Confidence: 3: You are fairly confident in your assessment. It is possible that you did not understand some parts of the submission or that you are unfamiliar with some pieces of related work. Math/other details were not carefully checked. The paper introduces 3DPhys Video, a training-free pipeline for generating physically realistic and photorealistic videos from a single input image. It addresses the fundamental limitation of traditional video generative models, which often fail to adhere to real-world physical dynamics. The pipeline operates in two main stages, Novel View Synthesis and Simulation to Video Generation. The core contribution is the training-free pipeline that repurposes an off-the-shelf image-to-video diffusion model for two entirely different tasks: 3D scene reconstruction and physics-guided video synthesis. The authors conducted extensive experiments to validate the effectiveness of their proposed method. The results demonstrate that 3DPhysVideo outperforms state-of-the-art methods in terms of physical realism and semantic consistency while maintaining competitive photorealism. The Material Point Method (MPM) is computationally expensive, especially for high-resolution simulations and complex scenes with numerous interaction points. The overall pipeline's speed is likely bottlenecked by the MPM step. The authors should clearly address the runtime breakdown for the three main stages: 3D reconstruction, MPM simulation, and I2V synthesis, to highlight the practical efficiency of the "training-free" claim. The demonstration mostly focuses on relatively contained scenes with specific, localized physical events (e.g., ball drops, liquid pouring). It is unclear how well the pipeline scales to large-scale, non-local physical phenomena like wind effects, cloth dynamics, or complex collisions involving many small particles. Could you address the problems in the weaknesses? Fully AI-generated
PreviousPage 1 of 1 (4 total rows)Next