ISSUE 02WEDNESDAY, JUNE 3, 2026PRINT 06.2026

GEOMDIGEST

THE INSIDER PUBLICATION FOR COMPUTATIONAL GEOMETRY & DESIGN

GEOMDIGEST / PAPERS / CAST-COMPONENT-ALIGNED-3D-SCENE-RECONSTRUCTION-FROM-AN-RGB-IMAGE-2025-673735
No code

CAST: Component-Aligned 3D Scene Reconstruction from an RGB Image

2025 / ACM Transactions on Graphics / DOI 10.1145/3730841

Recovering high-quality 3D scenes from a single RGB image is a challenging task in computer graphics. Current methods often struggle with domain-specific limitations or low-quality object generation. To address these, we propose CAST (Component-Aligned 3D Scene Reconstruction from a Single RGB Image), a novel method for 3D scene reconstruction. CAST starts by extracting object-level 2D segmentation and relative depth information from the input image, followed by using a GPT-based model to analyze inter-object spatial relations. This enables understanding of how objects relate to each other within the scene, ensuring more coherent reconstruction. CAST then employs an occlusion-aware large-scale 3D generation model to independently generate each object's full geometry, using Masked Auto Encoder (MAE) and point cloud conditioning to mitigate the effects of occlusions and partial object information, ensuring accurate alignment with the source image's geometry and texture. To align each object with the scene, the alignment generation model computes the necessary transformations, allowing the generated meshes to be accurately placed and integrated into the scene's point cloud. Finally, CAST applies a physics-aware correction mechanism, which leverages a fine-grained relation graph to generate a constraint graph. This graph guides the optimization of object poses, ensuring physical consistency and spatial coherence. By utilizing Signed Distance Fields (SDF), the model effectively addresses issues such as occlusions, object penetration, and floating objects, ensuring that the generated scene accurately reflects real-world physical interactions. Experimental results demonstrate that CAST significantly improves the quality of single-image 3D scene reconstruction, offering enhanced realism and accuracy in scene understanding and reconstruction tasks. CAST has practical applications in virtual content creation, such as immersive game environments and film production, where real-world setups can be seamlessly integrated into virtual landscapes. Additionally, CAST can be leveraged in robotics, enabling efficient real-to-simulation workflows and providing realistic, scalable simulation environments for robotic systems.

3
Citations
63
References
0
Implementations
No evidence
Repro status

Reproducibility Dossier

No evidenceConfidence: automated / checked Apr 2026

GEOMDIGEST treats reproducibility as an evidence trail: public artifacts, documentation, data, packaging, archival stability, and verification checks. Numeric scores are only exposed for audited records; public pages prioritize the evidence itself.

0
Evidence
0
Verified
not yet
Code
not yet
Data
not yet
Docs
not yet
Build checks
No public reproducibility evidence has been attached yet. Editors can add code, data, documentation, package, demo, benchmark, archive, or supplement links.
Methodology
Improve this dossier

Implementation Index

No implementations indexed yet

This paper is in the knowledge graph, but we have not attached a runnable artifact yet.