Automatic scene inference for 3D object compositing

Report
Automatic scene inference for 3D
object compositing
Kevin Karsch (UIUC), Sunkavalli, K. Hadap, S.; Carr, N.;
Jin, H.; Fonte, R.; Sittig, M., David Forsyth
SIGGRAPH 2014
What is this system
•
•
•
•
Image editing system
Drag-and-drop object insertion
Place objects in 3D and relight
Fully automatic for recovering a comprehensive
3D scene model: geometry, illumination, diffuse
albedo, and camera parameters
• From single low dynamic range (LDR) image
Existing problems
• It’s the artist’s job to create photorealistic
effects by recognizing the physical space
• Lighting, shadow, perspective
• Need: camera parameters, scene geometry,
surface materials, and sources of illumination
State-of-the-art
• http://www.popularmechanics.com/technolo
gy/digital/visual-effects/4218826
• http://en.wikipedia.org/wiki/The_Adventures
_of_Seinfeld_%26_Superman
What can not this system handle
• Works best when scene lighting is diffuse;
therefore generally works better indoors than
out
• Errors in either geometry, illumination, or
materials may be prominent
• Does not handle object insertion behind
existing scene elements
Contribution
• Illumination inference: recovers a full lighting
model including light sources not directly
visible in the photograph
• Depth estimation: combines data-driven
depth transfer with geometric reasoning
about the scene layout
How to do this
• Need: geometry, illumination, surface
reflectance
• Even though the estimates are coarse, the
composites still look realistic because even
large changes in lighting are often not
perceivable
Workflow
Indoor/outdoor scene classification
•
•
•
•
K-nearest-neighbor matching of GIST features
Indoor dataset: NYUv2
Outdoor dataset: Make3D
Different training images and classifiers are
chosen depending on indoor/outdoor scene
Single image reconstruction
• Camera parameters, geometry
– Focal length f, camera center (cx, cy) and extrinsic
parameters are computed from three orthogonal
vanishing points detected in the scene
Surface materials
• Per-pixel diffuse material albedo and shading
by Color Rentinex method
Data-driven depth estimation
• Database: rgbd
• Appearance cues for correspondences: multiscale SIFT features
• Incorporate geometric information
Data-driven depth estimation
Et: depth transfer
Em: Manhattan world
Eo: orientation
E3s: spatial smoothness in 3D
Scene illumination
Visible sources
• Segment the image into superpixels;
• Then compute features for each superpixel;
– Location in image
– Use 340 features used in Make3D
• Train a binary classifier with annotated data to
predict whether or not a superpixel is
emitting/reflecting a significant amount of light.
Out-of-view sources
• Data-driven: annotated SUN360 panorama
dataset;
• Assumption: if photographs are similar, then
the illumination environment beyond the
photographed region will be similar as well.
Out-of-view sources
• Use features: geometric context, orientation maps,
spatial pyramids, HSV histograms, output of the light
classifier;
• Measure: histogram intersection score, per-pixel inner
product;
• Similarity metric of IBLs: how similar the rendered
canonical objects are;
• Ranking function: 1-slack, linear SVN-ranking
optimization (trained).
Relative intensities of the light sources
• Intensity estimation through rendering: adjusting until
a rendered version of the scene matches the original
image;
• Humans cannot distinguish between a range of
illumination configurations, suggesting that there is a
family of lighting conditions that produce the same
perceptual response.
• Simply choose the lighting configuration that can be
rendered faster.
Physically grounded image editing
• Drag-and-drop insertion
• Lighting adjustment
• Synthetic depth-of-field
User study
• Real object, real scene VS inserted object, real
scene
• Synthetic object, synthetic scene VS inserted
object, synthetic scene
• Produces perceptually convincing results

similar documents