FOCUS: Clustering Crowdsourced Videos by Line-of-Sight

Report
FOCUS: Clustering Crowdsourced
Videos by Line-of-Sight
Puneet Jain, Justin Manweiler,
Arup Acharya, and Kirk Beaty
Clustered
by shared
subject
CHALLENGES
CAN IMAGE PROCESSING SOLVE
THIS PROBLEM?
Camera 1
Camera 2
LOGICAL similarity does not imply VISUAL similarity
Camera 3
Camera 4
5
VISUAL similarity does not imply LOGICAL similarity
6
CAN SMARTPHONE SENSING SOLVE
THIS PROBLEM?
Why not triangulate?
Sensors are noisy, hard to distinguish subjects…
GPS-COMPASS Line-of-Sight
INSIGHT
easy to identify
hard to identify
Don’t need to visually identify actual
SUBJECT, can use background as PROXY
Simplifying Insight 1
same basic structure persists
Don’t need to directly match videos, can
compare all to a predefined visual MODEL
Simplifying Insight 2
Light-of-sight (triangulation) is almost
enough, just not via sensing (alone)
Simplifying Insight 3
FOCUS
Fast Optical Clustering of live User Streams
Vision
Sensing
Cloud
Clustered Videos
Video
Extraction
Video Streams
(Android, iOS, etc.)
Hadoop/HDFS
Failover, elasticity
Image processing
Computer vision
FOCUS Cloud
Video Analytics
Watching Live
home: 2 away: 1
Change Angle
Change
Focus
Users Select & Watch
Organized Streams
Clustered Videos
Video
Extraction
Hadoop/HDFS
Failover, elasticity
Image processing
Computer vision
FOCUS Cloud
Video Analytics
pre-defined reference “model”
Watching Live
home: 2 away: 1
Change Angle
Change
Focus
Users Select & Watch
Organized Streams
keypoint
z
extraction
multi-view
reconstruction
z
estimates camera POSE and content in field-of-view
Multi-view Stereo Reconstruction
Model construction technique based on
Photo Tourism: Exploring image collections in 3D
Snavely et al., SIGGRAPH 2006
17
Visualizing Camera Pose
keypoint
z
extraction
multi-view
reconstruction
z
frame-by-frame
video to model
alignment
z
z inputs
sensory
• Given a pre-defined 3D, align incoming video
frames to the model
• Also known as camera pose estimation
19
keypoint
z
extraction
multi-view
reconstruction
z
integration of
sensoryz inputs
Gyroscope, provides “diff”
from vision initial position
Filesize ≈ 1/Blur
0
1
2
Gyroscope
Sampled Frame
3
4
t-1
t-2
20
keypoint
z
extraction
multi-view
reconstruction
z
pairwise model
image analysis
z
Field-of-view
Using POSE + model POINT CLOUD, FOCUS geometrically
identifies the set of model points in background of view
21
keypoint
z
extraction
multi-view
reconstruction
z
pairwise model
image analysis
z
3
Similarity between
image 1 & 3 = 13
Similarity between
image 1 & 2 = 18
2
1
Finding the similarity across videos
as size of point cloud set intersection
22
Clustering “similar” videos
Similarity
Score
1
Application of Modularity
Maximization
2
2
high modularity implies:
3
3
1
• high correlation among the
members of a cluster
• minor correlation with the
members of other clusters
RESULTS
Collegiate Football Stadium
• Stadium 33K seats
56K maximum attendance
• Model: 190K points
412 images (2896 x 1944 resolution)
• Android App
on Samsung Galaxy Nexus, S3
• 325 videos captured
15-30 seconds each
25
Line-of-Sight Accuracy (visual)
26
Line-of-Sight Accuracy
GPS/Compass LOS estimation is <260
meters for the same percentage
In >80% of the cases, Line-of-sight
estimation is off by < 40 meters
27
FOCUS Performance
75% true positives
Trigger GPS/Compass
failover techniques
28
Natural Questions
• What if 3D model is not available?
– Online model generation from first few uploads
• Stadiums look very different on a game day?
– Rigid structures in the background persists
• Where it won’t work?
– Natural or dynamic environment are hard
Conclusion
• Computer vision and image processing are often computation hungry,
restricting real-time deployment
• Mobile Sensing is a powerful metadata, can often reduce computation
burden
• Computer vision + Mobile Sensing + Geometry, along with right set of
BigData tools, can enable many real-time applications
• FOCUS, displays one such fusion, a ripe area for further research
Thank You
http://cs.duke.edu/~puneet

similar documents