PPT

Report
Automated Analysis of Interactional
Synchrony using Robust Facial
Tracking and Expression Recognition
Xiang Yu1, Shaoting Zhang1, Yang Yu1, Norah
Dunbar2, Matthew Jensen2, Judee K. Burgoon3,
Dimitris N. Metaxas1
1CBIM, Rutgers Univ., NJ
2Oklahoma University, Ok 3Univ. of Arizona, AZ
Introduction
Our goal
• Predict whether an interactant is truthful or
deceptive by analyzing interactional synchrony
• Propose a computational framework to do so
Introduction
What is interactional synchrony?
• Interpersonal communication is contingent upon
some form of mutual adaptation such as:
–
–
–
–
–
–
–
Accommodation
Interpersonal coordination
Matching
Mirroring
Compensation
Divergence
Complementarity
Introduction
Why investigating interactional synchrony?
• Practitioners have suggested using interactional
synchrony for detecting deception:
– with terrorists (Turvey, 2008)
– in FBI interviews (Navarro, 2003)
– and in police investigations (Kassin et al., 2007)
• Assumption: interviews with deceivers are less
synchronous than interviews with truth tellers.
• However, few systematic studies of coordination,
synchrony or reciprocity have examined the
effects of synchrony on deception.
Introduction
Relevant techniques
• Face Tracking
- Active Shape Model (Cootes et al. CVIU 1995)
- Active Appearance Model (Cootes et al. ECCV 1998)
- Constrained Local Model (Cristinacce and Cootes, BMVC 2006)
• Gesture and Expression recognition
- Head nodding, shaking, HMM(Kapoor and Picard PUI 2001)
- Gesture recognition, CRF (Morency et al. CVPR 2007)
- Image based expression recognition (Pantic et al. PAMI 2000)
- Video based (Black and Yacoob, IJCV1997; Cohen et al. CVIU 2003)
Proposed Method
System framework
Module 1: Robust Facial Tracking
• Robust facial tracking is the foundation of
interactional synchrony analysis
• Challenges:
– Partial occlusion
– Multiple poses
– Poor lighting conditions
Active Shape Models
• Active Shape Model
Shape vector:  = 1 , 1 , 2 , 2 , ⋯ ,  , 
Shape matrix:  = [1 , 2 , ⋯ ,  ]
PCA: Covariance matrix: Σ = Λ 
Shape representation:  =  + 

Active Shape Models
• Using KLT tracker to estimate shape  locally.
• Optimize shape , not only similar to , but
also follows the shape distribution:
arg min  Λ−1  +  −  + 

2
2
• Alternately optimize  and , until they
converge.
Limitations
• Partial occlusion:
– Bayesian inference [Y. Zhou CVPR’03]
– Pictorial structures [P. Felzenszwalb IJCV’05]
– Sparse outliers [F. Yang FG’11]
• Multiple poses:
– Mixture of Gaussian distribution [T. Cootes IVC’99]
– Kernel PCA [S. Romdhani BMVC’07]
– Hierarchical multi-state [Y. Tong PR’07]
Face Tracking: Handle Occlusions
• We explicitly model the large errors as a
sparse vector :
arg min  Λ−1  +  −  +  + 
,
s.t. 
0
2
2
≤ 1
sparsity number of 
Face Tracking: Handle Multi-pose
• Instead of a low dimensional subspace, we
model the shape as a sparse linear
combination of training shapes:
...
≈
Face Tracking: Handle Multi-pose
• Instead of a low dimensional subspace, we
model the shape as a sparse linear
combination of training shapes:
arg min  −  + 
,
s.t. 
0
≤ 1 , 
0
sparsity number of 
2
2
≤ 2
sparsity number of 
Synthetic data
• Shape optimization from side pose with outliers.
Blue line: detection results. Red line: fitting result.
E-ASM
Sparse shape registration
(Handle gross error)
Our method
(Handle gross error
and multi-pose)
Quantitative Comparison
• The point error by pixel
Occlusion by Hat
Extended ASM
Sparse Shape Registration
Our method
Occlusion by Scarf
Extended ASM
Sparse Shape Registration
Our method
Multiple Pose
Sparse shape Registration
Our Method
Multiple Pose with Occlusions
Sparse shape Registration
Our Method
Face Tracking Results
• Tracking result from CMC dataset
Module 2 and 3:
Expression Detection and Head Pose
Demo of the successful detection of events
(both are smiling and with the same head pose)
Module 2: Expression Recognition
• Use the relative intensity
order of facial expressions
to learn a ranking model
(RankBoost) for recog. and
intensity estimation
• 6 universal facial
expressions (i.e., fear,
anger, sadness, disgust,
happiness, surprise)
• Trained ranking model
gives a score for the prob.
of smiling in real time.
Expression Recognition(1)
• Facial feature representation.
Expression Recognition(2)
• Ordinal pair-wise data organization.
Expression Recognition(3)
• Ranking Model.
- sparse Rankboost (weak classifier)
lost function:
- Adaboost strong classifier
Smile Detection
Illustration of synchrony for a smiling event (both the subject and
the interviewer are smiling at about the same time instant)
Smile Synchrony
Left: smiling scores curves for both the subject (in blue) and
interviewer (in red). Here we see that there is synchrony.
Right: Snapshots from the actual tracked footage illustrating
smiling synchrony.
Module 3: Head Pose Module: Nodding
• A 3D head nodding sequence.
The green lines show head
pitch angle. The black lines
show head yaw angle.
• Head pose pitch curve along
time axis. The pattern of the
blue plot is characteristic of
head nodding
Head Pose Module: Shaking
• A 3D head shaking
sequence. The green lines
show head pitch angle.
The black lines show head
yaw angle.
• head pose yaw curve. The
pattern of the red plot is
characteristic of a head
shake.
Head Pitch Synchrony (nodding)
head pitch curves for the subject (in blue) and the interviewer (in
red). The patterns are characteristic of head nodding. Here we see
that there is synchrony (both plots show nodding pattern).
Head Yaw Dissynchrony (shaking)
head yaw curves for the subject (in blue) and the interviewer (in
red). The distinct pattern of the red plot is characteristic of a head
shake. This head shaking event is NOT in synchrony.
Module 4: Synchrony Feature Extraction
Cross correlation
• Generate higher level synchrony feature from
Lower level features
• Correlation based strategy
Synchrony Feature Extraction
hit-miss rate
• Synchrony definition.
– If event A is detected for subject X at
Smile
time T and for subject Y at time T ± w 
Yes
we say we have synchrony
• Some attributes can be extracted
from detected Synchrony:
–
Smile_Hit_Subject
Subject
Interviewer
No
Time
Smile_Hit_Interviewer
1. Smile_Hit_Subject_Rate_S(x): Rate of
Subject
Smile
smiling synchrony (led by subject and
Interviewer
followed by interviewer) in segment (x) Yes
– 2. Smile_Hit_Interviewer_Rate_S(x):
Rate of smiling synchrony (led by
interviewer and followed by subject)
No
Time
Synchrony Feature Extraction
hit-miss rate
• Some attributes can be extracted
from detected Synchrony:
– 3. Smile_Miss_Subject_Rate_S(x):
The rate of no smiling synchrony (led
by subject and not followed by
interviewer) in segment (x)
– 4.Smile_Miss_Interviewer_Rate_S(x
): The rate of no smiling synchrony
(led by subject and not followed by
interviewer) in segment (x)
Smile_Miss_Subject
Subject
Smile
Interviewer
Yes
No
Time
Smile_Miss_Interviewer
Subject
Smile
Interviewer
Yes
No
Time
Module 5: Feature Selection and
Classification
• Feature Selection, Genetic Algorithm.
• Classification
Two-class problem:
Truthful vs. Deceptive
Three-class problem:
Truthful, Sanctioned and Unsanctioned cheating
Experimental Databases
• Computer-Mediated
Communication
Dataset (CMC)
• Face to Face
Communication
Dataset (FtF)
Evaluation of Synchrony feature
Evaluation of Two-class Classification
• Confusion Matrices
• Performance
Evaluation of Three-class Classification
• Confusion Matrices
• Performance
Conclusions
• We investigated how the degree of synchrony
effects the result of deception detection.
• Automatic methods provide an important way
to evaluate synchrony other than manual
coding
• Some observations:
– Different lower level features contribute
differently to the deception detection.
– Modalities have subtle influence in detecting
deception. (CMC vs. FtF)
Thanks!

similar documents