Vocalic Markers of Deception and Cognitive Dissonance for Automated Emotion Detection Systems Dr. Aaron C. Elkins The University of Arizona Emotional Voice 2 Can computers perceive vocal emotion? Yes…. but, The science of the emotional voice is young Communication is complex and dynamic Moods and emotions contextually switch Emotion is computationally ill-defined Measuring emotion may inform theory 3 Emotional Dimensions DISGUST? 4 Four Components of Speech Voiced vs. Unvoiced sounds [v] vs. [f] Airstream through mouth or nose [m] vs. [o] 5 Speech Sounds (1) pitch, (2) loudness, and (3) quality Sound is small variations in air pressure that occur rapidly in succession Vocal folds superimpose outgoing air of voiced sounds The vocal folds vibrate to create a periodic vibration (100 – 250 Hz) We measure these features digitally 6 Recording Father – Digital Audio Waveform measures pulses of vocal folds Based on air pressure disturbance (dB) Voiced vs. Unvoiced (low pressure) Each peak occurs every 100th of a second (100 Hz) 7 Vowel Articulation Source-Filter Theory (Müller, 1848) Vocal Folds vibrate at same speed (pitch) Resonance changes in vocal tract to filter frequencies (formants) 8 Vocalics Vocalic Analysis Examines how it was said Amplitude Pitch (frequency) Response latency Tempo Linguistics Examines what was said 9 Sound Production is Complex When we tense our muscles, such during stress, our larynx tenses Higher Pitch The process is complex Emotions affect the normal operation Deception takes away cognitive resources away and is stressful More mistakes, lower quality, increased average and variation in pitch Sympathetic Nervous system response Increased auditory acuity Heightened arousal 10 Standard Vocal Measures Calculated with Praat and Custom Signal Processing Software 11 Nemesysco LVA 6.50 Commercial Vocalic Software Evaluated 12 Five Vocalic Studies Summarized Study One (Deception Experiment) Study Two (Cognitive Dissonance) Study Three (Embodied Conversational Agent and Trust) Study Four (Embodied Conversational Agent Security Screening - Bomber) Study Five (Embodied Conversational Agent Security Screening - Imposter) Vocal Deception (Study 1) – Experimental Design N = 96 $10 reward for appearing credible to professional interviewer Two Sequences: First Sequence: DT DDTT TD TTDD T Second Sequence: DT TTDD TD DDTT T 13 Short-Answer Questions Only 8 had variation both within and between subjects Two types of questions: Charged and Neutral 14 Results Built-in classification performed at chance level Vocal measures independent of system discriminated deception: FMain, AVJ, and SOS Possible Latent Variables measuring Conflicting Thoughts, Cognitive Effort, and Emotional Fear Logistic regression performed best on charged questions Higher pitch, cognitive effort, and hesitations are predictive of deception in more stressful interactions The claim that the vocal analysis software measures stress, cognitive effort, or emotion cannot be completely dismissed Deception and Stress can be predicted by Acoustic measures of Voice Quality and Pitch when controlling for speaker characteristics 15 Vocal Dissonance (Study 2) – Experimental Design Modified Induced-Compliance Paradigm Participants (N=52) made two vocal counter-attitudinal arguments for cutting funding for service for the disabled Choice is manipulated High vs. Low (IV) High N = 24, Low N = 28 Participants report attitude towards argument issue (DV) Arousal (Vocal Pitch) High choice had a 10Hz higher pitch F(1,50) = 4.43, p = .04 All participants reduced their pitch over time F(1,50) = 4.90, p = .03 17 Cognitive Difficulty High Choice had nearly 2x the response latency on argument two F(1,50) = 4.53, p = .04 Arousal moderation 18 Cognitive Difficulty Participants spoke with 33% more nonfluencies on the second argument F(1,50) = 4.03, p = .05 19 The Importance of Language (Imagery as Abstract Language) 20 Vocal Dissonance Model χ²(1, N = 51), p = .49 SRMR = .02 R² Attitude Change = .17, Imagery = .11 21 From the lab to the AVATAR 22 First Kiosk 23 Kiosk from Last Year 24 Third-Generation Kiosk 25 Gender and Demeanor 26 Vocal Trust (Study 3) – Experimental Design • Participants completed presurvey • Packed bag before ECA screening interviewing • Completed security screening • All responses to ECA recorded for vocal analysis ECA Demeanor and Gender N = 88 Participants (53 Males, 35 Females) Question Block 1 Question Block 2 Question Block 3 Question Block4 Repeated Measures Latin Square Design All participants interacted with all demeanor and gender ECA combinations 4 Questions Per block, 16 Total Questions 28 Trust and Time Main effects Initial Trust = 4.09 Trust Rate of Change Multilevel Growth Model Specified with Trust as the DV (N = 218) with Subject as random effect (N=60) .04 per second increase p < .01 Duration .05 decrease in trust for every second spent answering the ECA over the 7.6 second average p < .001 29 Vocal Pitch, Time, and Trust Main Effect of Pitch For every 1Hz increase in pitch over 156Hz trust drops by .01 p = .03 Interaction Pitch and Time Pitch x Time b = 9.3e05, p = .03 Over time pitch predicts trust less and less 30 Results Human perceptions of trust transfer to ECA Time plays in important role in the interaction All participants trusted the ECA more over time, particularly when it smiled 48 increase in trust when ECA smiles Vocal measures of pitch predicted trust, but only early on For every 1Hz increase in pitch over 156Hz trust drops by .01 Over time pitch predicts trust less and less 31 Vocalics of a Bomber (Study 4) Experimental Design • 29 EU border guards were randomly assigned to build a bomb (N = 16) or Control (N = 13) then pack a bag • Identical to Study 3, but no breaks in the interview • Only male neutral demeanor ECA interviewed participants • Bomb Makers were instructed to successfully smuggle the bomb past the ECA Vocal Analysis Recorded responses to question: “Has anyone given you a prohibited substance to transport through this checkpoint?” Average Response 2.68 sec (SD = 1.66) Responses such as “No” or “of course not” Vocal measures of Pitch and Pitch Variation 33 Results of Vocal Pitch Voice Quality, Gender, and Intensity included as covariates No difference in mean vocal pitch F(1,22)=0.38, p = .54 Main Effect of pitch variation Bomb Makers had 25.34% more variation F(1,22)=4.79, p=.04 34 Pitch Contours 35 Eye Gaze: Guilty 36 Eye Gaze: Innocent 37 Vocalics of an Imposter (Study 5) – Experimental Design 38 EU Border Guards All required to present visa and passport through multiphase screening E-gate Manual Processing AVATAR Screening Interview Four randomly assigned imposters carrying false documents with hostile intentions through screening AVATAR Interaction Example iPad Output for Screener 40 Voice Quality Change from Baseline Question (What is your full name?) 41 Vocalic Classification Model 42 Vocalic Resulting Classification 7 innocents falsely classified as terrorists 27 correctly classified as innocent All “guilty” referred to secondary Overall accuracy = 81% TPR = 100% TNR = 79% FPR = 20% FNR = 0% 43 Eye Fixations on Visa 44 Date of Birth Results – Correct? 45 Final Decision Model 46 Vocalic Resulting Classification 3 innocents falsely classified as terrorists One of these three was actually lying Actually a True Positive 31 correctly classified as innocent All “guilty” referred to secondary Overall accuracy = 94.47% TPR = 100% TNR = 88.24% FPR = 5.8% Reduced by 3/4 FNR = 0% 47 Questions? Isn’t the voice amazing?