Single-stroke Language-Agnostic Keylogging
using Stereo-Microphones and
Domain Specific Machine Learning
Sashank Narain Amirali Sanatinia Guevara Noubir
College of Computer and Information Science
Northeastern University
• Side channel attacks escape the security model
– Academically pioneered by Paul Kocher’1996
– Timing, power analysis, sound
• Global proliferation of mobile smartphones
– Estimated 1.75 billion smartphones in 2014
• Used for many day-to-day and business operations
• Trusted for sensitive information
– Personally Identifiable Information (PII)
– Credit Card numbers, Passwords, Location information
• Easy target of direct and indirect privacy breaches
• Problem & General Attack Scenario
• Android Sensors for Keystroke Inference
• Related Attacks
• Challenges in Keystroke Inference
• Our Approach
– Using Signal Processing, Designing a Meta-Algorithm
• Evaluation Results
• Mitigation Techniques
The Problem
• Sensors on smartphones bypass security mechanisms
– Accelerometer, Compass, Gyroscope
• Not sandboxed
• Do not require explicit permissions
• Indirectly leak sensitive information
– GPS, Camera & Microphones
• Require coarse explicit permissions but contain
generic descriptions
• Users may ignore permissions
• Directly leak sensitive information
• Can be accessed at anytime
Attack Scenario
• Adversary lures victim to install Trojan app
– e.g., ‘To-do’ app that supports speech recognition
• App records sensor data when user types in Trojan app
– Builds training models from collected data
• On the phone / On a central server
• App invokes service that waits for sensitive activity to start
– e.g., Your Favorite Bank Login Page
• App records sensor data when sensitive activity
– Generates predictions from sensitive data using training models
Motion Sensors in Android
• Easy to build apps using these APIs
• Java methods in Sensor class of Android SDK
– C++ functions in sensor.h header of Android NDK
• Fixed three dimensional co-ordinate system
– Relative to device
Android Co-ordinate System
• Sensitive to minute motion such as keystrokes
• Measures Linear Acceleration + Gravity
• Or obtain sensor fusion data measuring Linear Acceleration
• Extremely sensitive to motion and very noisy
– High-pass filter removes gravity
– Low-pass filter removes noise
• Used for initial experiments, discarded later on
– Gyroscope more stable for Keystroke Inference
• Measures rate of rotation in radians / sec
• Good for inference
– Sensitive to motion but not very noisy
– Similar pattern for same keys and different for other keys on x/y axes
Similarity between two taps of Character ‘Q’ and two taps of Character ‘V’
Gyroscope (cont.)
• To compute rotation:
Inc. Angle of Rotation ≈ Rate of Rotation * Sampling Time (dT)
• Challenge:
Gyroscope Bias & Bias Drift requires correction
• Microphone arrays commonplace in modern smartphones
– Used for audio enhancements e.g., noise suppression
HTC One series support stereo-recording
• Ideal for inference
– Keystrokes on a soft keyboard can be recorded by microphones
– Different amplitudes and time delay for unique keystrokes
– Fixed time delay at two microphones for same keys (8 samples for
‘Q’, 15 for ‘V’)
Sound waves for Character ‘Q’ and ‘V’ taps
Stereo-Microphones (cont.)
• Delay in tap detection between two microphones (M1, M2)
Number of Samples =
(Distance(T, M1) – Distance(T, M2)) * Sampling Rate / Speed of Sound
• For the HTC One
Distance between microphones: 0.134 m
Maximum supported sampling rate: 48 KHz
Speed of sound in air: 340 m / s
Difference of +19 samples to -19 samples
• For future devices with higher sampling rate
– Example sampling rate: 192 Khz
– Difference of 2*75 samples for tap close to one microphone
Related Work (Attacks)
• First work by Cai & Chen 2011
Demonstrated feasibility of inference using the Orientation sensor
Developed Android application called ‘TouchLogger’
Accuracy tested on Number only keypad in Landscape mode
Successful inference accuracy of 70% on 3 data-sets
Related Work (cont.)
• Owusu et al. 2012
QWERTY in Landscape mode, Area Inference
Developed Android app called ‘ACCessory’
Data-sets on HTC ADR 6300 phone from 4 users
Successfully inferred 6 character passwords
• 6 passwords out of 99 in 4.5 trials
• Estimated 59 passwords out of 99 in 215 trials
• Xu, Bai & Zhu 2012
– Lock screen password and numbers during call
• E.g., Credit Card and PIN numbers
– Used two sensors, Accelerometer for tap detection &
Orientation for inference
– Developed Android app called ‘TapLogger’
– Data-sets on HTC Aria and Google Nexus (One)
phones from 3 users
– Achieved: 50% for 1 guess and high accuracy for top 3
Related Work (cont.)
• Aviv et al. 2012
• PIN numbers and pattern passwords inference
• Used the Accelerometer sensor for inference
• Data-sets on Nexus One, G2, Nexus S and Droid Incredible from 24
users in two settings
• Controlled (Seated) and Uncontrolled (Walking)
• Accuracy of 43% and 73% on PIN and pattern passwords respectively,
within 5 attempts
Related Work (cont.)
• Miluzzo et al. 2012
– QWERTY in Landscape mode and Icon in Portrait mode inference
– Used Accelerometer and Gyroscope sensor combined with
Ensemble learning
– Presented a framework called ‘TapPrints’
– Datasets on Google Nexus S, Samsung Galaxy Tab 10.1, iPhone 4
– Icon locations inferred with 79% and 65% accuracy for the iPhone
and Google Nexus S, resp.
– Characters inferred with 65% accuracy
– Some icons or characters inferred with accuracy
of up to 90% and 80%, respectively
• Gyroscope
– Noise
• Typing with trembling hands
• Typing in different environments e.g., inside a car
– Soft Touch
• User taps too soft to induce vibrations
– Gyroscope Drift and Bias
• Stereo-Microphones
– Noise
• Typing in an environment with lot of background noise
• Typing in different environments with different noise levels
– Soft Touch
• User tap sounds don’t reach microphones
Our Approach
• Use a combination of sensors
– Accelerometer (initially) + Gyroscope + Stereo-Microphones
• Use signal processing and richer data instead of features
– Complementary filter combining Accelerometer and Gyroscope
and bandpass filter to remove Gyroscope drift and noise
– Bandpass filter [1.5 - 3.5 KHz] to reduce audio noise
Gyroscope Filtering
Microphones Filtering
Our Approach (cont.)
• Use a specialized multi-level Meta-Algorithm
Use several machine learning algorithms and combine results
Create training models for individual characters
Create training models for specific keyboard areas
Make predictions on areas, then on individual keys in area
Area Division
Elementary Algorithms
• Machine learning algorithms
– Supervised classification
– Selected: Decision Trees (DT), Naïve Bayes (NB),
k-Nearest Neighbor (k-NN)
– Not selected: Hidden Markov Models, Support
Vector Machines, Random Forest, Neural
The Meta-Algorithm
Area Selection 
Individual Models 
Area Models 
Voting Models 
Comparison to Previous Work
• Use stereo-microphones for keystroke inference
• Combine sensor and acoustics for keystroke inference
• Use of richer processed sensor and audio data instead of
extracting features
• Use a multi-layer multi-algorithm approach based on the
specifics of Android keyboard
• Addresses smaller keyboard dimensions e.g., standard
QWERTY keyboard exceeding 90% prediction accuracy
• Demonstrating end to end attack feasibility
Evaluation System
• Hardware
– HTC One (Android 4.4) , Samsung S2 & Tab 8
(Android 4.1)
– No modifications to OS
• Evaluation Application
– Collects datasets for training and evaluation
– Custom keyboard for training with same layout
as standard keyboard
– QWERTY & Numerical; Portrait & Landscape
• Datasets
– 7 participants
– 5 in office; 2 in restaurant (-2 unusable)
Evaluation Metrics
• Performance of Meta-Algorithm
– Of different sensors for different areas
– As compared to elementary use of algorithms
• End-to-end Attack
– For sensor data collected by Trojan app from sensitive apps
• Gyroscope results location dependent
– Areas further from gyroscope result in more rotation – Easy to Infer
• Microphones results typically location independent
– Infer mostly based on speed of sound
• The two could be combined to boost inference accuracy
– When both data are not noisy
Evaluation (cont.)
• Substantial increase in accuracy in comparison to
elementary use of algorithms
 Accuracy of samples using
elementary algorithms
Accuracy of samples using 
Evaluation (cont.)
• Possible to achieve > 90%
for QWERTY keyboard
• Possible to achieve > 95%
for Number keyboard
• Some sample sets between
– Noise > 70dB
– Gyroscope Drift
• Soft Touch sets < 20%
(End-to-End Attack)
• Collected on banking app with fake numbers
– Every UI page is known as an activity
– Trojan queries for the foreground activity every 5s
• 100 four digit PIN numbers
– 376 out of 400 digits predicted correct (94%)
– 84 predicted completely correct
• 100 sixteen digits Credit Card numbers
– 1467 out of 1600 digit predicted correct (91.5%)
– 52 predicted completely correct
Mitigation Techniques
• Sensors bypass Android security model (Sandboxing and
– Gyroscope sensor
• Is not sandboxed
• Does not require explicit permissions
– Microphones
• Requires explicit permissions but contain generic descriptions
– No dynamic control
• One Technique: Blocking
– Obtain lock on mutually exclusive sensors and hardware
– Invoke the Microphones or Camera to deny access to other apps
– FlaskDroid [Bugiel et al. 2013]
Mitigation Techniques (cont.)
• Alternative Technique: Limiting Access
– Blocking ineffective against Gyroscope sensor
• They are not-mutually exclusive
– Observation: sampling rate affects Inference capability
– Solution: Reduce the sampling rate for background apps
to a low but acceptable level
• Stereo-microphones + gyroscope keyloging
predictions can exceed 90% accuracy
• Implications of mobile phone sensors on privacy
still not well understood
– Need for better privacy models in devices loaded with
side channels
• Mitigations at all layers of the stack

similar documents