slides - dimva 2013

Report
Malevolution:
The Evolution of Evasive Malware
Giovanni Vigna
Department of Computer Science
University of California Santa Barbara
http://www.cs.ucsb.edu/~vigna
Lastline, Inc.
http://www.lastline.com
Well, I had it all planned out….
Until this guy came out with his story!
Malware can take many forms…
Who Is He?
• One of the top security researchers in Europe
– Hire him!
• Came to Berlin’s airport
• Guy told him he was in the right taxi line
• ‘Hey you don’t have a display with the money’
– Do not worry: The German government is creating a taxitracking program based on GPS so that no taxi driver needs
a billing device: awesome!!!
– Nick: GPS?!? Tracking!?! No money!?! Awesome!!!!
• Scam cost Nick 200 Eur (normal charge would be 30)
The Taxi
The Taxi
Cyberattack (R)Evolution
$$ Damage
Targeted Attacks
and Cyberwarfare
Billions
Millions
Cybercrime
Hundreds of
Thousands
Thousands
!!!
Cybervandalism
$$$
#@!
Hundreds
Time
Cyberattack (R)Evolution
Nobody Is Safe…
Targeted attacks are mainstream news.
Every week, new breaches are reported.
In the last few months alone …
Drive-by-download Attack
www.semilegit.com
www.bank.com
www.badware.com
<iframe src=“http://semilegit.com”
height=“0” width=“0”></iframe>
www.grayhat.com
POST /update?id=5’,’<iframe>..’)-www.evilbastard.com
Personal Data, Docs
Arms Race(s)
Malicious
Binary
Malicious
JavaScript
Signature-based
Anti-virus
Signature-based
Web Gateways
Obfuscated
Polymorphic
Malicious
Binary
Obfuscated
Polymorphic
Malicious
JavaScript
sandbox
Behavior-based
Anti-malware
honeyclient
Behavior-based
Anti-malware
Evasive
Malicious
Binary
Evasive
Malicious
JavaScript
An Evasion Framework
Labels/Blocks
Analysis
System
Target
System
Activates
Artifact,
Provenance
Producer
Executes/Displays
Known
Malicious
Artifacts,
Provenance
Known
Benign
Artifacts,
Provenance
Consumer
An Evasion Framework
Analysis System
Target System
Consumer
SPAM
X
N/A
N/A
Phishing
X
N/A
X
N/A
N/A
X
N/A (*)
N/A
X
Malicious Documents
X
X
X
Malicious Web Pages
X
X
N/A
Malicious Binaries
X
N/A
N/A
Social Engineering
Malware Installs
(*) First downloader
PBKAC: Make the user smarter
•
•
•
•
Evasion of the user good judgment
(SPAM: please don’t go!)
PHISHING: educate about provenance
MALWARE INSTALLS: educate about Fake AV, codecs
– The “Can I haz kittens?” problem
• MALICIOUS DOC: don’t open (good luck with that)
– Anything with “budget”, “salary”, etc. WILL BE OPENED
Harden The Target
• Evasion of the mechanisms to limit/control execution
• Windows 2023 Ultimate Edition will be able to
identify things that just should not be executed
• MS Office Professional 56.2 will actually prevent
documents from executing arbitrary code
• Internet Explorer 23 will detect memory corruption
attacks
Analysis Systems
• Evasion of detection/labeling
• Determine if an artifact is malicious based on
previous history
• Leverage both static and dynamic analysis
• Additional information can be leveraged if other
components need to be evaded as well
Evading Static Analysis
• Static analysis techniques can be evaded by making
the (relevant) code unavailable
– Packing
– Delayed inclusion of code
• Static analysis techniques can be evaded by
exploiting differences in the parsing capabilities of
the target system vs. analysis system
– Parsing the executable (target is OS)
– Parsing the document (target is office application)
Evading Static Analysis
Source: Binary-Code
Obfuscations in Prevalent
Packer Tools, Tech Report,
University of Wisconsin, 2012
Evading Dynamic Analysis
• Dynamic analysis techniques can be evaded by
fingerprinting the environment (and not execute)
– Detection of modified environment (instrumented libs)
– Detection of specific HW/SW configurations
• Devices
• Users
• File names
Evading Dynamic Analysis
Evading Dynamic Analysis
• Dynamic analysis techniques can be evaded by
exploiting differences in the execution capabilities of
the target system vs. analysis system
– Semantics (virtualization/emulation introduces
differences)
– Speed (dynamic systems are usually slower)
– Available resources (analysis has a finite, limited time)
• Sleeping
• Stalling loops
– User activity monitoring
Evading Dynamic Analysis
• Dynamic evasion – stalling loops
Combating Evasion
• Static analysis
– Use availability and parsing failures as a signal for
detection
• Benign software is packed
• Benign software is obfuscated
• Artifacts are often generated in a benign, wrong way
– Modify the sample to make it harmless
• Normalize
• Remove functionality that cannot be analyzed
• Might break functionality
Combating Evasion
• Dynamic analysis
– Reduce differences between analysis and target
environment
• Run on bare metal
• Exploit hardware-supported virtualization
• Use out-of-the-VM instrumentation
– Detect environment checks
• Identify conditional execution based on triggers
• Return non-static information about the environment
– Modify the sample to make it run
• Multipath execution
Combating Evasion
• Exploit the characteristics of multiple evasions
– Phishing pages need to evade detection from the analysis
system AND by the user
• If the page does not look like the impersonated organization the
attack will fail
– Malicious documents need to evade detection from the
analysis system, the target platform, AND the user
• If the attachment does not look interesting it will not be activated
Why Do I Care?
Feature
Extractor
Malicious
Pages
Terms
Extractor
Exploit Site
EvilSeed
http://www.easymoney.com
http://cheapfarma.ru
http://rateyourcar.com
http://nudecelebrities.it
C&C Site
Crawler
Prophiler
Public
Portal
Possibly
Malicious
Pages
Honeyclient
Honeyclient
Honeyclient
Malicious
Pages
Benign
Pages
Cloud
Benign
Pages
Anubis
Wepawet
Threat
Intel
Block
A Few Stats
• ANUBIS
– Number of unique IPs
that submitted to
Anubis: 433,290
– Number of files analyzed
by Anubis: 59,199,463
(unique files:
45,730,419)
– Registered users: 25,404
• WEPAWET
– Number of unique IPs
that submitted to
Wepawet: 141,463
– Number of pages visited
and analyzed by
Wepawet: 67,424,459
– Number of malicious
pages identified as
malicious: 2,239,335
An Example:
Detecting Split Personalities
• Detect when a malware sample exhibits multiple
personalities
• Signaturebased techniques are impractical
• Behavioral based techniques seem more promising...
– Different behaviors are reliable indicators for split
personalities
The Idea
• Definition:
Two systems are executionequivalent if all programs start
with the same initial state, and receive exactly the same
inputs
– “Initial state” means same OS components, memory and
registers are initialized with the same values
– “Same inputs” means the access to disk, network, registry, time,
and IPC returns the same value
• Hypothesis:
When a program is executed in two executionequivalent
systems, it should exhibit the same behavior
– “Same behavior” is output and sequence of system calls
Split Personalities
• A program that has different behavior on two
execution-equivalent systems implies that:
– Some instruction yielded some observable effects
– The program used (intentionally or not) these effect to
follow a different execution path
– This is likely the consequence of an attack based on CPU
semantics or timing
• The hard part is providing exactly the same inputs…
– Efficient Detection of Split Personalities in Malware
• Davide Balzarotti, Marco Cova, Christoph Karlberger, Christopher
Kruegel, Engin Kirda, Giovanni Vign in Proceedings of the Network
and Distributed System Security Symposium (NDSS), San Diego, CA,
February 2010.
The Approach: Log and Replay
Reference System
Analysis System
Windows
Windows
Log Driver
Replay Driver
syscall
log
(malware) sample
(malware) sample
Split personlaity
Some Caveats
• Not everything can be replayed
– Some operations have results that must be consistent with
the internal state of the operating system
• Memory allocation
– Some operations use handles the were created by passthrough system calls
• The definition of “same behavior” needs to be
relaxed to tolerate small, temporary deviations
Results
An Example:
Wepawet and Revolver
• State-of-the-art in honeyclients
– High-interaction honeyclients visit web pages and record
modifications to the underlying system (file system,
registry, processes)
– Unexpected changes are attributed to attacks
• Limitations
– Defenders need to know in advance the components that
will be targeted by attacks
– Configuration can be complex and incomplete
• Some of the vulnerable components are incompatible with each
other
– Limited explanatory power
Wepawet
• Characterizes the behavior of the browser as it visits web
pages
– Monitors events that occur during visit
– Characterizes properties of these events with features
– Uses statistical models to determine if feature values are normal
or anomalous
• In the training phase, learns the characteristics of benign
pages
• In the detection phase, flags as suspicious pages that
result in anomalous behavior
– Detection and Analysis of Drive-by-Download Attacks and
Malicious JavaScript Code
Marco Cova, Christopher Kruegel, Giovanni Vigna in Proceedings
of the World Wide Web Conference (WWW), Raleigh, NC, April
2010
Wepawet Features
• Exploit preparation
– Number of bytes allocated
(heap spraying)
– Number of likely shellcode
strings
• Exploit attempt
– Number of instantiated
plugins and ActiveX
controls
– Values of attributes and
parameters in method calls
– Sequences of method calls
• Redirections and cloaking
– Number and target of
redirections
– Browser personality- and
history-based differences
• Obfuscation
– String definitions/uses
– Number of dynamic code
executions
– Length of dynamicallyexecuted code
Wepawet Extensions
• PDF analyzer
– Analyzes the JavaScript within PDF documents
• Flash component analyzer
– Uses execution tracing to identify both malicious behavior
and other network endpoints
• Java Applet analyzer
– Uses execution tracing to identify known exploits
• Shellcode analyzer
– Uses emulation to extract URLs pointing to additional
malware
0-day Detection
•
•
•
•
“Aurora” attack
0-day exploit against IE6
Use-after-free vulnerability
Successfully compromised
Google and other
companies
• Posted to Wepawet before
having been made public
• Soon after incorporated into
Metasploit
Practical Impact
• Routinely used for takedown requests and
further analysis
• Used to generate
blacklist of malicious
sites
Impact on Attackers
40
Revolver: Detecting Evasions
in Web-based Malware
• Providing an oracle available to the public has
drawbacks
– Malware can be tested before deployment
• Exploitation of discrepancies leads to failed detection
– Revolver: An Automated Approach to the Detection of
Evasive Web-based Malware
A. Kapravelos, Y. Shoshitaishvili, M. Cova, C. Kruegel, G.
Vigna in Proceedings of the USENIX Security Symposium
Washington, D.C. August 2013
Evasion: Scope Handling
function foo() {
... //W6Kh6V5E4 is filled with non-alphanumeric data
Bm2v5BSJE="";
W6Kh6V5E4 = W6Kh6V5E4.replace(/\W/g,Bm2v5BSJE);
... // W6Kh6V5E4 now contains valid JavaScript
}
function foo(){
...
var enryA = mxNEN+F7B07;
F7B07 = eval;
{}
enryA = F7B07('enryA.rep' + 'lace(/\\W/g,CxFHg)');
...
}
Evasion: Interpreter Idioms
OlhG='evil_code'
wTGB4=eval
wTGB4(OlhG)
OlhG='evil_code'
wTGB4="this"["eval"] // Only works in Adobe’s JS
wTGB4(OlhG)
Evasion: Exception Paths
function deobfuscate(){
... // Define variable xorkey
// and compute its value
for(...) { ... // XOR decryption with xorkey }
eval(deobfuscated_string);
}
try {
eval('deobfuscate();')
}
catch (e){
alert('err');
}
function deobfuscate(){
try { ... // is variable xorkey defined? }
catch(e){ xorkey=0; }
... // Compute value of xorkey
VhplKO8 += 1; // throws exception first time
for(...) { ... // XOR decryption with xorkey}
eval(deobfuscated_string);
}
try { eval('deobfuscate();') } // 1st call
catch (e){
// Variable VhplKO8 is not defined
try {
VhplKO8 = 0; // define variable
eval('deobfuscate();'); // 2nd call
}
catch (e){ alert('err'); }
}
Evasion: Liberal Configuration
var nop="%uyt9yt2yt9yt2";
var nop=(nop.replace(/yt/g,""));
var sc0="%ud5db%uc9c9%u87cd...";
var sc1="%"+"yutianu"+"ByutianD"+ ...;
var sc1=(sc1.replace(/yutian/g,""));
var sc2="%"+"u"+"54"+"FF"+
"%u"+"BE"+...+"A"+"8"+"E"+"E";
var sc2=(sc2.replace(/yutian/g,""));
var sc=unescape(nop+sc0+sc1+sc2);
try {
new ActiveXObject("yutian");
} catch (e) {
var nop="%uyt9yt2yt9yt2";
var nop=(nop.replace(/yt/g,""));
var sc0="%ud5db%uc9c9%u87cd...";
var sc1="%"+"yutianu"+"ByutianD"+ ...;
var sc1=(sc1.replace(/yutian/g,""));
var sc2="%"+"u"+"54"+"FF"+
"%u"+"BE"+...+"A"+"8"+"E"+"E";
var sc2=(sc2.replace(/yutian/g,""));
var sc=unescape(nop+sc0+sc1+sc2);
}
Detecting Evasion: Challenges
•
•
•
•
Code is obfuscated
Code is generated on-the-fly
Code might probe for arcane versions of a browser
Not all code changes are relevant
Revolver
Pages
Web
ASTs
Candidate
pairs
Oracle
IF
…
VAR <= NUM
IF
…
VAR <= NUM
Similarity
computation
…
{bi, mj}
…
Malicious evolution
Data-dependency
JavaScript infections
Evasions
Optimizations
• The comparison step requires determining the edit
distance between n benign scripts and m malicious
scripts (which is usually infeasible)
• We eliminate duplicate ASTs
• We compute sequence summaries, which are vectors
with the frequencies of the possible 88 operations
• We extract the k nearest neighbors sequence
summaries and we apply the similarity over the
associated ASTs
Classification
• Data-dependency: categorizes script differences that are
associated with transforming data into code
– Same packers usually produce different code: if generating code
is same and generated code is very different, do not flag as
evasion
• Injection: categorizes script differences that are due to
addition of code to a previously-benign script
– Site gets compromised and attacker adds code to well-known
JavaScript libraries (e.g., jQuery)
• Evasion: categorizes script differences that are mostly
composed of control-flow nodes added to the previouslymalicious script
– Control-flow decisions are made to avoid executing the
malicious functionality
Evaluation: Evasion
• Collected 6,468,623 pages, of which 265,692 malicious
• Extracted 20,732,766 benign scripts, and 186,032
malicious scripts
• Derived 705,472 unique ASTs and 55,701 malicious ASTs
• For each benign AST, found ~70 malicious neighbors
• Computed 208K candidate pairs
–
–
–
–
6,996 Injections (701 classes)
101,039 Data dependencies (475 classes)
4,147 Evasions (155 classes)
2, 490 Evolutions (273 classes)
Limitations
• If we only see the evasive version of the code, we
cannot detect it (and identify the evasion)
• This approach can only operate on client-side
evasion
• If an evasion is performed before upacking/eval-ing
of code, similarity to other malicious code cannot be
computed
– However, the attacker has to “expose” their evasion
technique, instead of hiding it in the malicious code
http://revolver.cs.ucsb.edu
• Revolver is a service accessible to the public
– You need to be vetted to access the service
• We would like to make the evasion of the anti-evasion system
harder
• Please sign up and let us know what you think!
http://revolver.cs.ucsb.edu
Conclusions
• Malicious code is in continuous evolution
• Evasion of dynamic analysis-based detection has
become prevalent
– Humans cannot keep up
• Next steps in the arms race:
– Automatic detection of evasion attempts in binaries
• Possibly without re-execution
– Automatic detection of evasion attempts in web-malware
• See revolver.cs.ucsb.edu
– Automated evasion remediation
Questions?
EvilSeed
• Challenge: Find the needle in the haystack
• Approach: Search the web in a smart way
• The goal of EvilSeed is to generate a URL input stream
with “high toxicity”
• EvilSeed starts with a set of malicious web pages and uses
“gadgets” to find likely additional malicious web pages
–
–
–
–
–
Links gadget
Content dork gadget
Popular terms gadget
SEO gadget
DNS queries gadget
• Some level of random crawling is still necessary to find
completely new malicious web pages
Prophiler
• Quick identification of possible drive-by-download
web pages
– Each web page is deemed benign or possibly malicious
– Detection models derived through supervised machinelearning
• System as filter between a crawler and a more costly
(and more precise) dynamic analysis system
– The filter can allow high FP rates, as they are later
discarded by the dynamic analysis system
Learning Approach
• 77 static features are extracted from each URL and web
page
– HTML (19): web page content
– JavaScript (25): web page code
– URL and host-based (33): URL and URLs included in the content,
taking into account host characteristics (WHOIS, DNS)
• Supervised machine learning
– Learning: the system is fed with a labeled dataset
• Both known malicious and benign samples
– A model is generated by the system
– 10-fold cross validation is used to evaluate the effectiveness of
each model
– The models can then be used for detection
Anubis and Wepawet
• Web pages and binary components need to be
analyzed
– To identify their nature (malicious, benign)
– To identify their relationships with other components (e.g.,
C&C sites, distribution sites, malware components)
• Anubis: Binary program analyzer
– Available at http://anubis.cs.ucsb.edu
• Wepawet: Web page analyzer
– Available at http://wepawet.cs.ucsb.edu

similar documents