slides - Steven Gianvecchio

Report
Zhenyu Wu, Steven Gianvecchio, Mengjun Xie
Advisor: Dr. Haining Wang
The College of
WILLIAM k MARY

Internet & Ubiquitous Computing
◦ Billions of networked computers
◦ Playground for malware

Suppression Techniques
◦ Static analysis
 Low latency, high throughput
 Widely used, IDS deployable
◦ Dynamic analysis
The College of
WILLIAM k MARY
2

Un-obfuscated

◦ Binary in plain

Oligomorphism
◦ Segments of the binary

◦ Simple transformation (XOR)

Polymorphism
Metamorphism

State of the Art
◦ Control-flow encryption
◦ Byte frequency manipulation
Statistical analysis
◦ Anomalies in code body

◦ Meta transformation (P-code)

Algorithmic detection
◦ Build in transformations
◦ Compression and encryption

Unique substring
Advanced pattern matching
◦ N-gram signatures

Semantic analysis
◦ Persist high-level fingerprints
The College of
WILLIAM k MARY
3
WANTED
$5,000,000
The College of
WILLIAM k MARY
4
The College of
WILLIAM k MARY
5

Polymorphism
◦ Compression & Encryption
Nobody looks like a small dark box!
The College of
WILLIAM k MARY
6

Metamorphism
◦ Reordering Components
Wanted
$5,000,000
Cannot evade feature detections
The College of
WILLIAM k MARY
7

Control Flow Encryption
◦ Prevent feature analysis
Increases suspicion
The College of
WILLIAM k MARY
8

The Real Player
◦ Assume other people’s identity (Mimicry)
The College of
WILLIAM k MARY
9

Lessons Learned:
◦ Evasion without obfuscating features
◦ Evasion by refusing inspection
◦ Evasion by mimicking
 Obfuscating original features
 Open to inspection, but disguises detection
The College of
WILLIAM k MARY
10

Mimimorphism:
◦ Reversible transformation of an executable that produces
output statically resembles other benign programs
◦ Characteristics:
 Completely erases features from the original binary
 High order statistics matches benign executables
 Transformed payload consists of “meaningful” control flows,
highly resemble those from benign executables
The College of
WILLIAM k MARY
11

Text Stenography Technique
◦ Transforms the input data and produces mimicry output
copies that assume statistical and grammatical (structural)
properties of another type of data
◦ Originally proposed by Peter Wayner as means to
transport sensitive data under harsh surveillance
 Novel use of Huffman coding
The College of
WILLIAM k MARY
12

Huffman Coding
0
◦ Digesting
 Builds a Huffman tree according to
the symbol frequency
◦ Encoding
 Removes redundancies of the input
data using a given Huffman tree
◦ Decoding
 Recovers the original data from the
“condensed” data by emitting
symbols according to the original
Huffman tree
0
m
1
s
1
a
Huffman Tree
01  s
00  m
01  a
mass  000111
(32 bits)
(6 bits)
The College of
WILLIAM k MARY
13

What if we decode a piece of random data?
◦ Produces “meaningless” data, but
 The output exhibits similar symbol frequency to the digest
- and  Input data can be recovered by Huffman encode

Regular Mimic Function
◦ Learn: Build a Huffman tree from sample text
◦ Mimicry: Huffman decode on input (randomized)
◦ Recover: Huffman encode
The College of
WILLIAM k MARY
14

Insufficiencies
chi
◦ Produces illegible, garbled text
◦ Frequency distributions follow 2n
distribution

High-order Mimic Function
0
0
l
1
c
1
0
n
◦ Captures interdependencies
0
p
1
t
ins
 Build multiple Huffman trees
 One for each unique symbol prefix
◦ Produces “sensible” text with
much more “natural” symbol
frequency distributions
rou
0
n
1
g
1
t
Huffman “Forest”
The College of
WILLIAM k MARY
15

Mimicry of Peter Wayner’s paper
◦ Produced by 6th order mimic function
Each of these historical reason, I don’t recommend using gA(t) to
choose the safe. These one-to-one encoded with n leaves and
punctuation. The starting every intended to find the same order
mimic files. A Method is to break the trees by constructing the
mimics the path down the most even though, offer no way that is,
in this paper. Figure will not overflow memory. These produced
by truncating letter. This need to handle n-th ordered
compartment of nonsense words cannot bear any resemblance
to B because this task is a Huffman showed in [1], [2], [3] among
others.
The College of
WILLIAM k MARY
16

The Challenge: Machine Language Mimicking
◦ Consists of instructions and control flows
 Each instruction has a strict format to follow
 Machines never make “typo”, or use wrong “tense”!
◦ Mimic function has no knowledge of instructions
 Often makes mistakes generating instructions
 Have a low success rate of creating mimicry control flows

Our Solution
◦ Integrate a custom assembler / disassembler
◦ Help the mimic function understand the language
The College of
WILLIAM k MARY
17

Digesting
Mimicry
Target
Exec.
Binaries
XOR
Disassemble
PUSH
High Order
Instruction
Mimic Function
DEC
MOV
Control
Flows
Instruction
Huffman Forest
Mimicry Digest
The College of
WILLIAM k MARY
18

Digesting
Instruction
Prefix
MOV
XOR
PUSH
DEC
XOR
Exec.
PUSH
Binary
DEC
MOV
Inst. Prefixes
(Atomic op., repeat,
operand size, etc.)
0
1
ModR/M
(Mod / Reg. / R/M)
SIB
(Scale / Idx. / Base)
0
MOV
1
INC
PUSH
Displacement
Instruction Huffman Tree
COMMON_INST Structure
The College of
WILLIAM k MARY
19

Digesting
MOV
Instruction Encoding
Instruction
TemplatePrefix
MOV
XOR
PUSH
DEC
XOR
PUSH
Inst. Prefixes
(Atomic op., repeat,
operand size, etc.)
0
DEC
MOV
1
ModR/M
(Mod / Reg. / R/M)
SIB
(Scale / Idx. / Base)
0
MOV
1
INC
PUSH
Displacement
Instruction Huffman Tree
COMMON_INST Structure
The College of
WILLIAM k MARY
20

Digesting
Instruction Encoding
Template
MOV
MOV
0
1
Inst. Prefixes
(Atomic op., repeat,
operand size, etc.)
16bit
ModR/M
(Mod / Reg. / R/M)
Inst. Prefix
SIB
(Scale / Idx. / Base)
Displacement
REP
0
1
EAX
0
ECX
1
EDX
ModR/M
0
1
2x8+16
SIB
3x4+0
……
Displacement
The College of
WILLIAM k MARY
21

Digesting
Instruction Encoding
Instruction
TemplatePrefix
MOV
MOV
Inst. Prefixes
(Atomic op., repeat,
operand size, etc.)
0
16bit
XOR
PUSH
0 1
DEC
1
REP
EAX
0
ModR/M
(Mod / Reg. / R/M)
SIB
(Scale / Idx. / Base)
Inst. Prefix
ECX
0
0
1
1
EDX
INC
ModR/M
1
1
MOV
Displacement
0
2x8+16
SIB
PUSH
3x4+0
……
Instruction Huffman Tree
Displacement
The College of
WILLIAM k MARY
22

Digesting
MOV
Inst. Prefixes
(Atomic op., repeat,
operand size, etc.)
Instruction
Prefix
Instruction
Prefix
XOR
PUSH
DEC
XOR
PUSH
DEC
ModR/M
(Mod / Reg. / R/M)
SIB
(Scale / Idx. / Base)
0
MOV
0
MOV
1
1
INC
PUSH
Displacement
Instruction Huffman Tree
The College of
WILLIAM k MARY
23

Digesting
Instruction
Prefix
DEC
PUSH
DEC
MOV
0
CMP
XCHG
0MOV
1
0
DEC
1
0
MOV
PUSH
POP
XOR
PUSH
DEC
JMP
1
CALL
MOV
0
MOV
1
1
INC
PUSH
Mimimorphic Digest
The College of
WILLIAM k MARY
24

Encoding
PRNG
Binary
Data
High Order
Instruction
Mimic Function
Mimicry
Digest
Assemble
Mimicry
Binaries
The College of
WILLIAM k MARY
25

Encoding
XOR
PUSH
DEC
01001001
Binary
10010101
Data
00010100
10001001
0
0
XOR
MOV
PUSH
1
Mimicry
Digest
1
INC
PUSH
DEC
Instruction Huffman Tree
Instruction
Prefix
The College of
WILLIAM k MARY
26

Encoding
01001001
Binary
10010101
Data
00010100
10001001
Instruction Encoding
Template
MOV
XOR
PUSH
0 1
DEC
16bit
0
REP
1
Inst. Prefix
0
INC
1
0
MOV
2x8+16
0
1
EAX
0
ECX
1
EDX
ModR/M
1
PUSH
3x4+0
……
Instruction Huffman Tree
SIB
Displacement
The College of
WILLIAM k MARY
27

Encoding
Instruction Encoding
Template
MOV
0
01001001
10010101
00010100
10001001
16bit
1
REP
Inst. Prefix
0
1
EAX
0
ECX
1
EDX
ModR/M
0
1
2x8+16
SIB
3x4+0
……
Displacement
The College of
WILLIAM k MARY
28

Encoding
MOV
Instruction Encoding
Template
MOV
01001001
10010101
00010100
10001001
0 1
Inst. Prefixes
(Atomic op., repeat,
operand
size, REP
etc.)
16bit
ModR/M
(Mod /Inst.
Reg.Prefix
/ R/M)
SIB
0 / 1Base)
(Scale / Idx.
Displacement
3x4+0
2x8+16
SIBStructure
COMMON_INST
0
1
EAX
0
ECX
1
EDX
ModR/M
……
Displacement
The College of
WILLIAM k MARY
29

Encoding
MOV
01001001
10010101
00010100
10001001
Inst. Prefixes
(Atomic op., repeat,
operand size, etc.)
XOR
PUSH
ModR/M
(Mod / Reg. / R/M)
SIB
(Scale / Idx. / Base)
DEC
MOV
?
Displacement
COMMON_INST Structure
The College of
WILLIAM k MARY
30

Encoding
Instruction
Prefix
01001001
10010101
00010100
10001001
XOR
XOR
PUSH
DEC
PUSH
MOV
DEC
MOV
The College of
WILLIAM k MARY
31

Decoding
Mimicry
Binaries
Disassemble
High Order
Instruction
Mimic Function
Mimicry
Digest
Binary
Data
PRNG
The College of
WILLIAM k MARY
32

Training
◦ Select 100 Windows XP system files as mimicry target
 They represent typical legitimate binaries
◦ Trained using 7th and 8th order mimimorphic engines
 Most control flow basic blocks have 7-8 instructions

Evaluations
◦ Statistical Anomaly Tests
 Kolmogorov-Smirnov Test & Entropy Test
◦ Semantic Detection Test
 Control Flow Fingerprinting
The College of
WILLIAM k MARY
33

Statistical Tests
0.09
◦ Kolmogorov-Smirnov Test
 Maximum byte frequency
distribution differences
 Legitimate: 0.074±0.045;
Mimimorphic: 0.093±0.006
◦ Entropy Test
0.074
0.516
 Measurement of predictability
(or randomness) of data
 Legitimate: 6.353±0.258;
Mimimorphic: 6.528±0.021
6.353
The College of
WILLIAM k MARY
34

Semantic Tests
◦ Control Flow Fingerprinting
 Statically analyze executables (with a special disassembler)
and extract control flow patterns
 Detecting malwares by matching their characteristic control
flow patterns (i.e., shared fingerprints)
◦ Between original binary and Mimimorphic instances
 Shared fingerprints: the lower the better
 Only 1 out of 100 instances share a single fingerprint (out of
hundreds of thousands fingerprints)
The College of
WILLIAM k MARY
35

Semantic Tests
◦ Between mimimorphic and legitimate binaries
 Shared fingerprints: the higher the better
 7th order mimimorphic instances:
 Average 1856.46±372.5 (72.93 benign files)
 Minimum 1057 (44 files); Maximum 3321 (92 files)
 8th order mimimorphic instances:
 Average 11407.99±912.42 (81.37 benign files)
 Minimum 9606 (70 files); Maximum 14216 (91 files)
The College of
WILLIAM k MARY
36

Semantic Tests
◦ A sample mimicry control
flow pattern
 Reproduced by a 7th order
mimimorphic instance
The College of
WILLIAM k MARY
37

Application Constraint
◦ Memory consumption: 600MB for 7th order and 1.2GB for
8th order mimimorphic transformation
 Disk-based on-demand digest storage
◦ Size increase: 20x inflation for 7th order and 30x for 8th
order mimimorphic transformation
 Typical malware are less than 100KB
 Mimimorphism results in 2~3MB files
The College of
WILLIAM k MARY
38

We propose mimimorphism as a novel binary
obfuscation technique
◦ Enhanced high order mimic functions with custom
assembler / disassembler
◦ Achieves evasion by disguising, not refusing detection
◦ Effective against both statistical anomaly detection as
well as semantic fingerprinting tests
The College of
WILLIAM k MARY
39

Robustness against other approaches
◦ Automatic n-gram detections
 Typical x86 instruction length: 2.1~2.8
 8th order mimimorphism can approach 16-gram mimicry
 Existing n-gram detection algorithms can hardly scale up to
◦ Static semantic analysis
 Mimimorphism does not target specific detection techniques
 Focuses on reproducing features from benign programs
 Immune to lower order signature detections
The College of
WILLIAM k MARY
40

Robustness against other approaches
◦ Deep syntactic analysis
 Fails to exactly reproduce high level syntactic features:
 45% “functions” do not have matching prologue and epilogue
 Many jump instructions go across function boundaries
 Detectable program-level anomalies
 Not all programs follow conventions
 Could lead to false positives
The College of
WILLIAM k MARY
41
The College of
WILLIAM k MARY

The Problem of the Unpacker
◦ Mimimorphic transformation does not provide solution for
hiding the unpacker
◦ However, we believe unpackers do benefit from using
mimimorphism
 Unpacker is the weakness of polymorphism because it is
easy to be “spotted” – all other payload is not executable!
 All mimimorphic payload is “executable”, separating
unpacker code from the payload becomes non-trivial
The College of
WILLIAM k MARY
43

Decoding
Mimicry
Binaries
Disassemble
High Order
Instruction
Mimic Function
Mimicry
Digest
Binary
Data
PRNG
The College of
WILLIAM k MARY
44

Decoding
Instruction
Prefix
MOV
XOR
PUSH
DEC
XOR
00Mimicry
PUSH
Binary
DEC
Decoded
MOV Bits
Inst. Prefixes
(Atomic op., repeat,
operand size, etc.)
0
1
ModR/M
(Mod / Reg. / R/M)
SIB
(Scale / Idx. / Base)
0
MOV
1
INC
PUSH
Displacement
Instruction Huffman Tree
COMMON_INST Structure
The College of
WILLIAM k MARY
45

Decoding
Decoded Bits
Instruction
Prefix
MOV
MOV
00
Inst. Prefixes
0
(Atomic op., repeat,
operand size, etc.)
16bit
XOR
PUSH
0DEC
1
1
REP
EAX 0 1
0
1
ModR/M
(Mod / Reg. / R/M) Inst. Prefix
Decoded Bits
SIB
(Scale / Idx. / Base)
ECX
0
0
EDX
INC
ModR/M
1
1
Displacement
2x8+16
SIB
COMMON_INST Structure
MOV
PUSH
3x4+0
……
Instruction Huffman Tree
Displacement
The College of
WILLIAM k MARY
46

Decoding
0101
Decoded Bits
MOV
MOV
Inst. Prefixes
(Atomic op., repeat,
operand size, etc.)
ModR/M
(Mod / Reg. / R/M)
SIB
(Scale / Idx. / Base)
Displacement
0
16bit
1
REP
Inst. Prefix
0
1
EAX
0
ECX
1
EDX
ModR/M
0
1
2x8+16
SIB
3x4+0
……
Displacement
The College of
WILLIAM k MARY
47

Decoding
0101
Decoded Bits
MOV
MOV
Inst. Prefixes
(Atomic op., repeat,
00 operand size, etc.)
ModR/M
(Mod / Reg. / R/M)
0
16bit
XOR
PUSH
0DEC
1
1
REP
EAX 0 1
0
1
Inst. Prefix
ECX
0
Decoded BitsSIB
(Scale / Idx. / Base)
Displacement
0
EDX
INC
ModR/M
1
1
MOV
2x8+16
SIB
PUSH
3x4+0
……
Instruction Huffman Tree
Displacement
The College of
WILLIAM k MARY
48

Decoding
Instruction
Prefix
Instruction
Prefix
XOR
PUSH
DEC
XOR
01001001
00
0101
10010101
PUSH
DEC
0
MOV
Decoded Bits
0
MOV
1
1
INC
PUSH
Instruction Huffman Tree
The College of
WILLIAM k MARY
49

similar documents