ppt - DEEPNESS Lab

Report
Automated Signature Extraction for High
Volume Attacks
Yehuda Afek
Anat Bremler-Barr
Shir Landau Feibish
This work is part of the Kabarnit–Cyber Consortium (2012-2014) under Magnet program, funded by the chief
scientist in the Israeli ministry of Industry, Trade and Labor.
This research was also partly supported by European Research Council (ERC) Starting Grant no. 259085.
Current DDoS Attack
Zombies
on
innocent
computers
Infrastructurelevel DDoS attacks
Bandwidth-level
DDoS attacks
2
Server-level DDoS
attacks
High volume attacks - Current Defense
Many different types of attackers:
Defense
Line1
Remaining attacks:
 Botnets (millions of computers)
 Hard to identify behaviorally,
under the radar screen
 Zero-day – no known signatures
Defense
Line 2
access
SYN cookies,
control list Challengefiltering
response
3
Defense
Line 3
behavioral
analysis
…
Defense
Line n
Call for
HELP!!
Signature based DDoS Attack Detection
Unknown (zero-day) attacks:


Some hope: Attack tools usually leave some unique footprint
(repeating pattern)

Example in packet:
Connection: KEEP-ALIVE

Today: Find signatures manually (human eye)

Our goal: Find it automatically
Signatures used by anti-DDoS devices and firewalls to stop
attack


4
Mitigation in minutes, good enough for these types of attacks
Signatures also used in

NIDS/IPS (Snort, Bro, etc.)

Worm detection (automated extraction)

Previous work:




Worm behavior (address dispersion, suspicious code, etc.)
Fixed-length signatures
Non-scalable
Notable works:





5
Kephart et al ‘94
Honeycomb [Kreibich et al ’04]
Earlybird [Singh et al ‘04]
Autograph[Kim et al ’04]
Hancock[Griffin et al ’09]
System Overview
Peace time traffic sample
Attack time traffic sample
Signature
Extraction
Attack signatures
e.g. Connection: KEEP-ALIVE
Our Challenge: Automatically find signatures that appear
frequently only during attack
Where:
Input collection:


6
In mitigation box (DDoS Guard/firewall/anti-DDoS etc.)
In the cloud – collect data from several collectors.
Signature Extraction - High Level
Signature Extraction
Peace time
traffic sample
Attack time
traffic sample
7
Find frequent
strings in peace
time traffic
Find frequent
strings in attack
time traffic
Take only
strings
found in
attack and
not in
peace
Attack signatures
e.g. Connection: KEEP-ALIVE
Our Goal
Automatically find signatures that appear frequently only
during attack
Requirements:
Find minimal set of signatures
1.

Allow signatures of varying lengths
Don’t include signatures found in legitimate traffic
2.
3.

Minimum false positives
Minimize space and time usage
4.


8
Some filtering devices have limited capacity
Large amounts of data
Quick response
Finding Frequent Strings in Traffic


Input: Sequence of packets
Output: Strings that appear frequently in packets
Peace time
traffic sample
Attack time
traffic sample

9
Find frequent
strings in attack
time traffic
Take only
strings
found in
attack and
not in
peace
Attack signatures
e.g. Connection: KEEP-ALIVE
Common Stringology solution: use suffix trees/arrays


Find frequent
strings in peace
time traffic
too much space
Our solution uses heavy hitters
Heavy Hitters (Frequent Items)

Input: N values, integer v
Output: v values each appearing at least N/v times

Approximate solution:




Uses O(v) space!
One pass over input!
Known counter based HH Algorithms:



10
Misra & Gries 1982
Lossy Counting – Monku and Motwani 2002
Space saving - Metwally et al 2005 – currently using
Space saving Heavy Hitters
[Metwally et al 2005]

Algorithm:

Maintain v values, and their counters.
Input
10
22
value
counter
30
10
1
10
22
1
35
30
1
50
11
Space saving Heavy Hitters
[Metwally et al 2005]

Algorithm:


Maintain v values, and their counters.
If next value x is one of the v, increment its counter.
Input
10
22
value
counter
30
10
2
10
22
1
35
30
1
50
12
Space saving Heavy Hitters
[Metwally et al 2005]

Algorithm:



Maintain v values, and their counters.
If next value x is one of the v, increment its counter.
Else take item with minimal counter c:


Replace value with x
New counter is c+1
Input
10

Error rate: N/v
22
value
counter
30
10
2
10
35
2
35
30
1
50
13
Our Solution



Heavy hitters usually done on numbers… how do we use it for
text?
k-grams: strings of length exactly k
Trivial idea: For each packet:



Take all k-grams (sliding window)
Do Heavy hitters on them
b1=abca
b2 = bcab
k-grams
b3 = cabc
Fixed length not good enough
Either too short: cuts up longer signatures



14
abcabcadefgfsdghjghnfdghfgsdhfjs
Substring pollution - Too many heavy hitters for one signature
Or too long : noisy signatures
Our Solution: Double Heavy Hitters

Double Heavy Hitters algorithm: two separate instances of
heavy hitters


Heavy Hitters 1: Find heavy hitters of k-grams
Heavy Hitters 2: Find heavy hitters of varying-length strings created
during run of Heavy Hitters 1
Input to Heavy
Hitters 1: k-grams
….
k
k
k
Heavy Hitters
1
k
k
15
k
k
Input to Heavy
Hitters 2: strings
string
string
Heavy Hitters
2
string
string
k
string
Output is
output of
Heavy
Hitters 2
Double Heavy Hitters Algorithm


While processing k-grams in Heavy Hitters1
Find max run of k-grams:




Already in Heavy Hitters 1
Counters of consecutive k-grams maintain predefined ratio
Create string
Insert into Heavy Hitters 2
k-grams:
abca
Is already
in Heavy
Hitters 1?
N
bcab
cabc
abca
bcab
cabc
abcd
N
N
Y
Y
Y
N
abcabc
Check ratio
16
bcda
N
cdab
dabc
abca
N
N
Y
abca
Double Heavy Hitters Algorithm

Example:
bi
K-gram
b1
abca
b2
bcab
Heavy Hitters 1
Heavy Hitters 2
b3
cabc
K-gram
counter
string
counter
b4
abca
abca
1
NULL
0
b5
bcab
bcab
1
NULL
0
b6
cabc
cabc
1
NULL
0
b7
abcd
17
Input:
abcabcabcd
Double Heavy Hitters Algorithm

Example:
bi
K-gram
b1
abca
b2
bcab
Heavy Hitters 1
Heavy Hitters 2
b3
cabc
K-gram
counter
string
counter
b4
abca
abca
2
NULL
0
b5
bcab
bcab
1
NULL
0
b6
cabc
cabc
1
NULL
0
b7
abcd
18
Input:
abcabcabcd
String =
abca
Double Heavy Hitters Algorithm

Example:
bi
K-gram
b1
abca
b2
bcab
Heavy Hitters 1
Heavy Hitters 2
b3
cabc
K-gram
counter
string
counter
b4
abca
abca
2
NULL
0
b5
bcab
bcab
2
NULL
0
b6
cabc
cabc
1
NULL
0
b7
abcd
19
Input:
abcabcabcd
String =
abcab
Double Heavy Hitters Algorithm

Example:
bi
K-gram
b1
abca
b2
bcab
Heavy Hitters 1
Heavy Hitters 2
b3
cabc
K-gram
counter
string
counter
b4
abca
abca
2
NULL
0
b5
bcab
bcab
2
NULL
0
b6
cabc
cabc
2
NULL
0
b7
abcd
20
Input:
abcabcabcd
String =
abcabc
Double Heavy Hitters Algorithm

Example:
bi
K-gram
b1
abca
b2
bcab
Heavy Hitters 1
Heavy Hitters 2
b3
cabc
K-gram
counter
string
b4
abca
abcd
3
abcabc 1
b5
bcab
bcab
2
NULL
0
b6
cabc
cabc
2
NULL
0
b7
abcd
21
Input:
abcabcabcd
String =
abcabc
counter
Heavy Hitters on text – improving the
estimation



Problem: substrings in heavy hitters
Only longest run is in input to HH2
Correct the count:


After run of algorithm
For all strings s in Heavy Hitters 2:

Heavy Hitters 2
string
counter
wonder
200
woman
300
wonderwoman
100
Find other strings which contain s and add their counters to s’s counter
Heavy Hitters 2
22
string
counter
Real
counter
wonder
200
300
woman
300
400
wonderwoman
100
100
Double Heavy Hitters Algorithm Analysis

Input:



Error bounds:



For HH1 with v items: N/v
For HH2 with v items: C/v
We Prove:


23
Input to HH1: N k-grams
Input to HH2: C consecutive grams
C ≤ N/(k + 1)
Overall: Error bound of the Double Heavy Hitters algorithm
Signature Extraction - High Level
Signature Extraction
Peace time
traffic sample
Attack time
traffic sample
Find frequent
strings in peace
time traffic
Find frequent
strings in attack
time traffic
Take only
strings
found in
attack and
not in
peace
Attack signatures
e.g. Connection: keep-ALIVE
Formalize with
thresholds
24
Chose Signatures

Create signatures that never appear in legitimate traffic
Thresholds:
Attack-high
Peace-low
Peace-high
Delta
Strings in attack
with frequency
> Attack-High
25
Chose Signatures

Create signatures that never appear in legitimate traffic
Thresholds:
Attack-high
Peace-low
Peace-high
Delta
Strings in attack
with frequency
> Attack-High
Signatures
26
Strings in peace
time
False
positives
Chose Signatures

Create signatures that rarely appear in legitimate traffic
Thresholds:
Attack-high
Peace-low
Peace-high
Delta
Strings in attack
with frequency
> Attack-High
Signatures
27
Strings in peace
with frequency
> Peace-Low
False
positives
Chose Signatures

Create signatures that may appear in legitimate traffic, but appear in attack
traffic much more
Thresholds:
Attack-high
Peace-low
Peace-high
Delta
Strings in attack
with frequency
> Attack-High
frequency >
Peace-high
frequency >
Peace-Low
Signatures
Signatures only if attack
frequency at least delta
more than peace frequency
28
False
positives
Use peace traffic to create filters

Use our Double Heavy Hitters algorithm on peace time traffic:
100%
Peace time traffic packets payload:
White list
abcabcadefgfsdghjghnfdghfg......
b1=abca
b2 = bcab
b3 = cabc
Double
Heavy
Hitters
Algorithm
frequency >
Peace-high
……
Peace-high
Output
values
Maybe white
list
50%
frequency >
Peace-high
frequency >
Peace-Low
Peace-low
Not white list
0%
29
Extracting Attack Signatures

Now use Double Heavy Hitters algorithm on attack time traffic
with filters
Attack traffic packets payload:
hagdhdadjashdklahdjkasfjasbfjabfhfgahfvhsbdfjkasnkiaywtqyeffcgfacsdxasdbas
b1=hagd
b2 = agdh
Heavy
Hitters
1
string
Modified DHH
30
b3 = gdhd ……
White list:
discard if
contained in
whitelist string
Heavy
Hitters
2
Output
values
Signatures
Maybe white list:
frequency
> AttackHigh
Evaluations


Overall eleven tests:
 Ten real attack captures
 5 captures of peacetime traffic
 5 synthetic peacetime captures
 One Synthetic attack in real peace
time traffic
Compare to human expert
31
Sample Signatures




Could not
be identified
manually
Extra newline between header fields
Use of upper-case characters, where usually lower
Use of a rarely used HTTP field
Use of rare user agent.
32
Results – Accuracy of Double Heavy Hitters
estimation
Algorithm (DHH)
Actual Count (frequency)
100
90
Percent
80
70
60
50
40
30
20
10
0
1

9
13
17
21
Signatures
25
29
33
37
Graph of frequency of signatures


33
5
RED – Actual count (frequency) in attack traffic
BLUE – Algorithm (DHH) estimation of frequency of signatures
Results - Attack Rate Estimation
Tests with real peace time traffic
Tests with synthetic peace time traffic
100
90
Attack rate
80
70
60
50
40
30
20
10
0
1
2
3
4
5
6
Test Number
34
7
8
9
10
11
Human Expert
Our Algorithm
Results – Recall and Precision Estimation
Precision: relevant packets from all identified
Tests with real
peace time traffic
Tests with synthetic
peace time traffic
100
Recall: identified packets from
all relevant
90
80
Average: 99.96
Worst case: 99.8
Percent
70
60
50
40
30
20
10
0
1
2
3
4
5
6
Test Number
35
7
8
9
10
11
Peacetime based
Attack based
Future Work

Identify signatures always found in same packets

Good synthetic peace-time traffic, global white-list

Support regular expression signatures
36
37

similar documents