Vithushan

Report
Entropy and Malware Detection
ITEC808 – Final Project Presentation
Vithushan Sivalingam
Student No: 42413753
Supervisors:
Prof. Vijay Varadharanjan & Dr Udaya Tupakula
11th November 2011
1/30
Contents
Introduction
 Project Aims
 Shannon’s Entropy Review
 Malware
 Entropy techniques with malware
 Analysis of the schemes
 Discussion
 Conclusion
 Future Works

2/30
Introduction

Entropy quantifies the uncertainty involved in predicting
the value of a random variable.
 The outcome of a fair coin flip (two equally likely outcomes)provides less information (lower
entropy) than specifying the outcome from a roll of a dice (six equally likely outcomes).

In real world, most collections of data give the false
information somewhere in between.
 False Positive - Couldn’t be Identify the software to be malicious, but missed it and it is a
malicious.
 False Negative - Identify the software be malicious, but it doesn’t turn out to be.
3/30

Malware detection plays a significant role in protecting
against attacks launched on a communication world.

Still malware detection tools cannot fully prevent against
encrypted and packed malwares.

Explore improvement of malware detection through entropy
techniques.
4/30
Project Aims

The main goal of this project was to investigate the
development of suitable entropy techniques to detect
malware.

ITEC808 Literature View Component are:
 Reviewing the Shannon’s entropy method.
 Identifying of malware attributes and functionality.
 Detailed understanding of entropy techniques and malware
detection.
 Study of entropy based malware detection schemes.
 Analysing and reasoning about the efficiency of the proposed
schemes.
5/30
Problems and Significance

Understanding the entropy theorem.

Malware Growth & Identifying attributes and
functionality.

Understanding on statistical variation in malware
executables.
6/30

Investigate the development of suitable entropy
techniques to detect malware.

Which could be helpful for security analysts to
identify more efficiently malware samples
(packed or encrypted).
7/30
Shannon’s Entropy Review

Point to Point Communication.
•
Given two random variables, what can we say about one
when we know the other? This is the central problem in
information theory.
•
Keywords : Choice, Uncertain and Entropy
8/30

The entropy of a random variable X is
defined by

() = −

 log 
=1
() =
 log
1

=1
X- information source
•

The entropy is non-negative. It is zero when
the random variable is “certain” to be
predicted.
9/30

Flip Coin {0.5,0.5}
1
Fair distribution
1
◦ H(x) = 1 2 log 2 1 +1 2 log 2 1
2

Double headed {1}
◦ H(x) = 1 log 2

2
1
1
≈ 1 bit (Receive 1 bit of information)
Known distribution
= 0 bit
Unfair Coin {0.75,0.25}
1
Unfair distribution
1
◦ H(x) = 3 4 log 2 3 +1 4 log 2 1
4
4
≈ 0.811 bit
10/30
H(X)
Bits
Probability
 Fair distribution entropy reached the highest level (1 bit)
 Known distribution, entropy getting 0 bits of information. ( P = 1 or 0)
 Unfair distribution, the entropy lower than maximum. (not balanced)
11/30

Joint Entropy
 For two random variables X and Y , the joint entropy is defined by
 H(X,Y) =

,  ,  log 
1
,
Conditional entropy
 Between two random variables X and Y are dependent. The extra
information X contains ones Y disclosed.
   =
1
, (, ) log (,)

Continue with chain of entropy rules.

Shannon was able to produce 22 information
related theorems and 7 appendices with the
mathematical explanations.
12/30
Entropy
Mutual Information
(Information Gain)
Joint
Entropy
Conditional Entropy
◦
◦
◦
◦

H(X) - H(X|Y) = H(Y) - H (Y|X)
H(X,Y) = H(X) +H(Y) (Independent)
H(X,Y) < H(X) +H(Y) (dependent)
H(X,Y) = H(X) + H(Y|X) = H(Y) + H (X|Y)
These entropy techniques helps to build the detection
models.
13/30
Malware

Malware labelled by its attributes, behaviours and
attack patterns.
14/30
.

Reported that among 20, 000 malware samples more than 80%
were packed by packers from 150 different families.

If the malware, modified in runtime encryption or compression,
known as a packed malware.

This process compresses an executable file and modifies the file
containing the code to decompress it at runtime
15/30

Packed executable is built with two main parts.
 Initially, the original executable is compressed and kept in a packed
executable as a file.
 Secondly, a decompression section is added to the packed
executable. (This section is used to reinstall the main executable. )
16/30
Entropy techniques with malware

Entropy of packed information is higher than the original information.
 Information is reduced by compression and a series of bits becomes more
unpredictable, which is equivalent to uncertainty.
◦ Packed Information
 Uncertainty
Information
◦ Original Information.
 Uncertainty
Information

Entropy
Entropy
False alarms play a big role.
 Possible that legitimately compressed and encrypted files could trigger
false positives.
17/30

But we can use entropy to determine whether it’s an
anomaly or not.
 Establish categories based on different entropies.
 If entropy over a threshold then we can categories to be malicious and
below that value all being not malicious.
Not Malicious
Malicious

That means, we can use the entropy as a measure to classify the
software to be malware.
18/30
Analysis of the schemes

In the Information-theoretic Measures for Anomaly
Detection.

Objective
 Provide theoretical foundation as well as useful tools that can
facilitate the IDS development process and improve the
effectiveness of ID technologies.

Experiments on
 University of New Mexico (UNM) sendmail system call data
 MIT Lincoln Lab sendmail BSM data
 MIT Lincoln Lab tcpdump data
19/30

Approach:
 Entropy and conditional entropy: regularity
 Determine how to build a model.
 Joint (conditional) entropy: how the regularities
between training and test datasets relate
 Determine the performance of a model on test data.
 A classification approach:
 Given the first k system calls, predict the k+1th system call
20/30
Conditional Entropy of Training Data (UNM)
Conditional Entropy
0.6
0.5
bounce-1.int
bounce.int
0.4
queue.int
0.3
plus.int
sendmail.int
0.2
total
mean
0.1
17
15
13
11
9
7
5
3
1
0
sliding window size
• More information is included, the more regular the dataset.
21/30
Misclassification Rate: Training Data
50
Misclassification Rate
45
40
bounce-1.int
35
bounce.int
30
queue.int
25
plus.int
20
sendmail.int
15
total
10
mean
5
17
15
13
11
9
7
5
3
1
0
sliding window size
• Misclassification means that the classification process classifies an item to be in
class A while the actual class is B.
• The misclassification rate is used to measure anomaly detection performance.
22/30
Conditional Entropy vs. Misclassification Rate
condEnt and misClass rate
1.2
1
0.8
total-CondEnt
total-MisClass
0.6
mean-CondEnt
mean-MisClass
0.4
0.2
0
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17
sliding window size
• The movement of misclassification rate coincides with the movement of
conditional entropy.
• Estimated movement of misclassification rate, to select a sequence length
for the detection model.
• E.g. Length 6 is better than 4, and 14 is better than 6.
23/30
30
sendmail.int
25
total
20
sm-10763.int
15
syslog-local-1.int
10
fwd-loops-1.int
5
fwd-loops-2.int
0
fwd-loops-3.int
sliding window size
17
plus.int
15
35
13
queue.int
11
40
9
bounce.int
7
45
5
bounce-1.int
3
50
1
Misclassification Rate
Misclassification Rate of Testing Data and Intrusion Data
fwd-loops-4.int
fwd-loops-5.int
• Misclassification rate used as a indicator to determine whether it is an
abnormal trace or normal trace .
24/30
Other Schemes Objectives

“Unpacking using Entropy Analysis” analysis, how to use entropy to quickly
and efficiently identify packed or encrypted malware executable and offer
results from testing methodology.
◦ - bintropy technique

“Estimation for real-time encrypted traffic identification” analysis Entropy
and describes a novel approach to classify network traffic into encrypted and
unencrypted traffic.
◦ real-time encrypted traffic detector (RTETD)
◦ The classifier is able to operate in real-time as only the first packet of each flow is processed
◦ Used encrypted Skype traffic
25 /30
Discussion

Through studying the schemes and information theory I was
able to find the follows.
 Entropy can be used to measure the regularity of reviewing
datasets of mixture of records.
 Conditional entropy can be used to measure the regularity on
sequential dependencies of reviewing datasets of structured
records.
 Relative entropy can be used to measure the relationship between
the regularity (consistency) measures of two datasets.
 Information gain used to categorise the classifying data items.
26/30
Conclusion

Review and Analyse of Shannon’s entropy study, with
Examples.

Research and identification of malware (packed)
functionalities with characteristics and attributes.

Analysis of entropy based schemes.

These significant findings will be following up in future
work.
27/30
 Involving on the Investigation of entropy analysis for selected
software samples.
o Use the entropy techniques to compute the entropy scores
from the selected malware executable samples.
 Identify the experimental tools.
o We planed to analysis the malware samples using commercial
experiments tools. E.g. PECompact Executable Compressor
28/30
Reference
1.
C. E. Shannon.The Mathematical Theory of Communication. Reprinted with corrections from
The Bell System Technical Journal,Vol. 27, pp. 379–423, 623–656, July, October, 1948.
2.
M. Morgenstern and A. Marx. Runtime packer testing experiences. In Proceedings of the 2nd
International CARO Workshop, 2008.
3.
*Lee, W., Xiang, D.: Information-theoretic Measures for Anomaly Detection. In: IEEE Symp. On
Security and Privacy, Oakland, CA, pp. 130-143 (2001).
4.
M. Morgenstern and Hendrik Pilz, AV-Test GmbH, Magdeburg, Germany, Useful and useless
statistics about viruses and anti-virus programs, Presented at CARO 2010 Helsinki.
5.
*Lyda, R., Hamrock, J.: Using Entropy Analysis to Find Encrypted and Packed Malware. In:
Security & Privacy, IEEE Volume 5, Issue 2, pp. 40-45, Digital Object Identifier
10.1109/MSP.2007.48 (March-April 2007).
6.
Guhyeon Jeong, Euijin Choo, Joosuk Lee, Munkhbayar Bat-Erdene, and Heejo Lee Generic,
Unpacking using Entropy Analysis, Div. of Computer & Communication Engineering, Korea
University, Seoul, Republic of Korea, 2010.
7.
*Peter Dorfinger, Georg Panholzer, and Wolfgang John: Entropy estimation for real-time
encrypted traffic identification: Salzburg Research, Salzburg, Austria, 2010.
29/30
Thank you.
30

similar documents