Securing Networks using SDN and Machine Learning

Report
SECURING NETWORKS USING
SDN AND MACHINE LEARNING
DRAGOS COMANECI – IXIA
@DRCOMANECI
[email protected]
ABOUT ME
• Sofware Engineer/Security Researcher at Ixia in the ATI (Application Threat
Intelligence) team
• Reverse engineering & emulating application protocols and strikes
• Doing a PhD on Software-Enabled Adaptive Network Traffic Management
(short version: SDN + ML )
SHORT INTRODUCTION
Problem:
• Traditional signature-based IPS/IDS approaches won’t scale as the network
becomes complex
Solution:
• Adaptive way of defending the network: SDN & Machine Learning
• Allows: Anomaly detection, botnet detection, honeypot rerouting
SYSTEM OVERVIEW
Progressive Flow Classification
Supervised
Learning
Unsupervised
Learning
SDN
Controller
Flow Grouping
Network
Devices
INTEGRATING FLOW CLASSIFICATION INTO AN
SDN CONTROLLER
• Modern SDN Controllers are basically event handlers
• Streams of events come into the controller from the network and
are transformed into forwarding rules
• Structure flow classification as events (e.g. flow match)
NETWORK ANOMALY DETECTION
• Continually train & refine supervised models for the traffic flows in our
network
• When a new flow doesn’t match any model flag it as suspicious, add it to the
queue for the clustering algorithm
• Run clustering with side information to see if there are other flows similar to it
• If it’s in a separate cluster => anomaly; if not, refine the model for the closest
match
BOTNET DETECTION
• Groups of hosts communicate periodically with a C&C server and receive
commands from it that are executed (eg. performing DDoS, scanning the
network, sending spam, etc.)
• Communication flow with the C&C server => anomaly
• Similar communication flows are performed afterwards for the command =>
group of related flows
• Anomaly + group of related flows originating from the same host afterwards
=> bot
HONEYPOT TRAFFIC REROUTING
• As before, if the flow doesn’t match any supervised model, mark the host
which initiated it as suspicious and store the flow 5-tuple
• Next time the host that initiated it tries to communicate reroute that flow to a
honeypot
SYSTEM ARCHITECTURE
Hadoop Cluster
Traffic Flows &
Computed Features
Classifier Models &
Flow Groups
Network Forwarding Elements
Network Controller
Network Element
Common
Distributed State
Data Store
Nettle
Controller
VM
Nettle
Controller
VM
Traffic
Classifier
Flow Classification
Events
Nettle
Controller
VM
Forwarding Rules &
Classifier Models
Network Element
Network Element
Traffic
Classifier
Traffic
Classifier
EXPERIMENTAL TESTBED
OVS
Switch
ML Enhanced SDN Controller
OVS
Switch
Diffuse
Classifier
OVS
Switch
OVS
Switch
Virtualized
Switches
Ixia BreakingPoint Application
Traffic Emulator
TESTING & RESULTS
• Used the Ixia BreakingPoint traffic emulator to simulate Enterprise, Small
Business and ISP network traffic: Enterprise, SOHO/Small Business, Sandvine
2H 2013 North America Fixed application profiles
TESTING & RESULTS
• Along with the normal network traffic, we also emulated application attacks
(Critical Strikes strikelist – 607 strikes) as well as botnet traffic (1646
different botnets, the majority of them HTTP based)
EVALUATION & RESULTS
• For training data, we generated packet captures with 256 streams for each
flow type in the application profile
• Then, we proceeded to train classification models for Diffuse (C4.5) for each
flow type through the WEKA ML framework
• Classification Accuracy:
Application Profile
Without attack/botnet traffic
With attack/botnet traffic
Enterprise
82%
68%
SOHO/Small Business
87%
71%
Sandvine 2H 2013 North
America Fixed
79%
63%
CLASSIFICATION TIME
• How many packets do we have to inspect before we can reach a conclusion
about the flow type? (cap at 20 packets)
• Flow features:
• Minimum, mean, maximum,
standard deviation and sum of
the packet sizes
• First 10 packet sizes
• First 10 packet communication
endpoint (initiator/responder)
RESOURCE USAGE OVERHEAD
• 1 Mininet VM with Diffuse installed simulating a topology with 4 switches;
learning switch SDN controller running in the same machine;
CPU usage overhead when enabling Diffuse: 17%
Memory usage overhead: 13%
CONCLUSIONS
• Machine learning flow classification & SDN can work together to
make the network adaptive
•
We can extract & use three types of information from the network:
• Flow type classification
• New flow type classifiers
• Flow groups
• Anomaly detection, botnet detection & honeypot rerouting can be
done
• ML traffic classification overhead is manageable

similar documents