Presentation Deck

Report
DATA SECURITY
AND BIG DATA
Carole Murphy
November 20, 2013
Big Data Conferences
• Major conferences are opportunities to learn, meet
colleagues, see vendor demos
3
Executive Summary
Five Things You Need To Know About Big Data Security
1.
2.
3.
4.
5.
Time-to-insight is even more important than cost savings as a
business driver for Hadoop
Unless you take action, security is likely addressed later, and then
applies brakes to the Big Data project
Data security in the Hadoop ecosystem is about much more than
authorization and authentication
Traditional data security solutions protect data at rest, but not in
use or in motion. The best solutions retain data value even as they
remove security and compliance obstacles to the project
Big Data presents an opportunity to address security and
compliance across your IT environment. Look for adaptable and
extensible security solutions
4
Big Data IS Now!
• Biggest growth drivers
• Accelerating Enterprise adoption
• Maturing software
• Increasingly sophisticated
professional services
• Continued investment
• Transforming the Data
Center
• “By 2017, Big Data will be the
norm for data management…”*
*Forrester, The Top Emerging Technologies To Watch:
Now Through 2018, by Brian Hopkins and Frank E.
Gillett, February 7, 2013
5
Background
Big Data – What’s Different?
•
•
Data coming from many
sources
Doesn’t need a schema
ETL
DW
BI
Raw
Load
Hadoop
BI
– Dump raw loads of data into
Hadoop
•
Hadoop processing is so fast
– Compute in minutes what
would take a night to batch
process
•
BI is real-time
– Ask questions you didn’t
know you needed to ask
•
Elephant in the room
– Data “lake” many times
cheaper than DW path
6
ETL Offload Use Case*
Hadoop
(HDFS, Map Reduce, Pig)
* Presented by MapR at Hadoop Summit, San Jose, June 2013
BI
7
Taming the Explosion in Data
Exabytes per month
Optimizing Time-to-Insight
80
70
60
50
40
30
20
10
0
2000
“90% of the data in the
world today has been
created in the last two
years alone”* - IBM
2005
2010
2015
Parabolic growth in data created and consumed* - Cisco
• The explosion in data fuels growth and agility
• But time to data value is gated by risk and compliance
• Attacks to data are here to stay, and big data means a big target
• Balancing data access and data security is critical
8
Risk Increases as Data Moves to Cloud
and Big Data Environments
Risk Increases
•
•
Individual Apps
Mainframes
OLTP
Not created for the enterprise
Security is just starting to be bolted on
Hadoop
Data
Warehouse
(Oracle, Teradata,
Netezza, etc.)
Cloud
•
Who has control of your data?
9
Extracting Value from Data
Big Data Includes Sensitive Data
• Marketing – analyze purchase patterns
• Social media – find best customer segments
• Financial systems – model trading data
• Banking and insurance – 360° customer view
• Security – identify credit card fraud
• Healthcare – advance disease prevention
How do you liberate the value in data –
without increasing risk?
10
Why Projects Get Stopped
Hidden Risks in Big Data Adoption
Breach Risks
Data Concentration Risks
– Internal users
– External shares
– Backup’s, Hadoop
stores, data feeds
Data Sharing Risks
–
–
–
Compliance challenges
with 3rd party risk
Cross-border, data
residency
Data in and out of the
enterprise
– Financial position
– Market position
– Corporate Compliance risk
Big Data
 Enables deeper data
analysis
 More value from old data
 New risks if data is not
protected
Cloud Adoption Risks
– Sensitive data in untrusted
systems
– Data in storage, in use,
transmitted to cloud
11
Take Advantage of Big Data Benefits
Identifying an Effective Data Security Strategy
• Integrate security, enable access
• Protect sensitive data before entering Hadoop,
in Hadoop and on the way out
• Enable accurate analytics on encrypted data
• Assure compliance
• Address global compliance comprehensively
• Reduce audit scope for PCI to cut costs
• Provable, verified, published, peer-reviewed,
NIST recognized security techniques
• Optimize performance and extensibility
• High performance
• Adapt to the newest tools in the ecosystem
• Fit into infrastructure, fast and easy to
implement
12
Options for Security
Hadoop Community
• SSL
• Disabled by default; doesn’t cover all paths, adds latency and CPU load
• Existing Hadoop access controls
• Kerberos is still the primary way to secure a Hadoop cluster
• Not fine-grained, can’t limit by data type or column
• Inappropriate access post-analysis
• Sentry from Cloudera
• Offers permission controls accessing data through Hive
• Knox from Hortonworks
• Gateway server provides a single point of authentication and access for
Hadoop services in a cluster
• MapR native authentication and authorization
• Transparent integration with Kerberos OR option for native
authentication
13
Options for Security
Commercial Data Security Products
• Container-based encryption
• Data-at-rest security at the block or file level
• Do you want different people/applications to have access to
different data types?
• Traditional data-masking
• 1-way-only limits use cases (e.g. fraud analysis)
• Technique doesn’t support production use cases
• Application level
• Encryption and tokenization options
• Consider standards-based approaches, key management
14
Goals
• All sensitive data must be stored on disk in protected
form (encrypted or tokenized)
• Compliance requirements (PCI, HIPAA)
• Disks are often removed from data center for servicing
• There are many ways that data can flow into HDFS
• Such as unstructured data being copied directly in
• Sensitive data also should be protected during analysis
• Because Hadoop has insufficient access controls
• Provide access controls to data based on data type and
project (data set)
15
Solutions for Handling Structured and
Unstructured Data
• Disk Volume-level (whole file) encryption
• Enables compliance
• Covers unstructured data, from all sources
• Provides protection against drive loss
• Good, but may not be sufficient
• Doesn’t reduce audit scope for PCI DSS
• Access controls in Hadoop can’t control user access at the field level, so
access to the cluster may need to be restricted to pass PCI or HIPAA audit
• Field-level tokenization and/or encryption
• Enables wider use of the cluster by multiple teams
• Data sharing with certain fields remaining protected
• Protects against failures at multiple layers
• Required for regulatory compliance in many cases
16
All Hadoop Integration Options
Landing
Zone
Data
Sources
Data
Warehouse
ETL
HDFS
Batch
Sqoop
Sqoop
Map
Reduce
Map
Reduce
Flume
Hive
+ more
Storage
Encryption
+ more
Key Management,
Tokenization and Policy
Control
BI
Applications
17
Protecting Data Inbound to Hadoop
Landing
Zone
Data
Sources
Data
Warehouse
ETL
HDFS
Batch
Sqoop
Sqoop
Map
Reduce
Map
Reduce
Flume
Hive
+ more
Storage
Encryption
+ more
Before Ingestion
Key Management,
Tokenization and Policy
Control
BI
Applications
18
Protecting Data Inbound to Hadoop
Landing
Zone
Data
Sources
Data
Warehouse
ETL
HDFS
Batch
Sqoop
Sqoop
Map
Reduce
Map
Reduce
Flume
Hive
+ more
Storage
Encryption
+ more
During Ingestion
Key Management,
Tokenization and Policy
Control
BI
Applications
19
Protecting Data Inbound to Hadoop
Landing
Zone
Data
Sources
Data
Warehouse
ETL
HDFS
Batch
Sqoop
Sqoop
Map
Reduce
Map
Reduce
Flume
Hive
+ more
+ more
Storage
Encryption
After Ingestion
Key Management,
Tokenization and Policy
Control
BI
Applications
20
Retrieving Clear Data from Hadoop
Landing
Zone
Data
Sources
Data
Warehouse
ETL
HDFS
Batch
Sqoop
Sqoop
Map
Reduce
Map
Reduce
Flume
Hive
+ more
Storage
Encryption
+ more
Before export/query
Key Management,
Tokenization and Policy
Control
BI
Applications
21
Retrieving Clear Data from Hadoop
Landing
Zone
Data
Sources
Data
Warehouse
ETL
HDFS
Batch
Sqoop
Sqoop
Map
Reduce
Map
Reduce
Flume
Hive
+ more
Storage
Encryption
+ more
During export/query
Key Management,
Tokenization and Policy
Control
BI
Applications
22
Retrieving Clear Data from Hadoop
Landing
Zone
Data
Sources
Data
Warehouse
ETL
HDFS
Batch
Sqoop
Sqoop
Map
Reduce
Map
Reduce
Flume
Hive
+ more
Storage
Encryption
BI
Applications
+ more
After export/query
Key Management,
Tokenization and Policy
Control
23
PCI Data – Keep Hadoop and Data
Warehouse out of Audit Scope
Landing
Zone
Data
Sources
Data
Warehouse
ETL
HDFS
Batch
Sqoop
Sqoop
Map
Reduce
Map
Reduce
Flume
Hive
+ more
Storage
Encryption
+ more
Management, Tokenization
and Policy Control
BI
Applications
24
PHI Data – Encrypted in Hadoop for
HIPAA; Minimized Application Changes
Data
Warehouse
Data
Sources
HDFS
Sqoop
Sqoop
Map
Reduce
Map
Reduce
Flume
Hive
+ more
Storage
Encryption
+ more
Key Management,
Tokenization and Policy
Control
BI
Applications
25
Private Application Data – Critical part of
Compliance – 100% Transparent
Data
Warehouse
Data
Sources
HDFS
Sqoop
Sqoop
Map
Reduce
Map
Reduce
Flume
Hive
+ more
Storage
Encryption
+ more
Key Management,
Tokenization and Policy
Control
BI
Applications
Use Case: Healthcare Company
• Challenge
• Big Data team tasked with securing large multi-
node Hadoop cluster for HIPAA, HITECH
• Challenging time-frames
• Solution
• Data de-identified in ETL move before entering
Hadoop
• Ability to decrypt analytic results when needed,
through multiple tools
• Benefits
• Ability to leverage medical data to develop more
targeted marketing strategies and services to key
demographics
26
27
Use Case: Multi-national Bank
• Challenge
• PCI compliance is #1 driver
• ETL offload use case with Hadoop alongside a traditional
data warehouse
• Solution
• Integrate with Sqoop on ingestion; Hive on the applications
/ query side to protect dozens of data types
• Fraud analysts work with tokenized credit card numbers
• Benefits
• Enable fraud analytics directly on protected data in Hadoop
• Fraud analysts have ability to de-tokenize as needed with
strict controls
28
Use Case: U.S. Military Organization
• Challenge
• US Surgeon General directive – share healthcare
data with medical research institutes
• Maintain HIPAA/HITECH Compliance
• Solution
• De-identified 100+TB dataset at field level before
release
• Format-preserving encryption enables distributed
analytics in Hadoop
• Usable data values for accurate analytics
• Benefits
• Secure re-identification by Agency as needed
• Improved healthcare with compliance
29
Key Considerations
• Most Big Data projects are associated with Data
Warehouse projects…
• What is your data warehouse strategy (e.g. expansion, ETL offload
•
•
•
•
•
to Hadoop, integrating new data sources…)?
What is your use case(s)? What does the business need?
If you use de-identified data in Hadoop, would you ever need to get
back to the original data?
Will you have sensitive data going into Hadoop (PII, PCI, PHI)?
What compliance or privacy regulations are you concerned about
addressing?
Do you need data protection across disparate systems (open
systems to mainframe)?
30
Security Checklist to Make Big Data Safe
• Solves complex global compliance issues
• Ensures data stays protected wherever it goes
• Enables accurate analytics on encrypted data
• Optimizes performance
• Flexibly adapts to the fast-growing Hadoop
ecosystem
• Reduces PCI audit scope where applicable
31
About Voltage Security
• Origins: DARPA Funded Research at Stanford University
• Patented Innovations: 27
• Unstructured data: Identity Based Encryption (IBE)
• Structured data: Format Preserving Encryption (FPE), Tokenization,
Data Masking,
Stateless Key Management
• Leader in large scale data-centric security solutions.
• Customers: 1200+ Enterprise Customers/Government Agencies.
• Analyst Recognition: Gartner, Forrester, Burton IT1, Mercator
• Contact Voltage Security: www.voltage.com
31
Copyright 2013 Voltage Security
31
THANK YOU

similar documents