Unintended Consequences of Data Sharing Laws and Rules

Unintended Consequences of
Data Sharing Laws and Rules
Sam Weber
Software Engineering Institute, CMU
Laws and regulations concerning data sharing
and privacy often have unintended
Problem Space
• Three dimensions of issue:
– What is ethical?
– What is legal?
• Usually, laws are an attempt to codify ethical rules.
– Utility
• What are people trying to accomplish?
• Often dimensions conflict:
“No useful database can ever be perfectly anonymous,
and as the utility of data increases, the privacy
• Paul Ohm, “Broken Promises of Privacy: Responding to The
Surprising Failure of Anonymization”
Common Issues
• Sharing of cybersecurity data
– Can defenders collect/share personal information about people “for good”?
• Health care data
– Who has what rights over a patient’s medical data?
• Patient, doctors and insurance companies all have different interests
– What rules should protect people whose data is used for medical research?
• General
– What is personally identifiable information?
• What is privacy?
– What about data that has already “gone out”?
– What about data that can be inferred from already revealed information?
– How do deal with changed laws/technology?
• When laws/technology changes, what happens to existing databases?
• What happens in cases of international data sharing?
Unintended Consequences
• Three real-world situations:
1. Medical research and skull-stripping
2. Botnets and intrusion data
3. Burglar/Girls Around Me applications
Medical Research
• Biomedical research strongly dependent
on access to health data
– ex: brain scans for alzheimer’s research
• Want to protect privacy of people whose data is
– Two general approaches:
• Privacy/USA: generally industry-specific (HIPAA, Driver’s
Privacy Protection Act,…)
– HIPAA identifies Personally Identifying Information
• Privacy/EU: global, Data Protection Directive
– PII “anything that can be used to identify you”
– First is often ineffective, second is unstable
Skull Stripping and Biomedical Research
• HIPAA Privacy Rule regulations permit
use/disclosure of data that have been removed of
patient identifiers w/o authorization
– Informed consent otherwise difficult to
• Problem: from MRI of head, can reconstruct face
• Solution: “skull stripping”/“defacing” algorithms
(ex: Bischoff-Grethe et. al. “A technique for the deidentification of structural brain
MR image”)
• Notice: NO real threat!
– Probably entirely wasted effort
• Situation: Researcher discovers botnet C&C on
university’s machine
– Allows botnet to continue to run, but observes it
– Discovers how botnet works and finds ways to defeat
• First question: Is this ethical?
– Con: Researcher-controlled machine is knowingly
attacking innocent people
– Pro: Researcher isn’t making existing situation worse,
and is in long-run making people more secure
• Current botnet strategy
– Take over victim machine, then cause it to do illegal
action. Only do “real” activity if illegal activity took
– Effectively disables defenders who are bound not to
allow illegal activity
• Strategy if defenders aren’t allowed to violate
victim’s privacy:
– Bind all command-and-control activity to PII of victims
– Inhibit all data sharing!
Burglary App
• Researchers asked IRB for permission to write app that
– Collected public photos from Flickr, Facebook and other places
– Automatically located people who
• Had address of home discoverable
• Lived close to researchers
• Were currently on vacation more than 1000 miles away
• IRB granted complete permission (to researchers’ surprise)
– Information is already publically available
– Okay even to publically release application
• “Girls Around Me” app
– Used information from foursquare, Facebook, etc to:
• Locate girls/boys currently physically close to user
• Display photos and bios of said girls/boys
• Is there any difference between two applications?
– We know that privacy attacks are currently being done, but unpublished. If
research is prohibited, then defenders are hampered
• Need to consider implications of policies
– Threat models useful:
• What threats are you intending to counter?
• How will attackers respond to policy?
– Need to consider utility/social good
Dead Sea Scrolls
• Scrolls found in 1940s
– access controlled by owners, majority still
unreleased by 1990
• Concordance prepared in 1950s:
– Used for linguistic analysis
– Alphabetical listing of words in document, along
with words immediately before and after
• 1991: Wacholder and Abegg reconstructed
scrolls from concordance
Legal/Ethical Issues
• Laws aren’t logical rules, conflict, ambiguous, change over time
• Laws
– Privacy/USA: generally industry-specific (HIPAA, Driver’s Privacy Protection
• HIPAA identifies Personally Identifying Information
– Privacy/EU: global, Data Protection Directive
• PII “anything that can be used to identify you”
– Network data: variety of wiretapping laws, etc
• Even if action is legal, may not be ethical
– Laws often lag technology
• Constrain big data solutions
– Given data from multiple sources, what are the applicable laws? What
happens when laws change? What exact purposes can the data be used for?
What are the restrictions upon the analyses that can be performed?
– Real issue: certain experiments can be conducted at some US universities but
not others, because of different IRB rulings.
Data Creation/Storage
• What data do you store, and how?
– Store anonymized data? Create synthetic data?
– What about data gathered with different
anonymization requirements?
• When laws/attacks change, how do you
– What meta-data is needed about origin of data?
• Long history of de-anonymization work
• Inherent tradeoff between usability and
– Attacker models often unclear
• Potential solution: keep track of information
already disclosed
– Result: self-destructing databases

similar documents