Four years a SPY - Lessons learned in the interdisciplinary project

Report
Four years a SPY
- Lessons learned in the
interdisciplinary project SPION
(Security and Privacy in Online Social Networks)
Bettina Berendt
Department of Computer Science,
KU Leuven, Belgium
www.berendt.de , www.spion.me
Thanks to (in more or less
chronological order)
•
•
•
•
•
•
•
•
•
•
•
Sarah Spiekermann
Seda Gürses
Sören Preibusch
Bo Gao
Ralf De Wolf
Brendan Van Alsenoy
Rula Sayaf
Thomas Peetz
Ellen Vanderhoven
my other SPION colleagues
and many others co-authors and collaborators!
[All references for these slides are at the end of the slide set.]
Overview
1. What can data mining do for privacy?
2. Beyond privacy: discrimination/fairness,
democracy
3. Towards sustainable solutions
1. What can data mining do for privacy?
The Siren (AD 2000)
 1. DM can detect privacy phenomena
2. DM can cause privacy violations
3. DM can be modified to avoid privacy
violations
3. DM can be modified to avoid privacy
violations
Is that
sufficient?
... because: What is privacy?
• Privacy is not only hiding information:
▫ “dynamic boundary regulation processes […] a selective control of access
to the self or to one's group“ (Altman/Petronio)
▫ Different research traditions relevant to CS:
& vis-à-vis whom? Social vs. institutional privacy
AND: Privacy vis-à-vis whom?
Social privacy, institutional privacy, freedom from
surveillance
... because: What is privacy?
... and what is data mining? whom? Social
vs. institutional privacy
13
Goal (AD ~ 2008): From the simple view ...
towards a more comprehensive view
4. DM can affect our perception of reality
4. DM can affect our perception of reality –
also enhancing awareness & reflection?!
Privacy feedback and awareness tools
encrypted content,
unobservable communication
selectivity by
access
control
offline communities:
social identities,
social requirements
identification
of information flows
profiling
feedback &
awareness tools
educational materials
cognitive biases and
and communication design nudging interventions
legal aspects
Complementary technical
approaches in SPION
• DTAI is 1 of the technical partners (with COSIC and DistriNet)
• Developing software tool for Privacy Feedback and Awareness
• Collaborating with other partners (general interdisciplinary
questions, requirements, evaluation)
• What is Privacy Feedback and Awareness? Examples ...
Only these ^^^ friends should see it ^^^
Nobody else should even know I communicated with them
Who are (groups of) recipients in this network anyway?
What happens with my data? What can I do about this?
1. What can data mining do for privacy?
Case study FreeBu: a tool that uses
community-detection algorithms
for helping users perform audience
management on Facebook
An F&A tool for audience management
FreeBu (1): circle
FreeBu (2): circle
FreeBu (3): map
FreeBu (4): column
FreeBu (5): rank
FreeBu is interactive, but does it give a good starting
point? Testing against 3 ground-truth groupings and finding
“the best“ community-detection algorithm
FreeBu: better than Facebook Smart
Lists for access control
• User experiment, n=16
• 2 groups, same interface Result:
(circle), algo: hierarchical
modularity-maximisation
vs. Facebook Smart Lists
• Task: think of 3 posts
that you wouldn‘t want
everybody to see, select
from the given groups
those who should see it
FreeBu: What do users think?
• Two user studies with a total of 12 / 147 participants
• Method: exploratory, mixed methods (interview, questionnaire, log
analysis)
• Results:
▫ Affordances: grouping for access control, reflection/overview,
(unfriending)
▫ Visual effects on attention – examples “map“ & “rank“ vis.s:
More observations
• No relationship of tool appreciation with privacy
concerns
• “don‘t tell my friends I am using your tool to spy on
them“
• “don‘t give these data to your colleague“
• “how can you show these photos [in an internal
presentation] without getting your friends‘ consent
first?“
• Trust in Facebook > trust in researchers & colleagues?
• Or: machines / abstract people vs. concrete people?
• Recognition of privacy interdependencies? (
discussion of „choice“ earlier today)
• Feedback tools are themselves spying tools ...
Lessons learned
• Social privacy trumps institutional privacy
• Change in attitudes or behaviour takes time
• No graceful degradation w.r.t. usability:
▫ Tools that are <100% usable are NOT used AT ALL.
• What is GOOD? What is BETTER?
2. Beyond privacy:
discrimination/fairness
“Privacy is not the problem“
• Privacy, social justice, and democracy
• View 1: Privacy is a problem (partly) because its
violation may lead to discrimination.
“Data mining IS discrimination“
32
“Data mining IS discrimination“
33
“Privacy is not the problem“
• Privacy, social justice, and democracy
• View 1: Privacy is a problem (partly) because its
violation may lead to discrimination.
• View 2: Privacy is one of a set of social issues.
Discrimination-aware data mining
(Pedreschi, Ruggieri, & Turini, 2008,
+ many since then)
 PD and PND items: potentially (not) discriminatory
– goal: want to detect & block mined rules such as
purpose=new_car & gender = female → credit=no
– measures of discriminatory power of a rule include
elift (B&A → C) = conf (B&A → C) / conf (B → C) ,
where A is a PD item and B a PND item
Note: 2 uses/tasks of data mining here:
 Descriptive
 “In the past, women who got a loan for a new car often defaulted on it.“
 Prescriptive
 (Therefore) “Women who want a new car should not get a loan.“
Limitations of classical DADM
Constraint-oriented DADM
Exploratory DADM
Detection
• Can only detect discrimination
by pre-defined features /
constraints
• Ex.:
PD(female), PND(haschildren),
but discrimination of mothers
Exploratory data analysis
supports feature construction,
new feature analyses
Avoidance of creation
Fully automatic decision making:
cannot implement the legal
concept of „treat equal things
equally and different things
differently“ (AI-hard)
Semi-automated decision support:
sanitized rules  sanitized
minds?
?
Salience, awareness, reflection
 better decisions?
Exploratory DADM: DCUBE-GUI
Left: rule count (size) vs. PD/non-PD (colour)
Right: rule count (size) vs. AD-measure (rainbow-colours scale)
Evaluation: Comparing c & eDADM
Constraint-oriented DADM
Exploratory DADM
Detection
• Can only detect discrimination
by pre-defined features /
constraints
• Ex.:
PD(female), PND(haschildren),
but discrimination of mothers
Exploratory data analysis
supports feature construction,
new feature analyses
Avoidance of creation
Fully automatic decision making:
“hiding bad patterns“,
cannot implement the legal
black box
concept of „treat equal things
equally and different things
differently“ (AI-hard)
Semi-automated decision support:
sanitized rules  sanitized
minds?
?
“highlighting bad
patterns“, white box
Salience, awareness, reflection
 better decisions?
Online experiment with 215 US mTurkers
Framing
Tasks
Questionnaire
 Prevention: bank
 Detection: agency
 $6.00 show-up fee
 3 Exercise tasks
 6 Assessed tasks
 $0.25 performance
bonus per AT
 Demographics
 Quant/bank job
 Experience with
discrimination
Dabiku is a Kenyan national. She is single and has no children. She has been
employed as a manager for the past 10 years. She now asks for a loan of $10,000
for 24 months to set up her own business. She has $100 in her checking account
and no other debts. There have been some delays in paying back past loans.
Decision-making scenario
Task structure
 Vignette, describing applicant and application
 Rules: positive/negative risks, flagged
 Decision and motivation, optional comment
Required competencies
 Discard discrimination-indexed rules
 Aggregate rule certainties
 Justify decision by categorising risk factors
Rule visualisation by treatment
Constrained
DADM
Exploratory
DADM
(not DA)
DM
 Hide bad features
 Prevention scenario
 Flag bad features
 Detection scenario
 Neither flagged
nor hidden
Results: Actionability and decision quality
Decisions and Motivations
Biases
 DA versus DADM
 More correct decisions in DADM
 More correct motivations in DADM
 No performance impact
 Discrimination
persistent in cDADM
 Relative merits
 Constrained DADM better for prevention
 Exploratory DADM better for detection
 ‘‘I dropped the -.67
number a little bit
because it included
her being a female as
a reason.’’
Berendt & Preibusch. Better decision support through exploratory discrimination-aware data mining. in: ARTI, 2014
“Privacy is not the problem“
• Privacy, social justice, and democracy
• View 1: Privacy is a problem (partly) because its
violation may lead to discrimination.
• View 2: Privacy is one of a set of social issues.
• View 3: Heightened privacy concerns are just a
symptom of something more general being
wrong. (e.g. Discrimination – underlying
definition of fairness – who gets to decide?)
Discrimination-aware data mining
(Pedreschi, Ruggieri, & Turini, 2008,
+ many since then)
2 uses/tasks of data mining:
 Descriptive
 “In the past, women who got a loan for a new car often defaulted on it.“
 Prescriptive
 (Therefore) “Women who want a new car should not get a loan.“
Goal:
detect the first
AND/OR
block the second
(= push it below a
threshold)
What we did
• an interactive tool DCUBE-GUI
• a conceptual analysis of
▫ (anti-)discrimination as modelled in data
mining (“DADM“)
▫ unlawful discrimination as modelled in law
• framework: constraint-oriented vs.
exploratory DADM
• two user studies (n=20, 215) with DADM as
decision support that showed
▫ DADM can help make better decisions &
motivations
▫ cDADM / eDADM better for different settings
▫ Sanitized patterns are not sufficient to make
sanitized minds
“Privacy is not the problem“
• Privacy, social justice, and democracy
• View 1: Privacy is a problem (partly) because its
violation may lead to discrimination.
• View 2: Privacy is one of a set of social issues.
• View 3: Heightened privacy concerns are just a
symptom of something more general being
wrong. (e.g. Discrimination – underlying
definition of fairness – who gets to decide?)
Lessons learned
Privacy by
design?!
• A systems approach is needed
“Multi-stakeholder information systems“
Diverse
Information systems
stakeholders
Experts
Interactive systems (e.g. Exploratory
ValueSoftware
analysis)
sensitive
Algorithms
DevelopUsers
design,
No people; “solutionism“
ment,
HCI
Sociology,
AI / Data mining
IS Science,
Politics,
Law
Education
3. Towards sustainable solutions
Effectiveness of “ethical apps“?
Effectiveness of “ethical apps“?
Hudson et al. (2013):
• What makes people buy a fair-trade product?
• Informational film shown before buying decision?
▫ NO
• Having to make the decision in public?
▫ NO
• Some prior familiarity with the goals and activities
of fair-trade campaigns as well as broader
understanding of national and global political issues
that are only peripherally related to fair trade?
▫ YES
Rather: long-term educational campaigns
• “[W]hile latest technologies allow us to do plenty of easy
things on the cheap, those easy things are not necessarily the
ones that matter. Perhaps it's not even technology that is at
fault here.
• Rather, it's a choice between stand-alone apps that seek to
change our behavior on the fly and sophisticated, content-rich
apps—integrated into a broader educational strategy—that
might deepen our knowledge about a given subject in the long
term.
• And while there are plenty of news apps, having citizens
actually engage with the long-form content that those apps
provide—let alone understand the causes of the greenhouse
effect or the intricacies of world trade—is a task that might
require a different, app-free strategy.”
Morozov (2013)
Where to get a captive
audience for that?
• Schools, (universities)
• Schools: lots of materials, little
knowledge about effects
• Where there was evaluation, no
big effects
▫ (notable exception: SPION
Privacy Manual, in Dutch)
• Mostly short-term interventions
• With limited scope and often
unclear concepts
 We developed our own lesson
series spanning 10 double hours
(and carried it out)
Informatics
Economics
Society and politics
Trackers
Profile and behavioural data
Basic structure of data mining models
(correlations in “Big Data“ instead of
causality)
Use of data by Facebook for third
parties (business models and
customer loyalty)  advertising
Application of descriptive models for
predicting
 TIDAP (total intransparency of
data analysis and processing)
Customer segmentation and
„weblining“ (use of data mining by
third parties)  access to loans,
insurance, ...
Ex. 1: Association rule learning with
Apriori
Ex. 2: Regression analysis for
prediction
Usage contexts of other third parties  access to education, work, ...?
Cf. View 3: Heightened privacy concerns
are just a symptom of something more
general being wrong. (e.g. notions of
fairness, control, freedom of speech)
The fundamental right of
informational self-determination and
threats to it: Chilling effects created
by panoptism and TIDAP
Plurality of opinions as a
characteristic of democracy and
threats to it: “Weblining“ via TIDAP
Freedom of contract vs. Other fundamental rights of participation that the
state has to protect actively
Plan for schools (utopian?!):
Course & curriculum overview (ENISA Report 2014)
• Goals: Knowledge, reflection/attitudes, action orientation
• Module 1: Different notions of “security”: safety, e-safety, security,
cybersecurity, security and privacy, IT security and national security,
“good and bad hackers”, …
• Modules 2-6: „“Security“ in the sense of ...
2. protection against inappropriate content and undesired audiences
& contacts
3. protection of personal data and privacy
4. IT Security
5. protection of fundamental rights and democracy
6. protection against procrastination
• Duration proposal: 1-2 days – 1 year
• Stakeholders: ECDL Foundation, ISC, SANS, European Schoolnet /
etwinning, national and regional teacher (training) associations
Plan for university teaching (concrete):
Interlinking two courses
Knowledge and the Web
• Data interoperability and
semantics
– …
• Data heterogeneity and
combining data
– … <some topics mandatory only for 6p>
• From data to knowledge
– …
• Data in context
Privacy and Big Data
• Legal and ethical issues
of Big Data
– ...
• Data and database
security
– ...
• Privacy techniques
– …
– Data publishing /mining and privacy
– Data publishing /mining and
discrimination
Consultancy on privacy issues in their projects
Lessons learned
• How to measure that privacy (privacy awareness,
knowledge, behaviour, outcome ...) has become
BETTER?
• In doing that, how can we avoid another iteration of
undue “reification of data“ (Kitchins)?
• We need to enlist the computer scientists / take part as
CSers in addressing “Big Data“s problems – but:
• The really hard part is to ask CSers to depart from their
favourite basic assumption, which comes in different
flavours:
▫ If there is a problem, it‘s because someone has too little
information.
▫ Problems can be fixed.
▫ There is a right and a wrong.
Summary
1. What can data mining do for privacy?
2. Beyond privacy: fairness, democracy
3. Towards sustainable solutions
Many thanks!
Banksy, Marble Arch, London, 2005
References
pp. 5-6: Berendt, B., Günther, O., & Spiekermann, S. (2005). Privacy in E-Commerce: Stated preferences vs. actual behavior. Communications of the ACM,
48(4), 101-106. http://warhol.wiwi.hu-berlin.de/~berendt/Papers/p101-berendt.pdf
p. 10:
•
Altman, I. (1976). Privacy: A conceptual analysis. Environment and Behaviour, 8(1), 7-29.
•
Petronio, S. (2002). Boundaries of Privacy: Dialectics of Disclosure.Albany, NY, USA: SUNY.
•
Gürses, S.F. & Berendt, B. (2010). The Social Web and Privacy. In E. Ferrari & F. Bonchi (Eds.), Privacy-Aware Knowledge Discovery: Novel Applications
and New Techniques. Boca Raton, FL: Chapman & Hall/CRC Press, Data Mining and Knowledge Discovery Series.
http://www.cosic.esat.kuleuven.be/publications/article-1304.pdf
p. 13: Berendt, B. (2012). More than modelling and hiding: Towards a comprehensive view of Web mining and privacy. Data Mining and Knowledge
Discovery, 24 (3), 697-737. http://people.cs.kuleuven.be/~bettina.berendt/Papers/berendt_2012_DAMI.pdf
p. 15: Berendt, B. (2012). Data mining for information literacy. In D.E. Holmes and L.C. Jain. (Eds.), Data Mining: Foundations and Intelligent Paradigms.
Springer. http://people.cs.kuleuven.be/~bettina.berendt/Papers/berendt_2012_DM4IL.pdf
pp. 19ff.: Gao, Bo; Berendt, Bettina. Circles, posts and privacy in egocentric social networks: An exploratory visualization approach, ASONAM, Niagara Falls,
Canada, 25-28 August 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, pages 792-796, IEEE.
https://lirias.kuleuven.be/bitstream/123456789/424074/1/gao_berendt_2013.pdf
p. 25: Berendt, B.; Gao, B. Friends and Circles — A Design Study for Contact Management in Egocentric Online Social Networks. In Online Social Media
Analysis and Visualization, Springer, 2014.
pp. 36ff.: Berendt, B. & Preibusch, S. (2014). Better decision support through exploratory discrimination-aware data mining: foundations and empirical
evidence. Artificial Intelligence and Law, 22 (2), 175-209. http://people.cs.kuleuven.be/~bettina.berendt/Papers/berendt_preibusch_2014.pdf
p. 37: Gao, B. & Berendt, B. (2011). Visual Data Mining for Higher-level Patterns: Discrimination-Aware Data Mining and Beyond. In Benelearn 2011.
Proceedings of the Twentieth Belgian Dutch Conference on Machine Learning The Hague, May 20 2011 (pp. 45-52).
http://www.liacs.nl/~putten/benelearn2011/Benelearn2011_Proceedings.pdf
p. 50: Hudson, M., Hudson, I., & Edgerton, J.D. (2013). Political Consumerism in Context: An Experiment on Status and Information in Ethical Consumption
Decisions. American Journal of Economics, 72 (4), 1009-1037. http://dx.doi.org/10.1111/ajes.12033
p. 51: Morozov, E. (2013). Hey, Big Fair-Trade Spender. Apps promote ethical purchases, but do they inspire deeper learning? Slate.
http://www.slate.com/articles/technology/future_tense/2013/09/goodguide_fairphone_ethical_shopping_apps_miss_the_point.html
pp. 52f: Berendt, B., Dettmar, G., Demir, C., & Peetz, T. (2014). Kostenlos ist nicht kostenfrei. LOG IN 178/179, 41-56. Links to teaching materials and English
summary at http://people.cs.kuleuven.be/~bettina.berendt/Privacy-education/
p. 53: Berendt, B., De Paoli, S., Laing, C., Fischer-Hübner, S., Catalui, D., & Tirtea, R. (in press). Roadmap for NIS education programmes in Europe. ENISA.
p. 54: http://people.cs.kuleuven.be/~bettina.berendt/teaching/2014-15-1stsemester/kaw/
Backup
Transparency
A legal view of „knowledge is power“
62
62
Berendt: Advanced databases, 2012, http://www.cs.kuleuven.be/~berendt/teaching
62
A legal view of „knowledge is power“
63
63
Data protection :
transparency
and
accountability
obligations of
data controllers
Berendt: Advanced databases, 2012, http://www.cs.kuleuven.be/~berendt/teaching
63
A legal view of „knowledge is power“
64
64
Privacy :
Opacity
of the
individual
as a
Data subject
Data protection :
transparency
and
accountability
obligations of
data controllers
Berendt: Advanced databases, 2012, http://www.cs.kuleuven.be/~berendt/teaching
64
Article 8 of the European Convention on Human Rights
- a protected sphere in which one is „let alone“ (mostly)
65
65
Article 8 – Right to respect for private and family life
1. Everyone has the right to respect for his private and family life,
his home and his correspondence.
2. There shall be no interference by a public authority with the
exercise of this right except such as is in accordance with the
law and is necessary in a democratic society in the interests of
national security, public safety or the economic well-being of the
country, for the prevention of disorder or crime, for the
protection of health or morals, or for the protection of the rights
and freedoms of others.
Berendt: Advanced databases, 2012, http://www.cs.kuleuven.be/~berendt/teaching
65
Privacy as control – Data protection(1)
66
66
OECD Guidelines on the Protection of Privacy and Transborder
Flows of Personal Data (aka Fair Information Practices)
- Similarly encoded in the EU Directives relating to privacy
Collection limitation : Data collectors should only collect information
that is necessary, and should do so by lawful and fair means, i.e., with
the knowledge or consent of the data subject.
Data quality : The collected data should be kept up-to-date and
stored only as long as it is relevant.
Purpose specification : The purpose for which data is collected
should be specified (and announced) ahead of the data collection.
Use limitation : Personal data should only be used for the stated
purpose, except with the data subject’s consent or as required by law.
Berendt: Advanced databases, 2012, http://www.cs.kuleuven.be/~berendt/teaching
66
Privacy as control – Data protection(2)
67
67
Security safeguards : Reasonable security safeguards should
protect collected data from unauthorised access, use, modification, or
disclosure.
Openness : It should be possible for data subjects to learn about the
data controller’s identity, and how to get in touch with him.
Individual participation : A data subject should be able to obtain
from a data controller confirmation of whether or not the controller has
data relating to him, to obtain such data, to challenge data relating to
him and, if the challenge is successful, to have the data erased,
rectified, completed or amended.
Accountability : Data controllers should be accountable for
complying with these principles.
Berendt: Advanced databases, 2012, http://www.cs.kuleuven.be/~berendt/teaching
67
Contract
freedom?
Informatica
Economie
Maatschappij en politiek
Tekst (geschreven voor het
SPION Privacy Manual) +
software tools ter bescherming
tegen gegevensverzameling
Tekst (Website voor een breed
publiek)
Tekst (kwaliteitskrant)
Tekst (voor en seminaar;
Facebook‘s Data Use Policy)
Tekst (kwaliteitskrant)
(Tekst zie links)
Rollenspel
Web API (Facebook) +
data mining algoritme
Data mining online tool
(Preference Tool: “Predicting
personality from Faceb. Likes“)
Documentatie rond het tool,
wetenschappelijk artikel - psychologie
Teksten(rechtbank ordeel;
wetenschappelijk artikel rechten)
Tekst (wetenschappelijk artikel rechten)
Rollenspel

similar documents