Experience-Based Access Management (EBAM)

Report
Modeling and Detecting Anomalous
Topic Access
Siddharth Gupta1, Casey Hanson2, Carl A Gunter3,
Mario Frank4, David Liebovitz4, Bradley Malin6
1,2,3,4Department
of Computer Science, 3,5Department of Medicine, 6Department of
Biomedical Informatics
1,2,3University of Illinois at Urbana-Champaign, 4University of California, Berkeley,
5Northwestern University, 6Vanderbilt University
Outline of the talk
•
•
•
•
•
•
Motivation and Challenges
Our Contributions
Dataset Description
Random Topic Access (RTA) Model
Random Topic Access Detection (RTAD) Model
Evaluation and Results
EMR Access Breach
Reported on April 2013
• The University of Florida : 2 offenders illegitimately accessed 15,000 patients
over 3 years (March 2009- October 2012).
• Personal information, including names, addresses, date of birth, medical
record numbers and Social Security numbers were compromised for the
purposes of billing fraud.
• One of the offender was the insider in the hospital without prior.
• How can we efficiently model and detect these types of attacks in the
healthcare system.
Motivation
• Two broad classes of threats:
• Inside Threats: the behaviors of hospital users (staff) that adversely affects the
healthcare institution, where they commit financial frauds, medical identity
thefts and curiosity accesses to EMR.
• Outside Threats: an outsider entity hires an insider to commit fraud, a visitor
accessing records on open computers in some scenarios, untrustable patient
seeking information about other patient’s records.
• Ramifications: Irreversible violation of patient privacy and subsequent high
cost for hospitals.
• Deterrent: The current legal deterrent is a number of legal regulations, such as
the HIPAA and HITECH, which impose specific privacy rules for patients and
financial penalties for violating them
Classical Detection Methodologies
• Build a classifier on labeled data to differentiate
anomalous users from legitimate users.
• Real healthcare data is not labeled.
• Current methods use injection of synthetic
anomalous users and evaluate on them.
Random Object Access
• In Healthcare information systems the primary
mechanism for generating anomalous users is to
associate users with random patients in the dataset.
• We call such a system, ROA (random object access).
• The resulting user doesn’t appear to be a plausible
attacker in the real hospital setting.
Our Contributions
• Random Topic Access (RTA): we introduce and study a random
topic access model or RTA aimed at users whose access may be
illegitimate but is not fully random because it is focused on
common semantic themes.
• User Simulation: we utilize the latent topic framework to simulate
illegitimate users and model them as samples from a Dirichlet
distribution over topic multinomials.
• Anomaly Detection Framework: study RTA to detect and evaluate
the users having suspicious access patterns.
Data Set
Fig a)
Summary
Statistics for
Audit Logs
Fig b)
Summary
Statistics for
Patient Records
Random Topic Access (RTA) Model
• Random Topic Access (RTA) Model: a mechanism for utilizing
latent topic structures to represent real users in the population
and allow for the synthetic generation of semantically relevant
anomalous users.
• Topic modeling can provide a concise description of how a user
behaves in the context of his peers and the meaning of that
behavior.
• Model users as samples from a Dirichlet distribution over topic
multinomials.
Latent Dirichlet Allocation (LDA)
Diagnosis Raw Feature
Patient
1
2
3
1
0
1
0
...
4500
1
LDA
Diagnosis Topic Feature
Patient Topic 1 Topic 2 Topic 3
1
0.2
0.1
0.70
Topic Distributions
Topics Distributions
Neoplasm Topic
Obstetric Topic
Diagnosis Topics
Kidney Topic
Characterizing Users
P(Topic)
User and Accessed Patient Topic Distributions
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Number of Accesses
100
90
80
70
60
50
40
30
Topic 1
Topic 2
Topic ID
Topic 3
20
10
0
Patient 1: 100 times
Patient 2: 30 times
User
Patient 1
Patient 2
Multidimensional Scaling:
Patient Diagnosis
RTA: Simulating Users
•
r ~ Dir() with n dimensions, where n is the number of topics.
a.) Directed or Masquerading User (α<1) : an anomalous user of some specialty gains sole access
to the terminal of another user in the hospital.
b.) Purely Random User (α=1): user is characterized by completely random behavior, with little
semantic congruence to the hospital setting
c.) Indirect User: user type resembles an even blend of the topics of many specialized users
Population Distribution
A. Directed Users
α = 0.01
B. Purely Random Users
α=1
α = 0.1
C. Indirected Users
α = 100
Role Distribution
Masquerading Users
Purely Random Users
Anomalous Users
Real Users
Indirect Users
NMH Resident Fellow CPOE
Random Topic Access Detection
(RTAD)
• Random Topic Access Detection (RTAD): an anomaly detection framework that
generates synthetic users using RTA and applies a standard spatial outlier, knearest neighbor k-NN detection scheme for classification.
• Methodology
1. LDA: define patient topics, and user typing to represent users in the topic
space.
2. RTA user injection: generate three types of anomalous users and insert into
each role at a 5% mix rate.
3. Detection (k-NN): if the ratio of the avg. distance from a user to its k nearest
spatial neighbors to the avg. pairwise distance among those neighbors is
greater than a threshold, call the user anomalous.
4. Evaluation Metric: best Area Under the Curve (AUC) for each  , role
combination.
Results - I
The best AUC across all evaluated dimensions is plotted for each role performing
poor for  > 1 .
Results - II
The best AUC across all evaluated dimensions is plotted for each role performing
well or near average for  > 1.
Thank You !
Sponsors:
Contact: [email protected]

similar documents