SHARP High-Throughput Phenotyping Jyoti Pathak, Ph.D.

Report
Strategic Health IT Advanced Research
Projects (SHARP)
Area 4: Secondary Use of EHR Data
Project 3: High-Throughput Phenotyping
Project Lead: Jyotishman Pathak, PhD
PI: Christopher G. Chute, MD, DrPH
June 12, 2012
Electronic health records (EHRs) driven
phenotyping
• Overarching goal
• To develop high-throughput automated
techniques and algorithms that operate on
normalized EHR data to identify cohorts of
potentially eligible subjects on the basis of
disease, symptoms, or related findings
SHARPn High-Throughput Phenotyping
©2012 MFMER | slide-2
Current HTP project themes
• Standardization of phenotype definitions
• Library of phenotyping algorithms
• Phenotyping workbench
• Machine learning techniques for
phenotyping
• Just-in-time phenotyping
SHARPn High-Throughput Phenotyping
©2012 MFMER | slide-3
Algorithm Development Process - Modified
•
Standardized and structured
representation of phenotype
definition criteria
Use the NQF Quality Data
Model (QDM)
•
Rules
•
Conversion of structured
phenotype criteria into
executable queries
Evaluation
• Use JBoss® Drools (DRLs)
Semi-Automatic Execution
Phenotype
Algorithm
• Standardized representation of
Visualization
•
Transform
Mappings
Transform
clinical data
Create new and re-use existing
clinical element models (CEMs)
Data
NLP, SQL
SHARPn High-Throughput Phenotyping
[Welch et al. 2012]
[Thompson et al., submitted 2012]
[Li et al., submitted 2012]
©2012 MFMER | slide-4
NQF Quality Data Model (QDM)
• Standard of the National Quality Forum (NQF)
• A structure and grammar to represent quality measures in
a standardized format
• Groups of codes in a code set (ICD-9, etc.)
• "Diagnosis, Active: steroid induced diabetes" using
"steroid induced diabetes Value Set GROUPING
(2.16.840.1.113883.3.464.0001.113)”
• Supports temporality & sequences
• AND: "Procedure, Performed: eye exam" > 1 year(s)
starts before or during "Measurement end date"
• Implemented as set of XML schemas
• Links to standardized terminologies (ICD-9, ICD-10,
SNOMED-CT, CPT-4, LOINC, RxNorm etc.)
SHARPn High-Throughput Phenotyping
©2012 MFMER | slide-5
116 Meaningful Use Phase I Quality Measures
SHARPn High-Throughput Phenotyping
©2012 MFMER | slide-6
Example: Diabetes & Lipid Mgmt. - I
Human readable HTML
SHARPn High-Throughput Phenotyping
©2012 MFMER | slide-7
Example: Diabetes & Lipid Mgmt. - II
Computable XML
SHARPn High-Throughput Phenotyping
©2012 MFMER | slide-8
Algorithm Development Process - Modified
•
Standardized and structured
representation of phenotype
definition criteria
Use the NQF Quality Data
Model (QDM)
•
Rules
•
Conversion of structured
phenotype criteria into
executable queries
Evaluation
• Use JBoss® Drools (DRLs)
Semi-Automatic Execution
Phenotype
Algorithm
• Standardized representation of
Visualization
•
Transform
Mappings
Transform
clinical data
Create new and re-use existing
clinical element models (CEMs)
Data
NLP, SQL
SHARPn High-Throughput Phenotyping
[Welch et al. 2012]
[Thompson et al., submitted 2012]
[Li et al., submitted 2012]
©2012 MFMER | slide-9
Drools-based Phenotyping
Architecture
Clinical
Element
Database
Data Access
Layer
Business Logic
Transformation
Layer
Transform physical representation
 Normalized logical representation
(Fact Model)
Inference
Engine (Drools)
Service for
Creating Output
(File, Database,
etc)
SHARPn High-Throughput Phenotyping
List of
Diabetic
Patients
©2012 MFMER | slide-10
Automatic translation from NQF QDM
criteria to Drools
[Li et al., submitted 2012]
SHARPn High-Throughput Phenotyping
©2012 MFMER | slide-11
The “executable” Drools flow
©2012 MFMER | slide-12
Phenotype library and workbench - I
http://phenotypeportal.org
1. Converts QDM to Drools
2. Rule execution by querying
the CEM database
3. Generate summary reports
©2012 MFMER | slide-13
Phenotype library and workbench - II
http://phenotypeportal.org
©2012 MFMER | slide-14
Phenotype library and workbench - III
SHARPn High-Throughput Phenotyping
©2012 MFMER | slide-15
Machine learning and HTP - I
• Machine learning and
association rule mining
• Manual creation of
algorithms take time
• Let computers do the
“hard work”
• Validate against
expert developed
ones
[Caroll et al. 2011]
SHARPn High-Throughput Phenotyping
©2012 MFMER | slide-16
Machine learning and HTP - II
•
•
•
•
•
Patien TB
t
Origins from sales data
Items (columns): co-morbid conditions
Transactions (rows): patients
Itemsets: sets of co-morbid conditions
Goal: find all itemsets (sets of
conditions) that frequently co-occur in
patients.
• One of those conditions should be DM.
DL
M
ND … IEC
001
Y
Y
Y
Y
002
Y
Y
Y
Y
003
Y
Y
004
Y
005
A
Y
Y
B
Y
C
D
• Support: # of transactions the itemset I
appeared in
• Support({TB, DLM, ND})=3
• Frequent: an itemset I is frequent, if
support(I)>minsup
AB
AC
ABD
AD
BC
BD
CD
ACD
X: infrequent
[Simon et al. 2012]
SHARPn High-Throughput Phenotyping
Just-in-Time phenotyping - I
Transfusion-related Acute Lung Injury (TRALI)
Transfusion-associated Circulatory Overload (TACO)
Electronic Health Records and Phenomics
Just-in-Time phenotyping - II
TRALI/TACO
“sniffer”
SHARPn High-Throughput Phenotyping
©2012 MFMER | slide-19
Electronic Health Records and Phenomics
Active Surveillance for TRALI and TACO
Of the 88 TRALI cases correctly
identified by the CART algorithm, only
11 (12.5%) of these were reported to
the blood bank by the clinical service.
Of the 45 TACO cases correctly
identified by the CART algorithm, only 5
(11.1%) were reported to the blood bank
by the clinical service.
SHARPn High-Throughput Phenotyping
Publications till date (conservative)
14
12
12
10
8
6
8
6
Papers
Abstracts
Under review
6
4
2
2
0
Year 1 (2011)
Year 2 (2012)
Year 3 (2013)
SHARPn High-Throughput Phenotyping
©2012 MFMER | slide-22
2011 Milestones
 Standardized definitions for phenotype criteria
 Rules-based environment for phenotype
algorithm execution
 National library for standardized phenotype
definitions (collaboration with eMERGE)
 Machine learning techniques for algorithm
definitions
 Online, real-time phenotype execution
 Phenotyping algorithm authoring environment
SHARPn High-Throughput Phenotyping
©2012 MFMER | slide-23
2012 Milestones
• Machine learning techniques for algorithm
definitions
• Online, real-time phenotype execution
• Collaboration with NQF, Query Health and i2b2
infrastructures
• Use cases and demonstrations
• MU quality metrics (w/ NQF, Query Health)
• Cohort identification (w/ eMERGE, PGRN)
• Value analysis (w/ Mayo CSHCD, REP)
• Clinical trial alerting (w/ Mayo Cancer Ctr./CTSA)
SHARPn High-Throughput Phenotyping
©2012 MFMER | slide-24
Project 3: Collaborators & Acknowledgments
• CDISC (Clinical Data Interchange Standards Consortium)
• Rebecca Kush, Landen Bain
• Centerphase Solutions
• Gary Lubin, Jeff Tarlowe
• Group Health Seattle
• David Carrell
• Harvard University/MIT
• Guergana Savova, Peter Szolovits
• Intermountain Healthcare/University of Utah
• Susan Welch, Herman Post, Darin Wilcox, Peter Haug
• Mayo Clinic
• Cory Endle, Rick Kiefer, Sahana Murthy, Gopu
Shrestha, Dingcheng Li, Gyorgy Simon, Matt Durski,
Craig Stancl, Kevin Peterson, Cui Tao, Lacey Hart, Erin
Martin, Kent Bailey, Scott Tabor, Chris Chute
SHARPn High-Throughput Phenotyping
©2012 MFMER | slide-25

similar documents