White River Computing / UNC Chapel Hill

Report
SCIENCE-DRIVEN INFORMATICS
FOR PCORI PPRN
Kristen Anton
UNC Chapel Hill/ White River Computing
Dan Crichton
White River Computing
February 3, 2014
From Information Design, Nathan Shedroff
White River Computing / UNC Chapel Hill
Architecture – what is it?
Architecture:
• The fundamental organization of a system embodied in its
components, their relationships to each other and to the environment,
and the principles guiding its design and evolution (ANSI/IEEE Std.
1471-2000)
Architecture is decomposed into four core pieces :
• Process Architecture – describes the core processes for the system
• Data Architecture – describes the information models and data
standards for the system
• Application Architecture – Portals, tools, etc.
• Technology Architecture – Infrastructure elements
White River Computing / UNC Chapel Hill
Architecture Development Approach
• Identify the drivers and requirements
• Create an architectural description of the system –
identified stakeholders, concerns and associated models
• Identify core architectural principals
• Separate the architecture into key viewpoints
• Create a decomposition of the system identifying the
elements and mapping to the requirements
• Identified the high-level flows and analyze from the
rpocess, information and application/technology
perspectives
• Generate the architectural models
White River Computing / UNC Chapel Hill
Communicating an Architecture
• One of the major
challenges is
communicating an
architecture
•
•
Determine a useful view
of the system for the
stakeholder
Projects have suffered
because a useful view
wasn’t provided
• Who are the PCORI
stakeholders that care
about the architecture?
• How do we communicate
their care-abouts?
The view
is what
you see
The viewpoint is
where you look
from
(Stakeholders)
White River Computing / UNC Chapel Hill
Software Development
• The organization, implementation and
deployment of the software should follow the
identification of an architecture which aligns
with the principles and needs of the
stakeholders
• The separation of the architecture into concerns
will let us determine what capabilities exist and
what capabilities need to be developed
• Ultimately this will help to ensure that a system
is deployed which will integrate
White River Computing / UNC Chapel Hill
Recommended Software Development
Approach
Jan 2014 –
Mar 2014
Project
Formulation
Feb 2014 – June 2014
Project Organization, Objectives, High Level
Schedule and Project Plan
System
Formulation/
Architecture
High-Level Architecture for System and Data,
Architecture, Data Flows, Initial Data Structure,
etc
June 2014 – June 2015
Site Development
Development and deployment of the
infrastructure and architecture;
development of the core data model/
consistent with PCORnet “universal”
data model?
White River Computing / UNC Chapel Hill
Supporting science-driven research needs:
Case Study – Early Detection Research Network
(EDRN)
• Research network of collaborating scientists from
more than 40 institutions – international network of
networks
• Focus on identifying and validating biomarkers of
cancer at early stage/ preclinical
Bioinformatics challenges in EDRN:
Developing computing infrastructure that is “biomarker-centric.”
Improve research capability by enabling real-time access to a
variety of information that crosses institutional boundaries.
White River Computing / UNC Chapel Hill
Bioinformatics – Goals
Supporting science-driven research needs
• Coordinated discovery and validation of
biomarkers across cancer research centers to
increase accuracy of the results of studies
• Accommodating various data types
• Facilitation of analytics through data integration
and single-point access
• Support workflows associated with various types
of information
• Encouraging and supporting collaboration
White River Computing / UNC Chapel Hill
Bioinformatics – Goals
Supporting science-driven research needs
• Linking highly diverse systems together to integrate and
present data for analytics
• Defining a comprehensive information model for describing
the problem space/ ontology
• Providing software interfaces for capture, discovery, and
access of data resources
• Providing a secure transfer and distribution infrastructure
• Enabling all data sources to be heterogeneous and
distributed
• Providing integrated portal for access to distributed data
• Providing bioinformatics tools/ pipelines for uniform data
processing
White River Computing / UNC Chapel Hill
Bioinformatics
EDRN Knowledge Environment
Functional architecture: Services
•
•
•
•
•
•
Data capture
Data discovery
Data access
Data retrieval
Data processing
Data distribution
White River Computing / UNC Chapel Hill
Bioinformatics
EDRN Knowledge Environment
Information architecture: Data Model across EDRN projects
(“universal” data model)
• Representation of information associated with data
objects managed within the knowledge system
• Models for:
•
•
•
•
•
Biomarkers
Studies
Participants
Organs
Data generated from instruments (e.g. mass
spec, arrays)
White River Computing / UNC Chapel Hill
Bioinformatics
EDRN Knowledge Environment
Information architecture: Data Model
• Relationships between and among objects
• Standard set of metadata elements that can be used for
annotating objects
• Multiple metadata schemata for machine usable
explanations of the metadata descriptions
• Metadata descriptions describe the inception and
composition of data
• Common language for describing data and associated
attributes: Common Data Elements (CDEs)
• CDE has a Uniform Resource Identifier (URI) – URL form
points to CDE definition page – used in XML standards
White River Computing / UNC Chapel Hill
EDRN Knowledge Environment
Public Portal
BIOINFORMATICS
TOOLS
EDRN science data results
eCAS
(protocol_id,
participant_id)
Science Warehouse
(protocol_id,
participant_id)
ERNE
Participant DB
Participants and
their
characteristics
EDRN science data results
(local, distributed and varying
degrees of validation)
Protocol DB
Protocols and their
descriptions
(protocol_id,
participant_id)
(protocol_id)
Biomarker_DB
Descriptions of biomarkers
and their use (protocol_id)
Distributed Specimen
Databases
CDE Repository
Data elements and
their descriptions
VSIMS
Descriptions of EDRN studies
-Participants
-Specimen tracking, etc
EDRN Knowledge Environment
Success?
• Biomarker Database holds 850 curated biomarkers, including panels/
signatures of biomarkers
• Biomarker Database modeled to reflect the data model: activity in
multiple organs, protocols, data files – facilitate single-point data access
• eSIS contains 165 protocols
• eCAS holds 56 data sets, with many files in each set, and more added
daily – standard metadata around each set and each product
• Two bioinformatics tools implemented: Proteomics “pipeline” (generating
standardized biomarker identification files); REDCap (standardized data
definition and capture at the project level) – additional in progress
• Common Data Elements (CDEs) contributed to the NCI repository
• CDE has a Uniform Resource Identifier (URI) – URL form points to CDE
definition page – used in XML standards
• Portal facilitates authorized access to almost 200,000 specimens
• Publications and Resources
White River Computing / UNC Chapel Hill
EDRN Knowledge Environment
Technology
• Iterative development
• Open Source philosophy and tools
• Apache OODT (Object Oriented Data
Technology)
Software components developed independent
of any data model:
EDRN’s computing infrastructure can be
replicated
White River Computing / UNC Chapel Hill
EDRN Knowledge Environment
Technology
White River Computing / UNC Chapel Hill
Bioinformatics – Goals
Supporting science-driven research needs: SHARE
White River Computing / UNC Chapel Hill
Bioinformatics – Goals
Supporting science-driven research needs: SHARE
Geisel School of Medicine at Dartmouth / UNC Chapel Hill
Supporting science-driven research needs: PCORI PPRN
Geisel School of Medicine at Dartmouth / UNC Chapel Hill
Opportunity to offer our architecture to PCORnet?
Synergy in data model
Query across CCFA PPRN network …network of networks?
Geisel School of Medicine at Dartmouth / UNC Chapel Hill

similar documents