Tumor Profile Discovery and Tumor Bank Management with DORA www.polyomx.org Adrian Driga1, Russ Greiner2, Kathryn Graham1, 4, Sambasivarao Damaraju1, 4, David Wishart2, 3, John Mackey1, 4 , Carol Cass1, 4 1Cross DORA (Database for Online Retrieval and Analysis) is a webaccessible medical and laboratory information management system (LIMS), through which clinical, microarray, SNP, and metabonomic information from PolyomX-consented patients is stored, retrieved, managed, and analyzed. DORA is designed for data warehousing and has a flexible relational database architecture that can be readily scaled up to accommodate clinical data from new cancer types, or experimental data from new laboratory assays. PolyomX currently collects clinical and molecular data for four cancer types: breast, lung, ovarian, and gastric. DORA facilitates the generation of a cancer knowledge base and will help individualize cancer treatment by allowing researchers to identify patient-specific characteristics of a cancer disease at the molecular level. DORA supports translational research by providing quick electronic access to patient information from the clinical and molecular domains. For example, class prediction analysis is performed using the signal intensity of the microarray spots and the values of a clinical factor that partitions a group of patients into two classes. Database queries retrieve this information and present it to the statistical analysis software in the appropriate format. Confidentiality of the information collected in DORA is strictly maintained. Access to DORA is password-protected and confidential information is encoded before being stored in the database. Access to the confidential information and modification of data is highly restricted. For security reasons, DORA is currently available only in the Cross Cancer Institute computer network. Tumor banking information is managed through the Tumor Banking Database, a self-contained module of DORA. Please see the poster on the Tumor Banking Database for details. PolyomX is supported by the Alberta Cancer Foundation and the Alberta Cancer Board. Cancer Institute, Alberta Cancer Board, 2Department of Computing Science, 3Faculty of Pharmacy and Pharmaceutical Sciences, 4Department of Oncology, University of Alberta, Edmonton DORA Modules, Schema, and Forms Patient Data Sharing Using DORA M = Many 1 M Cancer Disease Treatment Protocol 1 1 M M Pathology Remote User Stage/Progression M M M Treatment tissue Remote User Lab Work blood, urine 1 1 M Wide Area Network 1 M Microarray M SNP Metabonomics Data Figure 1: Overview of Database Schema MA Slide Group M 1 tissue ID 1 Remote User MA Slide Type Sequence Info manufacturer, version oligo/cDNA gene of origin IDs 1 1 Local User Figure 5: Centralized Database Scenario 1 M Users connect securely to a central instance of DORA and access molecular or clinical data according to their permissions. When a user creates a new patient record in the clinical, microarray, SNP, or metabonomics module, that user is marked as the owner of the record and is notified by e-mail. All users can have access to all data as soon as the data is added to the database. However, when the central server is not accessible (e.g., server down for software upgrade), no data is available. The center hosting the DORA server is responsible for database administration and software development. M = Many MA Slide Repeat M experiment details & parameters M MA Slide Spot Sequence slide map (e.g., GAL) 1 DORA Server 1 M MA Slide Spot position, intensity value DORA Server S2 1 generates M MA Normalized Slide Spot position, normalized intensity value M M MA Gene Aggregate Value tissue ID, sequence ID, aggregate of normalized intensity values for sequence across all spots from the repeats in group Local User S2 Data S2 Main Features of DORA • Integration of molecular and clinical information. Finding Figure 2: Overview of Microarray (MA) Module Schema Remote User clinically relevant tumor profiles requires the analysis of genetic and clinical information for large cohorts of patients. For every patient, microarray, SNP, and metabonomic technologies can generate massive amounts of data. DORA speeds up the analysis process by seamlessly linking molecular data with relevant clinical data for every patient. Integrated views of the patient data are analyzed statistically and with machine learning techniques in order to discover molecular profiles for clinical factors. It has been shown that gene expression profiles can be reliable predictors of treatment response, relapse, and disease free survival, and that certain combinations of SNPs can indicate predisposition to cancer. Wide Area Network Data S1 Remote User DORA Server S3 Local User S1 Data S3 • Data sharing and portability. DORA, database and software, can be shared with or be easily replicated at other research centers. Researchers can access a centralized DORA database remotely or manage their own copy of DORA. For the latter scenario, data sharing can be done via import/export software. Data sharing is particularly important for studies on rare tumor groups (e.g., brain, pancreas) because it allows researchers to accumulate a large enough number of patients from across the province or the country. Researchers can exchange patient data, molecular and clinical, but will still retain ownership of the data that they have generated. • Scalability. DORA is designed so that new modules can be quickly integrated with the existing ones. Currently, modules for microarray, SNP, and metabonomic data and clinical sections for breast, lung, gastric, and ovarian cancer are fully functional. A clinical section for brain/CNS cancer is still in the design phase and it will be implemented soon. DORA Server S1 Figure 6: Distributed (Federated) Database Scenario Figure 4: Lung Cancer Stage/Progression Form • DORA is implemented as a MySQL database, and is made available to users via an Apache web server. The software that connects the web forms with the database is written in Perl and runs on the DORA server. • The server on which DORA resides, runs Red Hat Linux and is protected by a firewall. • All the software that is needed to run a DORA server, i.e., R.H. Linux, MySQL, and Perl, is freely available for noncommercial purposes. Figure 3: Ovarian Cancer Pathology Form • PolyomX has designed and implemented the software specific to DORA and can make this software available to ACB researchers. Several DORA servers are available at different sites and the databases have identical schemas. Users can connect to any of the servers, but will add new records to their local database. Integrated views of the data from several servers can be obtained via import/export tools for data exchange, or by running a query (same) against each database. Cost of administration and development is shared among the server hosts. With additional software, the schemas of the DORA databases do not need to be identical. Acknowledgements The authors want to thank doctors Brent Zanke, Tony Reiman, Tim Winton, Bryan Dicken, Michael Sawyer, Helen Steed, Katia Tonkin, and David Omahen for their help in designing the clinical modules of DORA, and Jennifer Listgarten for her help in designing the microarray module.