DCC PowerPoint Presentation

Report
Research Data Management
University of East London, 1st May 2013
Sarah Jones
Digital Curation Centre
[email protected]
Twitter: sjDCC
Funded by:
Why are you here?
• You’re managing data (your own or your group's)
• Or you think you maybe should be
• You’re not sure why it matters
• You’re not sure how best to do it
• You’d like to know whether you’re on the right track
Photo: by Orijinal http://www.flickr.com/photos/orijinal/3539418133
Why manage your data?
What if this was your desk?
•http://www.computerweekly.com
What if this was your laptop?
Why YOU need a Data
Management Plan
http://blogs.ch.cam.ac.uk/
pmr/2011/08/01/whyyou-need-a-datamanagement-plan
Good data management is about
making informed decisions
•http://xkcd.com/949
Why manage research data?
• To make your research easier!
• To stop yourself drowning in irrelevant stuff
• In case you need the data later
• To avoid accusations of fraud or bad science
• To share your data for others to use and learn from
• To get credit for producing it
• Because somebody else said to do so
Expectations of public access
“Publicly funded research data are a public good,
produced in the public interest, which should be
made openly available with as few restrictions as
possible in a timely and responsible manner that
does not harm intellectual property.”
RCUK Common Principles on Data Policy
http://www.rcuk.ac.uk/research/Pages/DataPolicy.aspx
…open data
http://www.bis.gov.uk/innovatingforgrowth
•10
...personal data
Benefits of sharing data (1)
“It was unbelievable. Its not science
the way most of us have practiced in
our careers. But we all realised that
we would never get biomarkers
unless all of us parked our egos and
intellectual property noses outside
the door and agreed that all of our
data would be public immediately.”
Dr John Trojanowski, University of Pennsylvania
www.nytimes.com/2010/08/13/health/research/
13alzheimer.html?pagewanted=all&_r=0
•... scientific breakthroughs
Benefits of sharing data (2)
“It was a mistake in a spreadsheet that could have
been easily overlooked: a few rows left out of an
equation to average the values in a column.
The spreadsheet was used to draw the conclusion
of an influential 2010 economics paper: that public
debt of more than 90% of GDP slows down growth.
This conclusion was later cited by the International
Monetary Fund and the UK Treasury to justify
programmes of austerity that have arguably led to
riots, poverty and lost jobs.”
... validation of results
www.guardian.co.uk/politics/2013/apr/18/
uncovered-error-george-osborne-austerity
Benefits of sharing data (3)
“There is evidence that studies that make their
data available do indeed receive more citations
than similar studies that do not.”
Piwowar H. and Vision T.J 2013 "Data reuse and the open data
citation advantage“ https://peerj.com/preprints/1.pdf
•... more citations
9% - 30% increase
Things to think about...
Photo by @boetter
http://www.flickr.com/photos/
jakecaptive/3205277810
What is data management?
“the active management and appraisal of data over
the lifecycle of scholarly and scientific interest”
Digital Curation Centre
Data management is
just part of good
research practice
What is involved in RDM?
• Data Management Planning
• Creating data
• Documenting data
Create
• Accessing / using data
Share
Document
Preserve
Use
• Storage and backup
• Preserving data
• Sharing data
Store
If you plan to share your data....
• Have you got consent for sharing?
• Do any licences you’ve signed permit sharing?
• Is your data in suitable formats?
Decisions made early on affect what you can do later
File formats for long-term access
•
•
•
•
•
Unencrypted
Uncompressed
Non-proprietary/patent-encumbered
Open, documented standard
Standard representation (ASCII, Unicode)
Type
Recommended
Avoid for data sharing
Tabular data
CSV, TSV, SPSS portable
Excel
Text
Plain text, HTML, RTF
PDF/A only if layout matters
Word
Media
Container: MP4, Ogg
Codec: Theora, Dirac, FLAC
Quicktime
H264
Images
TIFF, JPEG2000, PNG
GIF, JPG
Structured data
XML, RDF
RDBMS
•Further examples: http://www.data-archive.ac.uk/create-manage/format/formats-table
Documentation
What would someone unfamiliar with your
data need in order to find, evaluate,
understand, and reuse them?
Consider the differences between someone inside
your research group, someone outside your group
but in your field, and someone outside your field.
Two parts: metadata and methods
Metadata
• About the project
– Title, people, key dates, funders and grants
• About the data
– Title, key dates, creator(s), subjects, rights,
included files, format(s), versions, checksums
• Keep this with the data
Methods
• Reason #1 for not reusing someone else’s data: “I don’t know
enough about how it was gathered to trust it.”
• Document what you did. (A published article may not be enough.)
• Document any limitations of what you did.
• If you ran code on the data, document the code and keep it with
the data.
• Need a codebook? Or a data dictionary?
– If I can’t identify at sight what each bit of your dataset means, yes, you do
need a codebook or data dictionary.
– DO NOT FORGET THE UNITS!
Standards
• Why reinvent the wheel? If there’s a standard format
for your data or how to describe it, use that!
• The tricky part is finding the right standard.
–
–
–
–
Standards are like toothbrushes...
But using standards is good hygiene!
Your librarian can often help you find relevant standards.
Also check out the DCC catalogue of disciplinary metadata
http://www.dcc.ac.uk/resources/metadata-standards
Where to store your data?
• Your own drive (PC, server, flash drive, etc.)
– And if you lose it? Or it breaks?
• Somebody else’s drive
• Departmental drive
• “Cloud” drive
– Do they care as much about your data as you do?
How to backup?
• 3… 2… 1… backup!
– at least 3 copies of a file
– on at least 2 different media
– with at least 1 offsite
• Use managed services where possible e.g. University
filestores rather than local or external hard drives
• Ask central IT team for advice
What to keep?
It’s not possible to keep everything. Select based on:
– What has to be kept e.g. data underlying publications
– What can’t be recreated e.g. environmental recordings
– What is potentially useful to others
– What has scientific, cultural or historical value
– What legally must be destroyed
– ...
How to select and appraise research data:
www.dcc.ac.uk/resources/how-guides/appraise-select-research-data
How to share/preserve data?
• What is required?
– By your funder
– By your publisher
– By your uni
• What subject repositories, data centres and
structured databases are available?
http://databib.org
Putting the pieces together...
Photo by Dread Pirate Jeff
http://www.flickr.com/photos/
justageek/2851643792
Data Management Plans
DMPs are often submitted with grant applications, but
are useful whenever you are creating data to:
• Make informed decisions to anticipate and avoid problems
• Avoid duplication, data loss and security breaches
• Develop procedures early on for consistency
• Ensure data are accurate, complete, reliable and secure
• Save time and effort – make your life easier!
Which funders require a DMP?
•www.dcc.ac.uk/resources/policy-and-legal/ overview-funders-data-policies
What do research funders want?
• A brief plan submitted in grant applications, and in the
case of NERC, a more detailed plan once funded
• 1-3 sides of A4 as attachment or a section in Je-S form
• Typically a prose statement covering suggested themes
• An outline of data management and sharing plans,
justifying decisions and any limitations
Five common themes
1. Description of data to be collected / created
(i.e. content, type, format, volume...)
2. Standards / methodologies for data collection & management
3. Ethics and Intellectual Property
(highlight any restrictions on data sharing e.g. embargoes, confidentiality)
4. Plans for data sharing and access
(i.e. how, when, to whom)
5. Strategy for long-term preservation
A useful framework to get started
•Think about why
the questions are
being asked
•Look at examples
to get an idea of
what to include
•www.icpsr.umich.edu/icpsrweb/content/datamanagement/dmp/framework.html
Help from the DCC
a web-based tool to help you write DMPs
according to different requirements
•https://dmponline.dcc.ac.uk
•www.dcc.ac.uk/resources/
•how-guides/develop-data-plan
How DMP Online works
Create a plan
based on
relevant
funder /
institutional
templates...
...and then
answer the
questions
using the
guidance
provided
Example plans
• Technical plan submitted to AHRC by Bristol Uni
http://data.bris.ac.uk/files/2013/02/data.bris-AHRC-Technical-Plan-v21.pdf
• Rural Economy & Land Use (RELU) programme examples
http://relu.data-archive.ac.uk/data-sharing/planning/examples
• UCSD example DMPs (20+ scientific plans for NSF)
http://rci.ucsd.edu/dmp/examples.html
• My DMP – a satire (what not to write!)
http://ivory.idyll.org/blog/data-management.html
Tips on writing DMPs
• Keep it simple, short and specific
• Seek advice - consult and collaborate
• Base plans on available skills and support
• Make sure implementation is feasible
• Justify any resources or restrictions needed
http://www.youtube.com/watch?v=7OJtiA53-Fk
Acknowledgement
Thanks in particular to Dorothea Salo, Ryan Schryver and
colleagues for content from the “Escaping Datageddon”
presentation, available at:
http://www.slideshare.net/cavlec/escaping-datageddon
And to the Research360 project at the University of Bath for the
“Managing your research data” presentation, available at:
http://opus.bath.ac.uk/32296
Thanks – any questions?
DCC guidance, tools and case studies:
www.dcc.ac.uk/resources
Follow us on twitter:
@digitalcuration and #ukdcc
Exercise
• Writing a DMP
• Overcoming barriers to data sharing
Which suits best based on who has signed up?

similar documents