Presentation to ARGIS - Atlanta Region GIS User Group
October 30, 2013
Jennifer Doty | [email protected]
Data Management Specialist
Emory Center for Digital Scholarship
Best practices for managing geospatial data:
• File formats
• Naming conventions
• Folder structure
• Storage and backup
• Documentation
Trends in geospatial data archiving:
• Federal funding agencies’ requirements
• State initiatives for preservation
Best Practices: File Formats
Type of data
Acceptable formats for
sharing, reuse and
Other acceptable formats
for data preservation
Geospatial data
vector and raster data
ESRI Shapefile
(essential - .shp, .shx, .dbf,
optional - .prj, .sbx, .sbn)
ESRI Geodatabase format
(.mdb, .gdb)
geo-referenced TIFF (.tif,
CAD data (.dwg)
MapInfo Interchange
Format (.mif) for vector
Keyhole Mark-up Language
(KML) (.kml)
tabular GIS attribute data
Adobe Illustrator (.ai), CAD
data (.dxf or .svg)
binary formats of GIS and
CAD packages
UK Data Archive File Formats guide,
Best Practices: File Formats
GeoMAPP Geospatial Data File Formats
Reference Guide:
• provides quick reference of common
geospatial raster and vector dataset types
• serves as tool to identify geospatial format
types based on file extensions
• also includes information on standards and
specifications for documenting geospatial data
Best Practices: Naming Conventions
• Create meaningful but brief naming
conventions for your project
• Use file names to classify broad types of files
• Avoid using spaces and special characters
• Begin names with letters, not numbers
e.g. Census2010_blockgroups_GA, not 2010Census…
• Avoid very long file names
Best Practices: Naming Conventions
Example: keyword_steward_extent_date.ext
• Keyword (essential)—be as descriptive of the contents of
the data as possible by using a word or short phrase
• Steward (essential)—either the creator of the dataset or
the last one to make a significant modification to a dataset
• Extent (optional)—may be included to indicate resolution
of the data (e.g. county, state, or international)
• Date (optional)—may be used to indicate the date of
creation or the age range of the content. Recommended
format is YYYYMMDD
Indiana Geographic Information Council,
Best Practices: Naming Conventions
• useful to indicate file revisions or edits,
especially in collaborations
• can be through discrete or continuous
numbering, depending on minor or major
– think of software versioning—ArcGIS 10 was
significant change from 9.x., but ArcGIS 10.1 was
(relatively) minor change to 10
Best Practices: Folder Structure
• Separate directories for scratch workspace
and final data
• Hierarchy—is deep or shallow best for your
Tape library, CERN, Geneva by Cory Doctorow / CC BY-SA 2.0
Best Practices: Storage & Backup
Storage Considerations:
• Accessibility
• Read/Write speed
• Size limits—overall vs. file size
• Local—PC drive, flash drive, external hard drive
• Server—department/organization server space
• Cloud—Dropbox, Google Drive, etc.
Best Practices: Storage & Backup
Backup Considerations:
• Accessibility (local, server, cloud)
• Redundancy (rule of thumb—here, near, far)
• Incremental/Snapshot
• Automated
Metadata is a love note… by sarah0s / CC BY-NC-ND 2.0
Best Practices: Documentation
“When thoughtfully populated, geospatial
metadata can be a critical resource for
understanding and managing geospatial data for
current and future GIS practitioners and those
trying to preserve the data.”
-Utilizing Geospatial Metadata to Support Data Preservation
Practices, January 2011, GeoMAPP
Best Practices: Documentation
Metadata—represents the who, what, when,
where, why and how
• ISO 19115-2003 / 19139
FGDC’s Content Standard
for Digital Geospatial
Metadata (CSDGM)
CSDGM Fields for
Checklist: CSDGM Fields for
Identification Information - basic info about data set, including:
• party responsible—usually creator
• publication date—date the data set is completed and ready for use
• title—”where” “what” “when”
• maintenance/update frequency—annually, as needed, based on
census, etc.
• bounding coordinates
• keywords (theme and place)
• access and use constraints—any restrictions, disclaimers, or
guidance on data set attribution
• contact details
GeoMAPP, Utilizing Geospatial Metadata to Support Data Preservation Practices
Checklist: CSDGM Fields for
Data Quality Information – provides historical
lineage and source descriptions for the data
used in the creation of the data set, including:
• originator
• publisher, publication date & place
• “currentness” of source data
• process description
GeoMAPP, Utilizing Geospatial Metadata to Support Data Preservation Practices
Checklist: CSDGM Fields for
Spatial Reference Information - description of
the reference frame for, and the means to
encode, coordinates in the data set, including:
• map projection name
• coordinate system name
• unit of measure
• geodetic model—datum, ellipsoid
GeoMAPP, Utilizing Geospatial Metadata to Support Data Preservation Practices
Checklist: CSDGM Fields for
Entity and Attribute Information - details about
content of the data set—the entities, their
attributes, and domains from which attribute
values may be assigned, including:
• entity label
• attribute label and description
GeoMAPP, Utilizing Geospatial Metadata to Support Data Preservation Practices
Checklist: CSDGM Fields for
Metadata Reference Information - information
on the party responsible for creating the
metadata and the currentness of the metadata:
• metadata standard name
• metadata standard version
GeoMAPP, Utilizing Geospatial Metadata to Support Data Preservation Practices
Data Management Initiatives
Federal agency mandates for sponsored research:
• NSF & NIH requirements for DM plans
• GIS Inventory (Ramona) & Federal Grants data
sharing plans—
Other related initiatives:
• USGS DM working group
• DM training for early career researchers
FGDC Geospatial Data Lifecycle Model
State & National Initiatives in
Geospatial Data Archiving
GeoMAPP - Geospatial Multistate Archive and
Preservation Partnership (
• federally funded partnership between the Library of
Congress and state geospatial and archives staff from
North Carolina, Kentucky, Montana, and Utah
National Digital Stewardship Alliance (NDSA), Geospatial
Content Team (
• report identifying appraisal and selection activities as
they effect decisions defining geospatial content of
enduring value for the nation
Open GeoPortal @ Emory
NASA Goddard Photo and Video / CC BY
Green Question Mark by mikecogh on Flickr / CC BY
Contact Information:
Jennifer Doty | [email protected]
Data Management Specialist
Michael Page | [email protected]
Geographer & Geospatial Data Librarian
Emory Center for Digital Scholarship

