Present - OPeNDAP

Report
OPeNDAP Present and Future
An Overview Encompassing Current
Projects & Potential New Directions
Dave Fulker and James Gallagher
Rough Outline
• Background
• OPULS (an OPeNDAP-Unidata collaboration)
– DAP4 (to supersede DAP2)
– Experimental extensions (Async access, UGRID subsets)
• Hyrax over Amazon/S3
• Elaboration on server functions
– Perhaps binning, masking, a functional language?
– Relationship to WPS & other Web services
• Hyrax (& WCS) in OWS-9
OPeNDAP, Inc.
2
Origins
• Scientists (ocean fluxes & temps) envisaged
use of http for remote data access (1993)
• Collaboration with the designer of the JGOFS
data system…
• Led to Distributed Ocean Data System (DODS)
• DODS later was renamed OPeNDAP
(to be explained momentarily…)
OPeNDAP, Inc.
3
OPeNDAP Now Is:
• An acronym
– “Open-source Project for a Network Data Access Protocol”
– Often a synonym for “DAP”
• A not-for-profit corp. developing/supporting
– “DAPx” - a web-services protocol for data access
• Deployed by hundreds of data providers internationally
• Employed in many analysis packages (MATLAB, e.g.)
• Designated a “Community Standard” by NASA
– Server & client implementations* of DAP
*Note: there are other implementations
4
Available Software
• Free end-user applications that include DAP
support: panoply, idv, nco, …
• Commercial: IDL, Matlab, ArcGIS
• SDKs: The netCDF C and Java libraries; OC;
libdap; Java OPeNDAP, PyDAP
– Each of these provides its own API and they span
C, C++, Java and Python
• Data serves: PyDAP, Hyrax, TDS, …
OPeNDAP, Inc.
5
Concept:
Clients Get Just the Data They Need, as
They Need them
• Accessing data via URLs (i.e., URL = dataset)
– Appending query strings to subset or run server functions
• Getting responses of two (general) types:
– Metadata - dataset descriptions & catalogs (textual)
– Content - values and metadata (binary or textual)
• Using responses in diverse ways, e.g.
– MATLAB maps responses to its internal math types
– netCDF library allows apps to work as though
reading a local file
6
NOAA grant for
OPeNDAP-Unidata Linked Servers (OPULS)
• Goal 1: conformance & linkage between OPeNDAP &
Unidata DAP-servers, with short-term outcomes:
– New data-model & protocol specs: DAP4
• Consistent behaviors of OPeNDAP & Unidata servers
• Data-type richness (NetCDF4, HDF5, RDBs)
– Extensions (i.e., new server behaviors):
• Irregular-mesh subsetting
• Asynchronous access
• Goal 2: common framework for OPeNDAP & Unidata
servers, aiming for an architecture that
– Underpins the unique strengths of both
– Reduces likelihood of redundant effort
7
OPULS Progress So Far
• Draft of DAP4 data model & protocol specs
– Sufficient for the full richness of NetCDF-4 and
HDF-5 files (including “Groups,” e.g.)
• Progress on rigorous conformance-testing
• Successful extensibility experiments
– Irregular-mesh (i.e., UGRID) subsetting
– Asynchronous access (as may be useful for
near-line data storage)
– Amazon cloud deployment (more later…)
8
Other technologies OPULS considered
• JSON responses as an alternative to XML
– Decided they added too much bulk to the
specification and two many requirements for
implementers
– Could be added in a future version
– Can be built using XSLT from DAP4 XML
• OpenSearch
– Not incorporated into DAP4 for many of the same
reasons
• The DAP4 metadata response specifically includes
support for these
OPeNDAP, Inc.
9
OPULS and Feedback
• OPULS is ready for community feedback
• Design documents are online
– Web site: http://docs.opendap.org/
– The current draft specification is there as well
• Many features are already available in C++ and
C implementations
OPeNDAP, Inc.
10
Hyrax over Amazon/S3
• Exploits a natural fit between DAP-based
services and cloud services
• Initial progress already achieved under the
OPULS grant
• Bears interesting similarities to the challenge
of asynchronous data access
• May yield a new community of
OPeNDAP users
OPeNDAP, Inc.
11
More about clouds…
• Hyrax is trivial to run on the Amazon cloud
• We are looking at ways to work with data held
in S3
• S3 characteristics:
– Flat;
– Modest response times;
– Simple GET/PUT type API
OPeNDAP, Inc.
12
Using S3
• Tried S3 file systems – found them wanting
– Not interoperable (hardly surprising, but limiting)
– Extra layer to software stack
• Now working with XML ‘catalogs’
– XML documents create a faux hierarchy
– XML + XSLT  HTML (i.e., a ‘free’ web interface)
– XML + Hyrax + caching  DAP access
– The XML is very similar to THREDDS catalogs
OPeNDAP, Inc.
13
Elaboration on Server Functions
• Proposition: the future of OPeNDAP may lie in
provision of data-proximate (i.e., server-side)
functions that:
– Deliver precisely defined subsets
– Reduce the number of off-target retrievals
• I.e., enable querying of complex dataset properties
– Remap/transform data to simplify data use,
especially multi-source data integration
• Effective caching will be required
OPeNDAP, Inc.
14
Server Functions, DAP4
• DAP2 supports functions and functional
composition
• Currently, DAP4 treats ‘functions’ and a
‘functional language’ as an extension
• DAP4 provides more complete support for
functions, including metadata responses
(DAP2 does not provide this; a gap in the
DAP2 specification)
• Support for POST
OPeNDAP, Inc.
15
Server Functions, experimentation
• UGrid: Unstructured Grid (irregular mesh)
subsetting
• We have implemented a clone of the GDS
server’s syntax for functions
• Enables current netCDF-based DAP clients
(e.g., ECMF) to use the Ugrid function
• Other projects: Multi-instrument intercalibration
OPeNDAP, Inc.
16
Some Server-Function Ideas
• Binning: returns a distribution (as a raster of
boolean values on a user-specified grid) of
data values satisfying some criteria
• Masking: accepts a raster of zero/nonzero
values as a query argument, perhaps as a
geospatial selection criterion, e.g.
• Perhaps some (limited?) form of functional
language for very rich capabilities
• WPS, et al.
OPeNDAP, Inc.
17
Summary
• DAP is based on a domain neutral data model and an
expression-based constraint language
• While not ‘RESTful’ in the strictest sense, it is a REST design in
spirit (DAP predates the term by several years)
• OPULS is a collaborative project between OPeNDAP and
Unidata that intends to update DAP
• We are also running several experimental mini-projects within
its context:
– Asynchronous access, Unstructured Grid access, Cloud computing and
an expanded, function-based, server-side processing system
• DAP servers provide a good platform on which to build OGC
web services, as described in the following presentation.
OPeNDAP, Inc.
18

similar documents