ILDG Middleware Status Chip Watson ILDG-6 Workshop May 12, 2005 Outline Status: small changes from Dec 2005 Quick review of architecture Minimal implementation facts Next steps Status (quick look) Only a small amount of middleware work has been done in the last 6 months – development of new metadata catalog prototype at Adelaide based on XML database – modifications to metadata catalog prototype at Fermilab to conform to new interface – small amount of work on replica catalog prototypes at several sites (JLab, Adelaide, Fermilab) Architecture remains unchanged Architecture (review) Web Services – Metadata Catalog maps meta data to a global name – Replica Catalog maps a global name to one or more instances – Storage Resource Manager (optional) manages a disk, or disk + tape resource Draft schemas (WSDL) for these services exist Architecture (review) File based directories contain... – Master directory of all collaborations’ MDC, RC and membership lists, stored as XML files – Distributed group membership lists (XML) Initial version of schemas (XML) exist Implementation View Master Directory http://www.lqcd.org/<tbd>.xml contains for each collaboration: metadata catalog replica catalog group membership MDC for UKQCD MDC for USQCD MDC for Japan RC for UKQCD RC for USQCD RC for Japan Japan group file UKQCD group file USQCD group file subgroup A file subgroup B file file X MetaData Catalog ILDG schema defines only a query interface – multiple query languages (syntax) allowed for now (no clear winner yet) – queries map from physics metadata values to Global File Name (GFN) – proposed minor modification can also return the full physics metadata Minimal Implementation Master XML directory to be held at www.lqcd.org/<tbd>.xml For each collaboration, need at least these: – MetaDataCatalog (e.g. running at www.usqcd.org/<tbd>) – trivial Replica Catalog (does 1:1 name mapping) – standard web or ftp server to serve files Getting going... (or, what must a collaboration do?) First: Deploy a metadata catalog 1. choose an existing prototype & deploy 2. populate the catalog with qcdml v1.1 compliant documents, with ILDG compliant GFN’s (global file names) Note: names must have collaboration name as part of the string; this name matches the entry name in the master directory: gfn://collaboration/local-name 3. request [email protected] to add your MDC to the master directory on www.lqcd.org Getting going... (or, what must a collaboration do?) Second: Deploy a replica catalog 1. (option 1) write a simple function which maps your collaboration’s GFN naming convention into a static URL pointing to the file (i.e., no database, just string shuffling) OR 2. (option 2) get / implement a true RC, with multiple instance tracking (a database) 3. request [email protected] to add your RC to the master directory on www.lqcd.org Third: Serve the files (http, ftp, srm, ...) Nice things to also do... Deploy a real RC, which can track another collaboration’s copies of your files Populate a group membership file, to support group read/write access (otherwise your collaboration is relegated to “world” status) Deploy an SRM (with protocol negotiation) and also at least one file server that supports parallel streams (gridftp, bbftp, ...) for higher performance file retrieval Implement a web interface to your metadata catalog Near Term Expectations Adelaide will deploy an MDC, RC within the next few months USQCD will also try to match this within the next 6 months, but is currently distracted with getting machines into production others have not committed yet Australian ILDG Node Paul Coddington School of Computer Science, University of Adelaide South Australian Partnership for Advanced Computing [email protected] May 2005 Overview • A prototype ILDG node has been set up in Australia for data from the Centre for the Subatomic Structure of Matter (CSSM). • We have developed a metadata catalog, replica catalog and web portal. • Currently just allows searching, browsing and downloading of QCDML metadata – ability to download configuration files will be added later. • Metadata for around 50 ensembles is currently available. Metadata Catalog • Ensemble and configuration QCDML metadata is generated as XML files which are loaded into Apache Xindice, an XML database. • The metadata catalog web service was developed in Java using Xindice's implementation of the XML:DB API for XML databases. – So should work with other XML databases • It (almost) conforms to the metadata catalog interface defined by the ILDG Middleware Working Group. – Added additional parameter to specify returning GFNs or XML • XPath queries are passed directly to the XML database. Other Components • Replica catalog is a web service wrapper around the Replica Location Service for Globus Toolkit 3. – Plan to change this to GT4 RLS or something else. • No mechanism for downloading files yet – Will initially generate wget script, like Japanese portal. – Then investigate using SRM. • Web portal written using JSP. – http://www.sapac.edu.au/ildg/cssm/ • All software will be made freely available after code is cleaned up and documented. Middleware Working Group Near Term Task List Approve minor changes to MDC interface Decide on the URL for, and deploy: master directory file master membership file Collect official CA certificates from all collaborations and post at www.lqcd.org for all to easily retrieve (for configuring servers for strongly authenticated operations) Most Significant Challenges Get data into ILDG compliant format – create or automate creation of metadata compliant with qcdml1.1 – write files in ILDG format (or write translation program for on-the-fly translation) will LQCD application developers do this? or will manpower need to be found for translation programs? Get the MDC operational and populated (other tasks are comparatively easy) Other Challenges Manpower to implement a nice user interface for browsing, and optionally retrieving files (once per collaboration, or shared, even hosted at www.lqcd.org ?) Manpower to write some simple command line client tools to be used in workflow scripting Goal of reaching an operational status by June 2006 is still feasible!