Globus Project Future Directions

Report
Building an Open Grid
Ian Foster
Argonne National Lab
University of Chicago
Globus Alliance
www.mcs.anl.gov/~foster
Image Credit: Electronic Visualization Lab, UIC
Computing in Atmospheric Sciences, Annecy, September 10, 2003
Just to Avoid Confusion:
When I Say “Grid” I Mean …

[email protected]

2
ARGONNE  CHICAGO
Overview
• Why the Grid
• Technology trends and eScience
• Global knowledge communities
• The Grid and the atmospheric sciences
• Opportunities and examples
• Grid technologies
• OGSA and Globus Toolkit
• GriPhyN virtual data
• Grid infrastructure
• Summary
[email protected]
3
ARGONNE  CHICAGO
It’s Easy to Forget
How Different 2003 is From 1993
• Ubiquitous Internet: 100+ million hosts
• Collaboration & resource sharing the norm
• Ultra-high-speed networks: 10+ Gb/s
• Global optical networks
• Enormous quantities of data: Petabytes
• For an increasing number of communities,
gating step is not collection but analysis
• Huge quantities of computing: 100+ Top/s
• Ubiquitous computing via clusters
• Moore’s law everywhere: 1000x/decade
• Instruments, detectors, sensors, scanners
[email protected]
4
ARGONNE  CHICAGO
Foundation for e-Science
e-Science methodologies are transforming science,
engineering, medicine & business by enabling
whole-system approaches to complex problems
computers
software
Grid
sensor nets
instruments
colleagues
Shared data
archives
A Three-way Alliance Enabling
New Results
Theory
Models & Simulations
→
Shared Data
Requires much
engineering and
innovation
Computing Science
Systems, Notations &
Formal Foundation
→ Process & Trust
Experiment &
Advanced Data
Collection
→
Shared Data
Changes culture,
mores, and
behaviours
Demands new multi-national, multi-discipline,
computer-enabled consortia
Consequence: The Emergence of
Global Knowledge Communities
• Teams organized around common goals
• Communities: “Virtual organizations”
• With diverse membership & capabilities
• Heterogeneity is a strength not a weakness
• And geographic and political distribution
• No location/organization possesses all
required skills and resources
• Must adapt as a function of the situation
• Adjust membership, reallocate
responsibilities, renegotiate resources
[email protected]
7
ARGONNE  CHICAGO
For Example:
High Energy Physics
[email protected]
8
ARGONNE  CHICAGO
Global Knowledge Communities
Often Driven by Data: E.g., Astronomy
No. & sizes of data sets as of mid-2002,
grouped by wavelength
• 12 waveband coverage of large
areas of the sky
• Total about 200 TB data
• Doubling every 12 months
• Largest catalogues near 1B objects
[email protected]
ARGONNE  CHICAGO
Data and images courtesy Alex
9 Szalay, John Hopkins
New Opportunities
Demand New Technology
“Resource sharing & coordinated
problem solving in dynamic, multiinstitutional virtual organizations”
“When the network is as fast as the computer's internal
links, the machine disintegrates across the net into a set
of special purpose appliances”
[email protected]
10
(George Gilder)
ARGONNE  CHICAGO
Overview
◊ Why the Grid
• Technology trends and eScience
• Global knowledge communities
• The Grid and the atmospheric sciences
• Opportunities and examples
• Grid technologies
• OGSA and Globus Toolkit
• GriPhyN virtual data
• Grid infrastructure
• Summary
[email protected]
11
ARGONNE  CHICAGO
The Grid and the Atmospheric
Sciences: Opportunities
• Inter-personal collaboration
• E.g., Access Grid, CHEF
• On-demand access to simulation models
• E.g., Espresso
• Access to, and integration of, data sources
• E.g., Earth System Grid
• Integration of distributed codes
• Multidisciplinary modeling
• Integration of all of the above
• Collaborative, computationally intensive
analysis of large quantities of online data
[email protected]
12
ARGONNE  CHICAGO
[email protected]
13
ARGONNE  CHICAGO
Expresso Modeling Interface
(Michael Dvorak, John Taylor)
“Meteorology on demand”
[email protected]
14
ARGONNE  CHICAGO
NASA: Aviation Safety
Wing Models
•Lift Capabilities
•Drag Capabilities
•Responsiveness
Stabilizer Models
Airframe Models
•Deflection capabilities
•Responsiveness
Crew Capabilities
- accuracy
- perception
- stamina
- re-action times
- SOPs
Engine Models
Human Models
•Braking performance
•Steering capabilities
•Traction
•Dampening capabilities
Landing Gear Models
[email protected]
15
•Thrust performance
•Reverse Thrust performance
•Responsiveness
•Fuel Consumption
ARGONNE  CHICAGO
Earth System Grid (ESG)
Goal: address
technical
obstacles to
the sharing &
analysis of
high-volume
data from
advanced
earth system
models
[email protected]
16
ARGONNE  CHICAGO
CMS Event Simulation Production
• Production run on the integration testbed
• Simulate 1.5 million full CMS events for physics
studies: ~500 sec per event on 850 MHz processor
• 2 months continuous running across 5 testbed sites
• Managed by a single person at the US-CMS Tier 1
• EU DataGrid and LCG-1 operating at similar scales
[email protected]
17
ARGONNE  CHICAGO
NEESgrid Earthquake Engineering
Collaboratory
U.Nevada Reno
www.neesgrid.org
[email protected]
18
ARGONNE  CHICAGO
Overview
◊ Why the Grid
• Technology trends and eScience
• Global knowledge communities
• The Grid and the atmospheric sciences
• Opportunities and examples
• Grid technologies
• OGSA and Globus Toolkit
• GriPhyN virtual data
• Grid infrastructure
• Summary
[email protected]
19
ARGONNE  CHICAGO
Resource Integration
as a Fundamental Challenge
Many sources
of data, services,
computation
Security & policy
must underlie access
& management
decisions
Discovery
R
RM
R
RM
Access
Registries organize
services of interest
to a community
RM
RM
Security
Security
service
service
Resource management
is needed to ensure
progress & arbitrate
competing demands
Policy
Policy
service
service
Data integration activities
may require access to, &
exploration/analysis of, data
at many locations
[email protected]
RM
20
Exploration & analysis
may involve complex,
multi-step workflows
ARGONNE  CHICAGO
Performance Requirements Demand
Whole-System Management
• Assume
• Remote data at 1 GB/s
• 10 local bytes per remote
• 100 operations per byte
>1 GByte/s achievable today
(FAST, 7 streams, LAGeneva)
Remote
data
Parallel
computation:
1000 Gop/s
Local
Network
Wide area link (end-to-end
switched lambda?) 1 GB/s
[email protected]
Parallel I/O: 10 GB/s
21
ARGONNE  CHICAGO
Grid Technologies Promise to
Address Key Requirements
• Infrastructure (“middleware”) for
establishing, managing, and evolving
multi-organizational federations
• Dynamic, autonomous, domain independent
• On-demand, ubiquitous access to
computing, data, and services
• Mechanisms for creating and managing
workflow within such federations
• New capabilities constructed dynamically
and transparently from distributed services
• Service-oriented, virtualization
[email protected]
22
ARGONNE  CHICAGO
Increased functionality,
standardization
Grids and Open Standards
App-specific
Services
Open Grid
Services Arch
Web services
X.509,
LDAP,
FTP, …
Custom
solutions
GGF: OGSI, …
(+ OASIS, W3C)
Globus Toolkit Multiple implementations,
including Globus Toolkit
Defacto standards
GGF: GridFTP, GSI
Time
[email protected]
23
ARGONNE  CHICAGO
Open Grid Services Architecture
• Service-oriented architecture
• Key to virtualization, discovery,
composition, local-remote transparency
• Leverage industry standards
• Internet, Web services
• Distributed service management
• A “component model for Web services”
• A framework for the definition of
composable, interoperable services
“The Physiology of the Grid: An Open Grid Services Architecture for
Distributed Systems Integration”, Foster,
Kesselman,ARGONNE
Nick, Tuecke,
2002
[email protected]
 CHICAGO
24
Open Grid Services Architecture
Users in Problem Domain X
Applications in Problem Domain X
Application & Integration Technology for Problem Domain X
Generic Virtual Service Access and Integration Layer
Job Submission
Brokering
Registry
Banking
Workflow
Authorisation
OGSA
Structured Data
Integration
Data Transport Resource Usage Transformation Structured Data Access
OGSI: Interface to Grid Infrastructure
Web Services: Basic Functionality
Compute, Data & Storage Resources
Distributed
Structured Data
Relational XML
Semi-structured
-
Architecture
[email protected] Virtual Integration
ARGONNE  CHICAGO
25
The Globus Alliance & Toolkit
(Argonne, USC/ISI, Edinburgh, PDC)
• An international partnership dedicated to
creating & disseminating high-quality open
source Grid technology: the Globus Toolkit
• Design, engineering, support, governance
• Academic Affiliates make major contributions
• EU: CERN, Imperial, MPI, Poznan
• AP: AIST, TIT, Monash
• US: NCSA, SDSC, TACC, UCSB, UW, etc.
• Significant industrial contributions/adoption
• 1000s of users worldwide, many contribute
[email protected]
26
ARGONNE  CHICAGO
Other Relevant Efforts
• NSF Middleware Infrastructure initiative,
U.K. Open Middleware Initiative
• Archive, integration, testing, documentation
• EU DataGrid, GridLab, EGEE, Earth Science
Grid, NEESgrid, Unicore, etc.
• Produce/integrate software
• U.S. Virtual Data Toolkit (GriPhyN)
• GT + Condor in physics-friendly package,
and virtual data technologies
[email protected]
27
ARGONNE  CHICAGO
Grid Computing
and Industry
[email protected]
28
ARGONNE  CHICAGO
GriPhyN: PetaScale Virtual Data Grids
Production Team
Individual Investigator
Interactive User Tools
Virtual Data
Tools
Request Planning &
Scheduling Tools
Resource
èResource
èManagement
Management
èServices
Services
Workgroups
~1 Petaop/s
~100 Petabytes
Request Execution &
Management Tools
èSecurity
and
Security
and
èPolicy
Policy
èServices
Services
Other Grid
Services
èOther Grid
èServices
Transforms
Raw data
source
Distributed resources
(code, storage, CPUs,
networks)
“I’ve come across some
interesting data, but I need
to understand the nature of
the corrections applied
when it was constructed
before I can trust it for my
purposes.”
GriPhyN Virtual
Data Technology
Data
created-by
Transformation
execution-of
“I want to search an astronomical
database for galaxies with certain
characteristics. If a program that
performs this analysis exists, I
won’t have to write one from
scratch.”
[email protected]
30
“I’ve detected a calibration
error in an instrument and
want to know which derived
data to recompute.”
consumed-by/
generated-by
Derivation
“I want to apply an
astronomical analysis
program to millions of
objects. If the results
already exist, I’ll save
weeks
of computation.”
ARGONNE
 CHICAGO
Example Application:
Sloan Galaxy Cluster Analysis
DAG
Sloan Data
100000
Galaxy cluster
size distribution
Number of Clusters
10000
1000
100
10
1
1
10
Number of Galaxies
100
Jim Annis, Steve Kent, Vijay
Sehkri, Fermilab, Michael
Milligan, Yong Zhao, Chicago
Overview
◊ Why the Grid
• Technology trends and eScience
• Global knowledge communities
• The Grid and the atmospheric sciences
• Opportunities and examples
• Grid technologies
• OGSA and Globus Toolkit
• GriPhyN virtual data
• Grid infrastructure
• Summary
[email protected]
32
ARGONNE  CHICAGO
Grid Infrastructure
• Broadly deployed services in support of
fundamental collaborative activities
• Formation & operation of virtual organizations
• Authentication, authorization, discovery, …
• Services, software, and policies enabling ondemand access to critical resources
• Computers, databases, networks, storage,
software services,…
• Operational support for 24x7 availability
• Integration with campus and commercial
infrastructures
[email protected]
33
ARGONNE  CHICAGO
The Foundations
are Being Laid
Edinburgh
Glasgow
DL
Belfast
Newcastle
Manchester
Cambridge
Oxford
Cardiff
RAL
Hinxton
London
Soton
Tier0/1 facility
Tier2 facility
Tier3 facility
10 Gbps link
2.5 Gbps link
622 Mbps link
Other link
[email protected]
34
ARGONNE  CHICAGO
Current Focus of
U.S. Physics Grid Efforts: Grid3
A continuation of GriPhyN/PPDG/iVDGL testbed efforts,
focused on establishing a functional federated Grid
EGEE:
Enabling Grids for E-Science in Europe
Regional
Support
Operations
Center
Regional
Support
Regional
Support
Regional
Support
Center
Resource
Center
(Support for Applications
Local Resources)
(Processors, disks)
Grid server Nodes
Resource
Center
Resource
Center
Resource
Center
[email protected]
36
ARGONNE  CHICAGO
Overview
◊ Why the Grid
• Technology trends and eScience
• Global knowledge communities
• The Grid and the atmospheric sciences
• Opportunities and examples
• Grid technologies
• OGSA and Globus Toolkit
• GriPhyN virtual data
• Grid infrastructure
• Summary
[email protected]
37
ARGONNE  CHICAGO
Current Capabilities
• A core set of Grid capabilities are available and
distributed in good quality form, e.g.
• GT: security, discovery, access, data movement
• Condor, EDG: scheduling, workflow management
• Virtual Data Toolkit, NMI, etc.
• Deployed at moderate scales
• NEESgrid, EU DataGrid, LCG-1, …
• Usable with some hand holding, e.g.
• CMS event production: 5 sites, 2 months
• NEESgrid earthquake engineering experiment
[email protected]
38
ARGONNE  CHICAGO
Challenges
• Integration with site operational procedures
• Many difficult issues
• Scalability in multiple dimensions
• Number of sites, resources, users, tasks
• Higher-level services in multiple areas
• Virtual data, policy, collaboration
• Integration with end-user science tools
• Science desktops
• Coordination of international contributions
• Integration with commercial technologies
[email protected]
39
ARGONNE  CHICAGO
Summary
• “eScience”: computer-based, whole system
approaches to complex problems
• Grid technologies provide enabling services,
software, and infrastructure
• Significant opportunities within the
atmospheric sciences: remote access to, &
integration of, data & simulation capabilities
• Major implications for what it means to
provide infrastructure
[email protected]
40
ARGONNE  CHICAGO
For More Information
• The Globus Alliance®
• www.globus.org
• Global Grid Forum
• www.ggf.org
• Earth System Grid
• www.earthsystemgrid.org
• Background information
• www.mcs.anl.gov/~foster
• GlobusWORLD 2004
2nd Edition: November 2003
• www.globusworld.org
• Jan 20–23, San Francisco
[email protected]
41
ARGONNE  CHICAGO

similar documents