The DPM RoadMap

Report
DPM Status & Roadmap
Ricardo Rocha
( on behalf of the DPM team )
EMI INFSO-RI-261611
DPM Overview
HEAD NODE
DPNS
DPM
SRM
HTTP
NFS
GRIDFTP
RFIO
HTTP
NFS
XROOT
FILE METADATA
OPS
CLIENT
FILE ACCESS
OPS
RFIO
HTTP
NFS
XROOT
DISK NODE(s)
DPM Core
1.8.2, Testing, Roadmap
DPM 1.8.2 – Highlights
• Improved scalability of all frontend daemons
– Especially with many concurrent clients
– By having a configurable number of threads
• Fast/Slow in case of the dpm daemon
• Faster DPM drain
– Disk server retirement, replacement, …
• Better balancing of data among disk nodes
– By assigning different weights to each filesystem
• Log to syslog
• GLUE2 support
DPM Core – Testing Activity
• Improved validation & testing
– Collaboration with ASGC for this purpose (thanks!)
– Hammercloud tests running regularly
– They started with a 400 core setup, we looked at the
issues, now moving to 1000 cores to increase load
• Example run
– http://hammercloud.cern.ch/atlas/10006472/test/
• To be used extensively for stress testing
– Covering all components: DPM, RFIO, GRIDFTP, NFS,
HTTP, …
• Results will benefit other sites too
DPM Core – Testing
HC using RFIO
HC using GridFTP
Example
GridFTP vs RFIO
Thanks to ShuTing for the plots ( preliminary results )
DPM Core - Testing
• Big contribution from openlab student
– Martin Hellmich, University of Edinburgh
• Detailed analysis of DPM internals
– Detecting bottlenecks in specific transfer / access phases
Example… but we have a lot more
results which we are now investigating
DPM Core – Roadmap
•
•
•
•
•
•
•
•
•
•
Package consolidation: EPEL compliance
Fixes in multi-threaded clients
Replace httpg with https on the SRM
Improve dpm-replicate (dirs and FSs)
GUIDs in DPM
Synchronous GET requests
Reports on usage information
Quotas
Accounting metrics
HOT file replication
1.8.3
1.8.4
1.8.5
Beta Components
HTTP/DAV, NFS, Nagios,
Puppet, Perfsuite, Catalog
Sync, Contrib Tools
Beta Components: Overview
• Faster releases
– Monthly releases since June
• Separate yum repository
• Already in use by several sites
– Including sites in the UK
https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/Dev/Components
Beta Components: PerfSuite
Overview
Performance Suite
• Set of tools to easily trigger bunches of tests
– With different configurations
• Common wrapper, many tests
• Existing suites
–
–
–
–
POSIX Transfers: RFIO, NFS
GET/PUT Transfers: HTTP, GSIFTP
ROOT
More coming…
• Used for most results presented later
https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/Admin/Performance#Perfsuite
Performance Suite
• Set of tools to easily
test bunches
Sampletrigger
Configuration
– With different configurations
test_rfcp(c:5,s:{1M 2M 4M 8M 16M 32M 64M 128M 256M 512M 1G})x3
• Common wrapper, many tests
test_nfs(m:/mnt/nfs41,c:5,s:{1M 2M 4M 8M 16M 32M 64M 128M 256M 512M 1G})x3
• Existing suites
–
–
–
–
POSIX Transfers: RFIO, NFS
GET/PUT Transfers: HTTP, GSIFTP
ROOT
More coming…
• Used for most results presented later
https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/Admin/Performance#Perfsuite
Beta Components: HTTP / DAV
Overview, Performance,
Roadmap
HTTP / DAV: Overview
GET
1
LFC
REDIRECT
GET / PUT
CLIENT
2
DPM HEAD
REDIRECT
GET / PUT
3
DPM DISK
DATA
https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/WebDAV
HTTP / DAV: Overview
GET
1
LFC
REDIRECT
GET / PUT
CLIENT
2
DPM HEAD
REDIRECT
GET / PUT
3
DPM DISK
DATA
HTTP: Client Support
curl
browser
OS
Any
Any
GUI
NO
YES
CLI
YES
NO
X509
YES
YES
Proxies
YES
Only IE so far
Redirect
YES
YES
PUT
YES
NO
• Recommendation: browser/curl for GET, curl for PUT
• Chrome Issue 9056 submitted for proxy support
DAV: Client Support
TrailMix
Cadaver
Davlib
Shared
Folder
DavFS2
Nautilus
Dolphin
OS
Firefox < 4 *nix
Mac OS X Windows *nix
Gnome
KDE
GUI
YES
NO
YES
YES
N/A
YES
YES
CLI
NO
YES
NO
NO
N/A
NO
NO
X509
YES
YES
NO
YES
YES
NO
NO
Proxies
?
NO
NO
YES
NO
NO
NO
Redirect
YES
NO
YES
Not PUT
NO
NO
YES
• Updated analysis based on initial one from dCache
• Recommendation: Cadaver for *nix, Windows explorer
HTTP vs GridFTP: Multiple streams
• Not explicit in the HTTP protocol
• But needed for even higher performance
– Especially in the WAN
• So we added it, with some semantics
– Small wrapper around libcurl
– PUT with ‘0 bytes’ && null content-range == end
of write
• Submitted patch to libcurl to allow ssl session
reuse among parallel requests
HTTP vs GridFTP: 3rd Party Copies
• Implemented using WEBDAV COPY
• Requires proxy certificate delegation
– Using gridsite delegation, with a small wrapper client
• Requires some common semantics to copy
between SEs (to be agreed)
– Common delegation portType location and port
– No prefix in the URL ( just http://<server>/<sfn> )
HTTP vs GridFTP: 3rd Party Copies
Example of FTS usage
HTTP / DAV: Performance
• Xeon 4 Cores 2.27GHz
• 12 GB RAM
• 1 Gbit/s links
• No difference detected in LAN with different number of streams
– But early results do show a big difference on the WAN
• lcg-cp configured to use gridftp
• File registration & transfer times considered in both cases
HTTP / DAV: Issues & Roadmap
• Towards a first production release
– Testing with large number of concurrent clients
– Finish up the WAN performance tests
• And after that
– Further testing of 3rd party copy with larger files
– Finish validation against other implementations
– Validate usage via ROOT
– Improved GET on the LFC
– PUT support on the LFC (?)
Beta Components: NFS 4.1 / pNFS
Overview, Performance,
Roadmap
NFS 4.1/pNFS: Why?
•
•
•
•
•
•
•
Industry standard (IBM, NetApp, EMC, …)
Free clients (with free caching)
Strong security (GSSAPI)
Parallel data access
Easier maintenance
…
But you know all this by now…
NFS 4.1/pNFS: Overview
1
2
3
7
OPEN
LAYOUTGET
GETDEVICEINFO
METADATA
SERVER
CLOSE
CLIENT
4
5
6
OPEN
READ / WRITE
DISK SERVER(s)
CLOSE
https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/NFS41
NFS4.1 / pNFS: Client
•
•
•
•
pNFS support in linux kernel from >= 2.6.38
nfs-utils >= 1.2.3
Latest Fedora and Debian Sid have it
We provide packages for EL5
– Enabled pNFS in the elrepo mainline kernel
– nfs-utils and AFS module we package ourselves
NFS4.1 / pNFS: Performance
• IOZONE Results
• Server
– Xeon 4 Cores 2.27GHz
– 12 GB RAM
– 1 Gbit/s links
• Client
– Dual core
– 2 GB RAM
– 100 Mbit/s link
NFS4.1 / pNFS: Performance
• NFS vs RFIO
• Server
– Xeon 4 Cores 2.27GHz
– 12 GB RAM
– 1 Gbit/s links
• Client
RFIO read misbehaving in this test… investigating
– Dual core
– 2 GB RAM
– 100 Mbit/s link
• 8 KB block sizes
NFS4.1 / pNFS: Issues & Roadmap
• Towards a first production release
– Tests with a faster network link
– Testing with a larger number of concurrent clients
– WAN testing
– Enable bigger block sizes
• And after that
– X509 certificate support
• Still not figured out… needs a strong focus
– Further validation with other implementations
Beta Components: Even more…
Puppet, Nagios, Contrib,
Catalog Sync
Even more components…
• Catalog Synchronization
– Check Fabrizio’s talk next Monday (EGI Forum Lyon)
• DPM Admin contrib package
– Contribution from GridPP
– Now package and distributed with the DPM components
– http://www.gridpp.ac.uk/wiki/DPM-admin-tools
• Nagios monitoring plugins for DPM
– Available now
– https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/Admin/Monitoring
• Puppet templates
– Available now in beta
– https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/Admin/Puppet
Conclusion
• 1.8.2 fixes many scalability and performance issues
– But we continue testing and improving
• Popular requests coming in next versions
– Accounting, quotas, easier replication
• Beta components getting to production state
– Standards compliant data access
– Simplified setup, configuration, maintenance
– Metadata consistency and synchronization
• And much more extensive testing
– Performance test suites, regular large scale tests

similar documents