Particle Physics Unix Overview

Report
Oxford University
Particle Physics
Unix Overview
Pete Gronbech
Senior Systems Manager and
GridPP Project Manager
17th October 2013
Graduate Lectures
1
Strategy
 Local Cluster Overview
 Connecting to it
 Grid Cluster
 Computer Rooms
 How to get help

17th October 2013
Graduate Lectures
2
Particle Physics Strategy
The Server / Desktop Divide
Servers
Virtual Machine Host
Desktops
General
Purpose Unix
Server
Win 7
PC
Linux File
Servers
Win 7
PC
Linux
Worker
nodes
Win 7
PC
Group
DAQ
Systems
Ubuntu
PC
Web
Server
NIS
Server
torque
Server
Linux Desktop
Approx 200 Desktop PC’s with Exceed, putty or ssh/X windows used to access
PP Linux systems
17th October 2013
Graduate Lectures
3

Particle Physics Linux
Unix Team (Room 661):




Pete Gronbech - Senior Systems Manager and GridPP Project Manager
Ewan MacMahon – Grid Systems Administrator
Kashif Mohammad – Grid and Local Support
Sean Brisbane – Local Server and User Support

General purpose interactive Linux based systems for code development, short
tests and access to Linux based office applications. These are accessed
remotely.

Batch queues are provided for longer and intensive jobs. Provisioned to meet
peak demand and give a fast turnaround for final analysis.

Systems run Scientific Linux which is a free Red Hat Enterprise based
distribution.

The Grid & CERN are just migrating to SL6. The local cluster is following and
currently has one interactive node with a growing set of worker nodes available
from "pplxint8".

Most cluster systems are still currently running SL5. These can be accessed
from pplxint5 and 6.
We will be able to offer you the most help running your code on the newer SL6.
Some experimental software frameworks still require SL5.

17th October 2013
Graduate Lectures
4
Current Clusters

Particle Physics Local Batch cluster

Oxfords Tier 2 Grid cluster
17th October 2013
Graduate Lectures
5
PP Linux Batch Farm
Scientific Linux 5
Users log in to the interactive nodes
Pplxint5 & 6, the home directories and all
the data disks (/home area or /data/group )
are shared across the cluster and visible on
the interactive machines and all the batch
system worker nodes.
Approximately 300 Cores each with 4GB of
RAM memory.
pplxwn42
pplxwn41
16 * E5-2650 cores
pplxwn32
16 * Intel 5650 cores
pplxwn31
16 * Intel 5650 cores
pplxwn28
pplxwn27
Graduate Lectures
16 * AMD Opteron 6128 cores
16 * AMD Opteron 6128 cores
pplxwn26
16 * AMD Opteron 6128 cores
pplxwn25
16 * AMD Opteron 6128 cores
pplxwnnn
8 * Intel 5420 cores
pplxwnnn
pplxwnnn
8 * Intel 5420 cores
8 * Intel 5420 cores
pplxwnnn
pplxwnnn
8 * Intel 5420 cores
pplxwn10
8 * Intel 5420 cores
pplxwn9
8 * Intel 5420 cores
pplxint5
pplxint6
17th October 2013
16 * E5-2650 cores
8 * Intel 5420 cores
Interactive
login nodes
6
PP Linux Batch Farm
Scientific Linux 6
Migration to SL6 ongoing.
New SL6 interactive node pplxint8.
Use this by preference. Worker nodes will
be migrated from the SL5 cluster to SL6
over the next month.
Currently four servers with 16 cores each
with 4GB of RAM memory per core but
more will arrive as required.
ie 64 job slots.
pplxwnnn
16 * Intel 2650 cores
pplxwnnn
16 * Intel 2650 cores
pplxwn50
pplxwn49
16 * Intel 2650 cores
pplxint8
17th October 2013
Graduate Lectures
16 * Intel 2650 cores
Interactive
login nodes
7
PP Linux Batch Farm
NFS is used to export data to the smaller experimental
groups, where the partition size is less than the total size
of a server.
NFS
Servers
9TB
pplxfsn
Data
Areas
40TB
pplxfsn
Data
Areas
30TB
pplxfsn
Data
Areas
19TB
pplxfsn
Data Storage
Home
areas
The data areas are too big to be backed up. The servers
have dual redundant PSUs, RAID 6 and are running on
uninterruptible powers supplies. This safeguards against
hardware failures, but does not help if you delete files.
The home areas are backed up to by two different
systems nightly. The OUCS HFS service and a local back
up system. If you delete a file tell us a soon as you can
when you deleted it and it’s full name.
The latest nightly backup of any lost or deleted files
from your home directory is available at the read-only
location "/data/homebackup/{username}
The home areas are quota’d but if you require more
space ask us.
Store your thesis on /home NOT /data.
17th October 2013
Graduate Lectures
8
Particle Physics Computing
The Lustre file system is used to group multiple file servers together to provide extremely large
continuous
file spaces.
This
is used for
the Atlas
and LHCbLustre
groups.
Lustre MDS
Lustre
OSS01
Lustre
OSS02
OSS03
Lustre OSS04
18TB
18TB
df -h /data/atlas
Filesystem
/lustre/atlas
17th October 2013
44TB
Size Used Avail Use% Mounted on
244T 215T 18T 93% /data/atlas
df -h /data/lhcb
Filesystem
/lustre/lhcb
SL5 Node
44TB
Size Used Avail Use% Mounted on
95T 82T 8.5T 91% /data/lhcb
SL5 Node
SL5 Node
Graduate Lectures
SL6 Node
9
17th October 2013
Graduate Lectures
10
Strong Passwords etc

Use a strong password not open to
dictionary attack!
– No good
 Uaspnotda!09 – Much better
 fred123

Better to use ssh with a passphrased
key stored on your desktop.
17th October 2013
Graduate Lectures
11
Connecting with PuTTY
Question: How many of you are using Windows? & Linux? On the desktop
Demo
1. Plain ssh terminal connection
2. With key and Pageant
3. ssh with X windows tunnelled to
passive exceed
4. ssh, X windows tunnel, passive
exceed, KDE Session
http://www2.physics.ox.ac.uk/it-services/ppunix/ppunix-cluster
http://www.howtoforge.com/ssh_key_based_logins_putty
17th October 2013
Graduate Lectures
12
17th October 2013
Graduate Lectures
13
Puttygen to create an ssh key on Windows
Paste this into
~/.ssh/authorized_keys on
pplxint
Enter a secure passphrase
then save the public and
private parts of the key to a
subdirectory of your h: drive
17th October 2013
Graduate Lectures
14
Pageant

Run Pageant once after login to load
your (windows ssh key)
17th October 2013
Graduate Lectures
15
SouthGrid Member Institutions

Oxford
RAL PPD
Cambridge
Birmingham
Bristol
Sussex

JET at Culham





17th October 2013
Graduate Lectures
16
Current capacity

Compute Servers

Twin and twin squared nodes
– 1300 CPU cores

Storage


Total of ~700TB
The servers have between 12 and 36 disks, the
more recent ones are 3TB capacity each. These
use hardware RAID and UPS to provide resilience.
17th October 2013
Graduate Lectures
17
Get a Grid Certificate
Must remember to use the same PC to request and retrieve the
Grid Certificate.
The new UKCA page uses a JAVA based CERT WIZARD
17th October 2013
Graduate Lectures
18
Two Computer Rooms
provide excellent
infrastructure for the future
The New Computer room built at Begbroke Science Park jointly for the Oxford Super Computer
and the Physics department, provides space for 55 (11KW) computer racks. 22 of which will be
for Physics. Up to a third of these can be used for the Tier 2 centre. This £1.5M project was
funded by SRIF and a contribution of ~£200K from Oxford Physics.
The room was ready in December 2007. Oxford Tier 2 Grid cluster was moved there during
spring 2008. All new Physics High Performance Clusters will be installed here.
17th October 2013
Graduate Lectures
19
Local Oxford DWB Physics
Infrastructure Computer Room
Completely separate from the Begbroke Science park a computer
room with 100KW cooling and >200KW power has been built. ~£150K
Oxford Physics money.
Local Physics department Infrastructure computer room.
Completed September 2007.
This allowed local computer rooms to be refurbished as offices
again and racks that were in unsuitable locations to be re housed.
17th October 2013
Graduate Lectures
20
Cold aisle containment
17th October 2013
Graduate Lectures
21
The end for now…
Sean will give more details of use of the
clusters next week
 Help Pages




http://www.physics.ox.ac.uk/it/unix/default.htm
http://www2.physics.ox.ac.uk/research/particlephysics/particle-physics-computer-support
Email
 [email protected]
Questions….
 Network Topology

17th October 2013
Graduate Lectures
22

similar documents