HP-UX-Swap-and-Dump-Unleashed-DusanBaljevic

Report
HP-UX Swap and Dump
Unleashed
By Unix/Linux Apprentice with 26 Years of
Experience
Dusan Baljevic
Sydney, Australia
Aug 2011
Why This Document? *
•
Frequent “abuse” of good design principles.
•
A “friend in need is a friend indeed” – why standard
swap/dump design fails in real scenarios.
•
Everyone has different opinion – why not help system
administrators and architects stop implementing bad
practices.
•
Especially important on large-RAM servers.
•
Based on 26-year practical experiences in Unix/Linux.
This Document is Not:
•
A replacement for HP’s official statements.
•
A written manual to learn HP-UX and its design
principles in detail.
•
Glorified personal experience to prove that I
“know best” (rather the opposite).
HP-UX Current Official Recommendations*
- Part 1
Use the following guidelines when configuring swap logical
volumes:
•
Interleave device swap areas for better performance.
•
Two swap areas on different disks perform better than one
swap area with the equivalent amount of space. This
configuration allows interleaved swapping, which means
the swap areas are written to concurrently, thus enhancing
performance.
•
When using LVM, set up secondary swap areas within
logical volumes that are on different disks using
lvextend.
•
If you have only one disk and must increase swap space,
try to move the primary swap area to a larger contiguous
region.
HP-UX Current Official Recommendations*
- Part 2
•
Similar-sized device swap areas work best. Device swap
areas must have similar sizes for best performance.
Otherwise, when all space in the smaller device swap area
is used, only the larger swap area is available, making
interleaving impossible.
•
By default, primary swap is located on the same disk as
the root file system. The kernel configuration file contains
the configuration information for primary swap.
•
If you are using logical volumes as secondary swap,
allocate the secondary swap to reside on a disk other than
the root disk for better performance.
•
Disable mirror consistency checking for swap mirrored
primary swap device (no need to recover after a failure).
•
Use Priority 0 device swap to bypass swap on root disk.
HP-UX How Much Swap is Enough?
•
Every admin and architect has a different opinion.
•
Traditional views typically use formula:
SWAP = 1 or 2 x RAM
•
Some old designs and applications required even 3 x
RAM (or more).
•
Old HP-UX releases had serious issue with (now
obsolete) kernel parameter swapmem_on (see next slide).
HP-UX How Much Dump is Enough?* Part 1
The vast majority of problems are found in the kernel area. Only
rarely do the program data areas need to be examined, even more
rarely, the shared memory areas, and virtually never the buffer/file
cache and shared libraries.
If a full crash dump is taken, the total space needed with be as
high as RAM (and a bit more).
By compressed dump overall time taken will be reduced by 1/3 as
well as the disk space required should also get reduced by at least
1/3 for default selection of page classes (usually the default page
class selection utilizes around 20% of the memory).
HP-UX How Much Dump is Enough? –
Part 2
# crashconf -v
Crash dump configuration has been changed since boot.
CLASS
--------
PAGES
INCLUDED IN DUMP
DESCRIPTION
----------
----------------
-------------------------------------
UNUSED
9572004
no,
by default
unused pages
USERPG
1341553
no,
by default
user process pages
BCACHE
1980
no,
by default
buffer cache pages
KCODE
9142
no,
by default
kernel code pages
USTACK
1567
yes, by default
user process stacks
FSDATA
12
yes, by default
file system metadata
KDDATA
1492949
yes, by default
kernel dynamic data
KSDATA
8816
yes, by default
kernel static data
no,
unused kernel super pages
SUPERPG
128677
by default
Total pages on system:
12556700
Total pages included in dump:
Dump compressed:
ON
Dump Parallel:
DEVICE
1503344
ON
OFFSET(kB)
SIZE (kB)
LOGICAL VOL. NAME
------------ ------------ ------------ ------------ ------------------------1:0x000005
2349920
4194304
-----------4194304
# getconf PAGESIZE
4096
64:0x000002 /dev/vg00/lvol2
HP-UX Pseudoswap
Pseudoswap allows the kernel to treat a portion of physical memory as if it is swap
space in order to satisfy the swap reservation policy. Pseudo-swap is enabled by
default in all current versions of HP-UX and is removed as kernel parameter in11i v3
(swapmem_on).
I have 2GB of swap
and 8GB of available
memory. Can I start a
4GB process on
an idle server?
With Pseudoswap (swapmem_on=1)
Yes!
2GB Device Swap
+ 6GB Pseudo Swap (75% 8GB)
8GB Reservable Swap
Without Pseudoswap (swapmem_on=0)
No!
2GB Device Swap
+ 0GB Pseudo Swap
2GB Reservable Swap
Example of an Application Swap
Requirements
•
Please see SAP note 1112627 for a detailed explanation of
swap sizing and pseudo-swap.
•
In general device swap configurations of 1.5 or 2 x RAM
have proven appropriate for the majority of SAP
installations. The recommendation is to set device swap to
2 x RAM (minimum 20 GB).
•
Please refer to SAP note 153641 for a detailed explanation
of swap requirements on a per SAP instance basis.
Basics of Crash Dumps
Bad Example of Swap Design
# /usr/sbin/swapinfo -tm
TYPE
dev
dev
dev
dev
reserve
memory
total
Mb
AVAIL
30464
30464
30464
30464
98292
220148
Mb
USED
0
0
0
0
46202
2278
48480
Mb
FREE
30464
30464
30464
30464
-46202
96014
171668
PCT
USED
0%
0%
0%
0%
2%
22%
START/
Mb
LIMIT RESERVE
0
0
0
0
-
-
0
PRI
1
1
1
1
-
NAME
/dev/vg00/lvol2
/dev/vg00/swap1
/dev/vg00/swap2
/dev/vg00/swap3
HP-UX Maximum Swap
•
Swap space in the kernel is managed using
'chunks' of physical device space. These chunks
contain one or more (usually more) pages of
memory, but provide another layer of indexing
(similar to inodes in file systems) to keep the
global
swap table relatively small, as opposed to a large
table indexed by swap page.
• swchunk
controls the size in physical disk blocks
(which are defined as 1 KB) for each chunk.
Maximum Swap on HP-UX Before 11i V3
•
The total bytes of swap space manageable by the
system on HP-UX 11i older releases is:
swchunk x 1KB x 16384
where16384 is the system maximum number of
swap chunks in the swap table, as defined by
kernel parameter maxswapchunks.
swchunk has allowed values between 2048 and
65536 blocks.
Maximum Swap on HP-UX 11i V3
•
The total bytes of swap space manageable by the
system on HP-UX 11i v3 is:
swchunk x 1KB x 2147483648
Dump Terms
•
Dump unit A thread of execution during dump. A dump unit requires its own set of
CPUs, dump devices, and other resources, which are non-overlapping with other
dump units.
•
Reentrancy Capability of a dump driver to issue multiple I/Os simultaneously, one
I/O per HBA port, during dump.
•
Concurrency Capability of a dump driver to issue multiple I/Os simultaneously per
HBA port, during dump. In HP-UX 11i v3 this capability means that the driver can
issue I/Os simultaneously to multiple devices under a given HBA port, one I/O per
device.
•
Parallel Dump HP-UX 11i v3 dump infrastructure which enables the parallelism
features.
•
Reentrant HBA port or device An HBA port or device controlled by a reentrant
driver.
•
Concurrent HBA port or device An HBA port or device controlled by a concurrent
Dump Unit - Part 1 *
•
A Dump Unit is an independent sequential unit of execution within
the dump process.
•
Each dump unit is assigned an exclusive subset of the system
resources needed to perform the dump, including CPUs, a portion
of the physical memory to be dumped, and a subset of the
configured dump devices. The dump infrastructure in HP-UX 11i
v3 automatically partitions system resources at dump time into
dump units.
•
Each dump unit operates sequentially.
•
Parallelism is achieved by multiple dump units executing in
parallel.
Dump Unit - Part 2 *
•
A dump device cannot be shared across multiple dump units.
•
Multiple “reentrant devices” can be accessed in parallel only if the
devices are configured through separate HBA ports. Thus all “reentrant
devices” on the same HBA port will be assigned to a single dump unit.
•
Each “concurrent device” can be accessed in parallel. Each can
therefore be assigned to a separate dump unit, even if configured
through a single HBA port.
•
Multiple dump volumes on a single physical volume will not allow for
parallelism. Parallelism at dump time can only be achieved across
multiple physical devices (LUNs).
•
Logical volumes configured as dump devices: all logical volumes which
reside on the same physical device (LUN) are assigned to the same
dump unit.
Dump Options Overview
− Selective
• Based on classes/uses of memory
− Compressed
• >=5 CPUs per dump unit
• Mixed compressed/non-compressed images
− Parallel (concurrent)
• Faster dump with multiple “monarchs”
• Influenced by memory availability and dump devices
• HP Integrity Servers only
− Live dump
•
•
•
•
Crashdump a live system without forced shutdown or panic
System stays up, running & stable
Offline analysis of system
Memory image -> file
− Extra load during this save
• HP Integrity Servers only
Dump Parallelism
I/O support during dump is provided via dump drivers, and each configured
dump driver reports its parallelism capabilities to the dump infrastructure:
Legacy: new parallelism feature is not supported
Reentrant: supports parallelism per HBA port
Concurrent: supports parallelism per dump device
These requirements can be distilled into the following formulas for calculating
the number of dump units that can be achieved:
•
CPU Parallelism = (number of CPUs available at dump time) / (1 or 5,
depending on whether or not compression is enabled)
•
Device Parallelism = (number of reentrant dump HBA ports) + (number of
concurrent dump devices) + (1 if there are any legacy dump devices)
•
Number of Dump Units = Minimum (CPU Parallelism, Device Parallelism)
Dump Driver Parallelism Capability
Examples of HP-provided dump drivers on HP-UX 11.31:
fcd
Concurrent
td, mpt, c8xx, ciss, sasd, fclp
Reentrant
# crashconf -l
DEVICE
LOGICAL VOL.
------------
---------------
1:0x000002
64:0x000002
NAME
------------------/dev/vg00/lvol2
LUNPATH HANDLE
*
----------------------40/0/2/0/0/0/0/4/0/0/0.0x247000c0ffdb3fb9.0x4001000000000000
# ioscan -fNk | grep "40/0/2/0/0/0/0/4/0/0/0 "
fc
0 40/0/2/0/0/0/0/4/0/0/0
fclp
CLAIMED INTERFACE HP AD22260001 PCIe Fibre Channel 2-port 4Gb FC/2-port 1000B-T Combo Adapter
Dump Driver Capability
# scsimgr get_attr -a capability -H 40/0/2/0/0/0/0/4/0/0/0
SCSI ATTRIBUTES FOR CONTROLLER : 40/0/2/0/0/0/0/4/0/0/0
name = capability
current = "Boot Dump"
default =
saved =
Uncompressed vs. Compressed Dump –
One Dump Device *
Uncompressed vs. Compressed Dump –
Three Dump Devices *
Uncompressed vs. Compressed Dump –
Legacy Devices *
Uncompressed Dump – Reentrant Devices
*
Uncompressed vs. Compressed Dump –
Complex Example *
Compressed Dump Configuration
# crashconf -v
Crash dump configuration has been changed since boot.
CLASS
-------UNUSED
USERPG
BCACHE
KCODE
USTACK
FSDATA
KDDATA
KSDATA
SUPERPG
PAGES
---------1514754
112614
26235
10389
1136
40
386358
6933
21546
INCLUDED IN DUMP
---------------no, by default
no, by default
no, by default
yes, forced
yes, by default
no, forced
yes, by default
yes, by default
no, by default
Total pages on system:
Total pages included in dump:
Dump compressed:
Dump Parallel:
DEVICE
-----------3:0x000000
3:0x000000
DESCRIPTION
------------------------------------unused pages
user process pages
buffer cache pages
kernel code pages
user process stacks
file system metadata
kernel dynamic data
kernel static data
unused kernel super pages
2080005
404816
ON
# crashconf –c off
# crashconf –c on
to turn compression off until reboot
to turn compression on until reboot
ON
OFFSET(kB)
---------2349920
30677856
SIZE (kB) LOGICAL VOL. NAME
---------- ------------ ------------------------8388608
64:0x000002 /dev/vg00/lvol2
114688
64:0x000009 /dev/vg00/v3Dump
---------8503296
Dump device configuration mode is config_deprecated_mode.
Use crashconf -s option to change the mode.
# kctune dump_compress_on
Tunable
Value Expression
dump_compress_on
1 Default
Changes
Immed
# crashconf –tc off to change tunable to 0
# kctune dump_compress_on=0
# crashconf –tc on
to set tunable to 1
# kctune dump_compress_on=1
Compressed Dump Algorithm
• HP-UX uses one processor to do all disk writes and four
processors for compression.
• The algorithm for compression is Lempel–Ziv–Welch
(LZW).
• LZW is a universal lossless data compression algorithm,
simple to implement, and has the potential for very high
throughput in hardware implementations.
• One of the reasons for selecting LZW:
HP has a license to use it, and
It achieves pretty good compression ratio.
Concurrent Dump Configuration
# crashconf -v
Crash dump configuration has been changed since boot.
CLASS
-------UNUSED
USERPG
BCACHE
KCODE
USTACK
FSDATA
KDDATA
KSDATA
SUPERPG
PAGES
---------1514754
112614
26235
10389
1136
40
386358
6933
21546
INCLUDED IN DUMP
---------------no, by default
no, by default
no, by default
yes, forced
yes, by default
no, forced
yes, by default
yes, by default
no, by default
Total pages on system:
Total pages included in dump:
Dump compressed:
2080005
404816
ON
Dump Parallel:
DEVICE
-----------3:0x000000
3:0x000000
DESCRIPTION
------------------------------------unused pages
user process pages
buffer cache pages
kernel code pages
user process stacks
file system metadata
kernel dynamic data
kernel static data
unused kernel super pages
ON
# crashconf –p off
# crashconf –p on
to turn concurrent dump off until reboot
to turn concurrent dump on until reboot
OFFSET(kB)
---------2349920
30677856
SIZE (kB) LOGICAL VOL. NAME
---------- ------------ ------------------------8388608
64:0x000002 /dev/vg00/lvol2
114688
64:0x000009 /dev/vg00/v3Dump
---------8503296
Dump device configuration mode is config_deprecated_mode.
Use crashconf -s option to change the mode.
# kctune dump_concurrent_on
Tunable
Value Expression Changes
dump_concurrent_on
1 1
Immed
# crashconf –tp off to change tunable to 0
# kctune dump_concurrent_on=0
# crashconf –tp on
to set tunable to 1
# kctune dump_concurrent_on=1
HP-UX Kernel Parameters alwaysdump
and dontdump
On rare occasions, the system may panic before crashconf(1M) is run
during the boot process. On those occasions, the configuration can be
set using the alwaysdump and dontdump tunables.
# kctune -v -q alwaysdump
Tunable
alwaysdump
Description
Bitmap of memory page classes to include in a crash dump
Module
dump
Current Value
0 [Default]
Value at Next Boot
1024
Value at Last Boot
0
Default Value
0
Can Change
Immediately or at Next Boot
HP-UX Typical Crash Dump Configuration
# crashconf -v
Crash dump configuration has been changed since boot.
CLASS
-------UNUSED
USERPG
BCACHE
KCODE
USTACK
FSDATA
KDDATA
KSDATA
SUPERPG
PAGES
---------9571877
1340875
2309
9142
1567
12
1493845
8816
128257
INCLUDED IN DUMP
---------------no, by default
no, by default
no, by default
no, by default
yes, by default
yes, by default
yes, by default
yes, by default
no, by default
Total pages on system:
Total pages included in dump:
Dump compressed:
ON
Dump Parallel:
ON
DESCRIPTION
------------------------------------unused pages
user process pages
buffer cache pages
kernel code pages
user process stacks
file system metadata
kernel dynamic data
kernel static data
unused kernel super pages
12556700
1504240
DEVICE
OFFSET(kB)
SIZE (kB)
LOGICAL VOL. NAME
------------ ------------ ------------ ------------ ------------------------1:0x000005
2349920
4194304 64:0x000002 /dev/vg00/lvol2
-----------4194304
Dump device configuration mode is config_deprecated_mode. Use crashconf -s option to
change the mode.
HP-UX Savecrash Locking
Dump devices are often used as paging devices (primary swap is one
such example). If savecrash determines that a dump device is already
enabled for paging, and that paging activity has already taken place on
that device, a warning message will indicate that the dump may be
invalid.
If a dump device has not already been enabled for paging, savecrash
prevents paging from being enabled to the device by creating the file
/var/adm/crash/.savecrash.LCK.
swapon does not enable the device for paging if the device is locked in
/var/adm/crash/.savecrash.LCK. As savecrash finishes saving
the image from each dump device, it updates the
/var/adm/crash/.savecrash.LCK file and optionally executes
swapon to enable paging on the device.
HP-UX Dump Device in Non-Root VGs
•
As of HP-UX 11.00 we have the possibility to
configure
additional dump devices online (without the need of a
reboot. These dump LVs must not be configured using
lvlnboot –d but with crashconf(1M).
•
We are no longer restricted to choose a dump LV from
the root VG only. The configuration of such dump
devices is similar to the configuration of secondary
swap devices.
Example of Classical Swap/Dump Design
on HP-UX
Potential Issues
Primary PV
/stand
/stand
Primary swap/
dump
Other LVs
Alternate PV
•
If shortage of RAM, boot disks
experience severe I/O
performance problems due to
swap usage.
•
If more RAM is added, not
easy to resize primary swap
(contiguous blocks).
•
Long reboot due to
savecrash(1M) export to
/var/adm/crash.
•
More swap added in other
VGs, often different in size than
primary.
•
Waste of large amount of disk
space for swap.
/stand
/stand
Primary swap/
Dump mirror
Other LVs
RAID-1 for Boot disk
32 GB RAM
Swap = 1 or 2 x RAM
Swap/dump shared
Example of Different Swap/Dump Design on
HP-UX with Internal Boot Disks *
Primary PV
Alternate PV
SAN-based LUNs or LVs
/stand
/stand
Primary
swap
Primary swap
mirror
Other LVs
Other LVs
Secondary swap
Secondary swap
Dump area
Dump area
32 GB RAM
Dump areas set up on
different
Primary Swap = 4-8 GB
LUNs or PVs in non-root VGs
RAID-1 for Boot disks
Total Swap = 1 x RAM **
Swap NOT shared with dump
(dump PVs are NEVER
RAID-1
Example of Different Swap/Dump Design on
HP-UX with SAN Boot Disk *
Boot PV
SAN-based LUNs or LVs
/stand
Primary
swap
Secondary swap
Secondary swap
Dump area
Dump area
Other LVs
32 GB RAM
Primary Swap = 4-8 GB
Total Swap = 1 x RAM **
Swap NOT shared with dump
Dump areas set up on
different
LUNs or PVs in non-root VGs
(dump PVs are NEVER
RAID-1
HP-UX Persistent Dump Devices – Part 1
•
Persistent Dump Devices are those that are configured automatically after a
reboot. Persistent dump devices information is maintained in the kernel registry
services, (KRS, see krs(5)).
•
To mark the dump devices as persistent, there are two configuration modes
available.
config_crashconf_mode
In this mode crashconf(1M) and crashconf(2) are the only mechanisms
available to mark dump devices as persistent. Logical volumes marked for dump
using lvlnboot(1M) or vxvmboot(1M) and devices marked in
/stand/system for dump will be ignored during boot-up. This is the preferred
method for dump device configuration and will be used from this HP-UX release
onwards. This mode can be enabled using the crashconf -s option. VxVM
stores extent information of persistent dump logical volumes in lif(4). Up to
ten VxVM logical volumes can be marked persistent. The logical volumes
which are not part of the root volume group cannot be configured as
persistent dump devices.
HP-UX Persistent Dump Devices – Part 2
config_deprecated_mode
The logical volumes marked for dump using lvlnboot(1M) or
vxvmboot(1M) and devices marked in /stand/system for dump
will be configured as dump devices during boot-up. Devices marked as
persistent, using crashconf -s, will be ignored during boot-up.
Marking devices using lvlnboot(1M), vxvmboot(1M), and
/stand/system will be obsolete in the next HP-UX release. This
mode is deprecated on HP-UX 11.31 and will be obsolete in the next
HP-UX release. This is the default mode for dump and can be enabled
using the crashconf -o option.
HP-UX Dump Devices and Bad Block
Relocation
•
From HP-UX 11.23 release onwards, the LVM bad block relocation
feature is obsolete. However, for compatibility reasons the value is
maintained as a logical volume attribute.
•
If BBRA is not disabled when dump device is created, HP-UX
complains about “unsupported disk layout”.
•
Hence, the correct procedure to create a dump device in LVM is:
# lvcreate -C y -r n -L 16000 -n dump2 /dev/vgdump
HP-UX Crashconf Fails with Unsupported
Disk Layout Error - VxVM
The volume dumpvol was added to the /etc/fstab file and crashconf was issued to increase the total
dump
area but crashconf failed with the message below:
/dev/vx/dsk/rootdg/dumpvol: error: unsupported disk layout
The crashconf error is due to the dump area not being contiguous:
# vxprint -g rootdg -ht
v
dumpvol
pl dumpvol-01
-
ENABLED
ACTIVE
204800
SELECT
-
swap
dumpvol
ENABLED
ACTIVE
204800
CONCAT
-
RW
sd rootdisk01-07 dumpvol-01
rootdisk01 1081344 102400
0
c1t4d0
ENA
sd rootdisk01-17 dumpvol-01
rootdisk01 5702418 102400
102400
c1t4d0
ENA
The dumpvol volume has two areas on c1t4d0. The first is rootdisk01-07 which starts at 1081344 and is
102400 kb in size and the second is rootdisk01-17 which starts at 5702418 and is also 102400 kb in
size. The volume dumpvol needs to be contiguous so the last 102400 kb should be reduced from
dumpvol. To reduce dumpvol:
# vxassist shrinkby dumpvol 102400
HP-UX Crashconf Fails with Unsupported
Disk Layout Error - LVM
/dev/vg01/lvswap: error: unsupported disk layout
# lvdisplay /dev/vg01/lvswap
....
Bad block
on
Allocation
strict
Dump is required to be contiguous and have bad block
reallocation
turned off:
# lvchange -C y -r n /dev/vg01/lvswap
HP-UX VxVM Dump Device Creation* – Part
1
With Volume Manager 5.0 on HP-UX 11.31, to initialize the disk, must
use
vxdisksetup -ifB <disk> command, vxdiskadm is unable to
initialize the disk correctly for use with crashconf. Please note that
CDS
diskgroups are not affected. Those can still be initialized via vxdiskadm.
# vxdisk list
DEVICE
TYPE
DISK
GROUP
STATUS
c2t0d0s2
auto:none
-
-
online invalid
c2t1d0s2
auto:hpdisk
rootdisk01
rootdg
online
# vxdisk -f init c2t0d0s2 format=hpdisk
# vxdg init dumpdg c2t0d0s2 cds=off
# vxassist -g dumpdg -U swap make dumpvol 3g
HP-UX VxVM Dump Device Creation – Part
2
# crashconf -s /dev/vx/dsk/dumpdg/dumpvol
# crashconf -v
Crash dump configuration has been changed since boot.
CLASS
--------
PAGES
INCLUDED IN DUMP
DESCRIPTION
----------
----------------
-------------------------------------
UNUSED
10197
no,
by default
unused pages
USERPG
115131
no,
by default
user process pages
BCACHE
14359
no,
by default
buffer cache pages
KCODE
10819
no,
by default
kernel code pages
USTACK
890
yes, by default
user process stacks
FSDATA
26
yes, by default
file system metadata
KDDATA
100591
yes, by default
kernel dynamic data
KSDATA
7238
yes, by default
kernel static data
SUPERPG
1100
no,
unused kernel super pages
by default
Total pages on system:
260351
Total pages included in dump:
108745
Dump compressed:
Dump Parallel:
DEVICE
ON
ON
OFFSET(kB)
SIZE (kB)
LOGICAL VOL. NAME
------------ ------------ ------------ ------------ ------------------------3:0x000001
2350176
3:0x000000
544896
2097152
4:0x000001 /dev/vx/dsk/rootdg/swapvol
3145728
4:0x414ad8 /dev/vx/dsk/dumpdg/dumpvol
-----------5242880
HP-UX Better Swap and Dump Design – Part
1
•
Set up primary swap between 4 and 8 GB ONLY, no matter how large the RAM
is!
•
Primary swap device should not be NOT SHARED with dump.
•
Initially, set up primary swap only. In the pre-production testing, verify if that is
enough and avoid creating other swap areas unless absolutely necessary.
•
Secondary swaps (if you need to have them!) are created as 4-8 GB LUNs
(could be LVs in LVM or Plexes in VxVM) on SAN (if practicable). Ensure that
secondary swaps match the size of primary swap. That way, if server ever
needs to use swap, the performance of swap devices will be excellent and boot
disk I/O will never “suffer”.
•
If primary swap is left at 4-8 GB, then allocate separate dump areas in other
volume groups to match the size of physical memory if compression is disabled
or not possible (due to lack of available CPUs), or less if compression is
enabled and possible.
HP-UX Better Swap and Dump Design – Part
2
•
Disable savecrash(1M) at boot (/etc/rc.config.d/savecrash):
SAVECRASH=0
If you do it, make sure not to forget to run savecrash(1M) after the reboot.
•
Dedicated dump device will not shorten the time required to write from memory to
the dump volume during the crash, but will shorten the reboot time. This is
because the crash image are not at risk being overwritten by page or swap activity
and savecrash(1M) can run in background to save the crash files into the crash
dump directory.
•
If the dump device is also configured as one of the swap devices, the device
cannot be enabled for paging until savecrash(1M) has finished saving the
image from the device to the crash dump directory. Therefore, the boot time will
be longer if savecrash is run in foreground. This extra time will be even greater if
vPars are configured because multiple dump images may have to be saved.
HP-UX Better Swap and Dump Design – Part
3
•
When dump and swap areas are separated, there is no need to
save the crash images at boot time. Therefore,
savecrash(1M) at (re)boot can be disabled!
•
The reduction in reboot time achieved by configuring a separate
dump device (close to 50% over classical design with savecrash
running in foreground) is likely to provide a worthwhile return on
investment when system availability is a priority.
•
Using identical sizes and types of dump devices and HBAs in
the dump configuration is one way to avoid inequalities in dump
speeds or times across the dump units. This tends to produce
more predictable results and better overall parallelism.
HP-UX Better Swap and Dump Design – Part
4
•
It is recommended that shared swap and dump devices or
volumes not be used with parallel dump. Using a shared
swap/dump device can significantly increase the subsequent
reboot time because such devices result in swap being
disabled while saving the corresponding dump data (for
example, in /var/adm/crash).
•
Avoid file system swap altogether if possible.
•
Set priorities of SAN-based secondary swaps to lower value
than the primary swap (and let it be identical value across all
secondary swaps). That way, if there is a serious shortage of
RAM, swap will perform as “perfectly striped” volume.
HP-UX Better Swap and Dump Design – Part
5
•
If compressed dumps are required, ensure that there are
five CPUs per each dump unit.
•
Set up multiple dump units on SAN (non-root volume
groups), and enable parallel dumps. Note that, currently,
the logical volumes which are not part of the root volume
group in LVM cannot be configured as persistent dump
devices. * However, non-root data group with VxVM can be
used for persistent dump devices. **
HP-UX Better Swap and Dump Design – Part
6
•
For a kernel dump, the usual requirement:
Kernel text/static data
Kernel dynamic data in use
User-space kernel thread stacks (UAREA)
Kernel dynamic memory, which is free-and-cached (Super Page Pool),
is only needed when there is a problem in the SPP itself (pretty rare).
User data is very rarely needed (in addition, most users do not want HP
support reading their application private data for security reasons
(classified data, customer sensitive, and so on). The default
configuration for crashconf is good enough for most situations.
•
If enough disk space available or no other constraints imposed, you
might enable all crash classes in dumps (check crashdump(1M)).
Guidelines for Selecting Device Swap
•
Two swap areas on different disks are better than one single swap area
•
Only configure one swap area per disk
•
Device swap areas should be of similar size
•
Consider the speed of the disks
Swap LV
Swap LV
Swap LV
Swap LV
Swap LV
No!
Yes!
HP-UX Post-crash Manual Dump Export
•
If the dump was not saved completely due to lack
of space in the crash directory you have the
possibility to save the dump again. The -r
option (resave) need to be included when this is
not the first time that savecrash runs.
# savecrash -v [-r] <crash directory>
•
There is also the possibility to save the dump
directly to a tape:
# savecrash -v [-r] -t <tapedevice>
HP-UX Manual Dump Export from a
Specific Dump Device
To manually extract the dump, type either the persistent DSF or the legacy
DSF of the whole disk along with the offset:
DEVICE
OFFSET(kB)
SIZE (kB)
LOGICAL VOL. NAME
------------ ------------ ------------ ------------ ----------------3:0x000000
2612064
8364032
64:0x000002 /dev/vg00/lvol2
3:0x000001
18168692
40956
64:0x02000b /dev/vg01/dump_3
3:0x000001
18127732
40956
64:0x02000a /dev/vg01/dump_2
3:0x000001
18086772
40956
64:0x020009 /dev/vg01/dump_1
# savecrash -D /dev/rdisk/disk4 -O 18086772
-r -v .
or
# savecrash -D /dev/rdsk/c2t1d0 -O 18086772
-r -v .
Swapoff
•
Available with HP-UX 11.31.
•
The swapoff(1M) command disables swapping on the specified swap
device(s) for the current boot. The term swap refers to an obsolete
implementation of virtual memory; HP-UX actually implements virtual memory
by way of paging rather than swapping. This command and others retain names
derived from swap for historical reasons.
•
Does not remove swap device from /etc/fstab.
•
Will not be successful if amount of swap is needed, for example, reserve space
as reported by swapinfo(1M).
•
Example:
# /usr/sbin/swapoff /dev/vg00/lvol2
Swapoff – Real Life Example – Part 1
•
Remove primary swap and move it into another volume group. To
remove the primary swap, we need to ensure that the new swap device
has at least enough space that “reserve” requires. Otherwise,
swapoff(1M) command will fail!
# lvcreate -C y –r n -L 8192 -n lvswap2 /dev/vgswap
# swapon -f
/dev/vgswap/lvswap2
# swapinfo -tm
Mb
Mb
Mb
AVAIL
USED
FREE
USED
dev
8192
0
8192
0%
0
-
1
/dev/vg00/lvol2
dev
8192
0
8192
0%
0
-
1
/dev/vgswap/lvswap2
-
1301
-1301
3876
963
2913
25%
20260
2264
17996
11%
-
0
-
TYPE
reserve
memory
total
PCT
START/
Mb
LIMIT RESERVE
PRI
NAME
Swapoff – Real Life Example – Part 2
•
Remove primary swap on-line:
# swapoff /dev/vg00/lvol2
# lvrmboot -s vg00
# swapinfo -tm
TYPE
dev
reserve
memory
total
Mb
Mb
Mb
PCT
AVAIL
USED
FREE
USED
8192
0
8192
0%
-
1291
-1291
3876
963
2913
25%
12068
2254
9814
19%
START/
Mb
LIMIT RESERVE
PRI
0
-
1
-
0
-
NAME
/dev/vgswap/lvswap2
Swapoff – Real Life Example – Part 3
•
Add line into /etc/fstab for the new primary swap and reboot the
server:
/dev/vg00/lvol3 / vxfs delaylog 0 1
/dev/vg00/lvol1 /stand vxfs tranflush 0 1
/dev/vg00/lvol4 /home vxfs delaylog 0 2
/dev/vg00/lvol5 /tmp vxfs delaylog 0 2
/dev/vg00/lvol6 /usr vxfs delaylog 0 2
/dev/vg00/lvol7 /var vxfs delaylog 0 2
/dev/vg00/lvol8 /var/tmp vxfs delaylog 0 2
#/dev/vg00/lvdump3 / dump defaults 0 0
/dev/vgswap/lvswap2 / swap defaults 0 0
Swapoff – Real Life Example – Part 4
•
After the reboot, check swap status and confirm that non-root volume is
now the primary swap:
# swapinfo -tm
TYPE
dev
reserve
memory
total
Mb
Mb
Mb
PCT
AVAIL
USED
FREE
USED
8192
0
8192
0%
-
1283
-1283
3876
950
2926
25%
12068
2233
9835
19%
START/
Mb
LIMIT RESERVE
PRI
0
-
1
-
0
-
NAME
/dev/vgswap/lvswap2
Swapoff – Real Life Example – Part 5
•
However, because we did not initialize the disk in vgswap with “-B”
option, it does not contain the Boot Area, and cannot be added with
“lvlnboot -s /dev/vgswap/lvswap2”. As a result, this is reported:
# lvlnboot -v
Boot Definitions for Volume Group /dev/vg00:
Physical Volumes belonging in Root Volume Group:
/dev/disk/disk6_p2 -- Boot Disk
Boot: lvol1
on:
/dev/disk/disk6_p2
Root: lvol3
on:
/dev/disk/disk6_p2
No Swap Logical Volume configured
No Dump Logical Volume configured
Swapoff – Real Life Example – Part 6
•
We still have one persistent dump device, which is NOT listed in /etc/fstab*:
# crashconf -v
Crash dump configuration has been changed since boot.
CLASS
--------
PAGES
INCLUDED IN DUMP
DESCRIPTION
----------
----------------
-------------------------------------
UNUSED
584997
no,
by default
unused pages
USERPG
171077
no,
by default
user process pages
BCACHE
7529
no,
by default
buffer cache pages
KCODE
11892
no,
by default
kernel code pages
USTACK
1128
yes, by default
user process stacks
FSDATA
16
yes, by default
file system metadata
KDDATA
238003
yes, by default
kernel dynamic data
KSDATA
10563
yes, by default
kernel static data
SUPERPG
18286
no,
unused kernel super pages
by default
Total pages on system:
1043491
Total pages included in dump:
Dump compressed:
ON
Dump Parallel:
ON
DEVICE
OFFSET(kB)
249710
SIZE (kB)
LOGICAL VOL. NAME
------------ ------------ ------------ ------------ ------------------------1:0x000002
57023328
4096000
-----------4096000
Persistent dump device list:
/dev/vg00/lvdump3
64:0x000009 /dev/vg00/lvdump3
Crash Dump – Two Dump Unit Example –
Part 1
# crashconf -v
Crash dump configuration has been changed since boot.
CLASS
--------
PAGES
INCLUDED IN DUMP
DESCRIPTION
----------
----------------
-------------------------------------
UNUSED
4264207
no,
by default
unused pages
USERPG
185052
no,
by default
user process pages
BCACHE
45250
no,
by default
buffer cache pages
KCODE
11859
no,
by default
kernel code pages
USTACK
1271
yes, by default
user process stacks
FSDATA
16
yes, by default
file system metadata
KDDATA
581797
yes, by default
kernel dynamic data
KSDATA
10569
yes, by default
kernel static data
no,
unused kernel super pages
SUPERPG
107834
by default
Total pages on system:
5207855
Total pages included in dump:
Dump compressed:
Dump Parallel:
DEVICE
593653
OFF
ON
OFFSET(kB)
SIZE (kB)
LOGICAL VOL. NAME
------------ ------------ ------------ ------------ ------------------------1:0x000004
2612064
8388608
64:0x000002 /dev/vgroot/lvol2
1:0x000003
2496
1048576
64:0x010001 /dev/vgdump/dump2
1:0x000003
16386496
1048576
64:0x010002 /dev/vgdump/dump3
-----------10485760
Persistent dump device list:
/dev/vgroot/lvol2
Crash Dump – Two Dump Unit Example –
Part 2
*** A system crash has occurred. (See the above messages for details.)
*** The system is now preparing to dump physical memory to disk, for use
*** in debugging the crash.
*** The dump will be a SELECTIVE dump with
compression OFF and concurrency ON: 2320 of 20344 megabytes.
Primary Dump Header Location :
Device details:
Major number: 0x1f Minor number: 0xb0000
Offset: 16386496.
*** Dumping: 100% complete (2320 of 2320 MB)
time: 84 seconds, Number of Dump units: 2
Crash Dump Without Primary Swap, No
Persistent Devices, and No Dump Devices
in /etc/fstab
Console logs at boot time after a crash:
No crash dump devices defined.
Persistent dump device list is empty.
All subsequent crashes will fail to collect data into dump volumes:
Swap device table: (start & size given in 512-byte blocks)
entry 0 - auto-configured on root device; ignored - no room
WARNING: No swap device configured, so dump cannot be defaulted to
primary swap.
WARNING: No dump devices are configured. Dump is disabled.
Message buffer contents after system crash:
These messages are the contents of msgbuf, which should have been saved
In the dump. They are output to the console, as the dump was not taken.
How to Set the Dump Order for Saving
System Crash – Part 1
•
The current dump configuration first saves the crash to dump2 , dump1 , then to lvol2:
# crashconf
Crash dump configuration is changed after boot:
CLASS
-------UNUSED
USERPG
BCACHE
KCODE
USTACK
FSDATA
KDDATA
KSDATA
SUPERPG
PAGES
---------570458
136677
10426
7764
1172
8
192353
3641
120995
INCLUDED IN DUMP
---------------no, by default
no, by default
no, by default
yes, forced
yes, by default
yes, by default
yes, by default
yes, by default
no, by default
Total pages on system:
Total pages included in dump:
Dump compressed:
DEVICE
-----------31:0x021000
31:0x021000
31:0x021000
DESCRIPTION
------------------------------------unused pages
user process pages
buffer cache pages
kernel code pages
user process stacks
file system metadata
kernel dynamic data
kernel static data
unused kernel super pages
4173976
1253843
ON
OFFSET(kB)
---------924532
27843444
27859828
SIZE (kB)
---------4194300
2097150
1048575
---------7340025
LOGICAL VOL. NAME
------------ ------------------------64:0x000002 /dev/vg00/lvol2
64:0x00000a /dev/vg00/dump1
64:0x00000b /dev/vg00/dump2
How to Set the Dump Order for Saving
System Crash – Part 2
SOLUTION:
•
/etc/fstab does not list vg00/lvol2 , because it is the default dump volume.
/dev/vg00/dump1 ... dump defaults 0 0
/dev/vg00/dump2 ... dump defaults 0 0
•
Edit /etc/fstab file for the new order of the dump LVs. The order of the dump LVs is opposite of the placement in the
file, and vg00/lvol2 needs to be listed last to be used as the first dump lvol.
New listing of dump area's in /etc/fstab
------------------------------------------------------------/dev/vg00/dump2 ... dump defaults 0 0
/dev/vg00/dump1 ... dump defaults 0 0
/dev/vg00/lvol2 ... dump defaults 0 0
•
Edit /etc/rc.config.d/crashconf :
CRASHCONF_ENABLED=1
CRASHCONF_READ_FSTAB=1
CRASHCONF_REPLACE=1
# last dump area used
# second dump area used
# first dump area used
How to Set the Dump Order for Saving
System Crash – Part 3
•
Put the new dump configuration in place (when a crash is saved, the first dump area is lvol2 followed by dump1 , then by
dump2 ):
# /sbin/rc1.d/S080crashconf start
•
Check the new configuration:
# crashconf
CLASS
-------UNUSED
USERPG
BCACHE
KCODE
USTACK
FSDATA
KDDATA
KSDATA
SUPERPG
PAGES
---------169224
500811
10412
7764
1218
20
241200
3641
109204
INCLUDED IN DUMP
---------------no, by default
no, by default
no, by default
yes, forced
yes, by default
yes, by default
yes, by default
yes, by default
no, by default
Total pages on system:
Total pages included in dump:
Dump compressed:
DEVICE
-----------31:0x021000
31:0x021000
31:0x021000
DESCRIPTION
------------------------------------unused pages
user process pages
buffer cache pages
kernel code pages
user process stacks
file system metadata
kernel dynamic data
kernel static data
unused kernel super pages
4173976
1253843
ON
OFFSET(kB)
---------27859828
27843444
924532
SIZE (kB)
---------1048575
2097150
4194300
----------
LOGICAL VOL.
-----------64:0x00000b
64:0x00000a
64:0x000002
NAME
------------------------/dev/vg00/dump2
/dev/vg00/dump1
/dev/vg00/lvol2
Example of Distributed Swap Design
# /usr/sbin/swapinfo –t
TYPE
Kb
AVAIL
Kb
USED
Kb
FREE
PCT
USED
START/
Kb
LIMIT RESERVE
PRI
NAME
dev
4194304
0 4194304
0%
0
-
1
/dev/vgroot/lvol2 (4096MB)
dev
4194304
0 4194304
0%
0
-
0
/dev/vgswap1/swap1 (4096MB)
dev
4194304
0 4194304
0%
0
-
0
/dev/vgswap2/swap2 (4096MB)
reserve
memory
- 13417244 -13417244
25135192 5363876 19771316
21%
I am also very passionate about naming volume groups and
logical volumes in a meaningful manner. *
Paginglist Command
# /usr/sam/lbin/paginglist
/dev/vg00/lvol2|dev|4194304|4.0 GB|0|0.0 KB|4194304|4.0 GB|0%|0|-|1|no|now|
reserve|reserve|0|0.0 KB|2019848|1.9 GB|-2019848|-2019848.`KB||0||0|no|now|
total|total|4194304|4.0 GB|2019848|1.9 GB|2174456|2.1 GB|48%|0|0|0|no|now|
Patch Servers Regularly
Some of HP-UX 11.31 dump patches:
PHKL_41977: HANG OTHER crashconf(1M) hangs when trying to configure
more than 32 dump devices. This patch fix allows to configure a logical volume
as primary swap and, provide support to FCD and FCLP NPIV (N_Port ID
Virtualization) enablement.
PHKL_41257: HANG During MCA handling, the system hangs in the process
of generating a crashdump.
PHKL_39740: OTHER System fails to dump memory into dump devices.
PHKL_38628: PANIC P
HKL_38414: ABORT If in the kernel, base page size is configured greater than
4k, dump may get aborted prematurely and affect debugging of crash.
Add Timestamps to RC scripts – Part 1
•
If there are RC startup problems, /etc/rc.log is usually the first place we need to
check. The output from RC scripts can be found there, but rc.log has no
timestamp for each RC script.
•
In order to let rc.log has timestamp for each RC script, put date command into
each RC script, but this is not a good choice because there are so many files to
updates. A better option is to set /sbin/rc.utils. The rc.utils script intercepts the
output of RC scripts and logs it to /etc/rc.log , we can make it log timestamps as
well.
•
Backup /sbin/rc.utils before you make changes, ensure permissions unchanged:
# cp -p /sbin/rc.utils /sbin/rc.utils.bak
•
Edit /sbin/rc.utils , find the two lines echo >> $LOGFILE , (one is under routine
do_screen_mode , the other is under do_line_mode ), insert a new line:
date >> $LOGFILE
Add Timestamps to RC scripts – Part 2
/etc/rc.log reports:
Thu Aug 18 12:22:28 EST 2011
Configure system crash dumps
Output from "/sbin/rc1.d/S080crashconf start":
---------------------------EXIT CODE: 0
...
Thu Aug 18 12:22:33 EST 2011
Save system crash dump if needed
Output from "/sbin/rc1.d/S440savecrash start":
---------------------------savecrash directory not set;
defaulting to: /var/adm/crash
*EXIT: parse_args
ENTER: open_source
ENTER: read_header
ENTER: get_hdr_loc
*EXIT: get_hdr_loc
savecrash: Finished Reading Header From: device : /dev/rdsk/c11t0d0
offset:16386496
Crash Dump Scenarios – Part 1
•
If there are no crash dump devices on HP-UX, by design,
server will default to primary swap for saving crash dumps!
•
Persistent crash dump devices must be in root volume
group in LVM, but can be in any data group in VxVM
(Symantec documentation confirms it too).
•
If the crash dump devices are not persistent, and they are
not listed in /etc/fstab, and swap is not in root volume
group, HP-UX will happily use non-persistent dump devices
from other volume groups AS LONG as they are defined in
the currently running kernel configuration (check with
crashconf(1M) command).
Crash Dump Scenarios – Part 2
•
To make non-persistent dump devices enabled
permanently, they need to be added into /etc/fstab
and “switched on” via crashconf(1M) command and/or
/etc/rc.config.d/crashconf BEFORE crash
happens. Otherwise, if there is no fall-back to primary
swap, crash dump will FAIL.
•
A dump can be saved to both non-persistent and persistent
dump devices*.
•
If there are persistent crash dump devices (they must be in
root volume group in LVM, but can be in any data group in
VxVM), they will be used for saving crash dumps even if
they are not listed in /etc/fstab.
Thank You!
Dusan Baljevic
Sydney, Australia
Aug 2011

similar documents