If You Are Dealing With … (cont.)

Report
Best Practices for
Domino Server and
Application Tuning
Andy Pedisich
Technotics
© 2012 Wellesley Information Services. All rights reserved.
What We’ll Cover …
•
•
•
•
•
•
•
•
Tuning hardware and OS
Optimizing Domino server performance
Examining opportunities in on-disk structure (ODS)
Keeping applications under control
Mastering cluster replication
Dealing with database corruption
Resolving specific problems with databases
Wrap-up
1
Keep Up with Domino Fixpacks and Releases
•
•
•
Use this link to find out what’s new
 www-10.lotus.com/ldd/r5fixlist.nsf/WhatsNew
 In some cases, this will take you to a “Top 20 Fixes” for a
new release
 Granted, reading all this material can cure anyone’s
insomnia, but someone has to do it and it might as well
be you
Lots of Domino shops like to lag a bit when it comes to fixpacks
 Why do I need to keep up?
 “I didn’t see anything that might affect us”
Here’s a good example of why you might want to keep up with the
fixpacks, even if you didn’t see a
problem in your environment
2
Running Domino on Windows 2008 64-Bit
•
•
Windows 64-bit introduced a new problem with Domino
 Microsoft Windows 2008 64-bit servers sometimes have
significantly increased CPU usage and I/O degradation when
Lotus Domino opens or backs up large numbers of databases
 www-01.ibm.com/support/docview.wss?uid=swg21449825
I personally saw one case where we couldn’t seem to put enough
RAM into the system
 Server started running at 100% RAM util, and stayed that way
 It wasn’t until we were in the 16GB range that the utilization
dropped down to 85%
 No users were on the Domino server at the time
 Not everyone would even see this problem
3
Virtual Address Space Becomes Exhausted
•
The Virtual Address Space cache may be completely used up
 Successive calls to OS cache manager to get memory from the
OS system cache results in mapping/un-mapping of views from
the system cache
 These operations take a lot of CPU time and, as a result,
show as high OS CPU usage
 In addition, the large OS system cache may now reside on
the disk
 RAM is not large enough to hold the OS system cache

•
The result is significant I/O on the system
This occurs with Domino 8.5.2
4
You Might Need a Hotfix and a Domino Parameter
•
Domino opens databases with a RANDOM flag
FILE_FLAG_RANDOM_ACCESS
 In Windows 2008 64-bit, this flag causes file blocks that are
read to stay in the cache until the file is closed
 Domino keeps files open in the Database Cache (dbcache)
for performance reasons
 It takes quite a long time until the cache is released
5
Parameter Needed for Release 8.5.2 FP2 and a Hotfix
•
•
•
SPR #KBRN899NF6 and a hotfix provides a notes.ini variable to
disable the FILE_FLAG_RANDOM_ACCESS
 Once you have installed the hotfix, use this parameter
 Disable_Random_RW_File_ATTR=1
It is fixed in Domino 8.5.2 FP3 and 8.5.3
 It’s another great reason to keep up with fixpacks and new
releases
 But you’re still going to need a lot more memory running on
Windows 2008 (R2 also)
SPR# KBRN8AKKA9 – Fix to better improve performance when
opening files on Windows 64-bit platform
6
Keep Disks Unfragmented
•
•
•
Many administrators falsely believe that Domino does not suffer
from fragmented files on disk
 Fun fact: Domino uses smaller allocations for new documents
 This can cause files to be spread out across the disk, which
can cause performance issues, especially during backups
 The system has to hunt for all the sectors spread
everywhere on the disk
Defragment once per week when the server is not busy
There are several Windows tools, such as:
 Contig V 1.6
 It’s a free tool from Microsoft
 http://technet.microsoft.com/en-us/sysinternals/bb897428
7
A Free Defrag Tool for Domino that Uses Contig 1.6
•
•
Domino Defrag 3.2 OpenNTF Project
 www.openntf.org/internal/home.nsf/project.xsp?action=openDo
cument&name=DominoDefrag
 An open source solution of R853+ C API Lotus Domino server
task (DominoDefrag.exe) and a R853+ Lotus Domino server
XPages database called the DominoDefrag Administrator
 DominoDefragAdmin.nsf – relies on http://extlib.openntf.org/
Server task uses “contig.exe” (v1.6) to defrag Domino databases
on all Windows server 2003-2008 versions (32-bit and 64-bit)
 And will also defrag a full-text index associated with a Notes
database and the Domino server’s transaction log and DAOS
files
8
A NOTES.INI Parameter Improves the Product
•
•
DominoDefrag_EnterpriseSupport=1 (on)
 Output is recorded to CSV files, and sent to the DominoDefrag
Administrator for processing attached to a summary email
 Has the added functionalities:
 Being able to compact a database prior to defragging
 Supports multi-processing (can load multiple times to run
concurrently) and use of an indirect file (.ind) for compact
batch functionality
Performance checks can also be tested using generated
document collections
 This will help to determine the “before and after” defrag
millisecond read performance of databases and their
associated full-text indexes
9
A General File System Recommendation for All OS
•
•
•
Keep at least 30% free space available on all drives
 This allows the file system to optimize where to write data
 Helps to reduce file fragmentation
Keep file systems below 1GB on all platforms
 This helps performance, and makes disaster recovery faster
and simpler
You might have to split your data up to fit the smaller volumes
 The payback will come from better performance for mail and
applications
 Admittedly, it is harder to have smaller volumes with mail
files than with applications
 We like keeping all mail in one folder, don’t we?
10
Working With the Server Availability Index (SAI)
•
Did you ever track an SAI and noticed that a server never really
seemed to be available?
 Or maybe you never tracked an SAI before
 You can, with our special Statrep database
 TechnoticsR85Statrep.ntf
 Download free from www.andypedisich.com
11
The Stats Are There, Now You Can See Them
•
It has all the views that are on the original Statrep
 Plus over a dozen additional views to help you analyze the stats
your servers generate
12
The SAI Is Fixed in R8.5
•
•
It was broken for many years
 SAI calculation on fast servers still might not work for you
There is a routine called LOADMON that runs on Domino that
stores values in a LOADMON.NCF file on the server
 It compares access times using micro-seconds
 On a fast server, at off-peak times, transactions can take just
a few micro-seconds
 For normal servers, the SAI can sometimes look low
13
The Expansion Factor
•
Servers determine their workload based on the expansion factor
 This is calculated based on response times for recent requests
 Server compares recent response time to minimum response
time that the server has completed
 Example: Server currently averages 12ms for DBOpen
requests; minimum time was 4ms
 Expansion factor = 3 (current time/fastest time)
 This is averaged over different types of transactions
 Fastest time is stored in memory and in LOADMON.NCF
 LOADMON.NCF is read each time server starts
14
Delete LOADMON.NCF When the Server Starts
•
•
Delete LOADMON.NCF when server is down to delete old
minimum values
 Do this with a scripted start under the Windows platform
 Delete LOADMON.NCF before Domino starts
You can still do it on the Linux platform for free
 Nash!Com has a start script for free
 www.nashcom.de/nshweb/pages/startscript.htm
 The link has a list of all changes
 Plus a link where you can request the script from Daniel


Daniel is one of the smartest Domino administrators I have met in
my entire career
Linux/Unix start script can delete LOADMON.NCF
automatically
15
The Expansion Factor
•
•
But sometimes, Domino has a difficult time calculating the
expansion factor
 The result is that the Server_AvailabilityIndex is not a reliable
measure of how busy the server is
 This can happen with extremely high-performing servers
If you see a very low Server_AvailabilityIndex at a time you know
servers are supposed to be idle and you are trying to load
balance, there is something you can do to correct it
 And Domino can help!
16
Changing Expansion Factor Calculation
•
•
Use this parameter to change how the Expansion Factor is
calculated
 SERVER_TRANSINFO_RANGE=n
To determine the optimal value for this variable:
 After the server has experienced heavy usage, use this console
command:
 Show AI
 This means, show the availability index calculation

It has nothing to do with that 2001 Steven Spielberg movie, about
the robot that looks like a child and tries to become a real boy
17
An Easy Way to Find the Parameter Value
•
Show AI is a console command that has been around since
Domino Release 6
 It runs some computations on the server
 And suggests a SERVER_TRANSINFO_RANGE for you
18
Platform Disk Statistics
•
•
•
•
The disk specification will vary by server
Platform.LogicalDisk.1.AvgQueueLen
 AvgQueueLen: The average number of both read and write
requests that were queued for all logical disks on all physical
disks during the sample interval
 Should not consistently rise above 2
Platform.LogicalDisk.1.PctUtil
 PctUtil: Percent of time the drives are busy reading or writing
 Watch for disks constantly hitting above 80%
Track both of these statistics in Notes with the new Statrep
 Follow up with performance monitoring on the OS level
19
Change the View Temp File Default Folder
•
•
•
By default, Domino generates temp files in the server’s temporary
folder when it rebuilds a view
 Directory used by update/updall tasks for rebuilding indexes
The default is usually somewhere on the system drive C: when
using Windows servers
 If the system doesn’t have a temp folder, Domino puts the temp
files in the Domino data folder
Because of the disk I/O and disk space required, you should
change the location to a different drive
 Not your Domino data drive, or your transaction log drive, or
your OS drive, or your DAOS file system
 For maximum performance, it should be on its own drive
20
Make Sure There Is Plenty of Space Available
•
•
Use this parameter:
 VIEW_REBUILD_DIR=(drive and folder location)
 Make sure you have plenty of space available
 The performance increase is worth the trouble
If Domino calculates that there isn’t enough space on the
temporary folder’s drive, it uses a slower method to rebuild
the view
 You’ll see the message below in the log and console
 It’s best to remedy this with more disk space, or performance
will actually drop
Warning: Unable to use optimized view rebuild for view due to
insufficient disk space at directory. Estimate may need x
million bytes for this view. Using standard rebuild instead.
21
Anti-Virus Software on Domino Servers
•
•
I hate running AV software as a Domino task
 Many shops have stopped using it because malicious software
is caught with perimeter software or desktop software
If you must run OS platform AV software, remember to exclude:
 Domino data directory
 Transaction log drive
 TMP directory
 DAOS drive
 View rebuild directory
22
What We’ll Cover …
•
•
•
•
•
•
•
•
Tuning hardware and OS
Optimizing Domino server performance
Examining opportunities in on-disk structure (ODS)
Keeping applications under control
Mastering cluster replication
Dealing with database corruption
Resolving specific problems with databases
Wrap-up
23
Use Transaction Logging
•
•
Transaction logging can increase performance significantly
Enable transaction logging in the server document
 T-Logs might already be in use in Archive logging style if
servers are backed up incrementally
 Otherwise, use the Circular logging style so that transaction
logging reuses space
 But be careful where you put the logs
24
Choices to be Made by Administrators
•
You’ll need to decide whether to configure the transaction logs to
create more or less checkpoints
 To record a recovery checkpoint, Domino evaluates each active
logged database to determine how many transactions would be
necessary to recover each database after a system failure
 Then, it creates a recovery checkpoint record in the
transaction log that lists each open database and the starting
point transaction needed for recovery
25
Runtime/Restart Performance
•
Your choices are:
 Standard (default and recommended)
 To record checkpoints regularly
 Favor runtime
 To record fewer checkpoints
 Requires fewer system resources and improves server runtime performance, but causes more of the log to be applied
during restart
 Favor restart recovery time
 To record more checkpoints
 This option improves restart recovery time because fewer
transactions are required for recovery
26
Location of Transaction Logs
•
•
Transaction logs work best if placed on Raid 1 disks
 These are mirrored drives
 And should be local to the server
These logs should not be placed:
 On the Wintel system drive C:
 On the same drive as the Domino data
 On a SAN drive
27
Disconnect Idle Users
•
•
•
An idle user stays connected to a server for 4 hours
 This takes up valuable server resources
Use this parameter to drop idle users faster
 SERVER_SESSION_TIMEOUT=(number of minutes)
 Users will not have to re-enter a password if they become
active after the time limit
The minimum recommended setting is 30-45 minutes
 A lower setting may negatively impact server performance
 IBM/Lotus says it’s not needed in R8
 But I like to use the parameter regardless
 It gives you more realistic user concurrency stats
28
1,000 Users – Server_session_timeout=60
•
Comparison of memory usage on a Domino server
29
650 Users – Server_session_timeout=30
•
Domino server memory comparison with and without the
parameter set to 30
30
650 Users – Server_session_timeout=30 (cont.)
•
CPU Utilization comparing with and without the parameter
31
Disable HTTP Server Logging
•
•
We’ve found many instances where DOMLOG.NSF was well
over 2GB

And it was nearly impossible to wait for it to open
 Because it had never actually been opened before
If you don’t look at the logs, improve performance by disabling
the HTTP server logging

It’s in the HTTP section of the server document
 Disable both the Enable Logging and Domlog.nsf
32
Don’t Maintain Read Marks on All Databases
•
•
Replication of unread marks was primarily designed for mail
databases
 If you don’t need them, don’t replicate them, because it can
significantly slow database performance
For example, keep them switched off in Help, LOG.NSF,
NAMES.NSF, and any reference application
 Work with your developers to develop standards for enabling or
disabling the feature
33
Plan on a Monthly Restart for Domino Servers
•
•
•
Consider regular monthly restarts of Domino servers
 Not just Wintel-based servers, all servers
Server memory allocation and shared memory fragmentation can
occur over time
 Plus, there could be undocumented memory leaks
Regular restarts will help ensure your Domino servers are running
as efficiently as possible
34
Keep as Few Documents in Inbox as Possible
•
•
•
We all know large mail files are a problem, right?
 This is true, if only from the perspective of disk space
 But the issue is bigger than just disk space
 And here’s the proof you can take back to your domain
IBM/Lotus did a study using Domino on the iSeries called:
 Sizing Large-Scale Domino Workloads on iSeries
They found that reducing the number of documents kept in the
inbox:
 Reduces overall CPU usage
 Improves response time
 And can dramatically improve startup/recovery performance
35
It’s Very Logical When You Think About It
•
•
In terms of performance, the Inbox is the most “expensive”
container in a mail file
 The Inbox folder contains all new messages a mail file receives
 It must be updated each time a user opens the file
 Or clicks Refresh to see new mail
The more documents kept in the Inbox folder, the more expensive
it is to refresh the view of it
 Reducing the number of documents in the folder reduces the
CPU and main storage required to update the view of it
36
What Can You Do About It?
•
Two things you can do about this problem
 First, when a user calls and says that Notes is slow, ask this
question:
 How many messages are in your inbox?
 This should be a standard part of your help desk response
 Urge them to keep no more than 90 days in the inbox
 Use NOTES.INI parameters on Notes client to demonstrate
how indexing the inbox is a major problem
 CLIENT_CLOCK=1
 Debug_Console=1
37
Use Release 8.x Inbox Manager
•
•
•
Second, control the number of messages in the inbox using
settings in the AdminP section of the server document
 AdminP can start an agent in the user’s mail file to remove
messages from the Inbox
 This can also be controlled from policies
The messages are not deleted
 They are still in the All Documents view
Users need to know where the messages can be found
38
Control User Polling for New Mail
•
•
Some users want to know if they have new mail
They configure a user preference to check for new mail every
couple of minutes
 If there are a lot of users on a server, a setting like this can
really hurt performance
39
Override the User Configuration for New Mail Polling
•
•
•
Add this parameter to mail server’s NOTES.INI to control how
often a client can check for new mail
 MinNewMailPoll= (number of minutes)
 Experiment with this number, but 15 is safe
This parameter overrides the user’s selection in the Mail Setup
dialog box
 This can prevent frequent polling from affecting server
performance
Parameters like this one should be in every server’s NOTES.INI
 That’s why they belong in a server configuration document
40
Port Compression
•
•
•
Enable network port compression!
This is especially good for server-to-server communication
 Must be enabled on server
 Client should be enabled using policies
Up to 60% compression of data
41
What We’ll Cover …
•
•
•
•
•
•
•
•
Tuning hardware and OS
Optimizing Domino server performance
Examining opportunities in on-disk structure (ODS)
Keeping applications under control
Mastering cluster replication
Dealing with database corruption
Resolving specific problems with databases
Wrap-up
42
There Is a New On-Disk Structure for Domino 8
•
•
The term On-Disk Structure (ODS) describes the internal
architecture of Notes databases
 Each new release, except ND7, has included an update to the
ODS to accommodate new features and functions
Domino 8 includes a new On-Disk Structure, ODS48
43
Design Compression Saves Space
•
Design compression reduces the size of databases by
compressing design elements by up to 60%
 It will shrink the standard Notes 8 mail template MAIL8.NTF
from 25MB to 11MB
 The compression percentage achieved will vary from database
to database
 This is based on the compression ratio achieved for each
design element in each application
44
Enabling Design Note Compression
•
•
The design compression switch is available on the Advanced tab
of the properties of applications with ODS43 and ODS48
 You must be using the Notes 8 client to see the option
 However, the compression will not occur unless the
application is subsequently upgraded to ODS48
Once enabled, the Design Compression setting replicates to other
replicas of the application
 Keep in mind that the ODS itself does not replicate
45
Your ODS By Default Is 43
•
•
When a new application is created in a Lotus Notes 7, 8, or 8.5
client or on a Lotus Domino 7, 8, or 8.5 server, the on-disk
structure (ODS) remains at 43
The on-disk structure has been upgraded in Notes/Domino 8.5 to
the new ODS version of 51
 Add the following parameter to the NOTES.INI on the server or
client to use ODS 51:
 CREATE_R85_DATABASES=1
46
Use Compact –C to Upgrade to New ODS
•
•
Yes, it must be a compact –C, –B will not work
 Makes it easy to plan the ODS upgrade
 Low risk, no problems have been seen
Besides the “compress database design” option from ODS 48 in
advanced properties, it gives you options to turn on
 Compression of non-summary data
 Use Domino Attachment and Object Service (DAOS)
47
What We’ll Cover …
•
•
•
•
•
•
•
•
Tuning hardware and OS
Optimizing Domino server performance
Examining opportunities in on-disk structure (ODS)
Keeping applications under control
Mastering cluster replication
Dealing with database corruption
Resolving specific problems with databases
Wrap-up
48
Making Applications Behave
•
•
You’re not a developer, you’re an administrator
 What can you do to help applications stay under control?
The biggest complaints about agents that run applications are:
 The agents run too long
 The agents consume vast amounts of memory
 The agents utilize too much CPU on the server
 And all of these complaints are usually made anecdotally
 They are in conversations heard in elevators or around
water coolers

Are there still water coolers for people to
stand around, gossiping?
49
Domino Domain Monitor Probes
•
•
One way to scientifically prove when agents consume
extraordinary resources is to use application probes in DDM
These are set up in the Monitoring Configuration Database
 That’s EVENTS4.NSF
 Note that you can track agents by how long they run, behind
schedule, by CPU utilization, and by memory usage
50
Long Running Agents
•
•
Every administrator knows that you can set a maximum agent
execution time in server documents
 You could just set it for 1,440 minutes and allow agents to run
all day long
How do you know how long agents really run?
 Just ask the developer!
 They are very honest, hardworking people, for the most part
51
Find the Truth
•
•
You can set up a probe to monitor agents and report back to DDM
if an agent ran longer than a time you think is reasonable
 For example, 4 hours or 240 minutes
 You can monitor agent manager or the HTTP process
 DDM will report to the DDM database when an agent runs
longer, and will report it as a event of Fatal severity
Or you can set up a probe that monitors memory utilization
52
De-Mystify the Situation
•
•
•
The probe will report back to the DDM database
You will have actual data rather than water cooler data
You can make an intelligent choice about agents and resources
53
Full-Text Indexing for Searches
•
•
•
•
Should all servers be able to update full-text indexes?
 NO!
 FTI uses disk resources – adds 25%-45% to DB size
 FTI requires CPU and memory resources
Only enable FTI where it is absolutely necessary
 Such as mail and application servers where users require it
Disable full-text index building on hubs, gateways, and any other
server that does not have the requirement
Use Notes.ini parameter Update_No_Fulltext
 Set to 1 to prevent FTI builds
 Set to 0 to allow FTI builds
54
Simple Search Is Simply Awful Sometimes
•
Simple search is the type of processing used when a user
searches a non-full-text indexed application
 The simple search algorithm does the job, but is not very
efficient
 It can significantly impact performance on a Domino server
 For some applications, the ability to search documents may
not really be necessary
 However, the default functionality still allows users to do
simple searches on applications that are non-full-text
indexed
55
Preventing Simple Searches
•
Administrators can now prevent simple searches if an application
is not full-text indexed
 Enable this by selecting “Don’t allow simple search” on the
Advanced tab of Database Properties
56
Preventing Simple Searches (cont.)
•
If users attempt to simple search a database with this option
enabled, they will receive an error message as shown below
 This will probably generate a few help desk calls
 Be prepared by providing info about this feature, if you’re
going to deploy it
57
Property Doesn’t Replicate
•
Keep in mind that the “Don’t allow simple search” property does
not replicate for existing database replicas
 This lets you decide selectively whether each replica should
have the setting enabled
 The setting is carried over to new replicas and copies
58
Properties and How They Affect the Environment
•
•
Database properties that impact
performance and that should NOT
be set by the developer (these are
up to you)
They in no way impact the behavior
of the app, but they do impact the
behavior of the server or client
59
Database Settings for Optimal Performance
Property
Tab
To optimize
performance/size
Improves database
performance?
Reduces
database
size?
Set By
Administrator
or Developer
Document table
bitmap optimization
Advanced
Select option
Yes
No
Admin
Don't overwrite free
space
Advanced
Select option
Yes
No
Admin
Disable Transaction
Logging
Advanced
Depends on type of
Application
Maintain
LastAccessed
property
Advanced
Deselect option
Use LZ1
Compression for
Attachments
Advanced
Select the option only
if ALL elements of
environment are ND6
Admin
Yes
No
Admin
Yes
Admin
*Original Table from Domino Administrator Help – Modified by Technotics
60
Database Settings for Optimal Performance (cont.)
Property
Tab
To optimize
performance/size
Improves database
performance?
Reduces
database
size?
Set By
Administrator
or Developer
Allow use of stored
forms in this
database
Basics
Deselect option
Yes
Yes
Developer
Display images after
loading
Basics
Select option
Yes
No
Developer
Don't maintain
unread marks
Advanced
Select option
Yes
Yes
Developer
Don't support
specialized response
hierarchy
Advanced
Select the option
Yes
Slightly
Developer
Don't allow headline
monitoring
Advanced
Select the option
Prevents
performance
degradation
No
Developer
*Original Table from Domino Administrator Help – Modified by Technotics
61
Design Elements That Adversely Impact Performance
•
•
@[email protected]
 Excessive numbers of these will degrade your server’s
performance, as well as that of the client
 This especially applies to applications that will be accessed
from a browser
WebQueryOpen/WebQuerySave Agents
 These are agents that are triggered anytime a form is opened or
saved from the Web
 They execute on the server and can crush your performance
 Make sure you do performance testing WITH LOAD before
deploying
62
What We’ll Cover …
•
•
•
•
•
•
•
•
Tuning hardware and OS
Optimizing Domino server performance
Examining opportunities in on-disk structure (ODS)
Keeping applications under control
Mastering cluster replication
Dealing with database corruption
Resolving specific problems with databases
Wrap-up
63
Understanding Cluster Replication
•
•
Cluster replication is event driven
 It doesn’t run on a schedule
 The cluster replicator detects a change in a database
and immediately pushes the change to other replicas
in the cluster
If a server is down or there is significant network latency, the
cluster replicator stores changes in memory, so it can push them
out when it can
 If a change to the same application happens before a previous
change has been sent, the CLREPL gathers them and sends
them all together
64
Only One Cluster Replicator by Default
•
•
When a cluster is created, each server has only a single cluster
replicator instance
 If there have been a significant number of changes to many
applications, a single cluster replicator can fall behind
 Databases synchronization won’t be up to date
If a server fails when database synch has fallen behind, users will
think their mail file or app is “missing data”
 They won’t understand why all the meetings they made this
morning are not there
 They think their information is gone forever!
 Users need their cluster insurance!
65
Condition Is Completely Manageable
•
•
•
•
Adding a cluster replicator will help fix this problem
You can load cluster replicators manually, using the following
console command:
 Load CLREPL
 Note that a manually loaded cluster replicator will not be
there if the server is restarted after manually loading a
cluster replicator
Add cluster replicators permanently to a server
 Use this parameter in the NOTES.INI:
 CLUSTER_REPLICATORS=#
I always use at least two cluster replicators
66
When to Add Cluster Replicators
•
•
•
But how do you tell if there’s a potential problem?
 Do you let it fail and then wait for the phone to ring?
 No!
You look at the cluster stats and get the data you need to make an
intelligent decision
 Adding too many will have a negative effect on server
performance
Here are some important statistics to watch
67
Key Stats for Vital Information About Cluster Replication
Statistic
What It Tells You
Acceptable values
Replica.Cluster.
SecondsOnQueue
Total seconds that last DB
replicated spent on work queue
< 15 sec – light load
< 30 sec – heavy
Replica.Cluster.
SecondsOnQueue.Avg
Average seconds a DB spent on Use for trending
work queue
Replica.Cluster.
SecondsOnQueue.Max
Maximum seconds a DB spent
on work queue
Use for trending
Replica.Cluster.
WorkQueueDepth
Current number of databases
awaiting cluster replication
Usually zero
Replica.Cluster.
WorkQueueDepth.Avg
Average work queue depth
since the server started
Use for trending
Replica.Cluster.
WorkQueueDepth.Max
Maximum work queue depth
since the server started
Use for trending
68
What to Do About Stats Over the Limit
•
•
Acceptable Replica.Cluster.SecondsOnQueue
 Queue is checked every 15 seconds, so under light load,
should be less than 15
 Under heavy load, if the number is larger than 30, another
cluster replicator should be added
If the above statistic is low and Replica.Cluster.WorkQueueDepth
is constantly higher than 10 …
 Perhaps your network bandwidth is too low
 Consider setting up a private LAN for cluster replication
traffic
69
Stats That Have Meaning but Have Gone Missing
•
There aren’t any views in Lotus version of Statrep that let you see
these important statistics
 Matter of fact, the Cluster view is pretty worthless
 They lack the key cluster statistics you need to make decisions
70
Stats That Have Meaning but Have Gone Missing (cont.)
•
But there is a view like that in the Technotics R8.5 Statrep.NTF
 It shows the key stats you need
 To help track and adjust your clusters
 Download from my blog
 www.andypedisich.com
71
Use a Scheduled Connection Document, Also
•
Back up your clustered replication with a scheduled connection
document between servers
 Have it replicate at least once per hour
 You’ll always be assured to have your servers in sync, even
if one has been down for a few days
 And it replicates deletion stubs, too!
72
Don’t Forget About Silent Failover
•
•
Was a parameter you could set in R8.5.2
 FailoverSilent = 1
 Now available in a desktop policy settings document
Client will silently fail over to a different server if the current
server is no longer operational
 No confusing prompts
 Best practices = set to 1
73
What We’ll Cover …
•
•
•
•
•
•
•
•
Tuning hardware and OS
Optimizing Domino server performance
Examining opportunities in on-disk structure (ODS)
Keeping applications under control
Mastering cluster replication
Dealing with database corruption
Resolving specific problems with databases
Wrap-up
74
What Causes Corruption?
•
•
•
Lots of changes to a database
 The more changes to a database, the greater your chance for
corruption
 Until 7.0.2, a database could deal with no more than 30
million Notes IDs in its lifetime – think a high-volume
mail.box
 Frequent view and Full-Text Index (FTI) refreshing or rebuilding
 Consider a 20 Gig mail file with FTI update frequency of
“immediate”
Insufficient hard drive space
Third-party apps improperly set up to “lock” open DBs
75
What Causes Corruption? (cont.)
•
•
•
•
•
•
Read-only databases or views
Partially-written transactions or changes
Agents running against non-existent views
Running defrag on the OS with Domino running
The servers/agents’ lack of access to design elements
And many more …
76
How Do I Find Corruption?
•
•
•
•
Domino log (Log.nsf) review
Server console
Domino Domain Monitoring Database (DDM.NSF)
Ad hoc – your phone rings/you get a ticket
 Most likely way to find corruption, if you don’t have proactive
monitoring setup for keywords, such as:
 “corrupt”
 “RRV-bucket”
 “b-tree”
 Corruption may prevent access to applications, cause phantom
data to “appear” in a view, or generate error messages for end
users
77
The Three Commands That Fix Most Issues – Fixup
•
•
•
Fixup does a great job, but be careful
 Resolves inconsistencies resulting from partially-written
operations, including improperly closed databases
 “This database cannot be opened because a consistency
check of it is in progress”
 Is not needed when transaction logging is enabled
Takes open databases offline for duration of task
Is the only “destructive” maintenance task
 “Removes” corrupt data elements!
 Does not leave a deletion stub

Requires a replica somewhere to replace removed corrupted
documents
78
Fixup Options
•
•
•
•
Fixup –L Logs every database, fixup opens and checks
 Without this, only encountered problems are logged
Fixup –N Prevents fixup from “removing” corrupted documents
 Use this to salvage data if there are no other replicas
Fixup –V
 Prevents fixup from running on views
 Reduces Fixup runtime
Fixup –C
 Verifies the integrity of the database and reports errors
 Does not purge corrupted documents
 For more on fixup switches, see Administrator Help
79
The Three Commands That Fix Most Issues – Compact
•
Compact
 Upgrades the On-Disk Structure (ODS) of a database
 Allows disk space to be re-used
 After documents and attachments are deleted from a
database
 Removes documents from a database if archiving to a server is
set up via policies
 Comes in three styles
 In-place with space recovery
 In-place with space recovery and reduction in file size
 Copy-style compacting
80
The Three Compact Styles
•
In-place with white space recovery but no file size reduction
 Retains Database Instance ID (DBIID)
 Important for transaction logging
 The database can be accessed while this runs
 Default if no switch is used

•
This is the same as compact –B
In-place with space recovery and reduction in file size
 Assigns a new DBIID
 Only appropriate for transaction-logged servers if
incremental differential back-up software is used
 Also known as compact –B
 More resource-intensive and slower than “compact”
81
The Three Compact Styles (cont.)
•
Copy-style compacting
 Meaning compact –C
 Creates a copy and then deletes the original
 Requires sufficient disk space
 Assigns new DBIID
 Does not allow access to the DB while this runs
 DB access can be granted by adding –L
 BUT if DB changes, compact is cancelled
82
Compact Options
•
•
•
•
Compact –S 15
 Compacts DBs with 15% or more unused space
Compact –R
 Compacts without conversion to current Domino release
 Uses copy style
Compact –D
 Discards built view indexes and runs a copy-style compact
Compact –A
 Archives and deletes documents, then compacts DB
 For more compact switches, see Administrator Help
83
The Three Commands That Fix Most Issues — Updall
•
Updall
 Updates or rebuilds view indexes or Full-Text Indexes (FTIs)
 Including corrupt ones




Purges deletion stubs from DBs and discards view indexes
 By default, view indexes remain for 45 days
 Use the Notes.ini setting Default_Index_Lifetime_Days to
change when updall discards unused view indexes
Is the “as needed” version of UPDATE
Does not run continuously
Included in the Notes.ini setting ServerTasksAt2
 More on why you may not want this setting later
84
Updall Options
•
•
•
Updall database.nsf –T $Servers
 Updates a specific view
Updall –X
 Rebuilds full-text indexes, but not views
 Use to fix FTI corruption
Updall –R
 Rebuilds all used views
 Use to fix corruption
 For more updall switches, see Administrator Help
85
How Do I Prevent Corruption?
•
•
•
•
•
•
•
Avoid conflicting/overlapping maintenance tasks
Implement maintenance program docs
Set up third-party apps appropriately
Set database quotas
Control attachment sizes
Monitor disk space availability
Don’t allow immediate full-text index updates
86
Avoid Conflicting/Overlapping Maintenance Tasks
•
•
•
Running more than one maintenance task at the same time can
cause corruption, instead of solve it
When running maintenance tasks manually via the server
console, always wait until they’re done before starting a new one
Remove ServerTasksAt2 from your Notes.ini
 Most admins don’t know it’s there, and schedule compact or
other conflicting server tasks at 2:00 am, causing corruption
 Avoid editing the Notes.ini directly via the operating system
 Doing so is impossible to track and troubleshoot in case of
issues
87
Avoid Conflicting/Overlapping Maintenance Tasks (cont.)
•
Use these server console commands to capture and edit the
Notes.ini:
 Show config ServerTasksAt2
 To see present settings
 Set config ServerTasksAt2=
 To set settings
 Removes the setting entirely from the Notes.ini
88
What We’ll Cover …
•
•
•
•
•
•
•
•
Tuning hardware and OS
Optimizing Domino server performance
Examining opportunities in on-disk structure (ODS)
Keeping applications under control
Mastering cluster replication
Dealing with database corruption
Resolving specific problems with databases
Wrap-up
89
If You Are Dealing With …
•
•
•
•
•
“Invalid or nonexistent document”
“file.nsf is damaged, field length stored is incorrect”
“Database.nsf is CORRUPT – Now Read-Only!”
Cause
 A document or a view index has become corrupted, in some
cases due to replication or save conflicts
Solution
 Run “standard database maintenance”
 Transaction Logging
 No Transaction Logging
 Compact –B
 Fixup –F –L
 Updall –R –X
 Updall –R –X
 Or create a new replica
 Compact –B
 Or create a new replica
90
If You Are Dealing With … (cont.)
•
•
•
“RRV Bucket is corrupt”
Cause
 A Record Relocation Vector (RRV) table is an index mapping
to the actual data’s location on the hard disk
 RRV buckets don’t replicate
 Improper Domino server shutdown can cause this
 Third-party app altered the physical location of the database
on disk
 Disk defrag utility running with Domino server up
Solution
 Run standard maintenance, but with compact –C
 Make a new replica or copy
91
If You Are Dealing With … (cont.)
•
•
•
•
“Detected Storage Corruption”
“Attempt to use an invalid database pointer”
“B-tree structure is invalid”
 B-tree structure creates efficient lookups, fast access to data
“This database cannot be read due to an invalid ODS”
 Cause
 ODS problem, incomplete or corrupt index, soft deletes being
turned on in pre-ND6 versions of Domino, and in-place
compaction moves non-summary data to another location in
the database
 Solution
 Run standard maintenance
92
If You Are Dealing With … (cont.)
•
•
•
“corrupt desktop.ndk or corrupt local names.nsf”
 B-tree structure defines the way a view index is encoded for
efficient lookups and fast access to data
Cause
 ODS problem, incomplete or corrupt index
 View or full-text index corruption
 If full-text index is set to “immediate” and view index gets
discarded while agents are accessing it
Solution
 Run standard maintenance
93
If You Are Dealing With … (cont.)
•
•
•
“Extendible Hash Index is Corrupt and Can’t be Used”
Cause
 The Extendible Hash Index (EHI) is a list of design element
names converted into unique values, and does not replicate
 Corruption occurs when the EHI gets too large or partially
overwritten
Solution
 Do a copy-style compaction of the database to force a rebuild
 Fixup and updall will not repair the Extendible Hash Index
 Refresh the design in another replica, then replicate, forcing the
corrupt EHI to rebuild
 Pull a new replica or database copy
94
If You Are Dealing With … (cont.)
•
•
Corrupt mail.box
 Online maintenance does not usually work on mail.box
Solution
 Rebuild the affected mail.box
 Stop the router and issue “dbcache flush”
 Rename the mail.box file from the OS
 Be sure to copy all valid mail out of the old mail.box
 Restart the router
 This procedure may not always work, and a Domino server
shutdown may be required
 7.0.2 is capable of routing “around” corrupt mail.boxes
95
If You Are Dealing With … (cont.)
•
•
•
Cannot Write to log file: Database is corrupt – Cannot allocate
space – Now Read-Only!
Cause
 Insufficient hard drive space
 Back-up or anti-virus software running on the server is locking
open databases
Solution
 Check hard disk space
1. Shut down the Domino server
2. Rename Log.nsf
3. Restart the Domino server
96
If You Are Dealing With … (cont.)
•
•
•
Corrupt transaction logs
Cause
 Domino is not reusing archive-style transaction logs after
severe server crash
 Hard disk problems and crashes
 Copying transaction logs (*.txn files) over the network
Solution
 Make a full backup of the server
 Disable transaction logging in the Server doc
 Stop the Domino server
 Delete the transaction log directory on the OS
 Restart the Domino server
 Re-enable transaction logging
97
If You Are Dealing With … (cont.)
•
•
B-tree, RRV bucket error messages on Names.nsf, and you have
tried using online database maintenance
Solution
1. Shut down your Domino server
2. Open a DOS prompt
3. Navigate to the Domino Data directory (D:\domino\data)
4. Enter the following commands:
 C:\Domino\nfixup.exe names.nsf –F
 If you are transaction logging, use fixup –F –J
 C:\Domino\ncompact.exe names.nsf –C
 C:\Domino\nupdall.exe names.nsf –R –X
98
What We’ll Cover …
•
•
•
•
•
•
•
•
Tuning hardware and OS
Optimizing Domino server performance
Examining opportunities in on-disk structure (ODS)
Keeping applications under control
Mastering cluster replication
Dealing with database corruption
Resolving specific problems with databases
Wrap-up
99
Additional Resources
•
•
•
•
Domino Defrag 3.2 OpenNTF Project
 www.openntf.org/internal/home.nsf/project.xsp?action=openDo
cument&name=DominoDefrag
Nash!Com’s free Linux start script that deletes LOADMON.NCF
 www.nashcom.de/nshweb/pages/startscript.htm
Download new Technotics Monitoring Results STATREP.NSF
template
 www.andypedisich.com/blogs/andysblog.nsf/dx/admin2011.htm
How does the notes.ini file parameter “server_session_timeout”
affect server performance
 www-01.ibm.com/support/docview.wss?uid=swg21293213
100
7 Key Points to Take Home
•
•
•
•
Consider transaction logging, not only for incremental backups,
but also for faster restarts
Eliminate tasks you don’t need from the ServerTasksAt
parameters, especially ones that interfere with program
documents
Turn off full-text indexing on hubs, gateways, and other servers
that don’t absolutely require it
Make it a habit to check cluster statistics to determine if you need
more cluster replicators
101
7 Key Points to Take Home (cont.)
•
•
•
Use DDM probes to ensure agents aren’t consuming
unreasonable amounts of resources
Implement silent cluster failover-using policies and make your
users happier
Prevent simple searches of databases that are not full-text
indexed
102
Your Turn!
How to contact me:
Andy Pedisich
[email protected]
www.andypedisich.com
103

similar documents