Tuning Informix Dynamic Server and Your System for Optimum Performance
by Art S. Kagel KAGEL@bloomberg.net
Tech Notes 1998, Volume 8, Issue 3
Overview
This article is a compilation and expansion of my notes from a talk I gave this past
January to the New York Informix Users Group. I have expanded on the information I
provided them in places where appropriate and condensed where I could.
The goal of the article is to provide DBAs and System Administrators with guidelines
for tuning both Informix Dynamic Server and the underlying operating system and hardware
platform for best performance. The information presented is the result of my experiences
with UNIX and Informix systems, as well as the experience of some of the denizens of the
Informix Users News Group, comp.databases.informix.
The platform-related information is based on a UNIX server, though much is applicable
to Windows NT platforms as well. Informix Dynamic Server tuning tips are mostly geared
toward Informix Dynamic Server 7.2x and later, though there is some applicability to
earlier versions as well. I have tried to make note of this, and to point out where one
would treat Informix Dynamic Server 7.30, INFORMIX-OnLine Dynamic Server 7.1x or
INFORMIX-OnLine 5.xx differently. For further information about tuning and
version-specific features, be sure to refer to the Informix Dynamic Server 7.30
Performance Guide-the most complete discussion of these topics to date.
What kind of system? What to tune for?
- Server Configuration: I am assuming throughout that the machine on which Informix
Dynamic Server is running is a dedicated server with no other applications, or only
database clients executing. Tuning for a mixed environment where other, non-database
client applications are also running is a different task which involves balancing resource
usage between Informix Dynamic Server and other applications. (If this is not your
environment, do not turn to the next article yet! There will be things here that you will
find useful.)
- Issues not covered:
- Concurrency issues among large numbers of users. This gets mainly involved in the design
of your database schema, locking levels, and application design, which are beyond the
scope of this article.
- Loading to and deleting from large tables/databases. This is a problem with many
solutions dealt with elsewhere.
- Stored procedures and triggers. Using stored procedures and triggers includes issues of
trade-offs and involves both high costs and potentially large gains. This is also a
separate article.
- Transaction logging issues. I will deal with this right here and be done with it: tell
your bosses: "We use logging, period." There is no more discussion needed on the
matter.
- Long transaction handling: This involves many areas, including application design,
optimistic locking versus pessimistic locking, and number and size of log files (which I
will cover). The solution to this problem involves one or more of those perennial
religious issues we in DP have to deal with from time to time.
What are we trying to accomplish? There are trade-offs.
- Speed versus Safety: Issues affecting this trade-off include:
- Log buffering. Buffered logging is faster as logs are only written to disk when a log
file is complete. Unbuffered logging puts more stress on the system's I/O subsystems, as
each log buffer is written to disk as it fills. For a busy, transaction-oriented system,
this can mean a nearly constant stream of I/Os to the current logging device. Add to this
speed-versus-impact debate the fact that buffered logging places more of your transaction
logging information at risk during a system crash, and you have a complex issue. I always
recommend unbuffered logging. Keep the logs on a separate disk farm from the data with
separate controllers-and since we have agreed to talk about dedicated servers only
(okay, I coerced the discussion) the issue of impact on other processes is moot.
- Checkpoint frequency.
- More frequent checkpoints insure that data is flushed to disk and that rollback
information is complete and consistent.
- Less frequent checkpoints use less resources and improve concurrency, buffer re-use
(i.e. cache percentages), reduce application pauses, lower impact on server/application
performance.
This is a tough one. And there are some non-obvious issues. There is a defect in
versions prior to 7.23 which will cause the engine to crash if the Physical Log overflows
(i.e. fills before the checkpoint begun when the log reached 75 percent full completes).
Normally, this condition will simply cause the engine to pause all activity until the
checkpoint completes. If the checkpoint interval is too short, and the checkpoint duration
is too long, you will not get much useful work done as any updates must pause during the
entire checkpoint. There is a design flaw, discussed later in detail, that will block some
SELECTs during the first half of the checkpoint under specific conditions. The answer? You
have to consider your own needs and transaction mix. We will return to the issue of
checkpoints when we discuss transaction mix and optimization goals.
- Disk configuration is the issue of what kind of disk drives and disk farm should we use:
Singleton disks versus various RAID configurations. We will return to this later. For now,
note that this is a speed and safety issue that may have fewer trade-offs than one might
imagine. Here we may have clear answers.
- Transaction Mix: Determination and corresponding goals.
I propose that every
relational database installation falls into one of the following categories: Decision
Support, Data Warehouse, On-line Transaction Processing, or Simple Query Server. (The last
is not generally talked about in the literature and I had to invent it to describe the
systems that I manage.) Let's spend a few words describing each, so you can classify your
own installation and describe the tuning goals required for each. I will deal with
decision support systems and Data Warehouse systems together, because their issues and
goals tend to coincide.
- DSS/Data Warehouse. These are systems identifiable by very large datasets and very
coomplex queries. It is typical to join five or more tables, for at least one of these
tables to be a Gerund, containing no intrinsic data of interest, and for the typical
user's question to require two or three queries to answer. Updates are rare and usually
performed in batch, often involving replacing entire tables with completely new data.
Modifications to the database schema are not unusual. The presence of indexes to support
all common queries is rare, because the set of common queries is so large and changing.
The possible goals for tuning a DSS server are three:
- Maximize total query throughput.
To maximize the throughput of the query engine limit PDQPRIORITY to less than 25 percent
and lean toward multiprocessor systems where the number of total processors is more
important than CPU speed.
- Minimize processing time for each query.
To minimize execution time for each query permit PDQPRIORITY to exceed 50 percent for
critical queries and favor multiprocessor systems with FAST CPUs.
- Balanced priority.
You can create a balanced environment where all queries receive critical resources by
setting PDQPRIORITY between 25 percent and 50 percent. This will prevent one critical job
from locking a second (or third) critical job out of resources. When choosing a system,
balance the number of CPUs with CPU speed and get the most total CYCLES/sec you can for
the dollars you have to spend. In all cases, large tables should be fragmented with
performance and parallelism in mind.
- On-Line Transaction Processing (OLTP). These systems are medium-to-large datasets, which
are updated throughout the processing day. Updates involve modification of only a few
columns of only a few rows in several tables. Most queries are simple, involving one to
three tables where useful data is fetched from all tables involved. Views are commonly
used to institutionalize these queries. Typically the ratio of queries to updates
approaches one. A significant percentage of queries involve a large percentage of the data
in the tables that they access. Again, there are three possible goals:
- Maximize update activity throughput.
Use buffered logging, maximize checkpoint interval (CKPTINTVL), PDQPRIORITY=0, minimize
checkpoint duration, increase physical log size and maximize write cache percent.
(Veterans are now shaking their heads because several of these goals are contradictory.
Stay tuned-I will tell you how to balance these disparate goals, and where the
trade-offs are.)
- Maximize query activity throughput.
Maximize BUFFERS, PDQPRIORITY=0 or 1, Maximize read cache percent.
- Maximize transaction safety.
Minimize CKPTINTVL, unbuffered logging, redundant disks and I/O paths, reduce physical log
size. In this case you may want to fragment large tables using some logical expression
scheme to ease maintenance and improve update speed.
- Simple Query Server. These servers share qualities of the other two. The datasets may be
from very small (a few MB) to extremely large (a few TB). There may be on-line
transactions, but the percent of total update operations is small (<25 percent). There
may frequently be batch updates including table reloads, but the database schema is fairly
stable. Queries typically involve one or two tables supported by indexes and access only
small subsets of the available data. Here we need to keep PDQPRIORITY=0, maximize BUFFERS,
fragment tables for performance, and eliminate the need for parallel query execution.
- Processor Limits. How many CPUs, how fast are each?
- The issue of number versus speed of CPUs depends on your transaction mix.
- More CPUs mean better support for more users, smaller databases and queries requiring
increased parallelization.
- Faster CPUs mean better support of complex queries and larger databases-with the
caveat that many DSS/Data Warehousing queries can also benefit from greater
parallelization and more CPUs.
- For large fragmented tables you will need enough CPUs to be able to parallelize
searching every fragment with a different CPU. So either you limit your fragments to the
number of CPUs or buy enough CPUs to handle the fragments you desire for maximum
parallelism.
- Cache and memory arrangement. Different systems provide different cache and memory
architectures of which you must be aware.
More cache is better and multiple level
caches can help process migration between processors in SMP systems if you know how to
take advantage of the system's cache design. For example, Data General's Aviion 10000
series uses a multilevel cache design where four CPUs share an L2 cache. Processes
migrating from one of these four CPUs to another suffer almost no penalty. Also, each
group of eight CPUs owns a portion of the system's main memory. While migrating out of the
four CPU group causes a cache flush, migrating out of the eight CPU group will cause the
process to need to access its main memory image from the original memory board at reduced
backplane access speed-causing an even greater penalty. By taking advantage of the
memory architecture of our Aviion 10000's we have doubled server throughput!
- Your system's backplane speed and the number of expansion components supported on the
CPU Bus versus on the backplane can be critical to constructing an optimal server system.
For example, on many Intel-based SMP systems, and all NUMA-machine architecture systems,
the backplane runs several times slower that the CPU Bus. In contrast, Sun E series
servers run the backplane at the same speed as the CPU bus (except for the newest fastest
SPARC CPUs, for which Sun will have to develop new backplane technology). The backplane
can limit the number of I/O channels you can add to your system or the additional
throughput gains of adding yet another pair of controllers.
- Current/Maximum Memory Available. How much memory, how much is unused, how much is
allowed to be shared? How much can be made resident? How much swap is available? What
percentage of memory is dedicated or reserved for OS system buffer cache? What is the
system maximum memory? How much does it cost? Keep in mind that Informix can use as much
memory as you can give it. Witness these server limits:
Buffer Cache |
- up to 768,000 pages (3GB with 4K page size systems) |
DSS Memory |
- up to 1GB |
Locks |
- up to 8million (~ 32MB-see the manual for calculations) |
Logical Log Buffer |
- 3 are created =LOGSIZE up to 2GB each, 6GB total |
Physical Log Buffer |
- 2 are created =PHYSFILE up to 2GB each, 4GB total |
Data Dictionary Cache |
- no documented limit |
Stored Procedure Cache |
- no documented limit |
Data Distribution Cache |
- no documented limit |
Given these limits, how much RAM should I expect to allocate to shared memory? That
depends on your transaction mix, and on how much RAM you can cram into your box:
Small Simple Query Server |
-> 200MB 400MB |
Large Simple Query Server |
-> 600MB 1GB |
Large OLTP Server |
-> 200MB 500MB |
DSS/Data Warehouse Server |
-> 500MB 9GB |
- Disk Limits. How many drives are available for database use? How many controllers? How
much can be added easily? How much can be added with reconfiguration? Is a logical volume
manager available? What RAID configuration options are available/supported?
- Current/maximum number of drives. How many drives does your system have available? How
many can it support? The points to remember are:
- More drives are better.
- More smaller drives are better than fewer larger drives.
- Newer drives tend to be faster, but they also tend to be larger!
- Speed of drives. Obviously we want faster, but in what way? PC pundits have always
touted the benefits of faster seek speeds over spindle speed and transfer rate. Is this
true for us also?
- Faster spindle speeds are more important than seek speed if you have a good buffer
cache. Informix's is the best in the industry and with version 7.30, it just got even
better.
- Mirroring, striping, and other RAID configurations and on-drive and on-controller
caching favor spindle speed and transfer rate over seek speed.
- For smaller servers with less memory available for BUFFERS, or for which large stripe
sets are not practical, increasing seek speed can help.
The good news is that the newer drives mostly have spindle speeds in excess of 7200RPM
(the fastest are at 10000RPM-plus) and have seek times under 11ms so this is less of an
issue than it used to be. However, if you have systems with 3600RPM drives with seek times
> 11ms, it is time to start upgrading. This may be the single best investment toward
improving performance that you can make.
- Speed of controllers/channels.
- More channels-spread your drives across multiple controllers. Yes, an Ultra Wide
controller can support up to 16 drives, but with newer, faster drives with on-drive caches
and intelligent on-drive read ahead, it is likely that more than eight drives are capable
of swamping the controller. You could increase throughput by spreading those 16 drives
across two or even four controllers!
- Wider channels-Ultra Wide SCSI, SCSI Fibre, SSA all support transfer rates in excess
of 40MB/second with promises of 80MB or even 120MB/sec on the way! Compare that to
original SCSII at 5MB/sec or IDE at 3MB/sec!
- Caching controllers can make a big difference. The controllers on our system have cache,
but at the beginning the caches kept becoming disabled. When that happened, we would
notice everything seemed to just drag along. Write-back caches are not important, except
for smaller servers with fewer Informix buffers. If you do get write-back caching, make
sure there is a battery back-up and cache flush on reboot after a crash.
- Look for more intelligent controllers, which are capable of command reordering and
elevator writes (a.k.a. scatter/gather). Although Informix tries to do this for you, the
disk array schemes I will discuss below reduce the effectiveness of the engine's efforts.
- High speed I/O subsystems. These include various memory disk schemes, including systems
that are basically very sophisticated high-speed, battery-backed, memory caches in front
of a traditional disk array. These can make tables with a high access frequency fly! They
are expensive, but definitely keep them in mind. Even if you are planning to upgrade to
Informix Dynamic Server 7.3x, you should consider these subsystems as an alternative to
resident tables, as resident tables can cause lower priority tables to thrash the buffer
cache.
- Task Limits. What other tasks are running on the server? When? We agreed that for
simplicity, we would not discuss shared systems-but practically, this is an issue for
most of us and you will need to consider it. Also, what are your requirements for the
following? They will each impact your specific tuning efforts:
- Task completion window.
What are your application response time requirements? Are they measured in days? Hours?
Minutes? Seconds? What must be run when? When must the server be available? When
can you run on your back-up system only? Primary only? When are both needed? Can you have
a back-up system? Hot? Cold? Can you use your back-up for load balancing? Peak load
relief?
- Scheduling downtime window.
When can scheduled maintenance be performed? Table reorganizations? Indexes added/dropped?
ON-check run?
- Statistics update window.
When can statistics be updated without invalidating prepared cursors? Can your
applications recover from a -710 error?
- Damage correction window.
How long can the engine be down to correct data related problems? Performance related
problems? When?
- Archive/Recovery window.
- Duration of Archives using ON-Tape/ON-Archive/ON-Bar? Can you afford a 23-hour,
level-zero archive using ON-Tape with one tape drive on your 200GB server? Can you
allocate additional tape units and take advantage of ON-Bar? Does your system support
multiple tape drives? Tape libraries? Tape arrays (RAID 5 tape sets comprised of multiple
drives seen as a single device)?
- Can you take advantage of Informix Dynamic Server 7.30 features that make it easier to
create multiple mirrors and use rotating mirror sets as back-ups? Cost?
- What is acceptable downtime for a major restore?
- What is acceptable data loss due to catastrophe? Can you tolerate:
- Loss since last archive? No log back-up required
- Loss since last log back-up? Can use event alarm program/script to back up logs to
disk-letting system archive-secure these to tape daily or even hourly.
- Loss to last completed transaction? Need continuous log back-up to tape or other
removable device.
- No tolerance?
- Can you re-run transactions?
- Sync with a sibling server?
- Use Informix replication? Keep log files small.
- OS/CPU limits and their effects on tuning.
- AIX kernel parameter limits number of shared memory segments per process. Configurable.
- HP/UX PA RISC has only four shared memory segment registers. If you have to access more
than four shared memory segments concurrently, your process will become very slow.
Informix needs access to at least: 1-Resident segment, 1-Virtual segment, and for each
process 1-Message segment. If your engine has to add more than one additional virtual
segment, the CPU VPs will become bogged down. If your application is already using its own
shared memory and the engine allocates even one additional virtual segment, your
application will bog down switching between the engine's shared memory and its own.
- Solaris only supports four partitions per drive-meaning, in part, that drives larger
than 8GB are only partially usable by Informix. There are third-party volume managers that
can get around this problem. Sun has recently begun repackaging the Veritas VM themselves
to alleviate this problem.
- Logical disk volume management lacking. You may want to implement one of the disk
schemes discussed below, but your system does not support it. There are third-party volume
managers, as well as RAID software and hardware vendors that may be able to help you.
- System memory disk cache allocation. On certain UNIX boxes, the percentage of memory
that is allocated to OS file system buffers may be hard-wired and not configurable. Even
if not some UNIX flavors use this as a guideline-freeing unused cache pages for system
use if the system is swapping. Others free cache only if it is not being used. Still
others treat cache pages as inviolate. A dedicated server system needs a very small UNIX
buffer cache.
- Informix-based processor affinity versus OS-based affinity control.
- AFF_SPROC, AFF_NPROCS - Affects only the CPU virtual processes (VPs) not AIO VPs, Net
VPs, and others. The Informix engine does not do any load balancing! The first VP of each
class assumes all of the load unless it is already occupied. If you do share the machine
with other applications, a CPU VP can become blocked out of its processor by another
process, since each VP has affinity to a single CPU.
- OS level affinity control can give you both broader and finer-grained control. You will
be able to assign affinity all VP classes. You may be able, depending on what your OS
supports, to assign a VP affinity to a group of two or more CPUs so that they can float
over a limited set of CPUs and avoid becoming blocked. Depending on your system memory
architecture there may be little or no penalty for doing this. The OS's SMP task manager
can often load balance the CPUs for you with little cost if limited floating is allowed.
Which
way? It depends. Here is a case study: Remember the Data General 10000 memory architecture
I presented earlier? It means that a process feels no penalty from migrating from one CPU
to another, as long as it stays on a four-CPU card sharing the same L2 cache, and little
cost as long as it stays on the eight-CPU board where it's memory image resides.
Originally, on our platform the Informix Dynamic Server 7.13 port did not support
either affinity or KAIO. We got a program and sample script from Data General advanced
support to permit us to create affinity for our VPs, as I described, to four processor
groups-further allowing floating within the eight-CPU hardware group if all four CPUs
with affinity are busy.
The scripts I developed from this created affinity for the CPU VPs to groups round
robin by increasing group number from one (group 0 is reserved for OS processes and our
own system management processes), and the AIO VPs round robin by decreasing group number
from seven (we have 32 processors). Since CPU VP number one and AIO VP number two do the
most work, they are guaranteed not to interfere with each other as they run on independent
processor groups.
When Informix Dynamic Server 7.21 was ported to our platform, both affinity and KAIO
were enabled. We jumped on the chance to try them, since the Data General porting group
had reported a 25 percent increase in throughput using affinity and KAIO versus affinity,
and AIO VP and affinity seemed to improve things even for AIO VP versus non-affined VPs.
Unfortunately, we found our own throughput down 50 percent and went back to our own
affinity and AIO VPs! Remember: YMMV (your mileage may vary). On other platforms the
penalty for floating may be too great or non-existent.
- NOAGE. On certain platforms, notably Solaris and HP/UX, which aggressively age
long-running processes, this can make a big difference. Again-be careful if you are
sharing the server with other applications. A NOAGEd engine can preempt other
applications, whose priority will continue to drop through aging.
- RESIDENT. This is a big gainer with relatively little cost if you have enough physical
memory.
- MULTIPLECPU. If you got them, use them. There are reports that on some systems it only
pays to use one CPU for Informix Dynamic Server if you only have two. Certainly if you
have three or more, configure to use all but one. If you are sharing the server, reduce
the number of CPUs that you allocate to Informix Dynamic Server accordingly. It is one way
to play nice.
- NETTYPE. If you are using CPU VPs to listen to shared memory connections, use all that
you have. In effect, if you have five CPU VPs, then your NETTYPE parameter for shared
memory should configure five listeners, one in each CPU VP. Otherwise, the tendency for
the first few CPU VPs do most of the work, will only become exacerbated.
Also, since
these first few VPs which do have the listener threads will be busy doing queries, new
client connections will not be served promptly. Do not configure shared memory NETTYPES to
use NET VPs if possible. NET VPs use polling for shared memory connections and that will
spin your CPUs, tying them up. Similarly avoid TCP NETTYPES using CPU VPs for similar
reasons. It increases the number of system calls per second and ties up OS services,
because the CPU VPs cannot sleep while waiting for new connections and requests.
(I do not understand the secrecy surrounding the tuneability of these features. That
other database vendor makes a big deal about the tuneability of just these three caches in
their engine as a selling point against Informix Dynamic Server.)
- Data dictionary cache. This caches your schema information at the table level, so that
the system catalog tables do not have to be queried constantly. (This is the reason there
is no need for concern over the extent sizing of your system catalog tables.) The default
data dictionary cache contains 310 entries. If you have more than about 250 active tables
(you need a few extras for catalog tables and to allow for unbalanced hash assignments)
this cache will thrash. (In versions prior to 7.23 there was a defect that would cause a
SEGV in an MT_NOTIFY thread if this condition persists too long causing the engine to
crash.) You can adjust the size of the table by including the following two ONCONFIG file
parameters:
DD_HASHSIZE |
- # of hash buckets, must be a prime number (default: 31) |
DD_HASHMAX |
- # of entries per bucket, no restrictions (default: 10) |
Use onstat -g dic to determine if your dictionary cache is nearing capacity. There
should not be any hash buckets that are full (if your system has the defect discussed
above you may actually see buckets that have more items than DD_HASHMAX).
- Stored procedure cache. The cache of stored procedure information is also configurable.
Unfortunately, I have not been able to acquire either the default values for these
parameters, nor any restrictions on their values or ranges. The controlling variables are:
PC_HASHPOOL and PC_HASHSIZE.
If you make extensive use of large numbers of stored
procedures, and are seeing performance degradation from your procedures as load increases,
you can make a case to Informix Support that you may need to increase the stored procedure
cache. You may track activity on the stored procedure data dictionary tables in your
databases with onstat -t or onstat -g ppf. There should be little or no activity once your
system reaches a steady state, or for more ad-hoc systems, after a day or two of normal
activity. You can monitor the stored procedure cache by running: onstat -g prc.
- Data distribution cache. This cache maintains the data distributions generated by UPDATE
STATISTICS MEDIUM/HIGH in memory for active tables. The configuration parameters are
DS_HASHPOOL and DS_HASHSIZE. Again, I do not know the ranges of these parameters, nor the
rules covering their use. You can monitor the data distribution cache by running onstat -g
dsc. I would say that if you find a need to increase the data dictionary, this table
probably needs to be expanded as well.
The following parameters affect I/O performance in various ways. I have grouped them
according to which ones are most effected by modifying the others:
- BUFFERS, LRUS, LRU_MAX_DIRTY, LRU_MIN_DIRTY, CLEANERS
- Buffers. The more buffers the better. I dedicate as much as 40 percent of total shared
memory for buffers. The trade-off is that more buffers can mean longer checkpoints.
- LRUS. More LRUS better support large numbers of users by reducing bufwaits as users
compete for buffers to write to. Due to a suspected defect in the LRU rehash code you
should avoid setting LRUS to any of the following values: 16, 64, or 96. There may be
other nodal values to the error function also (32, 127, and 128 are known to be okay) the
symptom is very high bufwaits (a large percentage of dskreads and bufwrites).
- LRU_MAX_DIRTY and LRU_MIN_DIRTY. Treat these as one pair of parameters. These are the
main way to control checkpoint duration and to trade database throughput for system-wide
throughput. Larger values leave more dirty buffers for checkpoint time, increasing the
duration of the checkpoint but allowing the engine to optimize dirty buffer flushing to
reduce the impact on the system. Smaller values cause small flushes of one or a few LRUS
at a time between checkpoints, which minimizes the duration of the checkpoints. However,
this can adversely affect other applications running on the system.
- CLEANERS. Anyone who follows the postings on the newsgroup comp.databases.informix will
know that I have always recommended that the value of CLEANERS be equal to the number of
LRUS for 7.xx engines (or ==max(#drives,LRUS) for 5.xx engines). However, my most recent
investigations indicate that if you have, on average, more than 500 dirty buffers at
checkpoint time that need flushing, you may be better off with only one- or two-page
cleaner threads if you have response time requirements measured in seconds. I will discuss
the problem further in a later section. For now keep this in mind.
- KAIOOFF, NUMAIOVPs
- The environment variable KAIOOFF in the server's environment (if it exists and has been
exported), will cause Kernel Asynchronous I/O to be disabled in the engine if your
platform supports KAIO. The reasons for disabling KAIO are system/platform dependent and
were presented above in the discussion of processor affinity. In addition, if you use only
cooked devices, KAIO is non-functional and can be disabled with little or no cost.
- NUMAIOVPs. If you have KAIO enabled and have only raw disks, then only a few AIO VPs are
needed. If you do not have KAIO enabled, the number of AIO VPs needed varies depending on
version (the AIO VPs have been steadily improved over the versions). In general, NUMAIOVPs
should be equal to, or greater than, the value of CLEANERS. In addition, for versions 7.14
and lower, use 2.5 * #chunks, for 7.2x versions use 1.5 * #chunks, for 7.3x use #chunks +
1*(# especially active chunks) as a lower bound. For cooked files, you can follow the
recommendations for non-KAIO installations, taking only cooked files into consideration.
- LOGBUFF, PHYSBUFF, PHYSFILE, PHYSDB, CKPTINTVL
- LOGBUFF and PHYSBUFF. Trade safety for speed. With bigger buffers, more data is at risk
in the event of a hard crash. These are limited by the size of the smallest Logical Log
file and of the Physical Log file, respectively.
- PHYSFILE. This should be large enough to permit the desired checkpoint interval during
normal and normal-peak loads. Using more than 75 percent of the physical log will force a
premature checkpoint. (There is a defect in versions prior to 7.23 that may cause the
engine to crash if outstanding transactions fill the remaining 25 percent of the physical
log while the checkpoint is waiting for these sessions to complete critical processing.)
- PHYSDB. Do not place the Physical Log in the rootdb space. After initialization,
create a dbspace just for the physical, and/or the logical logs. Shutdown, and modify the
ONCONFIG file so that when the engine comes back up it will place the physical log in the
new dbspace. On a high transaction server, this should be a separate dbspace from the
logical logs also. It is best to keep rootdb, plogdb, and llogdb separated onto different
drives with different controllers from each other and from the data drives.
- CKPTINTVL. Shorter intervals are safer and make for shorter faster recovery in the event
of a hard crash. Longer checkpoint intervals will minimize the effects of checkpoints on
server throughput. If the interval is too short, your system can spend more time waiting
for checkpoints than processing data (been there, done that).
- LOGSIZE and LOGFILES. Fewer larger log files are more convenient to manage. More smaller
log files are safer, as they will be flushed to tape by continuous log back-up more
frequently. This protects against a catastrophic crash.
- RA_PAGES, RA_THRESHOLD. Faster systems, especially systems with faster disk farms, need
less read ahead. These values, and their utility to us, is heavily dependent on I/O speed.
If you use discrete drives look to set RA_PAGES around 64 with a large threshold of
between 16 and 32. If you are using a large RAID10 configurations, with a stripe of many
pairs of mirrored 10000RPM drives, using a caching controller capable of its own read
ahead, you can run a very busy system with RA_PAGES=16 and RA_THRESHOLD=4. Onstat -p
reports on read ahead usage stats. The sum of idx-RA, idxda-RA, and da-RA should be very
close to the value of RA-pgsused. Increase these parameters until the ratio drops or the
read-cache% begins to drop. Excessive read-ahead can cause needed data buffers to be
flushed, causing the drop in read-cache%. This can happen even while the RA ratio is
improving, so be careful to balance these. If this happens, either back off on the RA
parameters until the read-cache% is again acceptable, or increase the number of buffers
until the read-cache% improves.
Memory parameters and their results are monitored using onstat -g seg and onstat -g
mgm. They affect allocation of shared memory from the OS, and partitioning that memory
among queries.
- SHMVIRTSIZE. This should encompass, at least, normal load memory requirements and is
best configured to include normal peak load needs. This is especially important on HP PA
RISC and IBM AIX hosts, as already discussed.
- SHMADD. This should be at least 10 percent of SHMVIRTSIZE. Here is another trade off:
smaller additional segments are faster to allocate and attach to, but fewer segments may
be more efficient to use (and on AIX and HPUX are almost a requirement). Processing will
halt while all of the oninit processes attach to the new segment, and smaller segments
will minimize the effect. Some operating systems do this faster than others, but if you
have many VPs it may still take more time than response-time requirements will allow for.
- SHMTOTAL. Unless you have a very small memory system, just set this to zero (0) and let
memory grow.
I am going to defer to the excellent discussion of the optimization parameters and
optimizer hints in the 7.3 Performance Guide, which is available on line at http://www.informix.com/answers. Suffice it to
say that 7.3x adds much power and control over the functioning of the optimizer.
We are all aware that updates must wait while a checkpoint completes, and that SELECTs
are not affected. This is not quite true! Read on.
If you have more than 500 dirty buffers at checkpoint time, the checkpoint thread will
not relinquish control of CPU VP#1 until all buffers have been handed off to CLEANER
threads-as long as there are page cleaner threads available that are not busy. Since
user threads currently executing in CPU VP#1 are not migrated to other VPs prior to
beginning the checkpoint, or you may only have one CPU VP, this means that even FETCHs and
OPENs will hang for approximately the first half of the checkpoint duration (longer if you
set LRU_MAXDIRTY and LRU_MINDIRTY high). If you have response-time requirements measured
in seconds, this can be deadly.
If no page cleaner thread is ready, then the checkpoint thread will relinquish the VP
and place itself on a wait queue until a cleaner thread frees up. With less than 500 dirty
buffers, a single cleaner is launched to handle all dirty buffers and the checkpoint
thread relinquishes the CPU VP to other threads after that launch. This sounds better, but
the result is the same.
Based on this, I am recommending that sites with large numbers of buffers and short
response time requirements configure with one or only a few page cleaners so that the
checkpoint will have to enter a wait state while the previous page cleaners complete their
last assignments. This should cause the checkpoint thread to release the VP and permit
queries to continue. This issue is being examined by Informix R&D and, by the time
this goes to press, I should have had an opportunity to test the latest theoretical
solution. Search the C.D.I. archives at http://www.iiug.org/
for my results.
These are Holy War issues-here are the options and issues, with a summary chart to
follow.
- Raw-versus-cooked files. True-for many years, Informix has been running cooked files
in O_SYNC mode, so that the safety issue is long moot. However, I have performed the
benchmarks. Raw disk performs 15->25 percent faster than a cooked file in a filesystem
on the same device! If you need performance, you must use raw disks.
- Singleton drives. Simple and cheap, and no recovery value. Nothing more to say.
- RAID 0 - mirrored singleton drives. This gives you maximum safety and recovery through
100 percent redundancy of data. Read performance can be up to twice as fast as singleton
drives under load, while write performance is at least as good as singleton drives.
Downside? You may need more speed. Recovery cost: close to 100 percent of the surviving
drive's resources will be dedicated to rebuilding the replacement drive. Some RAID 0
implementations permit mirroring three or more drives for maximum redundancy, which will
reduce the impact of recovery but multiply the acquisition cost.
- Informix Mirror-versus-OS Mirror-versus-Hardware/Firmware mirror. Hardware mirrors are
the fastest in benchmarks that Informix conducted several years ago. OS mirrors were a
distant second, followed closely by Informix mirroring. These were tests under
INFORMIX-OnLine 5.0x. I suspect that OnLine Dynamic Server 7.3 mirroring might just edge
out OS mirror now. Informix mirror is the easiest to maintain from a DBA standpoint, while
your system administrators will make the same argument for either hardware or OS mirror,
depending on what tools are available to them.
- RAID 1. Many drives striped together so that sequential logical disk blocks are read
from or written to the next drive in the set. This will give you speed by essentially
doing large amounts of read ahead concurrently. All RA blocks will likely be available
after one logical I/O. Downside? Complex to manage and no redundancy, no hope of recovery.
- RAID 3 - N data drives + one parity drive. Good redundancy. Like RAID 1, there is the
automated read ahead advantage. Parity is read and checked on every read, hardware support
in newer RAID subsystems does this with little or no performance cost. Downside? Complex.
Recovery cost: close to 100 percent of the surviving drives' bandwidth is occupied during
recovery. All drives must be read to rebuild each block on the replacement drive. This can
seriously impact production runs during a recovery.
- RAID 5 - N+1 drives. Fair redundancy. The parity block for a stripe is alternated
between the drives to minimize the impact of sequential writes. Normally blocks from the
stripe that are not needed to satisfy a request are not read, including the parity block.
This is to improve random reads, but will seriously impair sequential read performance and
leaves the array open to errors in the recording media of a single drive. Parity is not
checked at read time, and the written data is not verified. Parity is only used to rebuild
a completely failed drive. Recovery cost: same as for RAID 3 in addition to the items
already mentioned.
- RAID 10 (or 1+0). A RAID 1 stripe set made up from N pairs of RAID 0 mirrored drives.
This is fully redundant. Safe from any drive or media failure, you can recover with as
many as half of the drives lost. In addition, if each side of the mirror are kept on
separate controllers and power supplies, you are even protected against controller and
power supply failure. Write performance is at least as good as RAID 1, while random read
performance under heavy load shows much better scalability than any other configuration
(almost double). Recovery cost: replacement of a single drive only affects its mirror
which is only 1/Nth of the total array, so production queries are affected at most (100 *
1/N)% of the time. The down side? The safety costs you 100 percent over the cost of
singleton drives or RAID 1 or up to 66 percent higher than an equivalent RAID 3 or RAID 5
array.
- RAM/FlashRAM Pseudo-Disks. Very fast but volatile. Units with disk secondary storage
amount to VERY SMART cache systems. Very good for highly active tables like a current
transactions table. If you have a sufficiently large on-controller cache the advantages of
these pseudo-disks may not be significant. Only your own benchmarks will provide the
answer.
Read |
Write |
Reliability/
Availability |
Recovery |
Ramdisk |
Ramdisk |
RAID 10 |
RAID 10 |
RAID 10 |
RAID 10 |
RAID 10 |
RAID 10 |
RAID 1
(if big I/O) |
RAID 0
(if big I/O) |
RAID 5 |
RAID 5 |
RAID 0
(if big I/O) |
RAID 1
(if big I/O) |
RAID 3 |
RAID 3 |
RAID 3 |
RAID 3 |
Singleton |
Singleton |
Singleton |
Singleton |
RAID 1 |
RAID 1 |
Singleton |
RAID 5 |
Ramdisk |
Ramdisk |
Table 1: Comparison of Storage Configuration Options.
- Updating statistics for maximum effect.
- The best strategy I have found for generally updating statistics, for maximum benefit in
minimum time, is to follow the recommendations in the Informix Dynamic Server 7.2 release
notes I received. Unfortunately, I have also seen at least two other versions of these
notes, one with the 7.1 recommendations unchanged and one with no update statistics
recommendations at all. The full recommendation is now contained in the 7.3 Performance
Guide, and can also be found by searching the C.D.I. archives at http://www.iiug.org/, where I posted them several months
ago. You can also download the package I submitted to the IIUG Software Repository named
utils2_ak. which contains the latest version of a utility I wrote, dostats.ec, that
implements these recommendations. If you have INFORMIX-ESQL/C or INFORMIX-4GL 7.10, you
will be able to compile this.
- When to update stats? The flip answer is when queries stop working-and in all
seriousness, this is the main criterion most of us use. To be more proactive, you can do
this weekly/monthly, etc., but that can get to be expensive and may interfere with
production. Statistics need to be recreated when the nature of the data has changed. In
effect, when the ratio of one key value to another is no longer as it was described in the
current data distributions, it is time to recreate the distributions. You can peruse the
current distributions with dbschema -d database -hd table.
- Tables fragmented with large numbers of extents. Use oncheck -pT or use oncheck -pe to
check the number of extents a table has and how they are laid out on disk.
- When and how to compress a table to reduce the number of extents:
- Generally, a table with more than 33 extents is probably slower than it could be. (33
because the engine has already quadrupled the size of each extent at 32 extents and the
table is still adding extents.)
- You can compress the table several ways:
- Unload the table, drop and recreate with a larger extent/next size and reload the data.
- Create a clustered index or re-cluster an existing one with ALTER INDEX TO NOT CLUSTER;
ALTER INDEX TO CLUSTER;
- ALTER FRAGMENT ON TABLE INIT IN . This works even for a non-fragmented table and is the
fastest way to reorganize a table.
It is a good idea to adjust the value of NEXT SIZE for the table before reorganizing to
minimize future fragmentation.
- Avoiding large numbers of interleaved extents.
- Place the table in a private dbspace dedicated to its use. Tables built this way can
have at most one extent per chunk in the dbspace and cannot interfere with other tables'
I/O needs. Obviously, you can only do this for a limited number of high growth tables.
- Place high growth tables into dbspaces along with low growth tables or tables with
different growth patterns (i.e., a table that is appended to monthly shares a dbspace with
ones that are updated daily so it does not interfere).
- Table fragmentation schemes. Again, the 7.3 performance guide has a very thorough
discussion of this topic, so I will just touch the issues and refer you there.
- A table needs to be fragmented when it:
- Needs more than 16 million pages. (A tablespace is limited to 16MM pages.)
- Needs parallel search performance.
- The optimizer can perform fragment elimination on critical queries.
- An index needs to be detached when:
- Its table needs almost 16MM pages (detaching the index can permit the table to grow a
bit more before needing to be fragmented).
- The index key does not match the fragmentation expression. You may even gain by
fragmenting the index according to its own key.
- The table is fragmented ROUND ROBIN to speed searches.
- There is a need to spread I/O load during complex DSS type queries.
- Disk Location. Informix continues, even in the 7.3 Performance Guide, to recommend that
you locate the busiest tables nearest to the center of each disk. This is a good
recommendation if you use singleton drives. For more complex disk farms, this is both
impractical and of little benefit. Why? It is very difficult to place tables and calculate
required offsets in a large RAID set. The disk and controller caches and read ahead
capabilities, as well as the gain from multiple spindles in RAID 0 and RAID 10 sets,
reduces the impact of head movement. Also, Informix's own cache reduces the random I/O
nature of the database's I/O patterns.
- Indexing strategies. Things to keep in mind when designing indexes.
- Versions before 7.24 contain a defect (partially fixed in 7.24 and completed in 7.3)
that causes the optimizer to only use the first two columns in a multi-column index. The
optimizer will always filter the remaining keys from the data pages unless a KEY-ONLY
search is selected. This is not reported in the sqexplain.out file.
- OPT_GOAL and DIRECTIVES make it more useful to create additional indexes in 7.30 and
later. See the 7.3 Performance Guide for discussions of these features in depth.
The advantages of setting PDQPRIORITY may seem obvious, but there are non-obvious
issues and repercussions to take into consideration.
- Effect on searching fragmented tables. Allows searching multiple fragments in parallel
if sufficient CPU resources are available and your I/O subsystems can provide sufficient
throughput.
Note: versions prior to 7.24 cannot merge ordered results from table fragments (known
as a Merge of Sorted Sinks). As a result, the engine always sorts results from fragmented
tables when the query contains an ORDER BY clause-even if an index matching the ORDER BY
is selected by the optimizer. This is true even when only a single fragment is searched,
except when the index is detached and non-fragmented. In 7.24 and later, code to merge
ordered results from multiple fragments was added back into the code (it was removed in
7.10) but is only effective when the optimizer goal is set to FIRST_ROWS.
- Affect on multiple table joins. Allows both sides of merge-join and nested-loop query
paths to operate in parallel
- Affect on sorting. I know that the manual states that values of PSORT_NPROCS > 10 are
no-op. My own testing shows otherwise-and has been verified by several other users-so
I recommend the following settings to maximize sort speed for index builds, UPDATE
STATISTICS, and complex queries which must complete in minimum time:
PSORT_NPROCS=40
This is the maximum effective value I have been able to verify.
PSORT_MAXALLOC=10240
This undocumented variable controls the size of in-memory sorting. PSORT_DBTEMP=.....
Set this to a list of at least three filesystems. Remember that the smallest, in terms of
free space, of these will control the total amount of available sort-work space.
If you prefer to use DBSPACETEMP for a sort-work area, be sure that you have at least
three dbspaces in that list, and that PSORT_DBTEMP is unset.
- Affect on other users.
- Remember that you cannot allocate more than 100 percent of available resources. This
means that if there are two users with PDQPRIORITY>50 (and MAX_PDQPRIORITY=100), then
one of these users will have to wait for resources to free up. Therefore PDQPRIORITY
effectively limits concurrency. Not a problem if you are tuning a DSS system with response
time requirements measured in hours or even minutes.
- Remember that the granularity of PDQPRIORITY is not one percent, but the granularity of
each of the various resources that it controls. For example, if you have only two CPU VPs,
then a PDQPRIORITY setting of 2 is effectively the same as one of 50 for your system!
Three performance enhancing features are described here. The Cursor Buffer Sizing and
Fetch Array features are fully documented in the "Documentation Notes for
INFORMIX-ESQL/C Programmers Manual" for versions 7.23, and later in the release
directory structure in file ESQLCDOC_7.2. I present the highlights and justification.
- Multiple connections are effective for several purposes:
- To avoid locking large numbers of rows. If you update or delete rows based on the
results of a query, depending on the database's and/or the session's isolation level, you
may end up holding many locks on the rows touched by the query. By making the query in a
separate connection, there will be no locks held.
- To improve performance when accessing data from multiple databases whether they reside
on the same or separate servers. By opening a separate connection to each database you
accomplish two things:
- You reduce impact on the local server by routing the remote connection directly to the
remote server, rather than passing all communication through the local server.
- You improve throughput. Access to a "foreign" database's tables, even if the
database resides in the same server instance, is significantly faster with separate
connections. This is even truer if one of the database servers is an OnLine 5.xx engine
being access by INFORMIX-ESQL 7.xx source. This seems to stem from the local engine
pushing communications buffers and header information related to the remote connection
aside and having to swap it back in while alternating communications between two
databases. The INFORMIX-ESQL multiple connection code seems to be more efficient at doing
this.
- Adjusting the size of the SQL Cursor Buffer. Communications with the engine are
accomplished using a communication buffer, which by default is 4K in size. Increasing this
size will significantly reduce network traffic and overhead and speed large queries. The
range of valid buffer sizes is 4096 -> 32767. This may be set in the user's environment
with the environment variable FET_BUF_SIZE, pushed into the environment using putenv()
prior to opening a cursor, or can be set using the global variable FetBufSize, which is a
signed short integer as the number of bytes to use in the buffer.
This will improve
performance for all queries where more than one complete row can fit into a single
Cursor Buffer. This is supported in all OnLine Dynamic Server/Informix Dynamic Server
versions since 6.0, and has been officially supported since 7.23. FetBufSize is a session
wide value and is used for ALL communications buffers for all FETCH and PUT statements.
BTW you cannot depend on FetBufSize to be set to the value of FET_BUF_SIZE so use getenv()
to check if the user has set a buffer size preference in the environment.
- Array fetch feature. Normally, when an ESQL FETCH is executed, a single row is copied
from the Cursor Buffer to the applications local variables, either to bound host variables
or to the memory pointed to by an sqlda structure or SQL DESCRIPTOR AREA. There is
overhead associated with moving the data one row at a time, but that is the way most
software wants to deal with it.
Since OnLine Dynamic Server 6.0, there has existed a
poorly documented feature known as Array Fetching. The feature has always been a part of
OnLine Dynamic Server/Informix Dynamic Server and was officially sanctioned in 7.21.
Basically, through the global variable FetArrSize you specify the number of rows you are
capable of processing in a single FETCH (this is adjusted so that rowsize * FetArrSize
< FetBufSize), point each (sqlda).sqlvar.sqldata to an array of FetArrSize elements
large enough to hold that column, and FETCH. The sqlca field sqlerrd[2] will contain the
actual number of rows transferred into your arrays during the latest FETCH. You can then
loop through these rows in your own mini cursor code, walking the parallel arrays for each
column fetched. This is much faster than the row-by-row method.
Well that's it. The sum total of all of my knowledge of Informix Dynamic Server-at
least the tangibles that I can articulate easily. I hope that you will have found this
article useful, and have not exclaimed, "What in the world is this guy talking
about!" too frequently. I have refrained from making concrete recommendations, except
when I am confident of my footing. Take those recommendations to heart.
You are using, or are planning to use, the most technically advanced and most complex
database product on the market. The experience is rewarding. I know of very few
DBAs who have used Informix Dynamic Server and do not love it and recommend it. I know
that the same cannot be said for DBAs using competitive products.
Preparing this article and the presentation to the NY Informix Users Group that
preceded it has been a revelation and a rewarding experience. I have learned much that I
thought I already knew, and coalesced my thoughts on things I never knew I knew. Thanks to
everyone on the comp.databases.informix newsgroup-you were all of great help. I have
learned from many, especially from Jonathan Leffler, Mark Stock, and others too numerous
to risk forgetting some. Thanks to Tim Schaeffer and Ed Yee for encouraging me to flesh
out my talk into this article. Thanks to the many people in Menlo Park and Lenexa who have
given me great advice and technical assistance over the years, and have helped me to learn
the secrets of tuning this beast called Informix Dynamic Server.
Art S. Kagel has over 16 years of computer programming, systems analysis, software and
database design, project management, database administration and consulting experience. He
has served as project leader on project teams from two to fifteen programmers and
supervised over 30 technicians on a departmental basis. He has worked with Informix
products since about the time INFORMIX-4GL incorporated SQL.
Mr. Kagel was educated in Computer Science at the Polytechnic Institute of New York,
with post-graduate work at New York University in Information Systems and Decision Support
Systems. In addition, he has earned a degree in Business Management and Marketing from St.
John's University in New York. Mr Kagel is an expert in the areas of decision support
systems, database design, software design, probability and statistics, market analysis,
media planning analysis, financial application systems and time accounting. Mr. Kagel
first earned recognition in the field of decision support systems for having developed the
first working model base management component for a decision support system. Mr. Kagel has
also written for technical journals on various aspects of data structures and analysis,
and has taught classes in relational database topics and programming techniques in several
computer languages, as well as in the use of various software packages and database
systems. He humbly admits to being one the leading experts in the application and
maintenance of Informix database server products.
In 1985 Art Kagel invented and published the Unshuffle Algorithm. The Unshuffle
algorithm is a sort algorithm which is the most efficient available when applied to most
real-world data sets (don't tell him that quicksort is faster for random data-he gets really
testy). Mr. Kagel also developed an algorithm for the merging of sorted sinks, which can
be shown to be the most efficient algorithm possible.
Mr. Kagel is an active participant in a small number of Internet news groups and
mailing lists promoting proper software engineering and database usage, and enjoys sharing
his experience with others. He has always found he learns as much as he teaches