PRS Technologies, Inc. and Peter Schmidt - a resource for Informix Database Administration

Home

Resumes

References

Downloads

Elite

Links

Contacts

Tuning Informix Dynamic Server and Your System for Optimum Performance

by Art S. Kagel KAGEL@bloomberg.net
Tech Notes 1998, Volume 8, Issue 3

Overview

This article contains the following information:

Introduction
Assumptions
Performance Goals
System Resources and Limitations
Virtual Processor Tuning (processor affinity, etc.)
Undocumented Config File Parameters/Features
Disk Tuning Parameters
Memory Parameters
Optimizer Parameters and Hints
The Checkpoint Stall Problem
Disk Configuration
Database and Table Optimization
Parallel Data Query
Using Tool Features To Improve Performance
Summary
About the Author

Introduction

This article is a compilation and expansion of my notes from a talk I gave this past January to the New York Informix Users Group. I have expanded on the information I provided them in places where appropriate and condensed where I could.

The goal of the article is to provide DBAs and System Administrators with guidelines for tuning both Informix Dynamic Server and the underlying operating system and hardware platform for best performance. The information presented is the result of my experiences with UNIX and Informix systems, as well as the experience of some of the denizens of the Informix Users News Group, comp.databases.informix.

The platform-related information is based on a UNIX server, though much is applicable to Windows NT platforms as well. Informix Dynamic Server tuning tips are mostly geared toward Informix Dynamic Server 7.2x and later, though there is some applicability to earlier versions as well. I have tried to make note of this, and to point out where one would treat Informix Dynamic Server 7.30, INFORMIX-OnLine Dynamic Server 7.1x or INFORMIX-OnLine 5.xx differently. For further information about tuning and version-specific features, be sure to refer to the Informix Dynamic Server 7.30 Performance Guide-the most complete discussion of these topics to date.

Assumptions

What kind of system? What to tune for?

Server Configuration: I am assuming throughout that the machine on which Informix Dynamic Server is running is a dedicated server with no other applications, or only database clients executing. Tuning for a mixed environment where other, non-database client applications are also running is a different task which involves balancing resource usage between Informix Dynamic Server and other applications. (If this is not your environment, do not turn to the next article yet! There will be things here that you will find useful.)
Issues not covered:
1. Concurrency issues among large numbers of users. This gets mainly involved in the design of your database schema, locking levels, and application design, which are beyond the scope of this article.
2. Loading to and deleting from large tables/databases. This is a problem with many solutions dealt with elsewhere.
3. Stored procedures and triggers. Using stored procedures and triggers includes issues of trade-offs and involves both high costs and potentially large gains. This is also a separate article.
4. Transaction logging issues. I will deal with this right here and be done with it: tell your bosses: "We use logging, period." There is no more discussion needed on the matter.
5. Long transaction handling: This involves many areas, including application design, optimistic locking versus pessimistic locking, and number and size of log files (which I will cover). The solution to this problem involves one or more of those perennial religious issues we in DP have to deal with from time to time.

Performance Goals

What are we trying to accomplish? There are trade-offs.

Speed versus Safety: Issues affecting this trade-off include:
1. Log buffering. Buffered logging is faster as logs are only written to disk when a log file is complete. Unbuffered logging puts more stress on the system's I/O subsystems, as each log buffer is written to disk as it fills. For a busy, transaction-oriented system, this can mean a nearly constant stream of I/Os to the current logging device. Add to this speed-versus-impact debate the fact that buffered logging places more of your transaction logging information at risk during a system crash, and you have a complex issue. I always recommend unbuffered logging. Keep the logs on a separate disk farm from the data with separate controllers-and since we have agreed to talk about dedicated servers only (okay, I coerced the discussion) the issue of impact on other processes is moot.
2. Checkpoint frequency.
  1. More frequent checkpoints insure that data is flushed to disk and that rollback information is complete and consistent.
  2. Less frequent checkpoints use less resources and improve concurrency, buffer re-use (i.e. cache percentages), reduce application pauses, lower impact on server/application performance.
  This is a tough one. And there are some non-obvious issues. There is a defect in versions prior to 7.23 which will cause the engine to crash if the Physical Log overflows (i.e. fills before the checkpoint begun when the log reached 75 percent full completes).
  
  Normally, this condition will simply cause the engine to pause all activity until the checkpoint completes. If the checkpoint interval is too short, and the checkpoint duration is too long, you will not get much useful work done as any updates must pause during the entire checkpoint. There is a design flaw, discussed later in detail, that will block some SELECTs during the first half of the checkpoint under specific conditions. The answer? You have to consider your own needs and transaction mix. We will return to the issue of checkpoints when we discuss transaction mix and optimization goals.
3. Disk configuration is the issue of what kind of disk drives and disk farm should we use: Singleton disks versus various RAID configurations. We will return to this later. For now, note that this is a speed and safety issue that may have fewer trade-offs than one might imagine. Here we may have clear answers.
Transaction Mix: Determination and corresponding goals.
I propose that every relational database installation falls into one of the following categories: Decision Support, Data Warehouse, On-line Transaction Processing, or Simple Query Server. (The last is not generally talked about in the literature and I had to invent it to describe the systems that I manage.) Let's spend a few words describing each, so you can classify your own installation and describe the tuning goals required for each. I will deal with decision support systems and Data Warehouse systems together, because their issues and goals tend to coincide.
1. DSS/Data Warehouse. These are systems identifiable by very large datasets and very coomplex queries. It is typical to join five or more tables, for at least one of these tables to be a Gerund, containing no intrinsic data of interest, and for the typical user's question to require two or three queries to answer. Updates are rare and usually performed in batch, often involving replacing entire tables with completely new data. Modifications to the database schema are not unusual. The presence of indexes to support all common queries is rare, because the set of common queries is so large and changing. The possible goals for tuning a DSS server are three:
  1. Maximize total query throughput.
    To maximize the throughput of the query engine limit PDQPRIORITY to less than 25 percent and lean toward multiprocessor systems where the number of total processors is more important than CPU speed.
  2. Minimize processing time for each query.
    To minimize execution time for each query permit PDQPRIORITY to exceed 50 percent for critical queries and favor multiprocessor systems with FAST CPUs.
  3. Balanced priority.
    You can create a balanced environment where all queries receive critical resources by setting PDQPRIORITY between 25 percent and 50 percent. This will prevent one critical job from locking a second (or third) critical job out of resources. When choosing a system, balance the number of CPUs with CPU speed and get the most total CYCLES/sec you can for the dollars you have to spend. In all cases, large tables should be fragmented with performance and parallelism in mind.
2. On-Line Transaction Processing (OLTP). These systems are medium-to-large datasets, which are updated throughout the processing day. Updates involve modification of only a few columns of only a few rows in several tables. Most queries are simple, involving one to three tables where useful data is fetched from all tables involved. Views are commonly used to institutionalize these queries. Typically the ratio of queries to updates approaches one. A significant percentage of queries involve a large percentage of the data in the tables that they access. Again, there are three possible goals:
  1. Maximize update activity throughput.
    Use buffered logging, maximize checkpoint interval (CKPTINTVL), PDQPRIORITY=0, minimize checkpoint duration, increase physical log size and maximize write cache percent. (Veterans are now shaking their heads because several of these goals are contradictory. Stay tuned-I will tell you how to balance these disparate goals, and where the trade-offs are.)
  2. Maximize query activity throughput.
    Maximize BUFFERS, PDQPRIORITY=0 or 1, Maximize read cache percent.
  3. Maximize transaction safety.
    Minimize CKPTINTVL, unbuffered logging, redundant disks and I/O paths, reduce physical log size. In this case you may want to fragment large tables using some logical expression scheme to ease maintenance and improve update speed.
3. Simple Query Server. These servers share qualities of the other two. The datasets may be from very small (a few MB) to extremely large (a few TB). There may be on-line transactions, but the percent of total update operations is small (<25 percent). There may frequently be batch updates including table reloads, but the database schema is fairly stable. Queries typically involve one or two tables supported by indexes and access only small subsets of the available data. Here we need to keep PDQPRIORITY=0, maximize BUFFERS, fragment tables for performance, and eliminate the need for parallel query execution.

System Resources and Limitations

Processor Limits. How many CPUs, how fast are each?
1. The issue of number versus speed of CPUs depends on your transaction mix.
  1. More CPUs mean better support for more users, smaller databases and queries requiring increased parallelization.
  2. Faster CPUs mean better support of complex queries and larger databases-with the caveat that many DSS/Data Warehousing queries can also benefit from greater parallelization and more CPUs.
  3. For large fragmented tables you will need enough CPUs to be able to parallelize searching every fragment with a different CPU. So either you limit your fragments to the number of CPUs or buy enough CPUs to handle the fragments you desire for maximum parallelism.
2. Cache and memory arrangement. Different systems provide different cache and memory architectures of which you must be aware.
  More cache is better and multiple level caches can help process migration between processors in SMP systems if you know how to take advantage of the system's cache design. For example, Data General's Aviion 10000 series uses a multilevel cache design where four CPUs share an L2 cache. Processes migrating from one of these four CPUs to another suffer almost no penalty. Also, each group of eight CPUs owns a portion of the system's main memory. While migrating out of the four CPU group causes a cache flush, migrating out of the eight CPU group will cause the process to need to access its main memory image from the original memory board at reduced backplane access speed-causing an even greater penalty. By taking advantage of the memory architecture of our Aviion 10000's we have doubled server throughput!
3. Your system's backplane speed and the number of expansion components supported on the CPU Bus versus on the backplane can be critical to constructing an optimal server system. For example, on many Intel-based SMP systems, and all NUMA-machine architecture systems, the backplane runs several times slower that the CPU Bus. In contrast, Sun E series servers run the backplane at the same speed as the CPU bus (except for the newest fastest SPARC CPUs, for which Sun will have to develop new backplane technology). The backplane can limit the number of I/O channels you can add to your system or the additional throughput gains of adding yet another pair of controllers.

Current/Maximum Memory Available. How much memory, how much is unused, how much is allowed to be shared? How much can be made resident? How much swap is available? What percentage of memory is dedicated or reserved for OS system buffer cache? What is the system maximum memory? How much does it cost? Keep in mind that Informix can use as much memory as you can give it. Witness these server limits:

Buffer Cache	- up to 768,000 pages (3GB with 4K page size systems)
DSS Memory	- up to 1GB
Locks	- up to 8million (~ 32MB-see the manual for calculations)
Logical Log Buffer	- 3 are created =LOGSIZE up to 2GB each, 6GB total
Physical Log Buffer	- 2 are created =PHYSFILE up to 2GB each, 4GB total
Data Dictionary Cache	- no documented limit
Stored Procedure Cache	- no documented limit
Data Distribution Cache	- no documented limit

Given these limits, how much RAM should I expect to allocate to shared memory? That depends on your transaction mix, and on how much RAM you can cram into your box:

Small Simple Query Server	-> 200MB 400MB
Large Simple Query Server	-> 600MB 1GB
Large OLTP Server	-> 200MB 500MB
DSS/Data Warehouse Server	-> 500MB 9GB

Disk Limits. How many drives are available for database use? How many controllers? How much can be added easily? How much can be added with reconfiguration? Is a logical volume manager available? What RAID configuration options are available/supported?
1. Current/maximum number of drives. How many drives does your system have available? How many can it support? The points to remember are:
  1. More drives are better.
  2. More smaller drives are better than fewer larger drives.
  3. Newer drives tend to be faster, but they also tend to be larger!
2. Speed of drives. Obviously we want faster, but in what way? PC pundits have always touted the benefits of faster seek speeds over spindle speed and transfer rate. Is this true for us also?
  1. Faster spindle speeds are more important than seek speed if you have a good buffer cache. Informix's is the best in the industry and with version 7.30, it just got even better.
  2. Mirroring, striping, and other RAID configurations and on-drive and on-controller caching favor spindle speed and transfer rate over seek speed.
  3. For smaller servers with less memory available for BUFFERS, or for which large stripe sets are not practical, increasing seek speed can help.
  The good news is that the newer drives mostly have spindle speeds in excess of 7200RPM (the fastest are at 10000RPM-plus) and have seek times under 11ms so this is less of an issue than it used to be. However, if you have systems with 3600RPM drives with seek times > 11ms, it is time to start upgrading. This may be the single best investment toward improving performance that you can make.
3. Speed of controllers/channels.
  1. More channels-spread your drives across multiple controllers. Yes, an Ultra Wide controller can support up to 16 drives, but with newer, faster drives with on-drive caches and intelligent on-drive read ahead, it is likely that more than eight drives are capable of swamping the controller. You could increase throughput by spreading those 16 drives across two or even four controllers!
  2. Wider channels-Ultra Wide SCSI, SCSI Fibre, SSA all support transfer rates in excess of 40MB/second with promises of 80MB or even 120MB/sec on the way! Compare that to original SCSII at 5MB/sec or IDE at 3MB/sec!
  3. Caching controllers can make a big difference. The controllers on our system have cache, but at the beginning the caches kept becoming disabled. When that happened, we would notice everything seemed to just drag along. Write-back caches are not important, except for smaller servers with fewer Informix buffers. If you do get write-back caching, make sure there is a battery back-up and cache flush on reboot after a crash.
  4. Look for more intelligent controllers, which are capable of command reordering and elevator writes (a.k.a. scatter/gather). Although Informix tries to do this for you, the disk array schemes I will discuss below reduce the effectiveness of the engine's efforts.
4. High speed I/O subsystems. These include various memory disk schemes, including systems that are basically very sophisticated high-speed, battery-backed, memory caches in front of a traditional disk array. These can make tables with a high access frequency fly! They are expensive, but definitely keep them in mind. Even if you are planning to upgrade to Informix Dynamic Server 7.3x, you should consider these subsystems as an alternative to resident tables, as resident tables can cause lower priority tables to thrash the buffer cache.
Task Limits. What other tasks are running on the server? When? We agreed that for simplicity, we would not discuss shared systems-but practically, this is an issue for most of us and you will need to consider it. Also, what are your requirements for the following? They will each impact your specific tuning efforts:
1. Task completion window.
  What are your application response time requirements? Are they measured in days? Hours? Minutes? Seconds? What must be run when? When must the server be available? When can you run on your back-up system only? Primary only? When are both needed? Can you have a back-up system? Hot? Cold? Can you use your back-up for load balancing? Peak load relief?
2. Scheduling downtime window.
  When can scheduled maintenance be performed? Table reorganizations? Indexes added/dropped? ON-check run?
3. Statistics update window.
  When can statistics be updated without invalidating prepared cursors? Can your applications recover from a -710 error?
4. Damage correction window.
  How long can the engine be down to correct data related problems? Performance related problems? When?
5. Archive/Recovery window.
  1. Duration of Archives using ON-Tape/ON-Archive/ON-Bar? Can you afford a 23-hour, level-zero archive using ON-Tape with one tape drive on your 200GB server? Can you allocate additional tape units and take advantage of ON-Bar? Does your system support multiple tape drives? Tape libraries? Tape arrays (RAID 5 tape sets comprised of multiple drives seen as a single device)?
  2. Can you take advantage of Informix Dynamic Server 7.30 features that make it easier to create multiple mirrors and use rotating mirror sets as back-ups? Cost?
  3. What is acceptable downtime for a major restore?
  4. What is acceptable data loss due to catastrophe? Can you tolerate:
    1. Loss since last archive? No log back-up required
    2. Loss since last log back-up? Can use event alarm program/script to back up logs to disk-letting system archive-secure these to tape daily or even hourly.
    3. Loss to last completed transaction? Need continuous log back-up to tape or other removable device.
    4. No tolerance?
      1. Can you re-run transactions?
      2. Sync with a sibling server?
      3. Use Informix replication? Keep log files small.
OS/CPU limits and their effects on tuning.
1. AIX kernel parameter limits number of shared memory segments per process. Configurable.
2. HP/UX PA RISC has only four shared memory segment registers. If you have to access more than four shared memory segments concurrently, your process will become very slow. Informix needs access to at least: 1-Resident segment, 1-Virtual segment, and for each process 1-Message segment. If your engine has to add more than one additional virtual segment, the CPU VPs will become bogged down. If your application is already using its own shared memory and the engine allocates even one additional virtual segment, your application will bog down switching between the engine's shared memory and its own.
3. Solaris only supports four partitions per drive-meaning, in part, that drives larger than 8GB are only partially usable by Informix. There are third-party volume managers that can get around this problem. Sun has recently begun repackaging the Veritas VM themselves to alleviate this problem.
4. Logical disk volume management lacking. You may want to implement one of the disk schemes discussed below, but your system does not support it. There are third-party volume managers, as well as RAID software and hardware vendors that may be able to help you.
5. System memory disk cache allocation. On certain UNIX boxes, the percentage of memory that is allocated to OS file system buffers may be hard-wired and not configurable. Even if not some UNIX flavors use this as a guideline-freeing unused cache pages for system use if the system is swapping. Others free cache only if it is not being used. Still others treat cache pages as inviolate. A dedicated server system needs a very small UNIX buffer cache.

Virtual Processor Tuning (processor affinity , etc.)

Informix-based processor affinity versus OS-based affinity control.
1. AFF_SPROC, AFF_NPROCS - Affects only the CPU virtual processes (VPs) not AIO VPs, Net VPs, and others. The Informix engine does not do any load balancing! The first VP of each class assumes all of the load unless it is already occupied. If you do share the machine with other applications, a CPU VP can become blocked out of its processor by another process, since each VP has affinity to a single CPU.
2. OS level affinity control can give you both broader and finer-grained control. You will be able to assign affinity all VP classes. You may be able, depending on what your OS supports, to assign a VP affinity to a group of two or more CPUs so that they can float over a limited set of CPUs and avoid becoming blocked. Depending on your system memory architecture there may be little or no penalty for doing this. The OS's SMP task manager can often load balance the CPUs for you with little cost if limited floating is allowed.
  Which way? It depends. Here is a case study: Remember the Data General 10000 memory architecture I presented earlier? It means that a process feels no penalty from migrating from one CPU to another, as long as it stays on a four-CPU card sharing the same L2 cache, and little cost as long as it stays on the eight-CPU board where it's memory image resides.
  
  Originally, on our platform the Informix Dynamic Server 7.13 port did not support either affinity or KAIO. We got a program and sample script from Data General advanced support to permit us to create affinity for our VPs, as I described, to four processor groups-further allowing floating within the eight-CPU hardware group if all four CPUs with affinity are busy.
  
  The scripts I developed from this created affinity for the CPU VPs to groups round robin by increasing group number from one (group 0 is reserved for OS processes and our own system management processes), and the AIO VPs round robin by decreasing group number from seven (we have 32 processors). Since CPU VP number one and AIO VP number two do the most work, they are guaranteed not to interfere with each other as they run on independent processor groups.
  
  When Informix Dynamic Server 7.21 was ported to our platform, both affinity and KAIO were enabled. We jumped on the chance to try them, since the Data General porting group had reported a 25 percent increase in throughput using affinity and KAIO versus affinity, and AIO VP and affinity seemed to improve things even for AIO VP versus non-affined VPs. Unfortunately, we found our own throughput down 50 percent and went back to our own affinity and AIO VPs! Remember: YMMV (your mileage may vary). On other platforms the penalty for floating may be too great or non-existent.
3. NOAGE. On certain platforms, notably Solaris and HP/UX, which aggressively age long-running processes, this can make a big difference. Again-be careful if you are sharing the server with other applications. A NOAGEd engine can preempt other applications, whose priority will continue to drop through aging.
4. RESIDENT. This is a big gainer with relatively little cost if you have enough physical memory.
5. MULTIPLECPU. If you got them, use them. There are reports that on some systems it only pays to use one CPU for Informix Dynamic Server if you only have two. Certainly if you have three or more, configure to use all but one. If you are sharing the server, reduce the number of CPUs that you allocate to Informix Dynamic Server accordingly. It is one way to play nice.
6. NETTYPE. If you are using CPU VPs to listen to shared memory connections, use all that you have. In effect, if you have five CPU VPs, then your NETTYPE parameter for shared memory should configure five listeners, one in each CPU VP. Otherwise, the tendency for the first few CPU VPs do most of the work, will only become exacerbated.
  Also, since these first few VPs which do have the listener threads will be busy doing queries, new client connections will not be served promptly. Do not configure shared memory NETTYPES to use NET VPs if possible. NET VPs use polling for shared memory connections and that will spin your CPUs, tying them up. Similarly avoid TCP NETTYPES using CPU VPs for similar reasons. It increases the number of system calls per second and ties up OS services, because the CPU VPs cannot sleep while waiting for new connections and requests.

Undocumented Config File Parameters/Features

(I do not understand the secrecy surrounding the tuneability of these features. That other database vendor makes a big deal about the tuneability of just these three caches in their engine as a selling point against Informix Dynamic Server.)

Data dictionary cache. This caches your schema information at the table level, so that the system catalog tables do not have to be queried constantly. (This is the reason there is no need for concern over the extent sizing of your system catalog tables.) The default data dictionary cache contains 310 entries. If you have more than about 250 active tables (you need a few extras for catalog tables and to allow for unbalanced hash assignments) this cache will thrash. (In versions prior to 7.23 there was a defect that would cause a SEGV in an MT_NOTIFY thread if this condition persists too long causing the engine to crash.) You can adjust the size of the table by including the following two ONCONFIG file parameters:

DD_HASHSIZE	- # of hash buckets, must be a prime number (default: 31)
DD_HASHMAX	- # of entries per bucket, no restrictions (default: 10)

Use onstat -g dic to determine if your dictionary cache is nearing capacity. There should not be any hash buckets that are full (if your system has the defect discussed above you may actually see buckets that have more items than DD_HASHMAX).

Stored procedure cache. The cache of stored procedure information is also configurable. Unfortunately, I have not been able to acquire either the default values for these parameters, nor any restrictions on their values or ranges. The controlling variables are: PC_HASHPOOL and PC_HASHSIZE.
If you make extensive use of large numbers of stored procedures, and are seeing performance degradation from your procedures as load increases, you can make a case to Informix Support that you may need to increase the stored procedure cache. You may track activity on the stored procedure data dictionary tables in your databases with onstat -t or onstat -g ppf. There should be little or no activity once your system reaches a steady state, or for more ad-hoc systems, after a day or two of normal activity. You can monitor the stored procedure cache by running: onstat -g prc.
Data distribution cache. This cache maintains the data distributions generated by UPDATE STATISTICS MEDIUM/HIGH in memory for active tables. The configuration parameters are DS_HASHPOOL and DS_HASHSIZE. Again, I do not know the ranges of these parameters, nor the rules covering their use. You can monitor the data distribution cache by running onstat -g dsc. I would say that if you find a need to increase the data dictionary, this table probably needs to be expanded as well.

Disk Tuning Parameters

The following parameters affect I/O performance in various ways. I have grouped them according to which ones are most effected by modifying the others:

BUFFERS, LRUS, LRU_MAX_DIRTY, LRU_MIN_DIRTY, CLEANERS
1. Buffers. The more buffers the better. I dedicate as much as 40 percent of total shared memory for buffers. The trade-off is that more buffers can mean longer checkpoints.
2. LRUS. More LRUS better support large numbers of users by reducing bufwaits as users compete for buffers to write to. Due to a suspected defect in the LRU rehash code you should avoid setting LRUS to any of the following values: 16, 64, or 96. There may be other nodal values to the error function also (32, 127, and 128 are known to be okay) the symptom is very high bufwaits (a large percentage of dskreads and bufwrites).
3. LRU_MAX_DIRTY and LRU_MIN_DIRTY. Treat these as one pair of parameters. These are the main way to control checkpoint duration and to trade database throughput for system-wide throughput. Larger values leave more dirty buffers for checkpoint time, increasing the duration of the checkpoint but allowing the engine to optimize dirty buffer flushing to reduce the impact on the system. Smaller values cause small flushes of one or a few LRUS at a time between checkpoints, which minimizes the duration of the checkpoints. However, this can adversely affect other applications running on the system.
4. CLEANERS. Anyone who follows the postings on the newsgroup comp.databases.informix will know that I have always recommended that the value of CLEANERS be equal to the number of LRUS for 7.xx engines (or ==max(#drives,LRUS) for 5.xx engines). However, my most recent investigations indicate that if you have, on average, more than 500 dirty buffers at checkpoint time that need flushing, you may be better off with only one- or two-page cleaner threads if you have response time requirements measured in seconds. I will discuss the problem further in a later section. For now keep this in mind.
KAIOOFF, NUMAIOVPs
1. The environment variable KAIOOFF in the server's environment (if it exists and has been exported), will cause Kernel Asynchronous I/O to be disabled in the engine if your platform supports KAIO. The reasons for disabling KAIO are system/platform dependent and were presented above in the discussion of processor affinity. In addition, if you use only cooked devices, KAIO is non-functional and can be disabled with little or no cost.
2. NUMAIOVPs. If you have KAIO enabled and have only raw disks, then only a few AIO VPs are needed. If you do not have KAIO enabled, the number of AIO VPs needed varies depending on version (the AIO VPs have been steadily improved over the versions). In general, NUMAIOVPs should be equal to, or greater than, the value of CLEANERS. In addition, for versions 7.14 and lower, use 2.5 * #chunks, for 7.2x versions use 1.5 * #chunks, for 7.3x use #chunks + 1*(# especially active chunks) as a lower bound. For cooked files, you can follow the recommendations for non-KAIO installations, taking only cooked files into consideration.
LOGBUFF, PHYSBUFF, PHYSFILE, PHYSDB, CKPTINTVL
1. LOGBUFF and PHYSBUFF. Trade safety for speed. With bigger buffers, more data is at risk in the event of a hard crash. These are limited by the size of the smallest Logical Log file and of the Physical Log file, respectively.
2. PHYSFILE. This should be large enough to permit the desired checkpoint interval during normal and normal-peak loads. Using more than 75 percent of the physical log will force a premature checkpoint. (There is a defect in versions prior to 7.23 that may cause the engine to crash if outstanding transactions fill the remaining 25 percent of the physical log while the checkpoint is waiting for these sessions to complete critical processing.)
3. PHYSDB. Do not place the Physical Log in the rootdb space. After initialization, create a dbspace just for the physical, and/or the logical logs. Shutdown, and modify the ONCONFIG file so that when the engine comes back up it will place the physical log in the new dbspace. On a high transaction server, this should be a separate dbspace from the logical logs also. It is best to keep rootdb, plogdb, and llogdb separated onto different drives with different controllers from each other and from the data drives.
4. CKPTINTVL. Shorter intervals are safer and make for shorter faster recovery in the event of a hard crash. Longer checkpoint intervals will minimize the effects of checkpoints on server throughput. If the interval is too short, your system can spend more time waiting for checkpoints than processing data (been there, done that).
LOGSIZE and LOGFILES. Fewer larger log files are more convenient to manage. More smaller log files are safer, as they will be flushed to tape by continuous log back-up more frequently. This protects against a catastrophic crash.
RA_PAGES, RA_THRESHOLD. Faster systems, especially systems with faster disk farms, need less read ahead. These values, and their utility to us, is heavily dependent on I/O speed. If you use discrete drives look to set RA_PAGES around 64 with a large threshold of between 16 and 32. If you are using a large RAID10 configurations, with a stripe of many pairs of mirrored 10000RPM drives, using a caching controller capable of its own read ahead, you can run a very busy system with RA_PAGES=16 and RA_THRESHOLD=4. Onstat -p reports on read ahead usage stats. The sum of idx-RA, idxda-RA, and da-RA should be very close to the value of RA-pgsused. Increase these parameters until the ratio drops or the read-cache% begins to drop. Excessive read-ahead can cause needed data buffers to be flushed, causing the drop in read-cache%. This can happen even while the RA ratio is improving, so be careful to balance these. If this happens, either back off on the RA parameters until the read-cache% is again acceptable, or increase the number of buffers until the read-cache% improves.

Memory Parameters

Memory parameters and their results are monitored using onstat -g seg and onstat -g mgm. They affect allocation of shared memory from the OS, and partitioning that memory among queries.

SHMVIRTSIZE. This should encompass, at least, normal load memory requirements and is best configured to include normal peak load needs. This is especially important on HP PA RISC and IBM AIX hosts, as already discussed.
SHMADD. This should be at least 10 percent of SHMVIRTSIZE. Here is another trade off: smaller additional segments are faster to allocate and attach to, but fewer segments may be more efficient to use (and on AIX and HPUX are almost a requirement). Processing will halt while all of the oninit processes attach to the new segment, and smaller segments will minimize the effect. Some operating systems do this faster than others, but if you have many VPs it may still take more time than response-time requirements will allow for.
SHMTOTAL. Unless you have a very small memory system, just set this to zero (0) and let memory grow.

Optimizer Parameters and Hints

I am going to defer to the excellent discussion of the optimization parameters and optimizer hints in the 7.3 Performance Guide, which is available on line at http://www.informix.com/answers. Suffice it to say that 7.3x adds much power and control over the functioning of the optimizer.

The Checkpoint Stall Problem

We are all aware that updates must wait while a checkpoint completes, and that SELECTs are not affected. This is not quite true! Read on.

If you have more than 500 dirty buffers at checkpoint time, the checkpoint thread will not relinquish control of CPU VP#1 until all buffers have been handed off to CLEANER threads-as long as there are page cleaner threads available that are not busy. Since user threads currently executing in CPU VP#1 are not migrated to other VPs prior to beginning the checkpoint, or you may only have one CPU VP, this means that even FETCHs and OPENs will hang for approximately the first half of the checkpoint duration (longer if you set LRU_MAXDIRTY and LRU_MINDIRTY high). If you have response-time requirements measured in seconds, this can be deadly.

If no page cleaner thread is ready, then the checkpoint thread will relinquish the VP and place itself on a wait queue until a cleaner thread frees up. With less than 500 dirty buffers, a single cleaner is launched to handle all dirty buffers and the checkpoint thread relinquishes the CPU VP to other threads after that launch. This sounds better, but the result is the same.

Based on this, I am recommending that sites with large numbers of buffers and short response time requirements configure with one or only a few page cleaners so that the checkpoint will have to enter a wait state while the previous page cleaners complete their last assignments. This should cause the checkpoint thread to release the VP and permit queries to continue. This issue is being examined by Informix R&D and, by the time this goes to press, I should have had an opportunity to test the latest theoretical solution. Search the C.D.I. archives at http://www.iiug.org/ for my results.

Disk Configuration

These are Holy War issues-here are the options and issues, with a summary chart to follow.

Raw-versus-cooked files. True-for many years, Informix has been running cooked files in O_SYNC mode, so that the safety issue is long moot. However, I have performed the benchmarks. Raw disk performs 15->25 percent faster than a cooked file in a filesystem on the same device! If you need performance, you must use raw disks.
Singleton drives. Simple and cheap, and no recovery value. Nothing more to say.
RAID 0 - mirrored singleton drives. This gives you maximum safety and recovery through 100 percent redundancy of data. Read performance can be up to twice as fast as singleton drives under load, while write performance is at least as good as singleton drives. Downside? You may need more speed. Recovery cost: close to 100 percent of the surviving drive's resources will be dedicated to rebuilding the replacement drive. Some RAID 0 implementations permit mirroring three or more drives for maximum redundancy, which will reduce the impact of recovery but multiply the acquisition cost.
Informix Mirror-versus-OS Mirror-versus-Hardware/Firmware mirror. Hardware mirrors are the fastest in benchmarks that Informix conducted several years ago. OS mirrors were a distant second, followed closely by Informix mirroring. These were tests under INFORMIX-OnLine 5.0x. I suspect that OnLine Dynamic Server 7.3 mirroring might just edge out OS mirror now. Informix mirror is the easiest to maintain from a DBA standpoint, while your system administrators will make the same argument for either hardware or OS mirror, depending on what tools are available to them.
RAID 1. Many drives striped together so that sequential logical disk blocks are read from or written to the next drive in the set. This will give you speed by essentially doing large amounts of read ahead concurrently. All RA blocks will likely be available after one logical I/O. Downside? Complex to manage and no redundancy, no hope of recovery.
RAID 3 - N data drives + one parity drive. Good redundancy. Like RAID 1, there is the automated read ahead advantage. Parity is read and checked on every read, hardware support in newer RAID subsystems does this with little or no performance cost. Downside? Complex. Recovery cost: close to 100 percent of the surviving drives' bandwidth is occupied during recovery. All drives must be read to rebuild each block on the replacement drive. This can seriously impact production runs during a recovery.
RAID 5 - N+1 drives. Fair redundancy. The parity block for a stripe is alternated between the drives to minimize the impact of sequential writes. Normally blocks from the stripe that are not needed to satisfy a request are not read, including the parity block. This is to improve random reads, but will seriously impair sequential read performance and leaves the array open to errors in the recording media of a single drive. Parity is not checked at read time, and the written data is not verified. Parity is only used to rebuild a completely failed drive. Recovery cost: same as for RAID 3 in addition to the items already mentioned.
RAID 10 (or 1+0). A RAID 1 stripe set made up from N pairs of RAID 0 mirrored drives. This is fully redundant. Safe from any drive or media failure, you can recover with as many as half of the drives lost. In addition, if each side of the mirror are kept on separate controllers and power supplies, you are even protected against controller and power supply failure. Write performance is at least as good as RAID 1, while random read performance under heavy load shows much better scalability than any other configuration (almost double). Recovery cost: replacement of a single drive only affects its mirror which is only 1/Nth of the total array, so production queries are affected at most (100 * 1/N)% of the time. The down side? The safety costs you 100 percent over the cost of singleton drives or RAID 1 or up to 66 percent higher than an equivalent RAID 3 or RAID 5 array.
RAM/FlashRAM Pseudo-Disks. Very fast but volatile. Units with disk secondary storage amount to VERY SMART cache systems. Very good for highly active tables like a current transactions table. If you have a sufficiently large on-controller cache the advantages of these pseudo-disks may not be significant. Only your own benchmarks will provide the answer.

Read	Write	Reliability/ Availability	Recovery
Ramdisk	Ramdisk	RAID 10	RAID 10
RAID 10	RAID 10	RAID 10	RAID 10
RAID 1 (if big I/O)	RAID 0 (if big I/O)	RAID 5	RAID 5
RAID 0 (if big I/O)	RAID 1 (if big I/O)	RAID 3	RAID 3
RAID 3	RAID 3	Singleton	Singleton
Singleton	Singleton	RAID 1	RAID 1
Singleton	RAID 5	Ramdisk	Ramdisk

Table 1: Comparison of Storage Configuration Options.

Database and Table Optimization

Updating statistics for maximum effect.
1. The best strategy I have found for generally updating statistics, for maximum benefit in minimum time, is to follow the recommendations in the Informix Dynamic Server 7.2 release notes I received. Unfortunately, I have also seen at least two other versions of these notes, one with the 7.1 recommendations unchanged and one with no update statistics recommendations at all. The full recommendation is now contained in the 7.3 Performance Guide, and can also be found by searching the C.D.I. archives at http://www.iiug.org/, where I posted them several months ago. You can also download the package I submitted to the IIUG Software Repository named utils2_ak. which contains the latest version of a utility I wrote, dostats.ec, that implements these recommendations. If you have INFORMIX-ESQL/C or INFORMIX-4GL 7.10, you will be able to compile this.
2. When to update stats? The flip answer is when queries stop working-and in all seriousness, this is the main criterion most of us use. To be more proactive, you can do this weekly/monthly, etc., but that can get to be expensive and may interfere with production. Statistics need to be recreated when the nature of the data has changed. In effect, when the ratio of one key value to another is no longer as it was described in the current data distributions, it is time to recreate the distributions. You can peruse the current distributions with dbschema -d database -hd table.
Tables fragmented with large numbers of extents. Use oncheck -pT or use oncheck -pe to check the number of extents a table has and how they are laid out on disk.
1. When and how to compress a table to reduce the number of extents:
  1. Generally, a table with more than 33 extents is probably slower than it could be. (33 because the engine has already quadrupled the size of each extent at 32 extents and the table is still adding extents.)
  2. You can compress the table several ways:
    - Unload the table, drop and recreate with a larger extent/next size and reload the data.
    - Create a clustered index or re-cluster an existing one with ALTER INDEX TO NOT CLUSTER; ALTER INDEX TO CLUSTER;
    - ALTER FRAGMENT ON TABLE INIT IN . This works even for a non-fragmented table and is the fastest way to reorganize a table.
  It is a good idea to adjust the value of NEXT SIZE for the table before reorganizing to minimize future fragmentation.
2. Avoiding large numbers of interleaved extents.
  1. Place the table in a private dbspace dedicated to its use. Tables built this way can have at most one extent per chunk in the dbspace and cannot interfere with other tables' I/O needs. Obviously, you can only do this for a limited number of high growth tables.
3. Place high growth tables into dbspaces along with low growth tables or tables with different growth patterns (i.e., a table that is appended to monthly shares a dbspace with ones that are updated daily so it does not interfere).
4. Table fragmentation schemes. Again, the 7.3 performance guide has a very thorough discussion of this topic, so I will just touch the issues and refer you there.
  1. A table needs to be fragmented when it:
    - Needs more than 16 million pages. (A tablespace is limited to 16MM pages.)
    - Needs parallel search performance.
    - The optimizer can perform fragment elimination on critical queries.
  2. An index needs to be detached when:
    - Its table needs almost 16MM pages (detaching the index can permit the table to grow a bit more before needing to be fragmented).
    - The index key does not match the fragmentation expression. You may even gain by fragmenting the index according to its own key.
    - The table is fragmented ROUND ROBIN to speed searches.
    - There is a need to spread I/O load during complex DSS type queries.
5. Disk Location. Informix continues, even in the 7.3 Performance Guide, to recommend that you locate the busiest tables nearest to the center of each disk. This is a good recommendation if you use singleton drives. For more complex disk farms, this is both impractical and of little benefit. Why? It is very difficult to place tables and calculate required offsets in a large RAID set. The disk and controller caches and read ahead capabilities, as well as the gain from multiple spindles in RAID 0 and RAID 10 sets, reduces the impact of head movement. Also, Informix's own cache reduces the random I/O nature of the database's I/O patterns.
6. Indexing strategies. Things to keep in mind when designing indexes.
  1. Versions before 7.24 contain a defect (partially fixed in 7.24 and completed in 7.3) that causes the optimizer to only use the first two columns in a multi-column index. The optimizer will always filter the remaining keys from the data pages unless a KEY-ONLY search is selected. This is not reported in the sqexplain.out file.
  2. OPT_GOAL and DIRECTIVES make it more useful to create additional indexes in 7.30 and later. See the 7.3 Performance Guide for discussions of these features in depth.

Parallel Data Query

The advantages of setting PDQPRIORITY may seem obvious, but there are non-obvious issues and repercussions to take into consideration.

Effect on searching fragmented tables. Allows searching multiple fragments in parallel if sufficient CPU resources are available and your I/O subsystems can provide sufficient throughput.
Affect on multiple table joins. Allows both sides of merge-join and nested-loop query paths to operate in parallel
Affect on sorting. I know that the manual states that values of PSORT_NPROCS > 10 are no-op. My own testing shows otherwise-and has been verified by several other users-so I recommend the following settings to maximize sort speed for index builds, UPDATE STATISTICS, and complex queries which must complete in minimum time:
PSORT_NPROCS=40
This is the maximum effective value I have been able to verify.

PSORT_MAXALLOC=10240
This undocumented variable controls the size of in-memory sorting. PSORT_DBTEMP=.....
Set this to a list of at least three filesystems. Remember that the smallest, in terms of free space, of these will control the total amount of available sort-work space.

If you prefer to use DBSPACETEMP for a sort-work area, be sure that you have at least three dbspaces in that list, and that PSORT_DBTEMP is unset.
Affect on other users.
- Remember that you cannot allocate more than 100 percent of available resources. This means that if there are two users with PDQPRIORITY>50 (and MAX_PDQPRIORITY=100), then one of these users will have to wait for resources to free up. Therefore PDQPRIORITY effectively limits concurrency. Not a problem if you are tuning a DSS system with response time requirements measured in hours or even minutes.
- Remember that the granularity of PDQPRIORITY is not one percent, but the granularity of each of the various resources that it controls. For example, if you have only two CPU VPs, then a PDQPRIORITY setting of 2 is effectively the same as one of 50 for your system!

Using Tool Features To Improve Performance

Three performance enhancing features are described here. The Cursor Buffer Sizing and Fetch Array features are fully documented in the "Documentation Notes for INFORMIX-ESQL/C Programmers Manual" for versions 7.23, and later in the release directory structure in file ESQLCDOC_7.2. I present the highlights and justification.

Multiple connections are effective for several purposes:
1. To avoid locking large numbers of rows. If you update or delete rows based on the results of a query, depending on the database's and/or the session's isolation level, you may end up holding many locks on the rows touched by the query. By making the query in a separate connection, there will be no locks held.
2. To improve performance when accessing data from multiple databases whether they reside on the same or separate servers. By opening a separate connection to each database you accomplish two things:
  - You reduce impact on the local server by routing the remote connection directly to the remote server, rather than passing all communication through the local server.
  - You improve throughput. Access to a "foreign" database's tables, even if the database resides in the same server instance, is significantly faster with separate connections. This is even truer if one of the database servers is an OnLine 5.xx engine being access by INFORMIX-ESQL 7.xx source. This seems to stem from the local engine pushing communications buffers and header information related to the remote connection aside and having to swap it back in while alternating communications between two databases. The INFORMIX-ESQL multiple connection code seems to be more efficient at doing this.
Adjusting the size of the SQL Cursor Buffer. Communications with the engine are accomplished using a communication buffer, which by default is 4K in size. Increasing this size will significantly reduce network traffic and overhead and speed large queries. The range of valid buffer sizes is 4096 -> 32767. This may be set in the user's environment with the environment variable FET_BUF_SIZE, pushed into the environment using putenv() prior to opening a cursor, or can be set using the global variable FetBufSize, which is a signed short integer as the number of bytes to use in the buffer.
This will improve performance for all queries where more than one complete row can fit into a single Cursor Buffer. This is supported in all OnLine Dynamic Server/Informix Dynamic Server versions since 6.0, and has been officially supported since 7.23. FetBufSize is a session wide value and is used for ALL communications buffers for all FETCH and PUT statements. BTW you cannot depend on FetBufSize to be set to the value of FET_BUF_SIZE so use getenv() to check if the user has set a buffer size preference in the environment.
Array fetch feature. Normally, when an ESQL FETCH is executed, a single row is copied from the Cursor Buffer to the applications local variables, either to bound host variables or to the memory pointed to by an sqlda structure or SQL DESCRIPTOR AREA. There is overhead associated with moving the data one row at a time, but that is the way most software wants to deal with it.
Since OnLine Dynamic Server 6.0, there has existed a poorly documented feature known as Array Fetching. The feature has always been a part of OnLine Dynamic Server/Informix Dynamic Server and was officially sanctioned in 7.21. Basically, through the global variable FetArrSize you specify the number of rows you are capable of processing in a single FETCH (this is adjusted so that rowsize * FetArrSize < FetBufSize), point each (sqlda).sqlvar.sqldata to an array of FetArrSize elements large enough to hold that column, and FETCH. The sqlca field sqlerrd[2] will contain the actual number of rows transferred into your arrays during the latest FETCH. You can then loop through these rows in your own mini cursor code, walking the parallel arrays for each column fetched. This is much faster than the row-by-row method.

Summary

Well that's it. The sum total of all of my knowledge of Informix Dynamic Server-at least the tangibles that I can articulate easily. I hope that you will have found this article useful, and have not exclaimed, "What in the world is this guy talking about!" too frequently. I have refrained from making concrete recommendations, except when I am confident of my footing. Take those recommendations to heart.

You are using, or are planning to use, the most technically advanced and most complex database product on the market. The experience is rewarding. I know of very few DBAs who have used Informix Dynamic Server and do not love it and recommend it. I know that the same cannot be said for DBAs using competitive products.

Preparing this article and the presentation to the NY Informix Users Group that preceded it has been a revelation and a rewarding experience. I have learned much that I thought I already knew, and coalesced my thoughts on things I never knew I knew. Thanks to everyone on the comp.databases.informix newsgroup-you were all of great help. I have learned from many, especially from Jonathan Leffler, Mark Stock, and others too numerous to risk forgetting some. Thanks to Tim Schaeffer and Ed Yee for encouraging me to flesh out my talk into this article. Thanks to the many people in Menlo Park and Lenexa who have given me great advice and technical assistance over the years, and have helped me to learn the secrets of tuning this beast called Informix Dynamic Server.

About the Author

Art S. Kagel has over 16 years of computer programming, systems analysis, software and database design, project management, database administration and consulting experience. He has served as project leader on project teams from two to fifteen programmers and supervised over 30 technicians on a departmental basis. He has worked with Informix products since about the time INFORMIX-4GL incorporated SQL.

Mr. Kagel was educated in Computer Science at the Polytechnic Institute of New York, with post-graduate work at New York University in Information Systems and Decision Support Systems. In addition, he has earned a degree in Business Management and Marketing from St. John's University in New York. Mr Kagel is an expert in the areas of decision support systems, database design, software design, probability and statistics, market analysis, media planning analysis, financial application systems and time accounting. Mr. Kagel first earned recognition in the field of decision support systems for having developed the first working model base management component for a decision support system. Mr. Kagel has also written for technical journals on various aspects of data structures and analysis, and has taught classes in relational database topics and programming techniques in several computer languages, as well as in the use of various software packages and database systems. He humbly admits to being one the leading experts in the application and maintenance of Informix database server products.

In 1985 Art Kagel invented and published the Unshuffle Algorithm. The Unshuffle algorithm is a sort algorithm which is the most efficient available when applied to most real-world data sets (don't tell him that quicksort is faster for random data-he gets really testy). Mr. Kagel also developed an algorithm for the merging of sorted sinks, which can be shown to be the most efficient algorithm possible.