Progress
Performance
Frequently Asked Questions
Chris Schreiber
Fast 4GL Systems, Inc.
V3.0, 11/08/98
Progress Performance FAQ
Chris Schreiber (chris@fast4gl.com)
Revision History:
1.0
11/06/95 First complete, official release1.01 11/09/95 Minor correction to 2.1.4 (Conversion)
1.02 12/02/95 Cosmetic changes to how I create the document for easier updates
1.1 02/06/96 Revisions to all of 4.5 (dump/reload). Added 3.2.1 (schema caching), 1.4 (suggested reading path), 2.1.5 (V8). Notes sprinkled throughout with references to other places to look for information on some topics, mostly to Dan Foreman’s book.
1.2 04/07/96 Removed the examples section and renumbered accordingly. Added sections 2.1.7 (Workgroup vs. Enterprise), 2.3.4 (Server vs. Broker), 4.7.1 (How much space in one file/table). Added information on promon in 1.3.1, updated client parameters in 3.2, new stuff on extents in 3.3, keeping users from deleting the database under UNIX in 3.4, corrected description of BI use in 3.9.1, noted use of dbanalys utility to 4.7
2.0 09/02/96 Updated 2.1.4 with caveat about V7 memory use, 2.3.4 updates the server vs. broker distinction, 3.2 has new client parameters, 3.4 mentions buffered AI, 3.8.1 has more APW suggestions, 3.8.3 has a caveat with the AIW, 3.9.1 has BI cluster size recommendations, 3.10 has more on the 63K .r code limit, 5.2 has a variety of new UNIX specific information, added info to 7.1.2 on getting Knowledge Base entries. Added 1.3.4 (VMS monitoring), 1.5 (The Performance Big Picture), 3.3.4 (OS logical volumes usage), 3.9.2 (BI Buffer Caching), 4.5.6 (What’s procopy good for?) 4.9 (Controlling misbehaving Progress processes), 7.2.8 (Software). Changed Word document to print nicely.
2.1 10/02/96 Added 3.9.5 (BI Formatting), 5.2.10 (Digital UNIX specifics). Updated 3.2 with some more client parameters, 3.8 discusses writer process precedence.
2.2 10/05/97 Added 2.1.5 (V7.3), 2.1.6 (V8.1), 2.1.7 (V8.2), 3.11 (Promon Details), 4.10 (Smart Objects), 5.3 (Windows NT). Updated 5.2.6 (AIX Specifics).
3.0 11/08/98 Added 2.1.8 (V8.3), 2.1.9 (V9/Skywalker), renumbered 2.1.8 to 2.1.10 and 2.1.9 to 2.1.11
Note that items in section 7 (such as WWW info, books, contributors, etc.) are updated on almost every release without specific comments in the listing above. There are endless minor corrections that I don’t bother documenting as well throughout.
The master source for this FAQ at the moment available from Fast 4GL Systems, Inc at:
http://www.fast4gl.com/resources/pperform/
I would like to thank Greg Smith for having created this document and maintaining it for so long. I know the Progress community will miss Greg.
Feel free to distribute this, put it on your FTP site, include links to my page on yours, or otherwise use and abuse it. The only thing I ask is that you make sure to include my name, e-mail address, and web page address if you snip away a part of the document to give to someone; that way they can get a copy of the full document if they want.
Table of Contents
1.
Introduction *1.1
How do I know I have a problem? *1.2
I don't have time to read everything out there, what are some quick suggestions? *1.3
How can I tell what’s going on? *1.3.1
Using promon on shared memory systems *1.3.2
Monitoring UNIX systems *1.3.3
DOS, Windows, NLM *1.3.4
VMS *1.4
Suggested reading path *1.5
The Performance Big Picture *1.5.1
Benchmarking *1.5.2
Record access speed as the bottleneck *1.5.3
Transactions as the bottleneck *2.
Progress Performance Major Components *2.1
Which Progress version should I use? *2.1.1
6.2 *2.1.2
6.3 *2.1.3
7.X *2.1.4
7.3 *2.1.5
8.0 *2.1.6
8.1 *2.1.7
8.2 *2.1.8
8.3 *2.1.9
9/Skywalker *2.1.10
Conversion Issues *2.1.11
What’s the difference between Workgroup and Enterprise versions of Progress? *2.2
Why are shared memory systems better? *2.3
What's the best connection mode? *2.3.1
Single user *2.3.2
Direct connect (self service) *2.3.3
Client/server (remote service) *2.3.4
What’s the difference between a server and a broker? *2.4
What's the difference between character and GUI modes? *2.4.1
What about GUI with Windows? *2.4.2
What about GUI with X-windows? *2.5
Database design *2.5.1
How does database size change things? *2.5.2
What's all this normal form stuff mean? *2.5.3
How do I optimize the database for queries? *2.5.4
How big should my records be? *2.5.5
Core database and connected database architecture *2.5.6
Multiple connected databases *2.6
Speeding up Progress 4GL programs *3.
Progress parameter and process tuning *3.1
What should my server parameters be set to? *3.1.1
How big do I set -B to? *3.1.2
Shared memory servers *3.2
Client parameters *3.2.1
Schema caching *3.3
Why use a multivolume database? *3.3.1
File system considerations *3.3.2
Single process access limitation *3.3.3
Direct (raw) file systems *3.3.4
What should operating system logical volumes be used for? *3.4
Are after image files worth it? *3.5
How much RAM do I need? *3.5.1
RAM on servers *3.5.2
RAM on clients *3.6
Temp files *3.6.1
Temp files on servers *3.6.2
Temp files on clients *3.7
Multiprocessor systems *3.7.1
Using -spin *3.8
What do all these writer processes do? *3.8.1
Tuning APW *3.8.2
Tuning BIW *3.8.3
Tuning AIW *3.9
Why is the before image file important? *3.9.1
Changing BI Cluster and block size *3.9.2
BI Buffer caching *3.9.3
Why you shouldn't use -bibufsize *3.9.4
Delaying transactions with -M *3.9.5
BI Formatting *3.10
Application .r code *3.11
Promon Details *4.
Specialized Progress solutions *4.1
Read only databases *4.2
Disabled indices *4.3
Running in no integrity mode *4.4
Faster index rebuilds *4.4.1
Index rebuilds on very large databases *4.5
My dump/reload takes forever; what can I do? *4.5.1
Why should I bother doing a dump and reload, anyway? *4.5.2
Will a dump/reload decrease the database size? *4.5.3
What’s a parallel dump/reload? *4.5.4
Incremental loading methods *4.5.5
Binary dump and load *4.5.6
What’s procopy good for? *4.6
Off-hours batch programs *4.7
Ongoing database maintenance *4.7.1
How do I tell how much space is taken up by one particular file/table? *4.8
Backup and restore operations *4.9
Controlling misbehaving Progress programs *4.10
Smart Objects *5.
Platform-specific recommendations *5.1
What about Novell NLM servers? *5.1.1
Optimizing the server computer *5.1.2
Optimizing the NLM *5.2
UNIX Kernel parameters *5.2.1
Semaphores and shared memory *5.2.2
Process and memory limits *5.2.3
DG/UX Specifics *5.2.4
SCO UNIX Specifics *5.2.5
System V R4 Specifics *5.2.6
AIX Specifics *5.2.7
HP/UX Specifics *5.2.8
Sequent specifics *5.2.9
UnixWare specifics *5.2.10
Digital UNIX specifics *5.3
Windows NT Specifics *6.
Hardware selection *6.1
Bus architecture *6.2
Disk subsystem *6.2.1
What about disk controllers? *6.2.2
How many drives do I need? *6.2.3
Should I use SCSI or IDE drives? *6.2.4
What type of SCSI interface do I need? Fast? Wide? *6.2.5
What about caching? *6.3
Why should or shouldn't I use mirroring or RAID drives? *6.3.1
RAID Data integrity *6.3.2
RAID Performance *6.4
I've got a bunch of disk drives; Where should I put things? *6.5
What about the network? *7.
Credits and additional resources *7.1
How did you find out all this? *7.2
Where can I get more information? *7.2.1
Progress Software *7.2.2
PEG e-mail list *7.2.3
Progress WWW sites *7.2.4
Newsgroups (comp.databases.progress) *7.2.5
Progress FTP sites *7.2.6
Books *7.2.7
Magazines *7.2.8
Software *This FAQ was written in an attempt to describe the major factors that influence the performance of a computer running the Progress database system. The best resource is to find those who have worked with systems similar to your own who can give specific recommendations. But there is a set of common solutions that often work and should be investigated and tested before asking for specific help. This FAQ presents the common solutions so you can have the proper background before trying to get help for your particular case.
The information here was gathered through a variety of sources (credits are listed at the end), but ultimately any errors are my fault. There are a variety of risky things discussed here; you should never be experimenting with performance improvements on any database without a current, verified backup. A test system just for such purposes is even better, but prohibitive in cost for most sites. I should probably insert a bunch of legalese here to cover my butt, but what it boils down to is the suggestions here are tried at your risk; don't blame me if your database goes up in flames because you tried messing around with the filesystem, your last backup was two years ago, and you can't even read the tape when you try it.
Progress is a trademark of Progress Software Corporation. They can be contacted at (781) 280-4000. NetWare belongs to Novell, DG/UX to Data General, HP/UX to Hewlett Packard, VMS to DEC, and AIX to IBM. Who knows the owner of the UNIX trademark nowadays; last time I checked, it was SCO. Windows is a registered trademark of Microsoft, who would probably be quite happy to sue me I didn't mention that. If I missed anybody else's trademark, I'm sure I'll hear about it, hopefully by some other method then a summons.
Usually, if there's a performance problem with your systems, you don't have to ask this; the users will be knocking down your door trying to get things to go faster. It's a good idea to do your testing when performance is going well, though; that way, you have a little more slack in case you accidentally slow things down instead of improving them.
For most Progress systems, the first things recommended are:
1. Check the -B parameter to make sure you're using the memory you've got
2. Check out the system performance information to make sure what the bottleneck is, tune system parameters to improve it
3. Add more RAM and increase -B to use it
4. Add more disk drives to the system, move files around to balance the load on each drive
Beyond that, you're probably going to have to chew on quite a bit of information before you see something that helps you out. Even this "summary" is rather long, but that's because there is no getting around the fact that Progress is a complicated system with many components, as is your operating system. Expecting that magically tweaking a couple of parameters or throwing some money at the system in the form of RAM or disks will solve the problem is not necessarily the case; often, you'll just shift the bottleneck to some other component, possibly with little or no improvement. A whole system look is really what is required for proper tuning.
Using promon on shared memory systems
Progress supplies the promon utility for computers with shared memory (see below for a more detailed discussion of shared memory). Running promon will give you a variety of statistics on what your database server is doing. It's a valuable tool, and you should use it regularly, even when the system is performing well. Information gathered during good days is useful in figuring out what's going wrong on bad days. promon is documented extensively in the Progress System Administration manuals.
One thing to watch out for is that some of the percentage calculations done in promon are a bit flaky; if you see anything that looks obviously wrong, try and figure out where the % is calculated from and do the math yourself. For example, it’s not unusual to see the buffer hit % go nuts, going absurdly low or over 100%--but if you look at the database activity screen and figure out the ratio of logical to physical reads and writes you can compute it yourself as a sanity check if things look suspicious.
Note that when you first access statistics on something in promon, you get the information since when the database was started. If you then hit a key to get another screen, you’ll only get information on the activity since you last looked. Exiting back to the main menu and entering a section again gets you the whole thing again.
UNIX systems let you watch a number of things. The three major categories of items to monitor are CPU utilization, memory utilization, and disk utilization. On AT&T derived systems, the sar utility will give you all these numbers, although they are sometimes deceptive. On some BSD-style systems you might have to run iostat and vmstat to get some pieces of the puzzle. There are other utilities that often are helpful; swap -l for swap memory information and ipcs -m for shared memory information, for example. There are endless UNIX watching utilities to be found on the Internet. Newer systems can have fancy utilities like rtpm built-in for your use.
General UNIX performance tuning is a bit beyond the scope of this document. You can check out the manuals that hopefully came with your system and the man pages on the system for more in-depth discussion of the topic. My favorite introduction to most UNIX topics is the Nemeth, Snyder, etc. UNIX System Administration Handbook published by Prentice-Hall; it is particularly useful because it discusses many different versions of UNIX and what the different commands are for each version. Other useful books to read on the topic include System Performance Tuning (which is admittedly a bit dated) and Essential System Administration (this one was recently updated) from O'Rielly and Associates. UNIX System V R4 is covered in good detail in UNIX System V Performance Management. SCO UNIX users can check out SCO Performance Tuning (Miscovich/Simmons, Prentice Hall). There are a variety of places on the Internet where this sort of thing is discussed; you might want to try the newsgroup for your variant of UNIX as a start.
These platforms lack shared memory (so they can't run promon) and are missing most of the good tools users of other operating systems have for monitoring performance. You can look at the Novell monitor to get an idea what’s going on at the OS level there, but the statistics you get when the NLM server is running are less than totally reliable (in particular, I’ve seen the CPU usage go way up even when the NLM wasn’t doing anything and other such strangeness). Usually you will have to resort to benchmarking operations (running long reports is usually a good test for some things, but it doesn’t say anything about record creation or update speed) and seeing before/after performance changes to see if what you altered is effecting run times. Specific hints are for the these operating systems are speckled around this FAQ, you might find something that helps you out further along. None of these are top performing Progress server platforms no matter how well tweaked. The lack of shared memory hurts you in several ways, as does the older code (non-32 bit and such, unlike your average UNIX).
There are a couple of tools to monitor VMS performance at the operating system level. The monitor command has a huge number of command switches to let you check out things like caching, disk efficiency, and CPU usage. The show command will let you examine things like memory and the system processes.
There’s a specific audience I’m trying to write to: intermediate Progress administrators who know their way around, but haven’t earned their guru merit badge yet. If that’s not you, things may be less than totally helpful to you. References to products here are all clarified at the end in the books section.
There are two basic paths to follow when picking up Progress information. The first involves learning all about Progress programming details, while the second deals with administrative details. Most people who work with Progress don’t have the luxury of being purists in either category but instead work both sides of the fence.
Programmers should start with the stuff in the Progress box; the tutorial and programming books are good enough that no one else tries and duplicate the material given there, most of the other things you’ll read assume you already understand all of the stuff in there and want to build on it. I’d next recommend picking up White Star Software’s Making Good Progress. For in-depth details of specific language functions, the line of books from TJD does a good job of explaining things, although last I checked they were a bit dated. After you’ve digested some of this material, if you’re looking for more performance oriented information, get Dan Foreman’s Progress Performance Tuning Guide. It includes a wide variety of programming techniques that improve performance, a topic I don’t even scratch the surface of here. All these books will pay for themselves the first day you find out something that makes some report you’ve had problems with in the past suddenly run twice as fast as it used to.
Programmers who are using some of the automated tools Progress provides (like the UIB or SmartObjects) aren’t going to find much to help them out in any of the books mentioned above--they’re all aimed (so far) that those writing code directly in the 4GL. Some books on those topics are trickling out. There is popular one about Progress V8 available from Que, however it has been sold out and out-of-print for some time.
Progress administrators who concentrate more on that than programming should start with the system administration books that come in the box; again, these are good enough that no one tries to duplicate much of that material. After you’ve absorbed them, the majority of this FAQ should make sense to you. Make sure to check out what’s available to you in the Progress knowledgebase as well, you’ll find tons of answers to common questions. The books from White Star will also be useful to you. The Progress Performance Tuning Guide has tons of good comments on all sorts of things you can do to reorganize your system. Most of the administration topics there are touched upon briefly here, but Dan gives examples, benchmarks, and details about a number of things that are simply mentioned in passing by myself. There’s also a number of sections describing database internals information that you just won’t find anywhere else. Again, all of these will pay for themselves the first time they point something out to you that you didn’t know and make something run bunches faster than it used to. I wish I had all of them available when I first started working with Progress, there’s any number of dead end paths I wouldn’t have had to walk down if I could have read them before.
In an effort to make this FAQ a more useful guide, I've created a diagram (with Visio, an excellent package for this sort of thing) that encapsulates the relationships between most of the things in and related to Progress that impact performance. All the major pieces are labeled with the appropriate section numbers to direct you along.

The most readable introduction I’ve found of how many of these pieces fit together is in a slim Progress document titled "Version 6.3 Documentation Bulletin"; it came with that particular version, and anyone who went straight from V6.2 to V7 missed it. Some of this material is so spread out in later documentation that it’s not nearly as helpful for getting an overview (although you do need to realize that some of the minor details, like parameter names, have changed a bit in V7 and up). I often find myself referring back to it when trying to clearly fix in my mind exactly what is meant by works like "checkpoint".
It's helpful to recognize that there are two major facets to Progress performance problems. The first is having a limit on just how fast records can be accessed. The second is a limit on how fast transactions can be pushed through the system. Each of these deserves its own discussion, along with some information on benchmarking to help you tell which category you fall into.
In order to verify that you have actually improved performance, you need a way to measure that performance. Just about everyone who is concerned with performance sits down and starts to write a simple benchmark that lets them measure how fast their system is.
Here’s how it usually goes. You write a simple program that iterates over some file in your system; usually a simple for-each loop that does some trivial operation, with some elapsed time computations. When you run it, you’ll usually find that what happens is that your CPU usage skyrockets to take over the entire machine, and the performance is totally unaffected by how you’ve configured the system (altering, say, the -B buffers does nothing). The problem with this approach is whatever data you’re looping over ends up getting into the buffer cache on the first run, and after that all you’re doing is testing how fast your computer can move that data around in memory. That’s why the CPU usage skyrockets; without any disk access to slow it down, Progress can and will sift through those records as fast as possible, using up all the CPU time it can get to do so, just like you should want it to.
Obviously, this isn’t a good approach. You can get slightly better results if you pick some monstrous file that won’t fit in the -B cache, but then what happens is that you constantly blow out the buffers and essentially they become useless. At that point, all you’re testing is how fast you can load new buffers off the disk, since the buffer cache always has old data you’re not looping over any more.
In real life, the buffer cache and its associated processes work well because there are bunches of users, each looking at different data. The common data ends up staying in the cache, while infrequently accessed records get pulled into memory and head back to disk when the processing on them is done.
For effective Progress benchmarking, you need to control quite a few factors. First, you obviously need consistency in the parameters used to start all the Progress programs. Second, you need to be consistent in what data is in memory when you start your program. The initial run of any benchmark will result in some information loading into the buffer cache, so later runs will run faster. Besides, the initial run is unrealistic; the vast majority of time programs are run against the database in reality, the buffer cache is already "primed" with some data. Third, you really do need to have multiple processes running at once to realistically assess performance. Consider using the batch facility to start a variety of different types of programs; maybe some that do reports, some that create or delete records, and some that update existing information.
If you’re thinking this is starting to sound complicated, you’re right. You are not going to "throw together" a benchmark that is really of any use at realistically measuring performance in a couple of hours. The last one I did that ended up being useful at all took several days worth of work. Things like controlling transaction scope so that your programs aren’t crippled because they overload the record lock table becomes critical. There seems to be some movement in the Progress community on measuring performance more accurately; check out the offerings that are appearing in the software section at the end of this FAQ for information on the products starting to trickle out that address problems like this.
What most people end up doing is finding a complicated report or two and running them, just to get an approximate figure on how things are running or how they are improving. That’s about as accurate a benchmark as you’ll get without putting some serious time into studying the issues involved and measuring things more accurately.
The reason most simple benchmarks are useless is because all they test is how fast you can access records that are already in memory. Sometimes, though, that’s a very important thing to know; it’s possible that this sort of situation is the performance problem with your system. Usually the only time you’ll run into this is if you have very complicated reports to run. If most of the data for your report can fit into the buffer cache on your system (perhaps because it’s running late at night when it has the whole system to itself), the only limit on how fast processing can happen is how quickly Progress can shuffle memory around and compute the information that appears on the report. The easiest way to tell if this is your problem is to watch the system statistics when the program you’re interested in is running. If the CPU usage is way up, the disk I/O is low, and promon tells you not much is happening to the buffer cache, you’re in a situation where record access speed is the only bottleneck on your system.
If this is your problem, about the only choice you have is to upgrade to a faster CPU. It’s possible that adding more processors to your system might help, but that depends on how well the programs you’re running can utilize the additional processors (if it’s one big report, having more processors to wait around and watch the one execute the one program won’t help).
One thing to remember is that, on a client/server system, much of the record processing is occurring on the client systems. Because of that, it’s much easier for a record access speed bottleneck to appear in that situation. The only real solution is to upgrade the horsepower of the clients with faster CPUs, more data stored locally, and a higher bandwidth network connection to the server. More on that topic appears in the client sections below.
This is the situation almost all sites running self-service clients find themselves (client/server sites usually have their problems in other spots). How fast things flow through the system is limited primarily by how fast transactions can be written to the BI file (more details on that process appear in 3.9) and then synchronized to the database.
Just what’s involved here? Look at the diagram. The self-service clients do their access to the shared memory buffer cache; this happens at for the most part at memory speeds. The updates that they do are organized into transactions, and they go into the BI buffer cache. From there, the BIW writes them out, and then the database block in memory is marked as modified. Those modified blocks get written out by an APW. When an entire BI cluster worth of changes has occurred, the system does a checkpoint, where it forces outs writes of all the modified database buffers.
That process is the major thing that determines how fast a transaction oriented system can go. It’s efficiency is the one of the first things you should investigate. Most of the tools you need are sitting in promon.
Obviously, how big the buffer cache is factors in. If it isn’t large enough, Progress can’t keep enough data in memory to make any of the rest of these processes useful. The broker will be forced to "evict" records from memory back to disk prematurely because it needs more space to deal with new data. You can tell if you have enough buffer space by looking at what portion of the database reads are being cached.
Checkpoints are a very high overhead process, but they only occur when a BI cluster is full. You can lower the overhead of a checkpoint by increasing the BI cluster size. For larger databases, this is the first thing to do after making sure your buffer cache is big enough. Don’t go making them too large. The system freezes as far as the users are concerned when a checkpoint occurs, because no access to the database can happen until the checkpoint processing is finished. As the cluster size increases, the amount of time that checkpoint takes will increase. Make the cluster too big, and the pause will be too lengthy.
Next, you need to have a BIW and APW(s) running for each database. If you have a version before V6.3, or you have a new version but have a Workgroup license, these are not options. V6.2 and earlier systems can have serious problems with the checkpoint freezes mentioned above. By writing changed BI and database buffers out constantly, the BIW and APW processes make the checkpoints take a minimum of time. That was the reason these processes were introduced. If you’re on a Workgroup system, you are essentially back to V6.2 performance, with its associated problems.
If you can start a BIW, do so, and make sure you’re specifying an appropriate size for its buffer cache. If you’re using AI, start an AIW and make sure it’s buffered correctly as well. Start one or two APWs. You monitor the efficiency of all these processes with promon. The BIW and AIW have "buffer wait" statistics that should be at 0 if they are properly configured, and looking at them should show that they wrote a number of buffers out. The "writes by APW" figure should be as close to 100% as possible. Hopefully you have a V7 system that tunes APWs automatically; if not, you’ll have to do some unpleasant tuning on the frequency with which they run.
After you get all that configured correctly, the limit should be how fast your OS can deal with the disk drives. If you database is large (or if you’re using an OS that has a single-process limitation), you may find a significant performance improvement by splitting the database into a multiple volume one to lower file system overhead. Also, having multiple extents for your BI file can help in some circumstances, as it can ease the overhead of BI file maintenance.
Next, you’ll need to examine the disks on your system to see how much traffic they get. If it’s more than they are keeping up with, you’ll need to get more of them and intelligently split things up (putting high-activity areas like the BI and temporary files onto another disk, for example). If you already have things split to many disks, examine what things you can split up further; for example, if your temp file disk is overloaded, you can add another one, since the location of temporary file space can be modified on a user to user basis. If you partition things up well and the system still can’t deal with the disks properly, consider adding more disk controllers to spread the load out.
Even though I/O performance is what has been focused on here, you will need to have sufficient CPU power to keep up. Consider adding more processors. If you have multiple processors, make sure you are tuning the spin parameter to properly utilize them.
If you’ve gone through this checklist, and everything seems tuned fine, you may need to consider redesigning parts of the application you are using if things still aren’t performing well. If you haven’t done so, a review of the indexes and code by someone familiar with performance problems is certainly in order before you spend too much cash on hardware.
Before you start asking about parameters to alter or other such things that usually give relatively minor improvements, you should first understand the big architecture choices that can really impact performance.
Progress is continually improving its product, and usually new releases have better performance than the ones before them. You should also realize that in a client/server environment it is possible to use a different version for the client program then what is running on the server (in fact, it's often mandatory, because Progress releases for clients tend to come out before the ones for servers). It is possible to have a V6 database accessed by clients running V7. Progress supports a one version difference between client and server (you're not supposed to be able to connect to a V6 server with a V8 client). And there's no backward compatibility for the clients--V6 clients can't talk to a V7 database server.
There are still lots of 6.2 databases out there. This and earlier versions are much improved upon by versions 6.3 and up. If your database is running too slow and you've got something before 6.3, you should probably upgrade.
6.3 is the easiest version to upgrade to for those running V5 or earlier V6 releases, as the differences between the releases are fairly minor. You'll probably get away with running the proutil conversion program and recompiling your source code (the 6.2 to 6.3 update doesn’t require a recompile and is even reversible if you have problems). There are major performance enhancements in 6.3, including the addition of writer processes and support for multiple processors; that is, if you have the Enterprise version (most do, as the Workgroup/Enterprise distinction came long after most 6.3 installations).
The various V7 releases are have a large amount of improvement over the past versions.from Progress. The performance improvements can be quite significant; in particular, client-server performance is greatly improved. The indexes used for files are compressed, making their use much more efficient. This happens whether you use the conv67 utility (after compression, empty space is left in the database where the old index blocks used to be) or a dump/reload (database will reduce in size) to upgrade; . Source code is usually ported over to this version without too many problems, but ultimately you may need to rewrite quite a bit of your application to get the full benefit of the new features in V7. You’ll get most of the performance improvements automatically, but the new GUI features will require new code. V7 also releases the 63K limit on .r code, in itself a reason for many people to upgrade (section 3.10 has more information on this topic). The maximum number of extents on a database has been increased from 100 to 256. One minor downside is that the all the new stuff in V7 usually results in things compiling slower. The big drawback is that all these new capabilities require more memory, and it's managed differently. These increased memory requirements make many sites that were correctly equipped for V6 run into problems with running out of physical memory after upgrading, resulting in more system paging.
V7.3 introduced a number of enhancements, the most important of which was the automatic tuning of writer processes.
The performance improvements introduced in V8.0 of Progress are not as dramatic as those added in V7. One component that was revised was the way client/server transactions are controlled. A 4GL construct called field lists lets you control which of the fields of a record are passed to your client. This facility has the potential to greatly reduce the amount of network traffic that is passed between the server and client. I’ve been told that you can use field lists with V8 clients if you server is running 7.3C or later; supposedly, that’s the version support for that feature was added. One other potential performance enhancement is the inclusion of the BI grow utility, used to add and format BI clusters easily.
V8.1 introduced application servers which allow more complicated query programs to execute directly on the server and avoid some of the overhead of C/S connections. Applications using the AppServer can minimize network bandwidth performance problems.
This version of Progress adds a variety of useful performance parameters. It’s now possible to make a database with an internal block size that’s larger (this is not to be confused with the file system block size). Using an 8K block size can cut down on the amount of indirection inside the database itself. It will also allow better index compression. You get the bigger block sizes by using a different empty database.
V8.2 also includes a documented binary dump and load, which doesn’t use the same mechanism as the undocumented version (which is still available). The documented version is slower because it uses a safer access mechanism, and it also benefits from the usual performance enhancing parameters (-i, -B, etc.).
This is the latest version of Progress available. It adds a variety of useful performance parameters, especially for larger Progress sites. Enterprise database enhancements improve application scalability, raising the user limit from 2,000 concurrent connections to 4,000 concurrent connections, and adding multiple semaphore sets. There is also a new 32-bit report engine and a 32-bit version of the report builder product. There were also some new VST tables added for retrieving performance data from promon directly from 4GL code.
This next version of Progress currently being developed will add many database performance enhancements including separate table spaces.
If you update your version of Progress, you can usually convert your old database with the proutil utility (check the documentation for proutil to see what the correct -C parameter is for your particular conversion, it will be something like conv67). Don't even think of trying this one out without a backup.
Something to watch out for is the changes that occur in parameter names and defaults when switching between versions. For example, you use the -e parameter in V6 for environment size, but it’s -mmax in V7 (and, although basically the same function, there are subtle differences between what exactly happens behind the scenes with that buffer between the versions).
One of the new performance features of V7 is the reorganized, compressed indexes; if you have many indexes, you may find that your database could take up considerably less space with V7 (a 50% reduction is not unheard of). But if you just use the conv67 utility, you don’t get that space back--it becomes free space that will get used eventually as the database expands. You’ll have to do a dump and reload on the database to reclaim the space fully (and to reduce the amount of scatter on your data that results from this process).
In an attempt to reorganize and simplify their pricing structure, Progress has created two tiers of Progress database servers. The Enterprise version is the full product with all the performance enhancing features. The Workgroup version is stripped down by not including any of the APW/BIW/AIW processes and not having improved multiple processor support via the -spin parameter (effectively returning performance of the database to the same level it used to be before V6.3). There is also no support for raw disk usage. Accordingly, the Workgroup server is only recommended for smaller systems with the database considerably less than 1GB, a single processor, and less than 30 users (there is supposedly a hard-coded limit of 65 users). Users who are currently running V6.3-V7.3 databases can contact Progress in order to upgrade to an equivalent product under the reorganized scheme.
In later versions of Progress, Workgroup products also don’t include some of the new performance enhancements. There’s no support for quiet points or variable database block sizes.
Most of the high-performance platforms that Progress runs on support shared memory (including most UNIX systems, VMS, NT, and OS/2). There are three major advantages to using shared memory. First, you can use utilities like promon to monitor your database in these environments. Second, starting with V6.3, you can run writer processes that improve the handling of updates to your database. Third, you can have clients that directly connect to and do their own access to the database; this will greatly improve performance.
Usually you don't have a choice of what connection mode you use; it's forced on your by your server and network configuration.
Single user mode is needed to update the schema of the database, the most common use. One big advantage of this mode is that record locking concerns disappear--since there is no one else using the database that can lock records, that whole mechanism is eliminated. Because of this, there are often system utilities distributed by the manufacturer of a database that are designed to only be run in single user mode. Trying to run them in any other mode will overflow the database lock table (you can get around this by greatly increasing the -L parameter in client startup, but this steals lots of memory from that would be better served elsewhere, like -B).
Because you lose the efficiencies of the writer processes that group transactions into blocks and lower overhead, single user mode may be slower than running a single client processes against a server if there are many updates to the database involved (that is, if you’ve got at least V6.3 where the writer processes were introduced).
On shared memory systems, the best multi-user way to access the database is with a self servicing client. This is done by directly connecting to the system the database server is on and running Progress from there. Telnet is the common method for UNIX systems to get into a system for this purpose. Note that if your clients are using the -H or -S parameters, they are not connecting directly even if they are running on the server; they are still connecting as separate clients and suffer much of the performance degradation of a full client/server connection.
Client/server systems are not the best performing choice for most Progress systems. The overhead of sending records to the clients over the network makes performance worse than a self servicing client running on the server. How much worse depends on several factors. Under V6, the performance difference can be as bad as 10x worse with client-server configurations. Improvements in V7 lower this to around 3x as bad. Two major factors are involved: not being able to use shared memory and the overhead of putting the data into packets and waiting for the network card to processes them and send them across the network.
The impact can be minimized by proper database and application design, to reduce the amount of data sent over the network. Designing your queries so that the server can select the appropriate records instead of sending them all over the wire for the client to sort through is particularly important. You may want to investigate making as much data as possible available on the local client if you have a poor performing C/S application (see the section on use a core database below, the core database might be a good choice for local data if it doesn't change often).
If you have a GUI application, you usually don't have any choice but to use a client-server configuration. In that case, performance tuning gets quite complicated. You have to tune the server, the clients, and the network connecting them; bottlenecks on any of the three can kill your system's performance.
Note that if your server supports high performance direct self-service users, you can still use them even if the majority of your users are client/server. Just run Progress without the -H and -S parameters on the command line while on the server, and your clients are self-service, even if the server was started with a networked client in mind. You will need to start multiple server processes in order to support a configuration like this, but the additional ones will require minimal resources. The direct connections can be useful for less interactive programs (reports, batch updates, etc.) even if the application was designed with a GUI. This sort of configuration is an excellent idea for upgrading NetWare NLM systems--you can keep the same clients, and they won't notice any difference between the server on the NLM and a separate (probably UNIX) server, after everything is configured correctly you only need to change a parameter or two on the command line (or .pf file) for the clients. You always have the option of using the high-performance self-service sessions by using Telnet into the new server for more intense programs. There is a -da parameter you can use on client sessions to force direct access, say if the .pf file you’re using has -H and -S in it by default.
Another factor to consider is that a client/server will not require anywhere near as much power in the server itself as far as the CPU goes; given sufficient network bandwidth and good disk performance, you should be able to handle more clients in a client/server environment than a self-service one. They will use less memory on the server as well.
Queries involving multiple related database files/tables in a C/S environment will send all appropriate records from each of the related files to the client and let the client join them together (there is a knowledgebase entry on how C/S joins are performed). This means that joins will send a whole lot of information over the network. The advantage of this technique is that you can do multiple table joins where the tables in different databases correctly; most database systems don’t allow joins across multiple databases.
Usually these two terms are used interchangeably here (with server being the more common), but there is a fine distinction between them on shared memory systems. The database broker is the process that sits between the physical database on the disk and other parts of the system while also coordinating the different users of the database. If you’re running self-service clients, the broker is all that sits between them and the database (with shared memory to pass information among the group). Normally, a self-service client reads from and writes directly to the database buffer pool itself. The broker can become involved with the reading itself if locked transactions are present. The broker handles flushing database updates out the disk during a checkpoint, although if an APW is running it will be writing those changed blocks out as well. The broker handles buffered writes to the BI file if you don’t have a BIW process running (unless you have the -Mf parameter set to 0 and there is no buffering on those writes).
A database server is a process that takes requests from Progress clients that aren’t self-service (usually in the form of network packets) and, as the server is a self-service client itself, deals with the database buffers for them.
The reason this seems complicated is that both the broker and server jobs are handled by the same Progress service (the one started with proserve or the like); you don’t see the seam between them. Clients started with -H and -S go through the server to the broker, self-service clients go right to the broker’s shared memory.
Also muddling things is that this distinction only became apparent in recent versions of Progress (I believe this was part of the enhancement to C/S performance first introduced with V7). Before that, everything went through the server, anyway (the server itself was actually the process that handled the use of shared memory).
Progress supports two major graphical types of clients, Windows ones (starting with V7) and X-Windows ones (in V7 only). Starting with V7, the Progress 4GL has commands to support event-driven applications that fully utilize these platforms for clients. You can still run the same programs in character modes if they were written correctly, however. Much is going on lately with layered applications in Progress that keep the user interface code separate from the main application logic.
The GUI code you can write with Progress is often less than totally portable between Windows and X clients.
This is rapidly becoming the most popular Progress client, and lately is the first platform new releases of Progress appear on. Tuning your Windows client is it's own adventure. You should put as much RAM as possible in it. I've gotten by with as little as 4MB, but 8MB works far better, 16MB is recommended, and 32MB is not unreasonably much. If you’ve got anything less than 32MB you should consider setting up a well configured swap file. Fast disk drives are a good idea; using a version of Windows (like Windows for Workgroups or Windows 95/98/NT) that supports 32-bit disk and file access can improve performance greatly at the disk end; as a bonus, you can get a free Microsoft TCP/IP client that is pretty good. Put as many files as possible on the client (temporary files and .r code) to avoid needing to deal with them over the network. If you find the screens don't pop up as quickly as you'd like or programs that display lots of text are slow, you should consider upgrading to a video card with built in Windows acceleration.
There’s also any number of things that are usually only considered at the server level that become important as you start working with a busy Windows client. You might move the temporary files to a RAM disk. You might add more drives and spread the disk load around. You might switch to SCSI disks to better support the I/O load.
Another option you might want to consider is to move to Windows NT clients, which are often far more reliable and better performing with Progress. You will definitely need to get more memory for this option, however; with NT, 16MB is a minimum, 32MB is recommended, and 64MB is not too much.
The biggest concern here is that the client computer will not only have to keep up with running the Progress client, but will also have to be running the X-server as well. The overhead on the database server to send information to the clients is said to be minimal by those who are running such a setup. I don't have much information myself on what the issues are for your average X workstation. Some like them because X clients tend to be more stable than Windows clients (like I often say, it doesn't matter how fast it might run at if it doesn’t work). Support for Motif clients ended with the release of V8.
How your database is designed is ultimately what determines how fast your applications will run, regardless of how well tuned everything else is. Often, you don't have any control over this, so it isn't discussed much, but changes made at the design level can greatly improve (or destroy) performance.
As your database gets bigger, getting things out of it (or creating new ones) gets slower. A growing database will ultimately require more RAM for buffering then it did when smaller. You can make a lot of mistakes in your database and computer implementation that will be unnoticeable if your database is 10MB; let it grow to 100MB and you'll strongly regret them. There are people out there with 1GB (and bigger) Progress databases--but you really need to have your database optimized for accessing this much data, and you better have great (and properly designed/configured) hardware to pull it off. It's impossible to put exact numbers on what database growth will do to your performance, but it's guaranteed that it will get slower; you'll have to compensate by improving things if you want to keep your level of performance constant.
The design criteria become even more critical when dealing with large databases. It's trivially easy to totally blow out all the data in the -B buffer cache and beat the drives to death by running an unindexed query on a big database file. If this is a danger, you might want to investigate using the parameters to give users private database buffers--that way, they can keep the data they are working with around even if the main cache clears because of a bad query.
If you need to get statistics on a database such as how big each of the files are, you can run the proutil program with a variety of options (like dbanalys) to get them; there are many more of these informational utilities available with V7.
Normal forms are organizational methods from database theory that tell you how to split your data up into files/tables; a database that follows them is said to be normalized. For example, if you have a list of customers for your company who all place orders, one of the normal form rules would tell you that the address for the customer should be stored in a separate file from the order file. That way, you can update that address in one place and not have to alter every order. Like any guideline, you shouldn't be a slave to normal forms. For the order example, many applications have the customer's address in a customer file, and another address spot in each order as well. The address for the order defaults to the customer's address, but you can change it. This is essential for some businesses that occasionally deliver to places other then the customer's regular address. There are ways to address this problem while staying within normal form rules (for example, let each customer have multiple address that are stored in a customer address file separate from the customer and order), but the point is that there are occasionally business idiosyncrasies that don't quite seem to mesh well with strict normal form design. In the vast majority of cases, however, proper thought put into database normalization is time well spent and quite a good idea, especially in keeping maintenance easier.
As far as performance goes, a normalized database may or may not be faster than an unnormalized one. For our customer/order example, running an order report that needs the address on it might run faster on the unnormalized database because you only need to access one file (order); the normalized one needs to access customer and order (while managing the relations between them). Depending on the selection and break criteria used, the normalized one could result in far fewer records accessed than the unnormalized one, and run faster; it's impossible to say in the general case. Normalization is a difficult topic to get concrete answers on.
For more information on normal forms and database normalization, check the Progress manuals first (V7 in particular includes an excellent section in the Database Design Guide). White Star’s Programmer’s Progress has a good section on the topic with Progress examples. Any good book on database theory will discuss them in extensive theoretical detail should you want further information.
The key to getting good query performance out of Progress it to pick your index fields correctly. Adding an index on the field you are searching by or sorting by can radically improve the speed your reports run at, if the program is written correctly so that it can use it (look at the output from the detailed listings that you can get from the compile command to see what indexes your program is using). Understand, however, that each of those indexes needs to be updated whenever a record is inserted or deleted from the database. More indexes help reports, less indexes help record updating. The proper number is a delicate balance of these two.
It's also impossible in many cases to create the number of index required to address every possible report users might run. If there are 10 fields users routinely sort on, there are 10! possible index combinations needed to handle every one of them; no way can you have that many, so you have to pick the more important ones to include. You'd need 90 just to get the top two levels, also not practical. If you've got a really general report that can run against any of the fields of the database, you're almost certainly going to end up with a few break by clauses that don't find any appropriate index to use. Version 7 has introduced multi-level indexes that address this concern. Also, the V7 server knows how to use multiple indexes at once to resolve a query.
One trick often used is to dump the data to a file and use operating system calls to sort the data external to Progress. Depending on how complicated your criteria are you may get better performance, but at the total loss of portability (UNIX fans who prefer writing shell scripts to Progress code always mention this possibility).
Another factor is the data types used for the file relations. Lookups run much faster using integers for the key fields that are shared than characters. Integers that hold recids are even better, but become problematic when doing dump and reloads; it's possible to provide recid pointers that are redundant, so that they can be used for lookups and regenerated using the extra information after a dump/reload. Examples of using this technique are given in the section labeled "recid joins" in Dan Foreman’s Progress Performance Tuning Guide.
Getting good report speed is a big topic; it is addressed in the Progress manuals in depth (look up "bracketing" for a discussion of the record selection method). There are also literally books written about just this subject from some of the book sources listed later. I’m most fond of White Star’s Programmer’s Progress/What do I do now that I have V7? and Dan Foreman’s Progress Performance Tuning Guide for this information, both discuss index usage extensively. White Star also has a book devoted to speeding up reports. You can tell your report has problems if it takes the same amount of time to run now matter how little you are trying to get it to do (for example, the report takes just as long to process one day worth of data as it does an entire month). These are the cases where the lack of information Progress has about your data results in a run time dominated by how long it takes to sort through your data just trying to find the necessary records.
It is most efficient if you can keep the size of the individual records of a file/table less than the block size of the system you are on (usually 1024 bytes, 2048 on some platforms like VMS and Sequent). Progress doesn't ensure that records larger than a single block are kept in the same area of the database, it is theoretically possible that a single record could have its first block at the beginning of the database while the remainder was stored at the end; this introduces large waits by the disk drives to seek around the database just to read one record.
You can compute the size of your records using information in the Progress manuals (there is a table that shows how big characters, integers, etc. are). Note that there is approximately a 20% overhead per record used by Progress for internal information, limiting the actual size you can use to about 80% of your system block size.
Because performance slows as databases get bigger, one solution is to make your databases smaller again. A popular method for doing this is to take the more static core files/tables from your database and put them in one database (parameters and customer information are often good choices), while putting the rapidly changing data (like orders, shipping data, or accounting entries that disappear at the end of the month) in another. You an connect to each of these databases either by including parameters to use them all when starting your Progress session or with the connect command in a 4GL program.
By doing this, you can tune the parts of your database more specifically. More active parts of the database can get put on faster disks, their own -B cache, their own BI files. The read-only database doesn't use valuable time dealing with BI files or resources to support database updates. The trade-off is that there is far more administration involved, there is more overhead involved with starting everything up, and you may end up using more resources (like files handles and semaphores) because you are starting more processes up to handle the additional databases. Also see the section below on multiple connected databases for some other comments.
Dan Foreman gives examples of configurations for this sort of setup in his Progress Performance Tuning Guide, where he also warns that also connecting to a local database violates the usual license for the client networking package, so investigate at your own legal peril.
Usually, in any situation where multiple databases are connected, it is recommended to enable two-phase commit to ensure database integrity across databases. You'll also have potential problems backing up the databases, as on-line backups from the databases can have transactions that span two or more databases that get backed up separately (this isn't a problem with strictly a read-only core and one database, but gets hairy if you split the production information in the database up further).
Some of the tools you might want to use to connect to your databases might not work with multiple, related databases. For example, at this moment most Progress ODBC drivers don’t support it; Report Builder doesn’t support it either.
This is much too big of a topic to properly address here. There are some easy things to do. Make sure every variable that is not associated with database transactions gets declared no-undo. Use no-lock on any database access you can. Always group statements into assign blocks whenever possible. Writing status information to the screen or disk as infrequently as possible (many a program has been found to have most of its time spent displaying counts and such to the screen). Avoid recursion and its stack overhead by placing the items in an array, workfile, or temp table. Try not to escape to the operating system, because this results in a big chunk of overhead.
For V7, use internal and persistent procedures instead of the older separate ones; also consider using temp tables instead of workfiles.
One very large issue to consider is how big the transactions generated in your application are. It’s fairly easy to write 4GL programs that strangle themselves with record locking issues (using up entries in the lock table and hampering performance). Also, transactions require more BI file activity, and this is the major bottleneck for most sites. Keeping transactions controlled is definitely a skill worth learning.
Make sure you obtain Gus Bjorklund’s paper on optimizing 4GL queries (available from http://www.peg.com). One thing to note is the find statement has fallen out of favor with later versions of Progress. Due to backward compatibility constraints, it doesn’t include many of the performance enhancements (like multiple index capability). New programs should be using queries instead.
Many of the things in the server configuration set themselves based on how many users you specify for the database with the -n parameter. You’ll still have to check a number of other parameters; some are discussed here, also check the subsections below on this topic.
Each of the record locks on a database uses an entry in the Progress lock table. Unless you get an error saying the lock table has overflowed, you don’t need to worry about this. There is some memory used by each lock table entry. You set lock table size with the -L parameter.
A variety of parameters that affected performance in earlier versions (like -Mr -stsh -pwscan -pwsdelay -pwwmax) have become largely automatic on V7.
The increasing -B buffer size parameter on the server is probably the easiest way to improve the performance of your system with minimal changes. The default value for the parameter is ludicrously small for any reasonably large database. The units of -B are the block size for your database, either 1K, 2K, 4K or 8K. The default value is a multiple of the number of users specified with -n. These buffers should get the majority of the RAM on your system (if Progress is the main application on the server, that is); it is not unusual at all for -B to get one-third, one-half, or even more of the total server RAM. It is better to give the memory to Progress to manage instead of leaving it for the operating system to use, as Progress's paging algorithms for determining what to swap out of the buffers are tuned better for Progress databases.
To see if you've allocated enough buffers, run promon and check the Buffer Hits %; you definitely want it to be over 95% while your application is running. You should increase -B rapidly until you hit this figure, then back off more slowly until the rate drops noticeably. That way you’ll get the best performance without wasting memory by setting it too high.
The database buffers are stored in a hash table, so as you increase the size of -B, there is a corresponding hashing figure that you may need to increase. The -hash parameter should be set to the first prime number greater than the # of buffers / 4 for most cases; this means that the chains of buffers that need to be searched to find your data average less than 4 entries deep. Some claim that increasing -hash to be the first prime number greater than -B, providing one hash table entry for each buffer, can improve performance further. One thing to be aware of: versions of Progress before 6.3 didn't support increasing the hash tables for the buffers above a very high number. This effectively limited the useful range of -B to an upper limit of 32,000 (32MB on most systems). Increases above this aren't effective in improving performance on these older systems. Progress claims that you don’t need to adjust -hash at all with versions after 6.2, that it will take care of itself as long as -B is less than 500,000 (the system has a lookup table of prime numbers that it uses to calculate -hash in these versions, and the maximum entry in that table is 98,407). If you want to see how Progress is allocating hash buffers for the buffer size you specify, you can look at that by using the R&D promon.
Also beware that you may have to increase some of the system parameters (the kernel under UNIX) to support big increases in -B (the maximum amount of memory a single process can use in particular is often limited).
There is a chapter in the Progress System Administration manual that gives specific details on the operation of server when running on shared memory systems, it is very good reading to give a good idea of just how the parameters you set alter operations.
Most of the client parameters you might want to change are discussed in other sections; for example, -q is discussed in the section on .r code and -RO is discussed in its own section.
In V6, the -e parameter controls how big the edit buffer size is. This is where all the .r code is loaded into and where the procedure editor programs get put at; as the buffer fills, the information is swapped out to a temporary file. It has to hold all the parts of the client process data.
Superseding -e in V7 is the -mmax parameter ("initial amount of memory allocated for r-code segments" as the manual puts it). It controls how much memory Progress can allocate for the edit buffer, workfiles, and some other data structures. This much memory isn’t automatically used, it’s more of an upper limit on the size of the buffer (which is dynamically managed as programs run). Since .r code is usually larger in V7, you’ll have to run -mmax higher than you set -e in V6, especially if you are using lots of workfiles which now go in that space as well.
Each of the client sessions get its own local buffer space that is used for things like workfiles in V6. If you don’t have enough, Progress with crash with an error message to that effect. You set this size with -l. Note that on many smaller platforms (like DOS) -l is limited to 63K.
With V7, the only way to limit that amount of memory a Progress session can use up is to adjust the kernel MAXUPROC parameter--unlike V6, where the combination of parameters like -e and -l let you figure out an upper limit.
Index cursors are used by each FOR EACH or FIND that uses an index. You’ll get an error if the -c parameter used to control the number of these allocated is too low.
You can set the size of the stack the clients start with -s. Presumably this also generates an error if set too low, but I’ve never seen it.
It is possible to improve performance in some environments by increasing the values for the -TM and -TB parameters. Typically, these parameters are only considered by those maintenance procedures like an index rebuild. You may find a performance improvement using them on client sessions as well, especially if report data you’re sorting through is not indexed.
With shared memory systems, it’s possible to have private database buffers that work like the main buffer cache, but belong to the one process. There are allocated with the -O parameter.
Fo