Comparing and contrasting the
runtime error detection
technologies used in
HeapAgent™ 3.0,
Purify NT® 4.0, and
BoundsChecker Pro™ 4.0

HeapAgent, Purify, and BoundsChecker Pro use three very different techniques for detecting runtime errors. The error detection breadth, compile/link intrusiveness, and runtime performance of each product are direct results of the respective techniques used and dictate where in the development life cycle each product is appropriate:

HeapAgent

HeapAgent uses heap replacement/instrumentation and stack instrumentation. HeapAgent replaces your application's heap manager with a debugging implementation that instruments the individual allocations to detect invalid operations performed on the heap, regardless of which code is responsible for the error. HeapAgent also instruments the stack and checks for invalid stack operations.

Error detection breadth: HeapAgent detects heap and stack memory errors only; it does not detect static memory errors.
Compile/link intrusiveness: HeapAgent automatically launches itself and begins checking on debug builds without a recompile or relink.
Runtime performance: HeapAgent takes advantage of idle CPU cycles to perform most checking in a background thread and, therefore, incurs negligible runtime slowdowns. In the section "Runtime tests," later in this paper, we describe tests that we ran with a small, independently developed application. HeapAgent's runtime degradation was 1.25x (5 seconds vs. 4 seconds).

Purify NT

Purify NT uses object code insertion (OCI). This technique translates your application's object files, inserting checking instructions between your app's instructions. Only those object files that are instrumented are checked.

Error detection breadth: Like HeapAgent, OCI detects only heap and stack memory errors; OCI does not detect static memory errors.
Compile/link intrusiveness: Each time an app is to be checked with Purify, the app must first be object-code instrumented.
Runtime performance: On the small test app we used, Purify NT caused an 11x runtime degradation (44 seconds vs. 4 seconds).

BoundsChecker Pro

BoundsChecker Pro uses compile-time instrumentation (CTI) to instrument your source code. This technique uses a pre-processor to insert checking statements between your application's code statements. Only those source files that are recompiled with instrumentation are checked.

BoundsChecker Pro includes BoundsChecker Standard Edition. Because this paper assesses memory error detection and because BoundsChecker Standard detects so few memory errors, we have not done an in-depth analysis of the details of its "malloc wrapper" technology. For more information, see the detailed chart at "Comparison of memory errors detected."

Error detection breadth: In addition to detecting heap and stack errors, BoundsChecker Pro with CTI also detects errors in the use of static memory.
Compile/link intrusiveness: Each time an app is to be checked with BoundsChecker Pro with CTI, the app must first be instrumented during a special compile. For small apps, compile time is minimal, however, compile time goes up exponentially with source file size; our recompile of MFC went from 24 minutes to 100 minutes.
Runtime performance: On the small test app we used, BoundsChecker Pro's CTI caused a 158x runtime degradation (10 minutes and 35 seconds vs. 4 seconds). This was at the "Quick" setting, which provides minimal memory error detection. At Normal and Maximum settings, the test ran 236x and 994x slower, respectively (15 minutes, 45 seconds for Normal and 66 minutes, 18 seconds for Maximum).

Note: BoundsChecker Pro and Standard also validate/detect a variety of non-memory-related errors including Windows and OLE API errors and leaks, and resource leaks. In addition, BoundsChecker Pro, by instrumenting your app with CTI, can detect invalid pointer manipulations and source code logic errors.

Summary

HeapAgent doesn't require a special build, incurs negligible runtime penalty, and automatically begins checking every time the application is run or debugged. As a result, HeapAgent is most valuable during development, where it catches the nastiest bugs, heap and stack errors, immediately after they're introduced, when they are easiest to fix.

Purify detects the same classes of errors as HeapAgent, but requires a special build and incurs a significant runtime penalty. As a result, Purify is used less frequently during development and more during testing.

The error detection of BoundsChecker Pro with CTI is the broadest, but its special compile and often-debilitating runtime performance make it useful almost exclusively during testing.

Return to the Table of contents.

How HeapAgent works

HeapAgent re-routes your application's memory allocation calls through a debugging heap implementation that "instruments" the heap with special fill values (guard fill, free fill, and in-use fill). HeapAgent validates each heap-related call and continually scans the heap itself, in a background thread, looking for corruption. HeapAgent also instruments the stack with fill values and checks for stack overwrites during function return. By watching the heap and validating all heap calls, HeapAgent detects all heap-related errors, including those caused by system and third-party DLLs. By watching the stack during function calls, HeapAgent detects many stack errors as well.

How heap replacement/instrumentation and stack instrumentation work

HeapAgent uses heap replacement, heap- and stack-memory instrumentation, heap-call validation, and deferred freeing to detect heap and stack errors. HeapAgent's technology should not to be confused with the simple "malloc wrapper" techniques used by many home-grown and low-end heap-checking tools such as BoundsChecker Standard Edition. HeapAgent's performance nor its comprehensive error detection can be achieved without implementing the heap itself.

Diagram 1, below, shows how heap replacement/instrumentation works.

Heap-and stack-memory instrumentation

HeapAgent instruments the heap and stack with special fill values (guard fill, free fill, and in-use fill). Thanks to these fill values, HeapAgent always knows how the heap and stack should look and can check the heap and stack themselves for corruption caused by overwrites. This technique detects overwrites regardless of the code that caused them, for example, the application code, a system DLL, or a third-party DLL.

Instrumenting the heap and stack also lets HeapAgent catch invalid references to previously freed or uninitialized memory. Because HeapAgent's free and in-use fill patterns are intentionally invalid addresses, the processor generates an exception fault if your application tries to dereference freed or uninitialized memory.

Finally, because it implements the heap, HeapAgent can identify, without question, all of the allocations that haven't been freed.

Heap-call validation

On every call to allocate or free memory, HeapAgent validates the parameters and examines the state of the allocation to verify that this use of memory is legal. HeapAgent immediately detects and reports all invalid or failing memory allocation or deallocation calls.

Deferred freeing

HeapAgent uses deferred freeing to temporarily delay the recycling of memory. When memory is freed, the freed memory is filled with the free-fill value, but it isn't actually freed. Instead, it's placed in a queue of defer-freed allocations.

Deferred freeing lets HeapAgent detect invalid reads, writes, and frees of previously freed memory that would have gone undetected if the memory had been returned to the heap and reused.

Heap browsers

While not an error detection technology as such, HeapAgent's heap browsers are unique to HeapAgent and critical to error diagnosis. These browsers let a programmer browse all heap data at any point in a running application. HeapAgent browsers offer several different views of data:

Allocation Browser: Displays detailed information about individual objects.
Dump Browser: Displays the current contents of memory, in a variety of formats.
Source Browser: Displays the source code that calls heap-related functions or that is responsible for memory errors.

These browsers are hot-linked together, so it's easy to navigate from one view of data to another. For example, from the Allocation Browser, the programmer can select an object and either display a Dump Browser that shows the contents of memory for the object or display a Source Browser that shows the source code that allocated the object.

Diagram 1: How heap replacement/instrumentation works

1. Your heap expands with memory from the OS

HeapAgent automatically fills the memory retrieved from the operating system with the free fill character (0xDD).

[Your heap expands with memory from the OS]

2. Your application calls `operator new`

foo.cpp
main()
{
    char *str = new char[11];
}

When allocating memory, HeapAgent:

Allocates the memory.
Records header information.
Fills the allocation with the in-use fill.

The HeapAgent header contains:

The name of the function that created the allocation (for example, malloc).
File, line, and pass count.
A sequential allocation number (assigned by HeapAgent).
The requested size of the allocation.
The call stack at the time of the allocation.
Checkpoint (a user-definable tag that you can use to group related allocations).
Read-only, no-free, and no-realloc status.

3. Your application enters a value in the allocation

strcpy(str, "Joe Schmoe");

4. Your application calls `operator delete`

foo.cpp
main()
{
    char *str = new char[11];
    strcpy(str, "Joe Schmoe");
    delete str;
}

When de-allocating memory, HeapAgent:

Checks local guards for overwrites.
Updates the header information to reflect the call that freed the allocation.
Fills the allocation and its guards with the free-fill character (0xDD).
Adds the allocation to the queue of defer-freed allocations and returns the oldest allocation in the queue to the heap for recycling.

[Your application calls operator delete]

5. Whenever your application is running but not using every CPU cycle

HeapAgent uses a separate thread, set at idle priority, to incrementally and continually scan the heap for overwrites. This checking is completely independent of the overwrite checking that HeapAgent performs when an individual object is allocated or freed. Because the thread is set at idle priority, it has negligible effect on runtime performance.

Return to the Table of contents.

Limitations of HeapAgent's error detection

HeapAgent detects only heap and stack errors; it does not detect static memory or parameter errors.
HeapAgent's platform availability is currently restricted to Intel Windows. However, HeapAgent's underlying technology is highly portable, and is available in library form on all major platforms in the form of SmartHeap, a related product from MicroQuill.
HeapAgent's technique for automatically starting up as the application loads works when using the Microsoft Visual C++ compiler and windowed GUI apps.
Using non-Microsoft compilers or console apps, the programmer must include the HeapAgent header file, recompile the application, and link the EXE with the HeapAgent library. The header file is only required if file and line information is desired in error reports. (File/line information on Microsoft Visual C++ EXEs is automatically captured from the PDB debugging information.)

Return to the Table of contents.

How Purify works

Purify uses a technique called object code insertion (OCI) to insert checking code around every instruction in your application that references, allocates, or deallocates memory.

How object code insertion works

Purify's object code insertion works by analyzing each instruction in your application, finding all of the instructions that reference memory, and rewriting those object files with checking code inserted.

Purify uses a memory-coloring scheme to keep track of the state of every byte of memory in the application. This requires every instruction in the app, including system DLLs, to be fully instrumented, or Purify will be unable to know the correct state of memory, since non-instrumented code could otherwise initialize, allocate, or free memory without Purify's knowledge.

Each time any portion of your code changes, it must be reinstrumented after it is compiled. This is roughly equivalent to an extra code-generation and link phase on each compile.

For more detailed information on OCI, see Pure's white paper:

http://www.pure.com/products/purify/PYhasjoywp.html

Limitations of Purify's error detection

The instrumentation that is inserted can actually introduce errors into your application. Because object instrumentation changes your application's object code so drastically, it is prone to introduce subtle changes in logic. When testing an object-instrumented app, it is important to be aware that the app being tested isn't the same app that will later be shipped.
Object code insertion is predicated on all code in the application being instrumented. As a result, if an error is caused by a DLL that mistakenly didn't get instrumented, a system or third-party DLL that couldn't be instrumented, or object code that intentionally is not instrumented for performance reasons, the error will not be detected.
Because Purify operates at the instruction level, it is processor-specific. Moreover, because Purify must understand every instruction in an application, it is very sensitive to processor revisions. Likewise, because Purify must instrument system and compiler CRT DLLs, it is sensitive to operating system and compiler revisions. Consequently, Purify will be exclusive to a specific compiler, OS, and chip version.
Purify for NT will support only Windows NT, so Purify'd applications cannot be tested under Windows 95. Because there are so many differences between Windows NT and Windows 95, it is critical that testing tools run on both platforms if Win95 is a target, even if application development occurs in NT.

Return to the Table of contents.

How BoundsChecker Pro works

BoundsChecker Pro uses compile-time instrumentation (CTI), which is a completely different technique for detecting errors than the technique HeapAgent uses. As we mentioned earlier, BoundsChecker Pro is a superset of BoundsChecker Standard so it can detect some memory errors on debug builds, without CTI. This paper discusses the CTI technology in depth, and does not discuss the malloc wrapper, because it's the CTI technology that lets BoundsChecker Pro detect the great majority of the memory errors that the product can detect. For more information, see the detailed chart at "Comparison of memory errors detected."

NuMega™ publishes very little information on how CTI works. However, because they licensed this technology (they didn't invent it or build it themselves), there is plentiful information available on the web site of Parasoft, the company NuMega licensed the technology from. For more detail, see the Parasoft white paper "Insure ++: A tool to support Total Quality Software™":

http://www.parasoft.com/insure++/papers/technical.html

Return to the Table of contents.

How compile-time instrumentation works

BoundsChecker Pro's compile-time instrumentation technology adds a pre-process step to the Microsoft Visual C++ build procedure. When a BoundsChecker Pro build is started, BoundsChecker Pro first analyzes each of the source files and, based on this analysis, inserts calls to runtime checking functions and also inserts temporary variables into the code. (The temporary variables are needed to hold intermediate results.) The checking functions are inserted at every location in the source that:

Allocates, frees, or reallocates memory.
Assigns a value to a pointer.
Reads a pointer's value.
References an array.
Calls a function (tests are inserted both before and after function calls).

The following "before" and "after" code example shows how this works.

Diagram 2: How compile-time instrumentation works

Your source code before instrumentation

CDB_ERR dbSession::Close( void )
{
    CDB_ERR     err = CDB_ERR_NONE;
    dbConnect   *connect;

Your source code after instrumentation

CDB_ERR dbSession ::Close(void)
{
    int _Insight_spmark;
    class dbConnect *_Insight_1;
    CDB_ERR _Insight_2;
    if (_Insight_init) 
        _Insight_I_src_cdb_c823552227();
    _Insight_func_top(1995, 29852, (long int) &_Insight_spmark, _Insight_strtable);
    _Insight_pop_this((void *) this);
    _Insight_decl_lwptr(_Insight_fid_44, 133L, 13, (void *) &_Insight_1,0, 1);
    _Insight_decl_lwptr(_Insight_fid_44, 133L, 20, (void *) &_Insight_2,0, 1);

    CDB_ERR err;
    _Insight_decl_lwptr(_Insight_fid_44, 134L, 652, (void *) &err,0, 2);
    _Insight_assign_ptr2(1234, (void **) &err, (void *) ((dbError *) 0),
        (void *) ((dbError *) 0));
    err = ((dbError *) 0);

    dbConnect *connect;
    _Insight_decl_lwptr(_Insight_fid_44, 135L, 286, (void *) &connect,0, 3);

Before

    while ((connect = dbConnect::FindConnect( this )) != NULL) {
        CDBVERIFY( connect->CloseDb( ));
        delete connect;
    }

After

    _Insight_stack_call(0);
    _Insight_1 = dbConnect ::FindConnect(this);
    _Insight_assign_ptr3((void **) &_Insight_1, 1990, 1);
    _Insight_after_call();
    _Insight_assign_ptr1(1991, (void **) &connect, (void **) &_Insight_1,(void *) _Insight_1);
    connect = _Insight_1;
    _Insight_ptra_check(286, (void **) &connect, (void *) connect);
    while(connect != 0) {
        _Insight_clear_temp((void **) &_Insight_1);
        _Insight_stack_call(0);
        _Insight_2 = (connect->CloseDb());
        _Insight_assign_ptr3((void **) &_Insight_2, 1992, 1);
        _Insight_after_call();
        _Insight_assign_ptr1(1993, (void **) &err, (void **) &_Insight_2,(void *) _Insight_2);
        err = _Insight_2;
        _Insight_ptra_check(652, (void **) &err, (void *) err);
        if (err != 0) {
            _Insight_clear_temp((void **) &_Insight_2);
            goto fail;
        } else 
            _Insight_clear_temp((void **) &_Insight_2);
        ;
        _Insight_deletea(286, (void **) &connect, (void *) connect, 8192);
        delete connect;
        _Insight_stack_call(0);
        _Insight_1 = dbConnect ::FindConnect(this);
        _Insight_assign_ptr3((void **) &_Insight_1, 1990, 1);
        _Insight_after_call();
        _Insight_assign_ptr1(1991, (void **) &connect, (void **) &_Insight_1,(void *) _Insight_1);
        connect = _Insight_1;
        _Insight_ptra_check(286, (void **) &connect, (void *) connect);
    }
    _Insight_clear_temp((void **) &_Insight_1);

Before

    // free what appears to be leaks
    m_username.Empty( );
    m_password.Empty( );

After

    // free what appears to be leaks
    _Insight_stack_call(0);
    m_username.Empty();
    _Insight_after_call();
    _Insight_stack_call(0);
    m_password.Empty();
    _Insight_after_call();

Before

    if (m_hsession) gblHandles->ReleaseUserHandle( m_hsession );
    if (m_dberr) delete m_dberr;

After

    if (m_hsession) {
        _Insight_stack_call(0);
        gblHandles->ReleaseUserHandle(m_hsession);
        _Insight_after_call();
    } 

    _Insight_ptra_check(1994, (void **) &m_dberr, (void *) m_dberr);
    if (m_dberr) {
        _Insight_deletea(1994, (void **) &m_dberr, (void *) m_dberr, 0);
        delete m_dberr;
    }

Before

    TheSystem.RemoveSession( this );

After

    _Insight_stack_call(0);
    TheSystem.RemoveSession(this);
    _Insight_after_call();

Before

fail:
    return err;
}

After

fail:
    _Insight_returning((void *) &err, 4096, 1);
    return err;
}

Return to the Table of contents.

Limitations of BoundsChecker Pro with CTI error detection

BoundsChecker Pro's compile-time instrumentation detects memory errors using "data usage" analysis. This technique does not rely on, or even have any direct knowledge of, what is happening in the heap. Errors are judged to be errors based entirely on whether or not your use of pointers and arrays is "proper."

A very real problem with this technique is that the indeterminate nature of the use of pointers in C and C++ makes it impossible to reliably determine "what is proper" from the code itself. For example, casting a pointer into a local variable can be interpreted as leaked memory due to pointer assignment, when in fact this is a perfectly acceptable and necessary operation in a comparison routine. (This same problem is the bane of code optimizers, whose data flow analyses are prone to incorrectly interpreting pointer usage and introducing errors during optimization.)

There are several other by-products of this technique:

Only instances of errors that meet BoundsChecker Pro's "rules of improper use" will be detected.
It is very prone to generating false positives. Casts, side effects of constructors and destructors, assignments, and overloaded operators are all examples that can confuse the source instrumentation into seeing errors in valid code.
The instrumentation that is inserted can actually introduce errors into your application. Because source instrumentation changes your application's source code so drastically, it is very prone to introducing subtle changes in logic. When testing a source-instrumented app, it is important to be aware that the app being tested isn't the same app that will later be shipped.
Compile-time instrumentation is predicated on every module being instrumented. As a result, an error will not be detected if it's caused by a module that mistakenly didn't get instrumented, an OS API, or a compiler runtime library (such as MFC) that intentionally is not instrumented for performance reasons (or because source isn't available).
BoundsChecker Pro with CTI disguises compile-time warnings in your application's code by turning warnings off during compilation of instrumented code. The reason BoundsChecker Pro with CTI turns warnings off is that its instrumented code contains numerous constructs that themselves generate compiler warnings. The result is that errors that would normally be caught at compile time are not caught until runtime, if ever.
Because BoundsChecker Pro with CTI does not implement the heap or implement deferred freeing, it cannot catch all instances of references to free memory or overwrites of heap data structures caused by non-instrumented code.

Return to the Table of contents.

Compile/link time tests

The test and the results

We recompiled some of the MFC source code (approximately 110K lines in 190 files) and measured the amount of time required by Purify and by BoundsChecker Pro with CTI. We performed three runs for each; the results listed below are the average times:

Product              Total compile/link time      Factor
========================================================
Visual C++ 
  debug build        23 minutes, 25 seconds       1x

HeapAgent            23 minutes, 25 seconds       1x
                     (works on debug build, so
                     no recompile is required)

Purify 
  object-code 
  insertion          24 minutes, 20 seconds       1.04x

BoundsChecker Pro
  compile-time 
  instrumentation    100 minutes, 55 seconds      4.31x

Note To enable stack checking, HeapAgent requires the /Ge compile flag, which activates stack probes for every function call that requires storage for local variables.

If you'd like to try the BoundsChecker Pro recompile test yourself, or if you'd like to see which compile flags we used, you can download the batch file we used to compute these timings. (No special files are required for HeapAgent or Purify.)

Download cti_test.zip

How HeapAgent affects compile/link time

HeapAgent does not affect compile or link time because it works on debug builds.

Each time your EXE is launched, HeapAgent is notified by the operating system. As the EXE is loaded, HeapAgent transparently patches the heap calls in the application's EXE and DLLs (including any C runtime library DLLs) to re-route all heap calls in the application to the HeapAgent debugging heap implementation. This patching affects only the currently loaded copy of the EXE and its DLLs; patching doesn't affect the on-disk copy of the program files or any other processes running on the system. All of this is done completely transparently -- without making the programmer recompile or relink to produce a special checkable build.

Working with standard builds of an application has a major benefit apart from ease of use: it ensures that the actual application is being tested, with no possibility of incorrect error reports or incorrect program logic resulting from the modifications that the source and object instrumentation techniques make to the application's code.

How Purify affects compile/link time

Purify requires a special "checkable" build that instruments your app's object code (unlike BoundsChecker Pro's CTI, which instruments your app's source code). Purify's instrumentation has two compile-time repercussions. First, it requires substantially more disk space than normal builds to accommodate the much larger object modules, as well as the redundant system and third-party DLLs that must also be instrumented. Second, it slows the build. For small apps this build time degradation is minimal. For larger apps, the larger object modules often cause the system to go virtual, thereby causing severe slowdowns.

How BoundsChecker Pro with CTI affects compile/link time

BoundsChecker Pro's compile-time instrumentation is like Purify's object-code insertion in that it requires substantially more disk space than normal builds to accommodate the preprocessed source and the much larger object modules. In addition, BoundsChecker Pro's instrumentation can cause the compile time to increase exponentially.

Return to the Table of contents.

Runtime tests

The test and the results

We wanted to test runtime performance on an app that performed some operation that most C++ developers would readily understand. In addition, we wanted an app that was:

Readily available, so any reader could reproduce our results.
Independently developed, so no one would wonder whether we had designed a benchmark application that favors HeapAgent.
Non-trivial but not huge.
Modestly memory intensive, yet small enough not to represent some extreme condition.

For the runtime tests, we used GNU Bison 1.25. Bison takes a grammar file as a command-line parameter and generates C code that implements a parser for that grammar. Bison contains 11,000 lines of C code; compiled with Microsoft Visual C++ debugging information, bison.exe is 163 KB.

The goal was to compare runtime performance for typical debugging sessions. Execution time was measured by printing the time of day at the top and bottom of main(). Execution times were compared for a debug Visual C++ build, HeapAgent 3.0, Purify 4.0, and BoundsChecker Pro 4.0 with and without compile-time instrumentation (CTI).

Bison is available at:

http://www.cdrom.com/pub/gnu/bison-1.25.tar.gz

For this benchmark, we chose the C++ grammar file that is distributed with GNU GCC. This file, parse.y, can be found at:

http://www.cdrom.com/pub/gnu/gcc-2.7.2.tar.gz

Note Bison uses long file names, so you can only run this test on Windows NT or Windows 95.

Also note You can extract the contents of these files using WinZip, which is available at:

http://www.winzip.com

If you'd like to run these tests yourself, you can download a zip file that contains a makefile, an INI file for the BoundsChecker compile-time instrumentation test, and the description you're now reading:

Download ha_bench.zip

The test environment

We ran the tests in the following test environment:

Gateway P5-133 with 32MB of RAM
No network connection
Microsoft Windows NT Workstation Version 3.51
- 44-100 MB Paging File Size
- Foreground and Background Applications Equally Responsive
Microsoft Visual C++ Version 4.1
HeapAgent 3.0
Purify for Windows NT Version 4.0 Build 446
BoundsChecker Professional Edition Version 4.00.193

The test code

We modified main.c to print the time of day at the top and bottom of main():

main.c
======
24d23
< #include <time.h>
48d46
<  { time_t curtime;time(&curtime);puts(ctime(&curtime)); }
90d87
<  { time_t curtime;time(&curtime);puts(ctime(&curtime)); }

The other change to the standard distribution was the addition of makefile, which is included in ha_bench.zip, as noted above. The command line to build the executable is:

nmake CC=cl

Running the benchmark tests

Run Bison with only the Visual C++-supplied memory debugging:

bison parse.y

HeapAgent test

Start HeapAgent and open bison.exe. Set the command-line arguments to parse.y. Use all default settings.

Purify test

Start Purify and Run bison.exe parse.y. Use all default settings.

BoundsChecker test (without compile-time instrumentation)

Start BoundsChecker and open bison.exe. Set the arguments to parse.y. Set the Error Detection mode to Quick in BoundsChecker Settings.

BoundsChecker test (with compile-time instrumentation)

For the BoundsChecker compile-time instrumentation test, you need to recompile bison.exe. We created an INI file, bc_cti.ini, for BoundsChecker to use during this recompile (bc_cti.ini is included in ha_bench.zip, as noted above):

checking_level full
checking_uninit_compile on
compiler_fault_recovery on
precompiled_header off

Then rebuild:

nmake clean
nmake CC="bcompile -Zop bc_cti.ini"

Finally, run bison.exe under BoundsChecker as described in the previous test.

The results

Each test was performed three times. The displayed starting time was subtracted from the displayed ending time to find the execution time. The three execution times were then averaged to arrive at the following results:

Test                                              mm:ss    Factor
==================================================================
Visual C++                                        00:04      1x
HeapAgent                                         00:04      1x
Purify                                            00:44     11x
BoundsChecker Pro (without CTI--"Quick" mode)     06:56    104x
BoundsChecker Pro (without CTI--"Normal" mode)    06:57    104x
BoundsChecker Pro (without CTI--"Maximum" mode)   57:43    865x
BoundsChecker Pro (with CTI--"Quick" mode)        10:35    158x
BoundsChecker Pro (with CTI--"Normal" mode)       15:45    236x
BoundsChecker Pro (with CTI--"Maximum" mode)      66:18    994x

Return to the Table of contents.

How HeapAgent affects runtime performance

HeapAgent's design goal was to build a memory checker that could be used every run. As a result, runtime performance was a major consideration in the design. Here are the major reasons HeapAgent has minimal runtime impact on your application:

First, HeapAgent takes advantage of the fact that memory overwrites necessarily change the contents of the heap. This lets HeapAgent catch overwrites the most efficient way possible -- by watching the heap itself for changes that lie outside the bounds of in-use memory blocks.
Second, to minimize the impact on application response time, HeapAgent performs this heap checking in a background thread using idle CPU cycles or a second CPU (if there is one). In a pre-emptive multi-threaded OS like NT or Windows 95, the CPU is actually idle and awaiting I/O events a significant portion of the time that a typical application is executing -- CPU utilization is well below 75% for most applications. You can see this by running NT's performance monitor and charting the Processor Time counter. Using a low-priority thread, HeapAgent exploits these idle cycles to perform its checking without slowing the app.
HeapAgent's background heap checking is incremental, which is critical for good performance since the heap-checking thread would otherwise become heap-bound (that is, a non-incremental heap-checking thread would keep the heap locked most of the time, causing the app to frequently block, waiting for the heap to become available). Incremental heap checking such as HeapAgent's cannot be implemented efficiently, if at all, unless the heap checker also implements the heap.
Third, HeapAgent's free and in-use fill patterns are intentionally invalid addresses that cause the processor to generates an exception fault if your application tries to dereference freed or uninitialized memory. This is much more efficient than inserting checking code between every memory reference. With HeapAgent, no extra runtime overhead is incurred because the error is detected by hardware rather than by software.
Fourth, since HeapAgent implements the heap manager itself, it can instrument the heap directly and internally. This is faster and much more efficient than maintaining elaborate data structures outside of the heap (externally) to record the status of the heap, such as whether or not a block has been previously freed.
Fifth, by implementing the heap, HeapAgent can integrate code checking directly into the allocation/deallocation functions. This eliminates the need to call any external functions. As a result, these checks are constant time, so performance isn't affected by app size or heap size.
And finally, HeapAgent's replacement heap implementation uses the same algorithms used in the ultra-fast SmartHeap runtime library, so your app's calls to allocate and free memory are faster.

Note HeapAgent does affect runtime performance under certain conditions. HeapAgent incurs some extra overhead with each allocation (for guard bytes, etc.) and defers the freeing of allocations (to help find writes into previously freed memory). As a result, the HeapAgent heap is larger than the normal heap and will exceed available physical memory earlier. In cases where the HeapAgent heap exceeds physical memory and the compiler allocator's heap does not, there will be performance degradation from disk hits.

However, most programmers have enough RAM that their test cases rarely exceed physical memory even with the larger HeapAgent heap. Overall, HeapAgent's checking is so efficient that the performance hit is typically more than offset by the performance gains an app gets from using the (SmartHeap-based) faster alloc and free functions.

Return to the Table of contents.

How Purify affects runtime performance

The most severe impact of OCI is on runtime performance. Purify was originally designed to work on RISC processors where most instructions manipulate registers and only two instructions -- load and store -- reference memory; always a word at a time. In the Intel CISC architecture, however, dozens of different instructions reference memory in very complex ways: a single instruction can both load and store memory, or load or store one, two, or dozens of bytes. This means that Purify on Intel must insert checking code in many more locations in your application. It also means that the checking code must be significantly more complex. Worse, OCI interferes with the Pentium instruction pipeline and on-chip cache, resulting in even more slowdown than Purify's designers may have anticipated.

Finally, Purify causes an application's runtime memory footprint and working set to explode, which can result in excessive swapping even for small applications. Two factors cause this ballooning of memory usage:

To implement its memory-coloring scheme, Purify must maintain huge state tables to track the state of every byte of memory in the address tables. These tables need to be even larger under NT than under UNIX since NT's address space is sparse rather than contiguous.
Because Purify must instrument system and other shared DLLs, the "Purify'd" system DLLs that are loaded with your application do not share code space with the operating system DLLs. The extra copy of these DLLs can add megabytes to your app's footprint.

Return to the Table of contents.

How BoundsChecker Pro with CTI affects runtime performance

The runtime impact of BoundsChecker Pro with CTI is even more severe than its compile-time impact. This is because operations, such as using a pointer, that normally consume just a few machine instructions now call at least one additional function, and often two. These functions will consume orders of magnitude more cycles to find the pointer, test its value, and record changes to its state (not to mention the function call/return overhead itself). In the code example in Diagram 2, "How compile-time instrumentation works," earlier in this paper, the four-line while loop, which consisted of three calls, ballooned to over 30 lines of code. Most of this code was added to make 20 additional calls to the checking functions of BoundsChecker Pro with CTI.

Return to the Table of contents.

Comparison of memory errors detected

The following table shows the memory errors detected by HeapAgent, Purify, BoundsChecker Pro with CTI, and BoundsChecker Standard, and compares error reporting and diagnosis features.

                                                       BoundsChecker
Errors detected                   HeapAgent   Purify   Pro w/CTI  Std.
======================================================================
Underwrites before the 
  beginning of heap objects          Yes        Yes       Yes      No
Overwrites beyond the 
  end of heap objects                Yes        Yes       Yes      Yes
Reads before the 
  beginning of heap objects          Yes        Yes       Yes      No
Reads beyond the 
  end of heap objects                Yes        Yes       Yes      No
Overwrites over internal 
  heap data structures               Yes        Yes       Yes      No
Reads of uninitialized 
  heap objects                       Yes        Yes       Yes      No
Writes into freed 
  heap objects                       Yes        Yes       Yes      No
Reads of free heap objects           Yes        Yes       Yes      No
Double frees                         Yes        Yes       Yes      Yes
Invalid parameters to 
  heap-related calls                 Yes        Yes       Yes      Yes
Retained references to 
  reallocated memory                 Yes        Yes       No       No
Leakage -- objects not freed 
  at end of program                  Yes        Yes       Yes      Yes
Leakage -- at other points 
  in program                         Yes        Yes       Yes      No
Stack underwrites                    Yes        Yes       Yes      No
Stack overwrites                     Yes        No        Yes      Yes
Reads of uninitialized 
  stack objects                      Yes        No        Yes      No
Reads beyond stack pointer           Yes        Yes       Yes      No
Underwrites of static 
  (global variables) memory          No         No        Yes      No
Overwrites of static 
  (global variables) memory          No         No        Yes      Yes
Reads of uninitialized 
  static (global variables) memory   No         No        Yes      No

Note The table above lists memory errors only. BoundsChecker also validates/detects Windows and OLE APIs and leaks, resource leaks, and, by instrumenting your source, invalid pointer manipulations and source code logic errors.

Also note BoundsChecker Pro has three error detection levels, Quick, Normal, and Maximum. More errors are detected at higher levels, but it's unclear from NuMega's marketing literature, printed documentation, or online help exactly which errors are detected at each level. The table above lists the errors detected at Maximum.

                                                       BoundsChecker
Error reporting                   HeapAgent   Purify   Pro w/CTI  Std.
======================================================================
File name, line number, and 
  stack trace identifying the call
  at which the error was detected    Yes        Yes       Yes      Yes
File name, line number, and 
  stack trace identifying the call
  that allocated the object          Yes        Yes       Yes      Yes
Pass count at file/line to 
  pinpoint specific object           Yes        Yes       No       No
Unique identifier for each object    Yes        No        No       No
Object parameters, including size, 
  contents, and address              Yes        Yes       Yes      No
User-defined object checkpoints      Yes        No        No       No


                                                       BoundsChecker
Error diagnosis                   HeapAgent   Purify   Pro w/CTI  Std.
======================================================================
Browse source code associated 
  with the detected error            Yes        Yes       Yes      Yes
Dump and browse contents of 
  memory related to the error        Yes        No        No       No
Format memory contents as ASCII,
  int, float, pointer, etc.          Yes        No        No       No
Select any byte of memory and 
  display the source that 
  allocated it                       Yes        No        No       No
Search the heap for allocations 
  matching filter criteria           Yes        No        No       No
Select any source line and 
  display the objects it allocated   Yes        No        No       No
Set breakpoints on 
  user-specified heap events         Yes        Yes       No       No
Change error detection settings 
  based on user-specified 
  heap events                        Yes        No        No       No
Break to the compiler-supplied 
  debugger from any error            Yes        Yes       Yes      Yes

Return to the Table of contents.

Additional comments on the errors detected

HeapAgent, BoundsChecker Pro with CTI, and Purify each use very different techniques to detect errors. These techniques determine when errors are, and are not, detected by each product.

Overwrites

Pure and NuMega like to point out how their OCI and CTI technologies, respectively, pinpoint the location of errors by catching them as they occur at runtime. They suggest that HeapAgent catches errors later -- meaning after many lines of code have executed. This is misleading in that overwrites are the only errors that Purify/BoundsChecker Pro catch earlier than HeapAgent in a given run. All other errors are caught at the same time by all three products.

The reason Purify and BoundsChecker Pro with CTI detect overwrites at the instruction level is that they explicitly check for overwrites before code instructions that can potentially cause an overwrite. HeapAgent, on the other hand, uses a background thread running at idle priority to perform continuous incremental scanning of the heap for actual overwrites.

With BoundsChecker Pro with CTI, you can only be sure you're catching all overwrites if you recompile every source file in your application, including third-party libraries. BoundsChecker Pro doesn't catch overwrites that are caused by a source file that hasn't been instrumented with CTI.

There are two reasons why HeapAgent's technique is, in practice, more effective at finding overwrites earlier in the development cycle:

Catching the instruction responsible for an overwrite, while often a valuable clue, does not by itself make the underlying logic error self-evident. The overwrite itself is merely a symptom of the underlying bug. The most common cause of overwrite errors is allocating too little memory for data that is stored. To diagnose such an overwrite, you need to know where the allocation was created as well as the allocation's size and contents. HeapAgent identifies the source location that created an overwritten allocation, and HeapAgent's allocation and dump browsers provide details on the exact contents, number of bytes, and data location of the overwrite, which makes diagnosis a snap.
Even more important to diagnosing an overwrite is detection as soon as possible after the error is introduced. Because HeapAgent is fast enough to be there every time the app runs, overwrite errors are always detected by HeapAgent's background thread and immediately reported to the programmer who introduced them. Pinpointing the instruction responsible for an overwrite isn't necessary in this case because the programmer already knows what logic changes he just made. On the other hand, if the memory checker can only be run occasionally due to its poor performance, as is the case with Purify and BoundsChecker Pro with CTI, an overwrite will be hard to diagnose even with instruction-level pinpointing. Because the code base being exercised may note have been checked for days or even weeks, any number of changes by any number of programmers may be responsible for the underlying logic error.

HeapAgent detects the same errors but without a performance hit. By being there all the time and catching each error as it's introduced, HeapAgent eliminates errors much earlier than the other tools, which are too slow to use all the time.

Return to the Table of contents.

Reads of uninitialized and freed memory

Purify and BoundsChecker Pro with CTI detect invalid memory reads the same way they detect overwrites: by explicitly checking for the error before every code instruction that reads memory. This technique generally lets both products detect all instances of these errors, though there are some exceptions. Because BoundsChecker Pro with CTI doesn't place guards before the beginning of memory allocations and doesn't implement deferred freeing, it can miss invalid reads and references to free memory caused by non-source-instrumented code.

HeapAgent, on the other hand, avoids this slow checking by leveraging the hardware's error-detection capabilities. HeapAgent fills free, uninitialized, and pre- and post-guard areas of memory with a unique fill value that is an intentionally invalid address. When a pointer is read from uninitialized, free, or out-of-bounds memory, the processor generates an exception fault as soon as the pointer is dereferenced because the fill pattern contains an invalid address.

HeapAgent then intercepts the exception fault and reports the error as an invalid read or write, identifying the file, line, and pass count of both the error and the allocation involved. This is much more efficient than inserting checking code between each memory reference -- with HeapAgent, no extra runtime overhead is incurred because the error is detected by hardware rather than software. The only time HeapAgent won't detect an invalid reference is when the value read from free memory is an integer or floating-point value instead of a pointer.

For the most part, however, references to uninitialized memory are caused by a programmer forgetting to initialize an entire data structure. HeapAgent detects references to the items in the data structure that contain pointers; when the first invalid reference is detected, the underlying logic error -- the failure to initialize the data structure -- will be obvious. The approach used by Purify and BoundsChecker Pro with CTI, which reports each uninitialized member of the data structure, is not only slower but redundant and overwhelms the programmer with volumes of irritating reports of the same logic error.

Likewise, the vast majority of references to free memory involve reading a class member of a deleted object. HeapAgent immediately detects any dereference of a pointer stored in a deleted object, including any call through a virtual function. Here again, the underlying logic error (premature free of the object) is apparent from HeapAgent's error report. Error reports of reads of other (non-pointer) members of the data structure would be of no further help because the error is already identified.

As with overwrites, HeapAgent detects the same errors but without a performance hit. By being there all the time and catching each error as it's introduced, HeapAgent eliminates errors much earlier than the other tools that are too slow to use all the time.

Return to the Table of contents.

Stack errors

HeapAgent will miss reads of non-pointer values from unallocated or uninitialized stack space. HeapAgent doesn't detect under/overwrites of individual local variables, only under/overwrites of the stack frame as a whole. Purify doesn't detect stack errors within any stack frame currently in use. Purify only detects reads/writes beyond the current stack pointer.

Leaks

Purify and BoundsChecker Pro with CTI use garbage collection and source code analysis, respectively, to identify what they believe are "true" memory leaks. HeapAgent, on the other hand, identifies all memory allocations that the program fails to free.

HeapAgent's leak detection is designed to identify every object that was not deleted. We chose this technique because many apparently benign memory leaks actually expose logic bugs whose ramifications go far beyond the actual leak itself. This is especially true in C++, where failing to delete an object also means that its destructor is not called. Since important resource management is often done during object destruction, failure to report this type of memory leak can later result in unforeseen problems. For diagnosis of leaks, HeapAgent provides an allocation checkpointing mechanism, a search and filter facility, and allocation browsers that let a programmer see the contents of and relationship between any subset of allocations in the heap in real time. Finally, any time a HeapAgent user determines that a reported leak is not significant, the "suppress" button lets the user turn off reporting of the error in all future runs.

Purify and BoundsChecker Pro with CTI use different leakage detection techniques that do not report seemingly "benign" leaks. As a result, these products will sometimes miss the serious logic errors that these leaks represent.

Worse, the technique of source code analysis used by BoundsChecker Pro with CTI often reports false positives where there is no leakage at all. The following is an example:

class Foo {
    Foo() { foo_global = this; }
} *foo_global;

void main()
{
    new Foo();          // BoundsChecker Pro with CTI 
                        // reports leakage
    delete foo_global;  // object is deleted here
}

Return to the Table of contents.

Pure and NuMega redundantly detect errors caught by the hardware

Our "errors detected" list above is smaller than the lists advertised by Pure and NuMega. One reason is that our list includes only memory errors. Another reason is that we haven't included, in our list, all of the errors that Purify and BoundsChecker Pro with CTI claim credit for but that are, in fact, detected by the hardware or the operating system.

Following are the errors on the Purify or BoundsChecker Pro with CTI "errors detected" lists that are automatically detected by the hardware or OS and reported by the Visual C++ debugger right at the responsible line of code. Because Purify and BoundsChecker Pro with CTI are inserting checking code before every instruction, they do detect some of these errors in software rather than leaving it to the hardware, but there is no net error detection gain to you (only slower performance).

Errors detected            Claimed by           Claimed by
by hardware or OS            Purify     BoundsChecker Pro with CTI
==================================================================
Writing null pointer          Yes                  Yes
Reading null pointer          Yes                  Yes
Freeing null pointer                               Yes
Array parameter is null                            Yes
Expression uses null                               Yes
Function pointer is null                           Yes
Exception continued           Yes
Exception handled             Yes
Exception ignored             Yes
Exception unhandled           Yes
Exception continued           Yes
Output debug string           Yes

Return to the Table of contents.

Conclusion

Pure's and NuMega's advertisements of error-detection omnipotence sound impressive until you realize that memory errors their products detect aren't markedly different from HeapAgent's, that they can require special builds, and that they can slow runtime performance by 10x to 900x or more, even in simple test cases. Both BoundsChecker Pro's compile-time instrumentation and Purify's object code insertion techniques incur substantial slowdowns because checking code is being inserted between every instruction in the app! The bottom line: both Purify and BoundsChecker Pro were designed as testing tools, not as development tools.

HeapAgent's design goal was very different from that of Purify and BoundsChecker Pro with CTI. We wanted to detect the same errors, but without the compile/link obtrusiveness and runtime degradation endemic to these products. As a result, HeapAgent's detection techniques were carefully designed to avoid any runtime slowdown by leveraging the multi-threading and hardware exception capabilities of modern OSs and CPUs, thereby producing a tool that programmers can use every time they run their apps.

HeapAgent's design is centered around the fact that the cause of runtime errors is logic errors introduced by programmers. By giving programmers a tool that they'll actually use every day, you'll leverage your most valuable asset in eliminating runtime errors: the programmers who introduce them. Nothing else you do could possibly have the same impact on the quality or timeliness of your software deliverables.

By all means, get a copy of Purify and/or BoundsChecker Pro to use for regression testing. But if you're serious about eliminating runtime errors from your software without killing productivity, give each of your programmers a copy of HeapAgent that they'll use "every run."

Return to the Table of contents.

Comparing and contrasting the runtime error detection technologies used in HeapAgent™ 3.0, Purify NT® 4.0, and BoundsChecker Pro™ 4.0

Table of contents

Heap-and stack-memory instrumentation

Heap-call validation

Deferred freeing

Heap browsers

1. Your heap expands with memory from the OS

2. Your application calls operator new

3. Your application enters a value in the allocation

4. Your application calls operator delete

5. Whenever your application is running but not using every CPU cycle

Diagram 2: How compile-time instrumentation works

Your source code before instrumentation

Your source code after instrumentation

Before

After

Before

After

Before

After

Before

After

Before

After

Compile/link time tests

How HeapAgent affects compile/link time

The test and the results

The test environment

The test code

Running the benchmark tests

HeapAgent test

Purify test

BoundsChecker test (without compile-time instrumentation)

BoundsChecker test (with compile-time instrumentation)

The results

Comparing and contrasting the
runtime error detection
technologies used in
HeapAgent™ 3.0,
Purify NT® 4.0, and
BoundsChecker Pro™ 4.0

2. Your application calls `operator new`

4. Your application calls `operator delete`