For many applications, simply relinking with SmartHeap results in a noticeable performance improvement. However, you may not see an immediate improvement, depending on how your application uses dynamic memory, the test cases you try, and the hardware you use to measure performance.
This appendix provides some suggestions for evaluating performance, and some specific guidelines for benchmarking memory managers.
Memory manager performance is inherently difficult to measure because the time required for a call to the memory manager depends on so many variables, including:
The number of allocations currently in the heap (allocation speed is proportional to heap size).
The size of previous allocations (the greater the variety of allocation sizes, the slower subsequent allocations will be).
The number of free blocks in the heap and degree of their fragmentation.
The amount of free physical memory (if the heap size exceeds physical memory in a virtual memory environment, page faults will affect performance more than any other factor).
A real-world application that heavily uses dynamic memory (especially a C++ application) typically starts out by creating a large number of small allocations of different sizes. Eventually, the application reaches a plateau where allocations are equally balanced with frees. This is the worst possible scenario for traditional malloc/new implementations because it fragments the heap into a multitude of small free blocks. Each subsequent allocation request requires a long search over the same list of small fragments of memory to find a block big enough to fulfill the request. In virtual memory environments, the situation is further aggravated because the heap’s list of free blocks spans many pages of the virtual memory space. As a result, the application wastes a large portion of its execution time on repeated page faults. SmartHeap is specifically designed to overcome the deficiencies of these allocation environments.
To construct a test case that most realistically exercises a memory manager:
1. Create a lot of allocations (at least tens of thousands) of random sizes in the range that is typical for your application (generally heavily weighted to allocations under 100 bytes).
2. Once the number of allocations is comparable to the number in use when your application is heavily stressed, proceed to a random mix (in equal proportions) of allocations and frees.
3. If your application resizes allocations, include an appropriate number of calls to realloc.
For virtual memory environments, your test must be run on the minimum physical memory configuration supported by your application. In this case, it’s especially important that you allocate a large amount of memory to evaluate the swapping behavior of the memory manager. Having a total heap size that exceeds free physical memory is the best way to demonstrate the strengths and weaknesses of a memory manager.
To create a memory manager benchmark test case that best reflects how your application will run on a day-to-day basis, take care to avoid the following pitfalls:
Don’t make a comparison based on a small number of allocations: Allocation speed degrades in proportion to the number of allocations in the heap, so it’s important that you test with realistically large heap sizes. Most allocators generally degrade markedly as heap size increases, and you’ll want to test the level of degradation.
Don’t just test a series of allocations without any frees: An allocator’s most time-consuming task is the search among previously freed allocations, which are now available for reuse. All allocators are relatively fast when there aren’t many free blocks.
Don’t create allocations of just one size: The search for a free block is always trivial if all blocks on the free list are the right size. Allocators are challenged only when they have a large free list of blocks in a variety of different sizes.
Don’t use a machine that contains more physical memory than your typical user has: Swapping occurs only when free physical RAM is exhausted, and this condition slows down allocators more than any other.
The SmartHeap implementation includes many features that maximize performance. These features include the following:
SmartHeap uses its own page table to ensure that heap searches have the best possible locality of references, which minimizes swapping.
The per-allocation overhead in SmartHeap is very small. For most platforms, including 16-bit Windows, NT, and OS/2, SmartHeap uses only 2 bytes to store header information for variable-size allocations, and SmartHeap fixed-size allocations have zero overhead. This compares with per-allocation overhead as high as 16 bytes with some memory managers (8 bytes of overhead is common). As well as saving space, smaller overhead reduces swapping by reducing the working-set size of your application.
When memory is freed in SmartHeap, blocks are coalesced with neighboring free blocks without any searching. Many memory managers either fail to coalesce free blocks, resulting in greater fragmentation, or perform a search of the heap to coalesce, which can make performance of free painfully slow.
SmartHeap allocates small blocks (those under 256 bytes) with an extremely fast algorithm rather than the normal general-purpose algorithm.
You can benefit from these features simply by relinking, since SmartHeap replaces malloc and operator new.
You’ll achieve an even greater performance improvement by creating multiple memory pools as described in section 2.4, “Using memory pools.” This technique gives you several performance advantages:
Allocations are faster because heap sizes are smaller and allocations vary less in size and extent.
Locality of reference improves, which can drastically reduce swapping in virtual memory environments. If allocations that are referenced in sequence are allocated from the same pool, speed of references improves here as well as speed of allocations.
You can create multiple fixed-size pools of different sizes to take advantage of SmartHeap’s fixed-size allocator, MemAllocFS, which uses a much faster algorithm than variable-size allocators such as malloc. MemAllocFS is especially valuable if your application creates large numbers of allocations of the same size.
If you’re coding in C++, you can have selected classes use SmartHeap’s fixed-size allocator without changing any of the calls to operator new in your application. To do this, overload operators new and delete for the classes that your application creates many instances of. For an example of this technique, see the code example for “new” in §4.2, “Function reference,” or see the “CPP” sample application in the SmartHeap samples directory.
If you want to evaluate SmartHeap’s performance without changing your code to call SmartHeap allocation APIs, you can fine-tune SmartHeap by changing the page size. SmartHeap allocates fixed-size pages of memory from the operating system and then sub-allocates from those pages. You can control the size of the pages that SmartHeap allocates from the operating system using the MemDefaultPoolPageSize global variable.
unsigned MemDefaultPoolPageSize = 8192;
You can experiment with different page sizes between 4096 and 65535 to see how performance is affected:
If you create a lot of large allocations, you may be able to improve performance by increasing the page size. For example, if your application creates many allocations larger than 4K, try increasing the page size to 32K or 64K.
If your application is swapping heavily, you might get the best performance by setting the page size equal to the virtual memory page size (4K). The default page size is documented in the Getting Started and Platform Guide.
Finally, if you are observing no performance improvement with SmartHeap in a test case that follows the above guidelines, it’s possible that SmartHeap is not being linked into your application. Make sure the SmartHeap Library appears before any other library on the linker command line, and check the map file for the symbol SmartHeap_malloc.
Also, make sure you don’t test performance with the Debug version of SmartHeap (usually a SmartHeap Library name ending in “d”) — Debug SmartHeap does a lot of error checking and is much slower than Runtime SmartHeap.