|
OverviewIndustry leaders at Algorithmics, BMC, Boeing, Citigroup, Cognos, Discreet, Ericsson, Hewlett Packard, Hyperion, i2, Microsoft, OpenWave, Oracle, Raytheon, SAS, and many others know memory management can affect an application's performance and reliability more than any other factor. These companies and many others rely on SmartHeap for their memory manager. Why? Because SmartHeap is the fastest, most portable, and most reliable allocator available. In addition, SmartHeap includes complete heap error detection facilities. SpeedSmartHeap's proprietary algorithms deliver unparalleled malloc/new performance in Windows and UNIX (Sun, HP, IBM, SGI, Red Hat, SuSE). Benchmarks show SmartHeap is 2X to 100X+ faster. SmartHeap also provides multiple memory pools, which improve locality and further eliminate fragmentation Error detectionSmartHeap doesn't stop with blazingly fast performance. It also provides the most complete heap error detection available. Memory bugs are typically the most insidious, spurious, and damaging bugs an application faces. Because SmartHeap controls and manages the heap, it can detect bugs other add-on debugging tools miss. In addition to providing better error detection, SmartHeap uses its knowledge of the heap to report unsurpassed detail about the cause of each error. Bugs that SmartHeap detects include leakage, memory overwrites, double-freeing, wild pointers, invalid parameters, out of memory, references to previously freed memory, and so on. PortabilityIn addition to outstanding speed and complete error detection, today's memory manager must be easily portable. SmartHeap ships as a binary linkable library for Windows and the major UNIX OSs. Each version provides an identical API, but is specifically optimized for that particular environment. Finally, SmartHeap's malloc and new are strictly ANSI compliant, so you don't have to code to a proprietary API to realize the benefits of SmartHeap. ReliabilityBecause SmartHeap is a runtime library product, it must deliver absolutely bullet-proof reliability. To ensure the ultimate in error-free operation, we built a certification tester which calls each of the SmartHeap APIs hundreds of thousands of times. This tester actually includes more lines of code than SmartHeap and was designed to comprehensively test all possible conditions SmartHeap might face, thus proving that all of the bugs are out. The bottom line? We guarantee that SmartHeap is faster, more reliable, more portable, and more complete in its heap error detection than ANY memory manager you're using -- or your money back.
Return to the Table of contents. SmartHeap's architectureSmartHeap is a faster malloc/new library because its underlying algorithms are superior to those used in compiler-supplied libraries. The problem: heap management is harder than you thinkAt first glance, writing a malloc/new library appears to be a very simple task; all the library has to do is allocate and free a few blocks. However, building a library that can handle a random mix of thousands or millions of allocations and frees of objects from one byte to megabytes in size, while running on a virtual memory, pre-emptively multi-tasking and multi-threading operating system, is not so easy. At least not if you also want the library to be fast and efficient in all conditions. Applications must manage large numbers of objectsToday's applications, especially those written in C++, tend to be more memory intensive than ever before, often allocating, freeing, and referencing hundreds of thousands, or even millions, of small objects. Their allocation patterns are random, with calls to new interspersed with calls to delete. As a result, the heap quickly evolves into a fragmented, chaotic jumble. This fragmentation, in turn, causes many commercial allocators to "hit the wall" - the performance of the allocator degrades exponentially as the heap size grows or when the allocator operates in virtual memory, rather than physical memory, conditions. Large-footprint operating systems compete with your application for precious RAM32-bit virtual memory operating systems provide the advantage of a huge address space. However, these operating systems themselves are huge, at least relative to the typical RAM configuration. As a result, they compete with your application for precious RAM and force your application's heap to swap far more frequently. Windows 95, for example, has a footprint of some 14 MB - several times the size of Microsoft's suggested 4 MB minimum memory configuration.) Its sheer size relative to typically available physical memory guarantees that your application's heap will always be at least partially non-resident, so each call to allocate, free, or even reference memory is likely to invoke agonizingly slow disk hits. Windows NT and UNIX systems run on machines with more memory, but their respective footprints are also larger, as are the apps that run on them. As a result, the identical competition for memory and associated performance degradation occurs. Pre-emptive multi-tasking and multi-threading operating systems increase swapping frequencyPre-emptive multi-tasking in Windows and UNIX further degrades performance. For example, your application (or one of its threads) may be in the middle of traversing a data structure when the operating system turns the processor over to another application or another thread. When your application (or that thread) gets its next slice of processor time, its data will often have been swapped to disk. Multiple threads exacerbate the problem still further. In multi-threaded environments, objects are normally serialized so that only one thread can be active in the heap at a time. This makes the heap a real bottleneck for multi-threaded applications and affects performance:
Return to the Table of contents. The solution: one allocator, three algorithmsProducing an allocator that's fast and efficient for objects of all sizes is not so easy. The algorithms that work best for allocating and freeing small objects don't work as well on large objects, and vice versa. SmartHeap solves this problem by implementing three distinct algorithms, one for small objects, one for medium-sized objects, and one for large objects. You don't have to change your code at all; you simply call new or malloc, just like before, and SmartHeap automatically uses the appropriate algorithm for the specified object size. Moreover, each SmartHeap algorithm scales well to very large heap sizes and is efficient in both physical and virtual memory conditions. Allocating small objects (under 256 bytes)The speed-space tradeoff of fixed-size allocatorsIf you've studied memory management, you know that a fixed-size allocator will always be faster than a variable-size allocator (2-10X faster, if not more, depending on heap-size). Fixed-size allocators reduce memory management to simple free-list management, which is extremely fast. Rather than searching the heap for the best fit, the fixed-size allocator can simply pick the head off the free list. You also know that the vast percentage of objects allocated by C++ apps are for things like fixed-size structures and classes--objects that tend to be smaller than 256 bytes. If you could only find a way to get fixed-size allocation performance for all of these objects, you could get an immediate performance boost. The tradeoff has always been that fixed-size allocators waste much more memory unless all objects are the same size. And when you're dealing with an entire heap, you're going to run across a wide spectrum of block sizes. Therefore, if you want to use a fixed-size allocator, you have to tediously analyze your code to find out how many objects you have of each size, create multiple fixed-size "pools" (mini-heaps) which correspond to where the object sizes congregate, and finally change your source to specifically call these specific fixed-size allocators. All this analysis and recoding takes time and precludes you from using an off-the-shelf malloc/new lib. Alternatively, you could choose a single fixed-size allocator that handles all objects below a certain size and routes all others to a variable-size allocator. But this technique, which SmartHeap used in the 2.x release, isn't optimum either. No matter where you draw the line, any object smaller than the chosen size wastes memory. For example, if you route all objects up to 32 bytes long to a fixed-size allocator, every object smaller than 32 bytes wastes 32 - object size bytes of memory. This waste causes the heap to grow unnecessarily large. So you end up choosing a fixed-size level so small that only a few objects use the much faster fixed-size allocator, and overall performance improvement is negligible. SmartHeap's fixed size allocator is fast and memory-efficientTo get around the speed versus waste tradeoff common to fixed-size allocators, SmartHeap dynamically establishes a separate fixed-size pool for each object size up to 255 bytes. You get 255 allocators without touching your code! For example, when your code first calls malloc or new to create a 32-byte object, SmartHeap automatically creates a 32-byte fixed-size pool and then allocates this object from it. All subsequent 32-byte objects are also allocated from this pool. The pools SmartHeap uses internally for these small allocations are very low-overhead, and free storage is shared between all the different sizes. So SmartHeap doesn't have the problem common to other fixed-size allocators of wasting reserved memory that is dedicated to each individual pool.This technique delivers the performance of a fixed-size allocator and because every object maps perfectly to its own fixed-size allocator, the per-object overhead of the SmartHeap small-object allocator is only a single byte for Win 16, Win 32, and OS/2. (It's five bytes for UNIX and Mac platforms.) In comparison, Visual C++ 4.0 incurs 16 bytes of overhead for every object allocated, an amount that is often larger than the actual objects being allocated. UNIX allocators incur from 8 to 16 bytes per object, depending on the vendor. Allocating medium-sized objects (256 bytes to 64K)The fixed-size allocator that SmartHeap uses for objects smaller than 256 bytes isn't appropriate for larger blocks. While small blocks commonly hold thousands of repeat instances of fixed-size structures and classes, larger blocks commonly hold variable-size objects such as arrays and buffers which are rarely reused. Tying up memory to maintain a fixed-size pool for an object size that is rarely repeated causes the heap size to grow (and stay) unnecessarily large. Hence, for objects larger than 256 bytes but smaller than the operating system page size or system allocator granularity, SmartHeap uses a very efficient variable-size allocation algorithm. The perils of locality of reference when allocatingThe problem with conventional variable-size algorithms is that they effectively treat the heap as one large region, maintaining a single free list that spans the entire heap. Over time, as objects are continually allocated and freed, the free list ultimately degenerates into a random path of pages in the heap. This causes the allocator to jump from page to page as it traverses the free list, which it must do on every call to malloc/new and sometimes even on every call to free/delete. The heap in these conventional implementations exhibits poor page locality: data that is referenced consecutively (in this case, the heap free list) isn't stored in the same page of the heap. The free list's lack of data locality wouldn't be a big problem if each free block were always large enough to satisfy each subsequent allocation request and if the heap were always entirely resident in physical memory. However, the same cycle of allocating and freeing that randomizes the free list also fragments the heap. This causes an ever-shrinking average block size, which, in turn, lessens the likelihood that "the next" free block in the list will be large enough to fulfill the current request. Moreover, as discussed above, most applications don't run purely in physical memory. As a result, a call to malloc or new often touches multiple pages while looking for a free block large enough for the object, and some of these touches invoke performance-killing disk hits. When a heap is fragmented and in tight memory conditions, a single allocation call can take a second or more as the allocator thrashes while traversing the free list. Most UNIX allocators and Microsoft's Win32 allocator improve on this by storing the free list and associated header information in a memory area separate from the blocks of data. Because the heap headers are smaller (usually eight bytes each) than the actual data, the free list can be stored more compactly, so data locality improves and swapping is reduced. However, separating the free list from the data still doesn't eliminate swapping. For large heaps, the free list continues to span a large number of pages, so traversing it can still touch multiple pages. In addition, for heaps with a small median object size (common in C++), very little space is actually saved because the objects themselves take up very little space. So the free list turns out to be only marginally smaller, and substantial swapping still occurs. Locality of reference also affects deallocation performanceLocality of reference is not just an issue when allocating memory; it is equally important when freeing memory. To minimize fragmentation, most allocators "coalesce," or merge adjacent free blocks to create a single larger space. To determine whether adjacent blocks are free, and thus could be merged, some allocators traverse the entire free list during calls to free. As a result, the same consequences of free list locality apply. SmartHeap's unique page table algorithm maintains better locality of reference and reduces swapping when allocating memoryFor medium size objects, SmartHeap uses a much smarter algorithm that virtually eliminates swapping while traversing the free-list. While other allocators treat the heap effectively as one large region, SmartHeap divides the heap into discrete pages that correspond with (and are perfectly aligned with) the pages of the underlying operating system virtual memory manager. And, also like the operating system, SmartHeap maintains a compact page table that keeps track of all of the pages in the heap. For each page in the heap, SmartHeap's page table stores the size of the largest free block in that page. This page table is much smaller than the compiler allocator's free list because the page table has just one entry per page, rather than one entry per free block in the heap. Rather than searching one long list of free blocks (and touching many pages in the process), SmartHeap quickly scans its much smaller page table for a page that it knows has space for the current allocation request. SmartHeap's actual free list is contained inside each page -- since the free list doesn't reference any other pages, only a single heap page is referenced during a SmartHeap allocation. With this technique, SmartHeap virtually eliminates swapping during allocation calls. Allocation speed is one clear benefit of SmartHeap's page-based allocation algorithm, but there is a more subtle benefit that can have an even greater impact on your application's overall performance. We mentioned earlier how the free list in traditional allocators follows a random path through the heap. A consequence of this is that each object that your application creates will lie on a random page within the heap. SmartHeap, on the other hand, with its page-centric free list, always tries to allocate consecutive objects from the same page. The result is that the data referenced by your application has better locality. Applications often reference (and free) memory in the same pattern in which they allocate it. For example, elements successively inserted into a list will be allocated and referenced in the order of the list links. Therefore, referencing this data will involve accessing fewer pages, which further minimizes swapping. SmartHeap's coalescing algorithms eliminate free list traversing when deallocating memoryAs we mentioned above, many compiler allocators traverse the free list on every free to determine whether or not adjacent blocks are free, and thus can be merged. As a result, deallocation performance degrades more as the heap size (and thus the free list size) grows. SmartHeap, on the other hand, doesn't rely on the free list at all to determine if adjacent blocks are free. Instead, it maintains special bits that indicate whether the adjacent blocks are free in the block headers for each object.. With each free/delete, SmartHeap checks this local header information and immediately coalesces any adjacent free blocks. This technique is constant time; it is not affected by the size of the heap. As a result, on all but the most modestly size heaps, SmartHeap is often orders of magnitude faster when freeing memory than compiler allocators. Allocating large objects (over 64K)SmartHeap treats large objects - those larger than the system page size and system allocator granularity - separately from either small or medium-sized objects. On platforms such as NT that provide an efficient large-object allocator, SmartHeap passes large object allocation requests directly to the OS. On other operating systems, such as Unix, that don't provide an efficient heap for large objects, SmartHeap implements its own large object allocator. You can control the threshold between "medium" and "large" with a SmartHeap API. The default value is different on each platform, but is generally between 4K and 64K. Other cool features in SmartHeapUse SmartHeap's multiple pools on a data usage basis to achieve performance gains when referencing dataAs we discussed above, SmartHeap automatically and transparently uses multiple memory pools for objects smaller than 256 bytes. In addition, you can explicitly create additional pools to further improve the performance of your application with minimal coding effort. (To allocate from a particular pool, you simply override the definition of new for a given class.) Multiple pools let you partition your data (regardless of the size range of the objects) into discrete "mini-heaps." This has a number of benefits:
Shared memoryShared Memory in Win32In Windows NT and Windows 95, each 32-bit process has its own separate address space. To allow sharing of memory between processes, Microsoft's Win32 API provides memory-mapped files. Beginning in version 3.1, SmartHeap supports Windows 95 and Windows NT shared memory. SmartHeap uses memory mapped files to implement shared memory pools. All of the SmartHeap allocation APIs that accept a memory pool parameter support Win32 shared memory. Because SmartHeap allocation APIs return direct pointers to memory, SmartHeap requires shared memory pools to be mapped to the same address in each process. In Windows 95, this is not a problem since shared memory is always mapped to the same address in each process. NT, however, does not guarantee that shared memory is mapped to the same address in each process. To solve this problem, SmartHeap includes the API MemPoolInitNamedSharedEx. This API lets you specify the address at which a shared memory pool should be mapped and/or an array of process IDs (pids) that will access the shared pool. If you specify a non-NULL value for pids, SmartHeap will search the address space of each of these processes to find a suitable address that is available in all of the processes. If you specify address as NULL, SmartHeap chooses a random address in the upper half of the application address space for NT, or a random address in the shared memory address space for Win95. This minimizes the chance of collisions with other shared pools or VirtualAlloc objects (which are normally allocated from the beginning of the address space). If all of the processes that will share a memory pool are running at the time you create the memory pool, you can have SmartHeap find an address automatically by specifying pidCount and pids parameters. In this case, the shared pool will be mapped into each process's address space before the MemPoolInitNamedSharedEx call returns. If there is no address space region of suitable size available in every process, MemPoolInitNamedSharedEx will fail (this would be very unusual considering that each process has 2 GB of address space). Win32's memory mapped files have a granularity of 4K: there's no heap API for allocating smaller blocks of shared memory. SmartHeap includes a malloc-like API to efficiently allocate small blocks (as small as 4 bytes) of memory and a free-like API to free individual small blocks. SmartHeap also provides overloaded operators new and delete for shared memory in C++. These APIs let you create and destroy sharable data structures of any kind, including those that contain pointers. This ability to allocate very small blocks and to map memory-mapped files at the same address in each process make SmartHeap extremely useful when porting 16-bit code to Windows 95 or Windows NT. Note: To guarantee that a set of processes will be able to successfully share a memory pool in NT, you must use the DLL version of SmartHeap. Shared memory for other platformsBeginning in version 3.0, SmartHeap supports UNIX shared memory. SmartHeap uses the shared memory and semaphore facilities of the standard InterProcess Communication (IPC) package to implement shared memory pools. You can use all of the SmartHeap allocation and de-allocation APIs with shared memory pools. In debug SmartHeap, shared memory pools fully support all of the same debugging facilities as private memory pools. See the Getting Started and Programmer's Guide for platform-specific details of the SmartHeap shared implementation on your platform. How important is malloc/free speed?Consider a typical application, which spends 40% of its total execution time on managing memory and takes 10 minutes to run. The table below shows how a faster memory management library affects this application. then malloc/new the app and the If malloc/new is takes this takes this entire app is this much faster... much time... much time... this much faster ------------------------------------------------------------------------- no change (1X) 4.00 minutes 10.00 minutes 0% 1.5X 3.60 minutes 9.60 minutes 4% 2X 2.00 minutes 8.00 minutes 20% 4X 1.00 minutes 7.00 minutes 30% 10X 0.40 minutes 6.40 minutes 36% 100X 0.04 minutes 6.04 minutes 39.6% Note that even a 4X improvement in malloc can result in a 30% overall application performance improvement -- and remember that SmartHeap is generally a minimum of 4X faster than other commercial allocators and requires just a relink to implement.
Return to the Table of contents. PortabilitySmartHeap provides portability to a broad set of platforms from a single, ANSI C compliant source code base. We support compilers from Microsoft, Borland, IBM, SUN, HP, Red Hat, SGI, and others. Platform-specific binary versions that are ready to be quickly and easily linked directly into an application are available for Windows (x86, x64 and itanium), Sun Solaris, IBM AIX, HP-UX, Red Hat, SuSE, SGI, and other platforms. Source code licenses are also available for all of these platforms and include the necessary .mak files for specific platforms. To maximize performance and efficiency, we isolated all platform dependencies into a single module of SmartHeap. This module is carefully tuned for each platform using manifest constants to control such architecture-sensitive variables as alignment, system page size, pointer size, and integer size. The following examples illustrate how SmartHeap is carefully tuned for each platform:
You can also readily compile SmartHeap on platforms and operating systems for which MicroQuill has not yet provided integration. Please contact MicroQuill for pricing and support details.
Return to the Table of contents. Debugging and error detection featuresIn addition to incredible runtime performance, SmartHeap provides the most complete heap error detection available. Because SmartHeap "owns" the heap, it not only detects more errors, but provides greater detail about each error than that provided by other "add-on" memory debuggers. As SmartHeap's debugging version allocates each block, it keeps track of the following information:
The SmartHeap debug library provides three levels of error detection, from simple to very exhaustive. dbgMemSetSafetyLevel controls how error checking SmartHeap performs. The three "safety levels" are:
Information that SmartHeap includes in error reportsWhen SmartHeap detects an error, the following information is included in the error report:
Where you can send error reportsYou can specify that SmartHeap send error reports to any combination of the following locations:
Errors detected by SmartHeapSmartHeap detects the following types of errors:
For an example of error detection and reporting, see the sample program and output SHTESTD.C and SHTESTD.OUT on the following pages. A sample application that illustrates Debug SmartHeap error reportsThe following pages show the SHTESTD.C sample application, which illustrates Debug SmartHeap error reports. Following the listing of the test program is the output it generates. Note that the line numbers in the sample code correspond with those reported by SmartHeap in the error report. 1 2 3 4 5 /* Note: the SmartHeap header file must be included _after_ any 6 * files that declare malloc, etc. 7 */ 8 #include "smrtheap.h" 9 #include "shmalloc.h" 10 11 #ifndef MEM_DEBUG 12 #error shtestd.c must be compiled with MEM_DEBUG defined 13 #endif 14 15 #define TRUE 1 16 #define FALSE 0 17 18 19 20 int main() 21 { 22 MEM_POOL pool; 23 unsigned char *buf; 24 int i; 25 unsigned char c; 26 27 dbgMemSetSafetyLevel(MEM_SAFETY_DEBUG); 28 dbgMemSetDefaultErrorOutput(DBGMEM_OUTPUT_PROMPT 29 | DBGMEM_OUTPUT_CONSOLE | DBGMEM_OUTPUT_FILE, "shtestd.out"); 30 31 pool = MemPoolInit(0); 32 dbgMemPoolSetCheckFrequency(pool, 1); 33 dbgMemPoolDeferFreeing(pool, TRUE); 34 dbgMemPoolSetCheckpoint(pool, 1); 35 36 buf = MemallocPtr(pool, 3, 0); /* this alloc never freed (leakage) */ 37 38 /* invalid buffer */ 39 MemPoolInfo(pool, NULL, NULL); 40 41 /* invalid pointer parameter */ 42 MemFreePtr((void *)ULONG_MAX); 43 44 /* underwrite */ 45 c = buf[-1]; 46 buf[-1] = 'x'; 47 MemValidatePtr(pool, buf); 48 buf[-1] = c; 49 50 /* overwrite */ 51 buf = MemallocPtr(pool, 3, 0); /* more leakage */ 52 c = buf[3]; 53 buf[3] = 'z'; 54 MemValidatePtr(pool, buf); 55 buf[3] = c; 56 57 dbgMemPoolSetCheckpoint(pool, 2); 58 59 /* write into read-only block */ 60 buf = MemallocPtr(pool, 10, MEM_ZEROINIT); /* more leakage */ 61 *buf = 'a'; 62 MemValidatePtr(pool, buf); 63 dbgMemProtectPtr(buf, DBGMEM_PTR_READONLY); 64 *buf = 'b'; 65 MemValidatePtr(pool, buf); 66 *buf = 'a'; 67 dbgMemProtectPtr(buf, DBGMEM_PTR_NOFREE | DBGMEM_PTR_NOREALLOC); 68 free(buf); 69 realloc(buf, 44); 70 71 /* double free */ 72 buf = malloc(1); 73 dbgMemPoolDeferFreeing(MemDefaultPool, TRUE); 74 dbgMemPoolSetCheckFrequency(MemDefaultPool, 1); 75 for (i = 0; i < 3; i++) 76 MemFreePtr(buf); 77 78 /* write into free block */ 79 c = *buf; 80 *buf = 'a'; 81 calloc(1, 3); 82 *buf = c; 83 84 dbgMemReportLeakage(pool, 1, 2); 85 86 return 1; 87 } SmartHeap error output from SHTESTD.C
|