Improving Memory Compaction with Dedicated Malloc Arenas
malloc’s design can make it difficult to return memory to the OS
A POSIX process shares an address space, a memory allocator, and its heap with the shared libraries that it uses. Because the application and the shared library are allocating memory in the same heap, it can be difficult to develop compact, memory-efficient services, even if there are no memory leaks.
An application might process a data stream and dynamically allocate memory as it processes elements in that stream. For example, a stream might describe a list of applications in a package repository, and the process might allocate memory for entries detailing the release history and for application icons. If the process uses a shared library as it processes elements, the shared library might also dynamically allocate memory for a private internal cache. In such a case, the heap will contain application allocations interleaved with allocations from the shared library.
Even if the application reliably tracks its allocations and frees them when it finishes processing the data stream, the heap might still contain small allocations from the shared library, which prevent libc from returning memory to the operating system. A small number of library allocations can keep a large amount of application memory occupied and prevent it from being returned to the OS.
One Arena with Interleaved Allocations
When both application and library code allocate from the same arena, their allocations become interleaved in memory.
Freeing memory allocated by the application may not be sufficient to return memory to the operating system and reduce RSS because library allocations are scattered throughout the arena, fragmenting the heap. In this illustration, a few small library allocations prevent a large heap from being returned to the OS. This illustrates how a small amount of uncontrolled allocation can have an outsized impact on memory compaction.
Dedicated Arenas for Library Code
glibc already provides per-thread arenas, to reduce lock contention when allocating memory in a threaded process. I’d like to propose exposing an interface that allows an application to request an arena handle, and to set a preferred arena for a thread.
Hypothetically, a malloc implementation could allow an application to register new memory arenas. The application could then set the preferred arena for a thread to an arena dedicated to a shared library before calling that shared library’s functions, and restoring the default arena on return.
By segregating the arenas used by a shared library and by the rest of the process, an application could avoid allocations that it can’t track within its own memory arena, which would improve its ability to compact its memory.
Using dedicated arenas, application and library allocations can avoid interleaved allocations. When the application’s allocations are contiguous and its arena is free of untracked allocations, the application can reduce its resident size when it releases allocations.
Example Implementation
For example, the application might look something like:
#include <malloc.h>
static arena_hd *netio_hd = NULL;
static void
app_register_netio_hd() {
if (netio_hd) return;
netio_hd = malloc_new_arena();
if (netio_hd == NULL) {
// check errno and handle allocation failure
}
}
static void
app_process_element(AppElement *element) {
arena_hd *current;
// Switch to a dedicated arena
current = malloc_set_arena(netio_hd);
netio_process_element(element);
// Restore the default arena
malloc_set_arena(current);
}