Anyone who has used or learned C is no stranger to malloc. Everyone knows that malloc can allocate a contiguous memory space and can be freed by free when it's no longer needed. However, many programmers are not familiar with the underlying mechanisms of malloc, and some even consider it a system call or a keyword provided by the operating system. In reality, malloc is just a standard library function in C, and its basic implementation isn't complicated. Any programmer with a basic understanding of C and operating systems can easily grasp it.
This article explores the inner workings of malloc by implementing a simple version. Although this implementation is not as efficient as the existing C standard library (such as glibc), it is much simpler than real-world implementations, making it easier to understand. What's important is that this implementation follows the same principles as actual implementations.
The article begins by introducing essential concepts such as how the operating system manages process memory and related system calls. It then gradually builds a simple malloc implementation. For simplicity, the discussion will focus on the x86_64 architecture and the Linux operating system.
1. What is malloc
2. Preliminary knowledge
2.1.1 Virtual Memory Address and Physical Memory Address
2.1.2 Page and Address Composition
2.1.3 Memory Pages and Disk Pages
2.2 Linux Process Level Memory Management
2.2.1 Memory Arrangement
2.2.2 Heap Memory Model
2.2.3 brk and sbrk
2.2.4 Resource Limit and rlimit
3. Implementing malloc
3.1 Toy Realization
3.2 Formal Implementation
3.3 Legacy Issues and Optimization
4. Other References
1. What is malloc
Before implementing malloc, we need to define it more formally. According to the standard C library function definition, malloc has the following prototype:
void* malloc(size_t size);
The purpose of this function is to allocate a contiguous block of memory from the system, with the following requirements:
- The allocated memory must be at least the number of bytes specified by the size parameter.
- The return value of malloc is a pointer to the starting address of the allocated memory.
- The addresses allocated by multiple calls to malloc must not overlap unless the previous allocation is released.
- Malloc should complete the allocation and return quickly (it shouldn't use NP-hard algorithms).
- When implementing malloc, you should also implement functions for resizing and freeing memory (i.e., realloc and free).
More information about malloc can be found by typing 'man malloc' on the command line.
2. Preliminary Knowledge
Before implementing malloc, it's necessary to explain some Linux memory-related knowledge.
2.1 Linux Memory Management
2.1.1 Virtual Memory Address and Physical Memory Address
Modern operating systems typically use virtual memory addressing. At the machine language level, virtual memory addresses are used. Each process seems to have access to a large amount of memory, like 2^N bytes, where N is the number of bits in the machine. For example, on a 64-bit CPU and OS, each process has a virtual address space of 2^64 bytes. This virtual address space simplifies program writing and helps the OS manage inter-process memory isolation. However, the actual physical memory used depends on the available RAM.
2.1.2 Page and Address Composition
In modern operating systems, both virtual and physical memory are managed in pages. A page is a fixed-size contiguous block of memory. On Linux, a typical page size is 4096 bytes (4K). Memory addresses can be divided into a page number and an offset. For example, on a 64-bit machine with 4G physical memory and 4K pages, the virtual and physical memory addresses are composed as follows:
The MMU (Memory Management Unit) maps virtual addresses to physical ones using page tables. This mapping is simplified here but reflects the basic principles of modern computers.
2.1.3 Memory Pages and Disk Pages
Memory is often seen as a cache for disks. When the MMU encounters a page not in physical memory, a page fault occurs, and the system loads the corresponding disk page into memory before re-executing the instruction. This part is transparent to malloc and won't be discussed in detail. For more information, refer to "In-depth Understanding of Computer Systems."
Finally, here is a process diagram showing real address translation, including TLB and page fault exceptions (image source: Wikipedia).
2.2 Linux Process Level Memory Management
2.2.1 Memory Arrangement
Understanding the relationship between virtual and physical memory and their mapping mechanisms, let's look at how memory is arranged within a process.
On a 64-bit Linux system, theoretically, the address space is 0x0000000000000000 to 0xFFFFFFFFFFFFFFFF, which is a very large range. However, Linux uses only a small portion (256T). The lower 47 bits and upper 17 bits are used, meaning the actual address space is 0x0000000000000000 to 0x00007FFFFFFFFFFF and 0xFFFF800000000000 to 0xFFFFFFFFFFFFFFFF, divided into user and kernel spaces.
User space includes code, data, BSS, heap, mapping area, and stack. The heap is the main focus of our discussion. The brk system call is used to allocate memory from the heap.
2.2.2 Heap Memory Model
Malloc typically allocates memory from the heap. The heap's virtual address space is mapped to physical memory as needed. Linux maintains a break pointer that points to an address in the heap. The address space from the heap start to the break is mapped and accessible, while the area beyond the break is unmapped and inaccessible.
2.2.3 brk and sbrk
To increase the heap size, the break pointer must be moved. Linux provides the brk and sbrk system calls for this. Their prototypes are:
int brk(void *addr);
void *sbrk(intptr_t increment);
Brk sets the break directly to an address, while sbrk moves it by a given increment. If the increment is zero, it returns the current break address.
2.2.4 Resource Limit and rlimit
Each process has a resource limit (rlimit) indicating the maximum memory it can use. You can get this using getrlimit. For example:
int main() {
struct rlimit *limit = (struct rlimit *)malloc(sizeof(struct rlimit));
getrlimit(RLIMIT_AS, limit);
printf("soft limit: %ld, hard limit: %ld", limit->rlim_cur, limit->rlim_max);
}
The rlimit structure has soft and hard limits, with the hard limit being the upper bound for the soft limit. Non-privileged processes can only set the soft limit.
3. Implementing malloc
3.1 Toy Realization
Before discussing the implementation of malloc, we can use the above knowledge to create a simple toy malloc. Here's an example:
/* A toy malloc */
#include
#include
void *malloc(size_t size) {
void *p;
p = sbrk(0);
if (sbrk(size) == (void *)-1)
return NULL;
return p;
}
This simple malloc increases the break by the size requested and returns the previous break address. It lacks record-keeping and is not suitable for real use.
3.2 Formal Implementation
Now we discuss a more serious implementation of malloc.
3.2.1 Data Structure
We need to define a data structure for managing heap memory. A block consists of a meta area and a data area. The meta area records the block's size, whether it's free, and a pointer to the next block. The data area is the actual allocated memory, and the first byte address is returned by malloc.
Here's a sample block structure:
typedef struct s_block *t_block;
struct s_block {
size_t size;
t_block next;
int free;
int padding;
char data[1];
};
For 64-bit machines, we add padding to ensure alignment. The schematic shows how the block is structured.
3.2.2 Finding the Right Block
To find the right block in the linked list, we can use either first fit or best fit algorithms. First fit is faster, so we'll use that.
/* First fit */
t_block find_block(t_block *last, size_t size) {
t_block b = first_block;
while (b && !(b->free && b->size >= size)) {
*last = b;
b = b->next;
}
return b;
}
This function finds the first block that meets the size requirement and returns it. If not found, it returns NULL.
3.2.3 Opening New Blocks
If no suitable block exists, we extend the heap by moving the break pointer.
#define BLOCK_SIZE 24
t_block extend_heap(t_block last, size_t s) {
t_block b;
b = sbrk(0);
if (sbrk(BLOCK_SIZE + s) == (void *)-1)
return NULL;
b->size = s;
b->next = NULL;
if (last)
last->next = b;
b->free = 0;
return b;
}
3.2.4 Splitting Blocks
First fit can lead to wasted space, so we split blocks if there's enough remaining space.
void split_block(t_block b, size_t s) {
t_block new;
new = b->data + s;
new->size = b->size - s - BLOCK_SIZE;
new->next = b->next;
new->free = 1;
b->size = s;
b->next = new;
}
3.2.5 Implementing malloc
With these functions, we can implement a simple malloc. We need to define the first_block and ensure proper alignment.
size_t align8(size_t s) {
if (s & 0x7 == 0)
return s;
return ((s >> 3) + 1) << 3;
}
#define BLOCK_SIZE 24
void *first_block = NULL;
void *malloc(size_t size) {
t_block b, last;
size_t s;
s = align8(size);
if (first_block) {
last = first_block;
b = find_block(&last, s);
if (b) {
if ((b->size - s) >= (BLOCK_SIZE + 8))
split_block(b, s);
b->free = 0;
} else {
b = extend_heap(last, s);
if (!b)
return NULL;
}
} else {
b = extend_heap(NULL, s);
if (!b)
return NULL;
first_block = b;
}
return b->data;
}
3.2.6 Implementing calloc
calloc involves allocating memory and initializing it to zero. We can do this efficiently by setting every 8 bytes to zero.
void *calloc(size_t number, size_t size) {
size_t *new;
size_t s8, i;
new = malloc(number * size);
if (new) {
s8 = align8(number * size) >> 3;
for (i = 0; i < s8; i++)
new[i] = 0;
}
return new;
}
3.2.7 Implementing free
Free requires checking the validity of the input address and merging adjacent free blocks to reduce fragmentation. We add a magic pointer to the block structure to verify the address.
typedef struct s_block *t_block;
struct s_block {
size_t size;
t_block next;
int free;
int padding;
void *ptr;
char data[1];
};
t_block get_block(void *p) {
char *tmp;
tmp = p;
return (p = tmp -= BLOCK_SIZE);
}
int valid_addr(void *p) {
if (first_block) {
if (p > first_block && p < sbrk(0))
return p == (get_block(p))->ptr;
}
return 0;
}
When blocks are freed, they may become fragmented. To mitigate this, we merge adjacent free blocks. This requires changing the block structure to a doubly linked list.
struct s_block {
size_t size;
t_block prev;
t_block next;
int free;
int padding;
void *ptr;
char data[1];
};
The fusion method merges adjacent free blocks:
t_block fusion(t_block b) {
if (b->next && b->next->free) {
b->size += BLOCK_SIZE + b->next->size;
b->next = b->next->next;
if (b->next)
b->next->prev = b;
}
return b;
}
The free function marks a block as free and merges adjacent blocks if possible. If the block is the last one, the break pointer is rolled back to release memory.
void free(void *p) {
t_block b;
if (valid_addr(p)) {
b = get_block(p);
b->free = 1;
if (b->prev && b->prev->free)
b = fusion(b->prev);
if (b->next)
fusion(b);
else {
if (b->prev)
b->prev->prev = NULL;
else
first_block = NULL;
brk(b);
}
}
}
3.2.8 Implementing realloc
Realloc requires copying data and possibly reallocating memory. We first implement a memory copy function that copies in 8-byte units:
void copy_block(t_block src, t_block dst) {
size_t *sdata, *ddata;
size_t i;
sdata = src->ptr;
ddata = dst->ptr;
for (i = 0; (i * 8) < src->size && (i * 8) < dst->size; i++)
ddata[i] = sdata[i];
}
Then we implement realloc, considering cases where the current block is large enough, needs to be split, or merged with subsequent blocks:
void *realloc(void *p, size_t size) {
size_t s;
t_block b, new;
void *newp;
if (!p)
return malloc(size);
if (valid_addr(p)) {
s = align8(size);
b = get_block(p);
if (b->size >= s) {
if (b->size - s >= (BLOCK_SIZE + 8))
split_block(b, s);
} else {
if (b->next && b->next->free && (b->size + BLOCK_SIZE + b->next->size) >= s) {
fusion(b);
if (b->size - s >= (BLOCK_SIZE + 8))
split_block(b, s);
} else {
newp = malloc(s);
if (!newp)
return NULL;
new = get_block(newp);
copy_block(b, new);
free(p);
return newp;
}
}
}
return p;
}
3.3 Legacy Issues and Optimization
The above implementation is a simple, functional version of malloc. There are many optimization opportunities, such as:
- Supporting both 32-bit and 64-bit systems
- Using mmap for large allocations instead of sbrk
- Maintaining multiple linked lists based on block sizes to reduce fragmentation
- Storing only free blocks in the linked list to improve efficiency
These optimizations are not detailed here. For further study, refer to the listed resources.
4. Other References
This article references several sources, including "A Malloc Tutorial," "Computer Systems: A Programmer's Perspective," and others. For real-world implementations, you can study the glibc source code.
Indoor Digital Signage,Display Advertising Machine,Led Digital Display Board,Indoor Digital Sign Board
Shanghai Really Technology Co.,Ltd , https://www.really-led.com