Describe the mechanism behind malloc by implementing a simple malloc

Anyone who has used or learned C is no stranger to malloc. Everyone knows that malloc can allocate a contiguous memory space and can be freed by free when it is no longer needed. However, many programmers are not familiar with the underlying mechanisms of malloc, and some even consider it as a system call or a keyword provided by the operating system. In reality, malloc is just a standard library function in C, and its basic implementation is not complicated. Any programmer with a basic understanding of C and the operating system can easily grasp it.

This article explores the inner workings of malloc by implementing a simple version of it. Although this implementation is less efficient than existing ones like glibc, it is much simpler and easier to understand. What's important is that it follows the same principles as real implementations.

The article starts by introducing essential knowledge about the operating system's memory management and related system calls. Then, it gradually builds a simple malloc. For simplicity, the focus is on the x86_64 architecture running Linux.

1. What is malloc

2. Preliminary Knowledge

2.1.1 Virtual Memory Address and Physical Memory Address

2.1.2 Page and Address Composition

2.1.3 Memory Pages and Disk Pages

2.2 Linux Process Level Memory Management

2.2.1 Memory Arrangement

2.2.2 Heap Memory Model

2.2.3 brk and sbrk

2.2.4 Resource Limit and rlimit

3. Implementing Malloc

3.1 Toy Implementation

3.2 Formal Implementation

3.3 Legacy Issues and Optimization

4. Other References

1. What is malloc

Before implementing malloc, it’s necessary to define it properly. According to the standard C library, the prototype of malloc is:

void* malloc(size_t size);

The function must allocate a contiguous block of memory in the system, meeting the following requirements:

- The allocated memory must be at least the number of bytes specified by the size parameter.

- The return value is a pointer to the starting address of the allocated memory.

- The addresses allocated by multiple calls to malloc must not overlap unless they have been freed.

- Malloc should complete the allocation quickly (it should not use NP-hard algorithms).

- The implementation must include realloc and free functions.

More details about malloc can be found by typing 'man malloc' on the command line.

2. Preliminary Knowledge

Before implementing malloc, it's essential to understand how Linux manages memory.

2.1 Linux Memory Management

2.1.1 Virtual Memory Address and Physical Memory Address

Modern operating systems typically use virtual memory addressing. Each process seems to have access to a large amount of memory, but in reality, it depends on the physical memory available. The MMU (Memory Management Unit) translates virtual addresses into physical addresses.

2.1.2 Page and Address Composition

Memory is managed in pages rather than individual bytes. A typical page size in Linux is 4096 bytes. Addresses are divided into page numbers and offsets. The MMU maps these pages using a page table.

2.1.3 Memory Pages and Disk Pages

Memory acts as a cache for disk storage. When a page is not in physical memory, a page fault occurs, and the system loads the corresponding disk page into memory.

2.2 Linux Process Level Memory Management

2.2.1 Memory Arrangement

Understanding the relationship between virtual and physical memory helps explain how processes manage their memory. On a 64-bit Linux system, the user space is divided into sections such as code, data, BSS, heap, mapping area, and stack.

2.2.2 Heap Memory Model

Malloc primarily allocates memory from the heap. Linux maintains a break pointer that indicates the end of the heap. This pointer can be moved using brk and sbrk system calls.

2.2.3 brk and sbrk

These system calls adjust the break pointer to increase or decrease the heap size. They are crucial for managing dynamic memory allocation.

2.2.4 Resource Limit and rlimit

Each process has limits on the resources it can use. These limits can be retrieved and adjusted using getrlimit and setrlimit system calls.

3. Implementing Malloc

3.1 Toy Implementation

A simple toy implementation of malloc can be written using sbrk to move the break pointer. However, this implementation lacks features like memory tracking and cannot handle freeing memory effectively.

3.2 Formal Implementation

To create a more robust implementation, we need to use a linked list of blocks, each containing metadata and the actual data. This allows us to track allocated and free blocks efficiently.

3.2.1 Data Structure

We define a structure for each block, including size, next pointer, free flag, padding, and a magic pointer to ensure valid addresses. This structure helps manage memory allocation and deallocation.

3.2.2 Finding the Right Block

To find a suitable block, we use a first-fit algorithm. This involves scanning the list of blocks until one that meets the size requirement is found.

3.2.3 Opening New Blocks

If no suitable block is found, we extend the heap by moving the break pointer forward using sbrk. This creates a new block that can be added to the list.

3.2.4 Splitting Blocks

When a block is larger than needed, we split it into two parts. This reduces fragmentation and improves memory utilization.

3.2.5 Malloc Implementation

Combining all the elements, we implement a basic malloc function that allocates memory, splits blocks when necessary, and tracks allocated and free blocks.

3.2.6 Calloc Implementation

Calloc is implemented by calling malloc and then zeroing out the allocated memory. This ensures that the memory is initialized to zero.

3.2.7 Free Implementation

Freeing memory involves marking a block as free and merging it with adjacent free blocks if possible. This helps reduce fragmentation and improve memory efficiency.

3.2.8 Realloc Implementation

Realloc adjusts the size of an existing allocation. It may involve splitting a block, merging with adjacent blocks, or allocating new memory if necessary.

3.3 Legacy Issues and Optimization

The current implementation is simple but lacks several optimizations. Future improvements could include support for 32-bit and 64-bit systems, using mmap for large allocations, maintaining multiple lists based on block sizes, and optimizing the search for free blocks.

4. Other References

This article draws heavily from "A Malloc Tutorial" and other resources like "Computer Systems: A Programmer's Perspective." For deeper insights, readers are encouraged to explore the Linux kernel's memory management and real-world implementations like glibc.

LED Interactive Whiteboard

LED Interactive Whiteboard,Smart Touch Screen Tv for Classroom,Interactive Tv Screens for Schools,Touch Screen Teaching Board

Shanghai Really Technology Co.,Ltd , https://www.really-led.com