Understanding the stack


Introduction

The stack is the first data structure that you encounter, often without even realizing it. You can envision it as a chain of data placed successively, representing previously called functions and their local variables within the program.

In this article, I would like to give you a semi-deep analysis of its internal structure and how function calls modify its content.

I will use the C language, ASCII art, and a simplified version of the (32-bit) x86 assembly language to illustrate various concepts. Below, you can find a quick summary of how assembly works.

Registers

To better understand the stack’s structure, we first have to understand the registers that are used to implement it.

The registers are the type of memory that is invisible for C programmers, but they are worth talking about. I will use the x86 architecture naming convention, but the same concepts are present in all assembly languages:

  • General-Purpose (EAX, EBX, ECX…): You can use these registers to hold any arbitrary value
  • Base Pointer (EBP): This holds the address of the current function’s stack frame
  • Stack Pointer (ESP): This register keeps track of where the stack ends
  • Program or Instruction Counter (PC): This register points to the address of the instruction to be executed next

Of course, this list goes on with Status or Flag registers, Control registers, Vector registers, etc…

Assembly commands

Throughout the article, you’ll find some assembly code that is important to understand. In assembly, the commands are referred to as mnemonics, and their inputs are called operands. For the examples of this article, I will use the following mnemonics:

  • push: Save the value of the first operand after the Stack Pointer and move the Stack Pointer forward
  • pop: Copy the value before the Stack Pointer into the first operand and move the Stack Pointer back
  • call: Invoke a subroutine (function)
  • add: Add the second operand’s value to the first operand
  • mov: Save the second operand’s value into the first operand

To illustrate its use, the following example will add 1 to the EAX register:

;  mnemonic  first operand  second operand
   add       eax,           1

Stack

When you call a function, the arguments and the current point of execution (called the return address) are placed on the stack. When the program returns from a function, the return address is removed from the stack and placed in the program counter, after which the arguments are removed from the stack.

This chain of memory has two main commands: push (which places a value after the stack pointer, and moves it forward) and pop (which removes the value before the stack pointer and moves it backwards). As these commands focus on the end of the stack, it is considered a First In Last Out (FILO) data structure. The first item you place on the stack will be the last item you receive when you start rolling back the stack.

Calling a function

To showcase how it works, let’s first take a look at how a function is called:

int c = add(1, 2);

Based on these instructions, the compiler could generate the following 32-bit assembly code:

01   push 2
02   push 1
03   call add
04   add esp, 4
05   add esp, 4
06   push eax

The instructions above can be interpreted as follows:

  1. Add the second 32-bit argument to the stack and move the stack pointer (ESP) forward by 4 bytes.
  2. Add the first 32-bit argument to the stack and move the ESP forward by 4 bytes.
  3. Call the add function by executing these steps automatically:
    1. Add the statement address after the Program Counter (PC) to the stack (this becomes the return address).
    2. Move the ESP forward by 4 bytes.
    3. Change the PC to point to the add function.
  4. Remove (pop) the first argument from the stack by moving ESP back by 4 bytes.
  5. Remove (pop) the second argument from the stack by moving ESP back by 4 bytes.
  6. Put the return value on the stack.

Several key points are noteable in these steps:

Instructions 01 and 02 show that the arguments are referenced from left to right. This convention makes it easier to reference the arguments inside the “add” function, with the first argument being closer to the base pointer (EBP). The interface between the caller and the function is defined by how the stack is populated with parameters.

Between steps 03 and 04, the “add” function is executed, and we’ll discuss how the program counter changes. There are different conventions how a function handles return values. For now, assume that, at the end of the call command, almost all registers remain the same as before calling add. The only register that changes is the first general-purpose register called EAX, which holds the return value.

In lines 04 and 05, you can observe that 4 bytes are added to the ESP to deallocate the function arguments. It is used to clean the stack from the arguments. In these instructions you can see that the stack, by convention, grows from higher addresses in memory to lower addresses.

To illustrate, this is how the stack and the registers look when we call the add function:

            0xffffffff
               ...
+--------------------------------+
|        Rest of the Stack       |
+--------------------------------+ <---- Base Pointer (EBP)
|               2                | # Second argument
|               1                | # First argument
|    Address of Instruction 04   | # Return address
+--------------------------------+ <---- Stack Pointer (ESP)
                ...
            0x00000000

The Program Counter (PC) is set to the beginning of the "add" function.

The memory between the EBP and the top of the stack, (where ESP points) is called the stack frame. It is representing the context of the currently executed function.

Executing a function

Let’s take a closer look at what happens in the add function:

01 int add(int a, int b) {
02   return a + b;
03 }

Based on this, the compiler could generate similar 32-bit assembly code:

01 add:
02    push ebp
03    mov ebp, esp
04    mov eax, [ebp+8]
05    mov ebx, [ebp+12]
06    add eax, ebx
07    pop ebp
08    ret

While this assembly code may appear more complex than the C code for the “add” function, a line-by-line analysis will clarify the process:

  1. The add function can be called from this address.
  2. Save the previous stack frame’s address by pushing the Base Pointer (EBP) on the stack and move the stack pointer (ESP) forward by 4 bytes.
  3. Save the current value of ESP into EBP, creating a new stack frame.
  4. Save the first argument into the first general-purpose register (EAX).
  5. Save the second argument into the second general-purpose register (EBX).
  6. Add EAX and EBX and store the result in EAX.
  7. Copy the previous value of the Base Pointer from the stack back into the Base Pointer and move the ESP back by 4 bytes.
  8. Return to the caller function:
    1. Copy the return address from the stack into the Program Counter.
    2. Move the Stack pointer back by 4 bytes.

You might wonder what [ebp+8] and [ebp+12] represent and why they correspond to the first and second arguments, respectively. In assembly, square brackets represent the value referenced by the address inside the brackets.

Remember that we placed the arguments on the stack from right to left when calling the add function. There’s also the return address and the address of the previous stack frame added to the stack.

At line 04, this is how the stack looks:

            0xffffffff
               ...
+--------------------------------+
|        Rest of the Stack       |
+--------------------------------+ <---- Previous Base Pointer
|        Second Argument         | # EBP + 12   --\
|         First Argument         | # EBP + 8       \  Previous Stack Frame
|         Return Address         | # EBP + 4       / ----------------------
|       Previous Base Pointer    | # EBP + 0    --/
+--------------------------------+ <---- ESP & EBP
                ...
            0x00000000

By evaluating [ebp+8], you can access the first argument on the stack, and similarly, [ebp+12] resolves to the second argument.

Summary

In conclusion, the stack is a fundamental data structure, that tracks function calls and local variables.

You can save values on the stack by calling the push mnemonic, and remove values by the pop mnemonic. These instructions will also modify the position of the Stack Pointer.

Function calls change the stack by pushing the arguments in a left-to-right order, save the Program Counter and then the Base Pointer.

Returning from a function reverses this process by loading the previous Base Pointer, then the Program Counter, and moving the stack pointer to deallocate the arguments.

I hope that this description helped you better understand the inner details of computer programs, and I hope you enjoyed reading it as much as I enjoyed writing it.