Trees with Arenas

A tree is a family photo binder. Each page has one photo on it — a person, a value, a thing. On the photo someone has scribbled two numbers in pencil. The first number is the page where you find their left child. The second number is the page where you find their right child. The pages live in numbered slots inside the binder. The binder owns every page. The photos own nothing — they only point at slot numbers. That is the whole trick of building a tree with an arena.

A 7-node binary search tree drawn the way you would sketch it on paper.

A tree the way you draw it on paper has a root at the top and two limbs underneath, each limb branching into two more, all the way down. The picture is clean. The code is not. The obvious Rust translation says "a node owns its two children," which spells out as struct Node { left: Box<Node>, right: Box<Node> }. That works fine until you try to do anything interesting with it. Add a parent pointer back up the tree and now the parent owns the child and the child owns the parent and the compiler refuses to compile a single line. Add a second branch that points at an existing subtree and the compiler refuses again, because two parents cannot both own the same child. Reach for Rc<RefCell<Node>> and the syntax explodes — every walk becomes .borrow(), every edit becomes .borrow_mut(), and the runtime starts panicking if two borrows overlap. Niko Matsakis, who has been writing the Rust borrow checker since 2014, ran into this same wall while building rustc itself. The compiler is one giant tree of syntax nodes that point at each other, often in cycles. Wrapping each node in a smart pointer would have made rustc both slow and unreadable. So the team did what compiler writers have done since the 1980s — they put every node in one flat array and replaced every pointer with an index into that array.

struct Node {
    value: i32,
    left: Option<usize>,
    right: Option<usize>,
}

struct Arena {
    nodes: Vec<Node>,
}

impl Arena {
    fn new() -> Self {
        Arena { nodes: Vec::new() }
    }

    fn add(&mut self, value: i32, left: Option<usize>, right: Option<usize>) -> usize {
        let id = self.nodes.len();
        self.nodes.push(Node { value, left, right });
        id
    }
}

A Node holds a value and two Option<usize> slots. The Option is the way Rust spells "maybe nothing" — Some(7) means the left child lives in slot 7, None means there is no left child and the trail ends here. The usize is the slot number itself. An Arena is a Vec<Node> and one method. add takes a value and its two child slots, pushes a Node onto the back of the Vec, and returns the slot number it landed in. The slot number is just self.nodes.len() measured before the push, which is also the index where the new node now sits. The returned usize is the page number you scribble on someone else's photo when you want them to point here.

The same tree laid flat as a Vec of nodes, with arrows showing how each node points at its children by slot index.

Building the tree means filling pages from the bottom up, because a parent needs to know its children's slot numbers before it can be written. The four leaf photos go in first — they have no children, so both their slots stay None. Then the two middle photos, each scribbled with the slot numbers of the leaves it sits above. Finally the root photo, scribbled with the slot numbers of the two middle photos.

fn build_tree(arena: &mut Arena) -> usize {
    let n3 = arena.add(3, None, None);
    let n7 = arena.add(7, None, None);
    let n12 = arena.add(12, None, None);
    let n20 = arena.add(20, None, None);
    let n5 = arena.add(5, Some(n3), Some(n7));
    let n15 = arena.add(15, Some(n12), Some(n20));
    arena.add(10, Some(n5), Some(n15))
}

The tree we just built looks like this on paper. 10 at the top. 5 and 15 in the middle. 3, 7, 12, 20 across the bottom. Inside the binder it looks nothing like that. The binder has 7 pages. Page 0 holds 3. Page 1 holds 7. Pages 2 and 3 hold 12 and 20. Page 4 holds 5 and points down at pages 0 and 1. Page 5 holds 15 and points down at pages 2 and 3. Page 6 holds 10 and points down at pages 4 and 5. The shape lives only in the scribbled numbers. The pages themselves sit in a flat line.

Walking the tree means starting at the root and following the scribbled numbers. There are three classic orders to do this in. In-order says "walk my left subtree, print me, walk my right subtree" — which for a search tree spits out every value in sorted order. Pre-order says "print me first, then walk both subtrees" — which gives you a recipe to rebuild the tree later. Post-order says "walk both subtrees first, then print me" — which is what you want when you have to free or compute things bottom-up. All three are the same five lines with the out.push moved to a different spot.

fn in_order(arena: &Arena, id: usize, out: &mut Vec<i32>) {
    let node = &arena.nodes[id];
    if let Some(l) = node.left {
        in_order(arena, l, out);
    }
    out.push(node.value);
    if let Some(r) = node.right {
        in_order(arena, r, out);
    }
}

fn pre_order(arena: &Arena, id: usize, out: &mut Vec<i32>) {
    let node = &arena.nodes[id];
    out.push(node.value);
    if let Some(l) = node.left {
        pre_order(arena, l, out);
    }
    if let Some(r) = node.right {
        pre_order(arena, r, out);
    }
}

fn post_order(arena: &Arena, id: usize, out: &mut Vec<i32>) {
    let node = &arena.nodes[id];
    if let Some(l) = node.left {
        post_order(arena, l, out);
    }
    if let Some(r) = node.right {
        post_order(arena, r, out);
    }
    out.push(node.value);
}

Each walk takes an &Arena, a slot number, and a buffer to append into. It looks up the node by indexing the Vec — &arena.nodes[id] — peeks at its left and right slots, and recurses into whichever ones are Some. The borrow checker barely shows up here. There is one shared borrow of the whole arena during the walk, no RefCell, no Rc, no lifetime annotations to figure out. The recursion threads id numbers through the call stack instead of &Node references, and usize is Copy, so the compiler never asks who owns what. That silence is the whole reason this pattern exists.

Run the program. It prints the binder first so you can see the seven pages with their scribbled slot numbers, then walks the tree in all three orders.

-- arena layout --
index | value | left | right
  0   |    3  |   .  |   .
  1   |    7  |   .  |   .
  2   |   12  |   .  |   .
  3   |   20  |   .  |   .
  4   |    5  |   0  |   1
  5   |   15  |   2  |   3
  6   |   10  |   4  |   5 <-- root

in-order  : 3 -> 5 -> 7 -> 10 -> 12 -> 15 -> 20
pre-order : 10 -> 5 -> 3 -> 7 -> 15 -> 12 -> 20
post-order: 3 -> 7 -> 5 -> 12 -> 20 -> 15 -> 10

fn print_arena(arena: &Arena, root: usize) {
    println!("index | value | left | right");
    for (i, node) in arena.nodes.iter().enumerate() {
        let marker = if i == root { " <-- root" } else { "" };
        let left = match node.left {
            Some(l) => format!("{l}"),
            None => String::from("."),
        };
        let right = match node.right {
            Some(r) => format!("{r}"),
            None => String::from("."),
        };
        println!("  {i}   |   {:>2}  |   {left}  |   {right}{marker}", node.value);
    }
}

fn print_walk(label: &str, values: &[i32]) {
    print!("{label}: ");
    for (i, v) in values.iter().enumerate() {
        if i > 0 {
            print!(" -> ");
        }
        print!("{v}");
    }
    println!();
}

Read the arena table top to bottom. Index 0 through 3 are the leaves with no children — both their slots show a dot. Index 4 is the node holding 5, with children at slots 0 and 1, which are 3 and 7. Index 6 is the root, holding 10, with children at slots 4 and 5, which are the subtrees rooted at 5 and 15. The arrow on the right marks which slot the walks start from. Now look at the in-order walk: 3, 5, 7, 10, 12, 15, 20. The values came out sorted. That is the defining property of a binary search tree, and you got it for free because the walk visits left, then self, then right at every node. Pre-order starts at the root (10) and dives left before right (10, 5, 3, 7, 15, 12, 20), which is the order you would speak the tree out loud if you were dictating it to someone over the phone. Post-order leaves the root for last (3, 7, 5, 12, 20, 15, 10), which is the order you would use to compute the height of every subtree before you ever needed the root's height.

Side by side: a Box-based tree that fights the borrow checker versus an arena tree where every link is a bare usize.

A question to sit with. Why is this so much easier than Box<Node> if both versions end up with the same tree shape? Because Box says "I own this child outright, and you cannot have it." Indices say nothing at all — they are bare numbers, and bare numbers do not trip the borrow checker. The arena owns every node. The nodes own nothing. If you ever want a second parent pointing at the same child (a directed acyclic graph rather than a pure tree), you copy the usize. Done. If you ever want a child pointing back at its parent (a tree with parent pointers, which Box cannot do without cycles), you store the parent's usize on the child. Done. The cost is one extra indirection through the Vec on every access, which on modern CPUs is faster than chasing a pointer to a separate heap allocation because the whole Vec sits in one cache-friendly block of memory. rustc, the rust-analyzer language server, the id-arena crate, the slotmap crate, and every serious Rust graph library use this pattern for exactly this reason.

The arena trades one weakness for another. You cannot free a single node — removing slot 4 from the middle of the Vec would shift every later slot by one and break every index pointing past it. Real arenas handle this with generational indices or a free-list of deleted slots, which is exactly what slotmap ships. The next lesson takes the arena idea past trees into the messier shape where nodes can have any number of neighbors and cycles are fair game — graphs.