ArchitectureApr 12, 20269 min read

The fork() primitive: how 80 ms copy-on-write VM snapshots unlock tree-of-thought for AI agents

Multi-attempt code synthesis and tree-of-thought agent patterns need one primitive nobody else ships: mid-flight sandbox fork. Here's what fork() is, why it's 80 ms, and what it enables.

Robel TegegnePodflare, founder

AI-agent patterns that want to branch execution — tree-of-thought, self-consistency voting, multi-attempt code synthesis — all hit the same wall: spawning N child executions, each starting from the exact state the parent is in mid-flight, is expensive.

On Docker: docker commit takes seconds. On serverless: cold-start N containers takes longer. On every other cloud sandbox we’ve looked at: not exposed at all.

Podflare ships fork(n) as a first-class primitive. It takes ~80 ms server-side for n=5. This post is about how it works and why it’s the interesting primitive for the next generation of agent architectures.

What fork() does

with Sandbox() as parent:
    parent.run_code("""
        import pandas as pd
        df = pd.read_parquet('/data/big.parquet')   # 2 GB, 30s
        model = train_model(df)                     # 5 min
    """)

    # One call, 5 children inheriting the full parent state
    children = parent.fork(n=5)                     # ~80 ms server-side

    for child, strategy in zip(children, strategies):
        child.run_code(strategy.code)               # parallel

    winner = pick_best(children)
    parent.merge_into(winner)                       # commit back
    for c in children:
        if c is not winner:
            c.close()

Every child starts with:

  • The parent’s full Python process state — all imports, all globals, df and model already in memory
  • The parent’s filesystem (copy-on-write)
  • Open network sockets cloned (they’ll need to be re-established on each child, but the TCP state is preserved for the short window before first write)
  • A fresh identity — separate IP, separate sandbox ID, separate lifecycle

Writes on any child are isolated. Child A’sdf.sort_values() doesn’t affect child B’s view of df. The parent stays running the whole time, unchanged, until you explicitly parent.merge_into(winner).

How it’s 80 ms

The server-side breakdown for n=5:

PhaseTime
Pause the running parent VM1 ms
Capture dirty-page diff snapshot4–5 ms
Resume the parent1 ms
Prepare child memory pages (shared CoW)~1 ms
Merge the diff onto the shared base2–3 ms
Spawn N children in parallel12–17 ms
Rootfs reflink clones (metadata-only, N x)~1 ms total
Total, server-side~80 ms

Two tricks carry most of the weight:

Diff snapshots, not full snapshots

A full microVM snapshot of a 1 GB VM is a 1 GB write. For a 2 GB VM, 2 GB. That’s seconds, not milliseconds. Our fork never writes a full snapshot — we capture only the pages the parent has dirtied since the VM first booted from its warm-pool template. On a fresh sandbox that’s usually 20–50 MB of dirty pages, captured in 4–5 ms.

Reflink rootfs clones

On a default filesystem, cloning a 4 GB rootfs is 4 GB of copies. On the filesystems we use (xfs reflink / ZFS CoW), cloning is a metadata-only operation — maybe 100 KB of inode writes, regardless of rootfs size. Sub-millisecond per child, parallelized across N. That’s the difference between a 3-second fork and an 80 ms fork.

The full architecture doc has the exact implementation — how we manage the memory-page CoW, how we tie children back to the parent formerge_into, how we handle the network namespace plumbing.

What fork() unlocks

1. Tree-of-thought, cheaply

Classic ToT: generate N candidate continuations at each reasoning step, score them, continue with the best. The scoring often requires actually running each candidate’s code and observing the output.

Without fork, running 5 candidates means either running them sequentially (5x latency) or spinning up 5 fresh sandboxes (5x setup cost, no shared context). With fork, you run them truly in parallel from identical state. The latency is max(t_1, t_2, ..., t_n) instead of sum.

2. Multi-attempt code synthesis

The model writes 5 different approaches to a problem. Fork, run each in a child, keep whichever compiles + passes the test suite, destroy the rest. The expensive setup (loaded dataset, imported libraries) is paid once on the parent.

3. Self-consistency voting

The model generates the same output 5 different ways from the same prompt (different seeds / temperatures). You run all 5 in parallel, take the majority answer. Most papers that report big self-consistency gains assume the execution step is free; fork makes it free in practice.

4. Time-travel debugging for agents

The agent hits an error on turn 7. You fork the sandbox at turn 6, let the agent try 5 different fixes in parallel children, pick the one that works. The main sandbox never sees the failed experiments. This is only practical with fast fork.

5. A/B testing agent strategies in production

You’re running a long-lived persistent agent and want to try a new prompt or tool set without risking the main session’s state. Fork, run the experiment in the child, keep the main sandbox untouched.

merge_into: committing a child back to the parent

The inverse of fork. Once the winning child is chosen, parent.merge_into(winner) takes the winner’s full state (memory + filesystem diff) and applies it to the parent, atomically. ~50 ms server-side.

It’s the "undo the fork, but with the winner’s changes" operation. After this, the parent "is" the winner, and you destroy the other children.

What fork() costs you

Being honest:

  • Memory. Each child starts with ~40 MB of its own RSS for the initial page table + control-plane state, plus any pages it dirties during its run. Five short-lived children that each dirty 10 MB burn ~250 MB of host RAM. For most workloads that’s fine; for massive fan-outs you pay attention.
  • Network identity. Each child gets its own DHCP-leased IP. TCP connections on the parent don’t migrate to children — if the child needs to make an outbound request, it opens a new socket. Almost always fine, but be aware.
  • Disk diff retention. A child’s writes to its rootfs live in a per-child overlay file until the child is destroyed. Large writes in many children = more disk. Cleaned on destroy.

What nobody else ships

We’ve surveyed every major cloud sandbox platform (E2B, Daytona, Blaxel, Modal sandboxes) as of April 2026. None of them expose a mid-flight fork primitive. The closest you get elsewhere is "snapshot + restore N times" which is measured in seconds and doesn’t preserve dirty-page diffs efficiently.

We ship fork because the agent patterns that want it are increasingly the patterns that define good AI-agent products. Self-consistency, tree-of-thought, multi-attempt synthesis — they all want fork, and the overhead of doing it without fork is usually what kills the idea at scale.

Try it

pip install podflare
export PODFLARE_API_KEY=pf_live_...

python -c "
from podflare import Sandbox

with Sandbox() as parent:
    parent.run_code('import numpy as np; a = np.ones((10000, 10000))')
    children = parent.fork(n=3)
    for i, c in enumerate(children):
        out = c.run_code(f'print(a.sum() + {i})')
        print(f'child {i}:', out.stdout.strip())
        c.close()
"

Free Podflare account — $200 starter credit. Fork is available on every tier. Full docs at docs.podflare.ai/concepts/fork.

Related reading

#fork primitive#tree of thought#multi attempt code synthesis#copy on write snapshot#pod fork#ai agent branching#diff snapshot

Keep reading

Ship an AI agent on Podflare in under a minute.

Hardware-isolated microVM per sandbox, ~190 ms round-trip, 80 ms fork(), full Python REPL persistence. Free tier includes $200 credit.

Get started free
The fork() primitive: how 80 ms copy-on-write VM snapshots unlock tree-of-thought for AI agents — Podflare