AI-agent patterns that want to branch execution — tree-of-thought, self-consistency voting, multi-attempt code synthesis — all hit the same wall: spawning N child executions, each starting from the exact state the parent is in mid-flight, is expensive.
On Docker: docker commit takes seconds. On serverless: cold-start N containers takes longer. On every other cloud sandbox we’ve looked at: not exposed at all.
Podflare ships fork(n) as a first-class primitive. It takes ~80 ms server-side for n=5. This post is about how it works and why it’s the interesting primitive for the next generation of agent architectures.
What fork() does
with Sandbox() as parent:
parent.run_code("""
import pandas as pd
df = pd.read_parquet('/data/big.parquet') # 2 GB, 30s
model = train_model(df) # 5 min
""")
# One call, 5 children inheriting the full parent state
children = parent.fork(n=5) # ~80 ms server-side
for child, strategy in zip(children, strategies):
child.run_code(strategy.code) # parallel
winner = pick_best(children)
parent.merge_into(winner) # commit back
for c in children:
if c is not winner:
c.close()Every child starts with:
- The parent’s full Python process state — all imports, all globals,
dfandmodelalready in memory - The parent’s filesystem (copy-on-write)
- Open network sockets cloned (they’ll need to be re-established on each child, but the TCP state is preserved for the short window before first write)
- A fresh identity — separate IP, separate sandbox ID, separate lifecycle
Writes on any child are isolated. Child A’sdf.sort_values() doesn’t affect child B’s view of df. The parent stays running the whole time, unchanged, until you explicitly parent.merge_into(winner).
How it’s 80 ms
The server-side breakdown for n=5:
| Phase | Time |
|---|---|
| Pause the running parent VM | 1 ms |
| Capture dirty-page diff snapshot | 4–5 ms |
| Resume the parent | 1 ms |
| Prepare child memory pages (shared CoW) | ~1 ms |
| Merge the diff onto the shared base | 2–3 ms |
| Spawn N children in parallel | 12–17 ms |
| Rootfs reflink clones (metadata-only, N x) | ~1 ms total |
| Total, server-side | ~80 ms |
Two tricks carry most of the weight:
Diff snapshots, not full snapshots
A full microVM snapshot of a 1 GB VM is a 1 GB write. For a 2 GB VM, 2 GB. That’s seconds, not milliseconds. Our fork never writes a full snapshot — we capture only the pages the parent has dirtied since the VM first booted from its warm-pool template. On a fresh sandbox that’s usually 20–50 MB of dirty pages, captured in 4–5 ms.
Reflink rootfs clones
On a default filesystem, cloning a 4 GB rootfs is 4 GB of copies. On the filesystems we use (xfs reflink / ZFS CoW), cloning is a metadata-only operation — maybe 100 KB of inode writes, regardless of rootfs size. Sub-millisecond per child, parallelized across N. That’s the difference between a 3-second fork and an 80 ms fork.
The full architecture doc has the exact implementation — how we manage the memory-page CoW, how we tie children back to the parent formerge_into, how we handle the network namespace plumbing.
What fork() unlocks
1. Tree-of-thought, cheaply
Classic ToT: generate N candidate continuations at each reasoning step, score them, continue with the best. The scoring often requires actually running each candidate’s code and observing the output.
Without fork, running 5 candidates means either running them sequentially (5x latency) or spinning up 5 fresh sandboxes (5x setup cost, no shared context). With fork, you run them truly in parallel from identical state. The latency is max(t_1, t_2, ..., t_n) instead of sum.
2. Multi-attempt code synthesis
The model writes 5 different approaches to a problem. Fork, run each in a child, keep whichever compiles + passes the test suite, destroy the rest. The expensive setup (loaded dataset, imported libraries) is paid once on the parent.
3. Self-consistency voting
The model generates the same output 5 different ways from the same prompt (different seeds / temperatures). You run all 5 in parallel, take the majority answer. Most papers that report big self-consistency gains assume the execution step is free; fork makes it free in practice.
4. Time-travel debugging for agents
The agent hits an error on turn 7. You fork the sandbox at turn 6, let the agent try 5 different fixes in parallel children, pick the one that works. The main sandbox never sees the failed experiments. This is only practical with fast fork.
5. A/B testing agent strategies in production
You’re running a long-lived persistent agent and want to try a new prompt or tool set without risking the main session’s state. Fork, run the experiment in the child, keep the main sandbox untouched.
merge_into: committing a child back to the parent
The inverse of fork. Once the winning child is chosen, parent.merge_into(winner) takes the winner’s full state (memory + filesystem diff) and applies it to the parent, atomically. ~50 ms server-side.
It’s the "undo the fork, but with the winner’s changes" operation. After this, the parent "is" the winner, and you destroy the other children.
What fork() costs you
Being honest:
- Memory. Each child starts with ~40 MB of its own RSS for the initial page table + control-plane state, plus any pages it dirties during its run. Five short-lived children that each dirty 10 MB burn ~250 MB of host RAM. For most workloads that’s fine; for massive fan-outs you pay attention.
- Network identity. Each child gets its own DHCP-leased IP. TCP connections on the parent don’t migrate to children — if the child needs to make an outbound request, it opens a new socket. Almost always fine, but be aware.
- Disk diff retention. A child’s writes to its rootfs live in a per-child overlay file until the child is destroyed. Large writes in many children = more disk. Cleaned on destroy.
What nobody else ships
We’ve surveyed every major cloud sandbox platform (E2B, Daytona, Blaxel, Modal sandboxes) as of April 2026. None of them expose a mid-flight fork primitive. The closest you get elsewhere is "snapshot + restore N times" which is measured in seconds and doesn’t preserve dirty-page diffs efficiently.
We ship fork because the agent patterns that want it are increasingly the patterns that define good AI-agent products. Self-consistency, tree-of-thought, multi-attempt synthesis — they all want fork, and the overhead of doing it without fork is usually what kills the idea at scale.
Try it
pip install podflare
export PODFLARE_API_KEY=pf_live_...
python -c "
from podflare import Sandbox
with Sandbox() as parent:
parent.run_code('import numpy as np; a = np.ones((10000, 10000))')
children = parent.fork(n=3)
for i, c in enumerate(children):
out = c.run_code(f'print(a.sum() + {i})')
print(f'child {i}:', out.stdout.strip())
c.close()
"Free Podflare account — $200 starter credit. Fork is available on every tier. Full docs at docs.podflare.ai/concepts/fork.
Related reading
- Cloud sandbox benchmark — the broader latency comparison.
- What is a cloud sandbox for AI agents?
- Persistent Python REPL — the other primitive that fork composes with.