OpenAI’s Agents SDK (released late 2025) is an opinionated agent-orchestration framework that handles tool-use loops, handoffs between specialized sub-agents, and tracing with remarkably little setup code. What it doesn’t ship is a secure code-execution primitive — you bring your own, as a FunctionTool.

Podflare is designed for exactly this slot. Hardware- isolated microVM, persistent Python REPL, ~190 ms cold start. This post shows the exact 15 lines of code to wire it in.

Install

pip install openai-agents podflare
export OPENAI_API_KEY=sk-...
export PODFLARE_API_KEY=pf_live_...

The integration

Podflare ships an Agents-SDK adapter at podflare.openai_agents:

from agents import Agent, Runner
from podflare import Sandbox
from podflare.openai_agents import podflare_code_interpreter

sandbox = Sandbox()

coder = Agent(
    name="coder",
    instructions=(
        "You are a senior data engineer. You have access to a "
        "persistent Python REPL via the run_python tool. Prefer "
        "pandas + polars over raw loops. State persists across "
        "tool calls."
    ),
    tools=[podflare_code_interpreter(sandbox=sandbox)],
)

result = await Runner.run(
    coder,
    "Download the latest NYC taxi data, compute the average "
    "trip length by borough, and plot the result."
)

print(result.final_output)
sandbox.close()

podflare_code_interpreter() returns a FunctionTool with:

Pydantic schema for the code argument
Description that mentions state persistence (so the model uses it correctly across turns)
An execute function that streams stdout + stderr back
Tracing integration with the Agents SDK’s built-in run tracer

Multi-agent handoff pattern

The most useful Agents-SDK pattern is handoff — a coordinator agent routes the user’s request to a specialist sub-agent. You can scope sandbox access per sub-agent so only the ones that need code execution get it:

from agents import Agent, Runner, handoff

sandbox = Sandbox()

coder = Agent(
    name="coder",
    instructions="Run Python. Return data.",
    tools=[podflare_code_interpreter(sandbox=sandbox)],
)

writer = Agent(
    name="writer",
    instructions=(
        "Turn data into a human-readable summary. Never run code."
    ),
    # no tools — this agent doesn't get sandbox access
)

coordinator = Agent(
    name="coordinator",
    instructions=(
        "Route data-extraction work to the coder agent; route "
        "summarization work to the writer agent."
    ),
    handoffs=[handoff(coder), handoff(writer)],
)

result = await Runner.run(
    coordinator,
    "Analyze last week's app store reviews for sentiment."
)

The coordinator sees the user’s request, decides it needs data extraction, hands off to coder. Coder runs Python in the sandbox. Handoff back to coordinator, which then hands off to writer to narrate the result. Clean, traceable, each agent has only the capabilities it needs.

Tracing

The Agents SDK’s built-in tracer (via OTEL / the OpenAI trace dashboard) picks up every tool call, so you can see per-run:

Which code the model ran
How long each exec took (server-side)
The stdout/stderr returned
Which agent made the call

Useful for debugging "why did my agent loop 15 times before giving up" and for long-term observability of production agent workloads.

Long-running sessions

For a chat-style agent where the same user comes back across sessions, flip persistent=True on the sandbox and wire the space_id into your session store:

# First request — create a persistent sandbox
sandbox = Sandbox(persistent=True)
session.sandbox_id = sandbox.space_id

# Later request — resume
sandbox = Sandbox.resume(session.sandbox_id)
# ...pandas, numpy, any DataFrames from last session are still in globals()

Handoff-between-sandboxes for fork tree-of-thought

A recently useful pattern: a coder agent that, when faced with ambiguity, forks its sandbox into N children, runs each child with a different strategy, picks the best result, and merges it back. Podflare’s fork(n) primitive (~80 ms server-side) makes this practical:

children = sandbox.fork(n=5)
results = await asyncio.gather(*[
    Runner.run(child_agent_on(c), prompt_variant(i))
    for i, c in enumerate(children)
])
winner = pick_best(results)
sandbox.merge_into(winner.sandbox)
for c in children:
    c.close()

Why this beats OpenAI’s built-in code_interpreter

The Agents SDK does offer a code_interpreter built-in (via the Responses API). It’s a perfectly good option when you want zero infra. Switching to Podflare gives you:

Full control over the Python environment (custom packages, pre-loaded datasets)
Data stays in your perimeter, not OpenAI’s
fork(n) for tree-of-thought (not in the built-in)
Persistent Spaces across chat sessions
Swap to a non-OpenAI model without losing the code-execution layer

See the full comparison post.

Ship it

Free Podflare account ($200 starter credit). Full Agents-SDK example in PodFlare-ai/demo.

Wiring a Podflare sandbox into OpenAI Agents SDK as a FunctionTool