Anthropic's Messages API gives Claude a clean tool-use protocol: you declare tools, Claude emits tool_use blocks when it wants to call one, and you return tool_result blocks with the output. It's the right shape for everything except running code.

For real code execution you want three properties that eval() or a subprocess can't give you:

Hardware isolation so Claude's generated code can't touch your process or host.
Persistent state across turns so when Claude imports pandas on turn 3, it's still imported on turn 7.
A fast round-trip so each tool call feels interactive.

This post walks through wiring Anthropic's tool_use up to a Podflare cloud sandbox — a hardware-isolated Podflare Pod microVM with a persistent Python REPL, round-trip under 200 ms. The full, runnable example is on GitHub at PodFlare-ai/demo.

The shape of the integration

Both Anthropic and Podflare expose simple, well-typed APIs that compose with each other cleanly. The flow is:

You open a Sandbox at the start of the conversation.
On every Claude turn you include a tool definition for run_python that calls sandbox.run_code().
You loop: model turn → maybe a tool_use block → execute → return tool_result → repeat, until the model returns a final text message.
You close() the sandbox at the end. Or keep it open as a persistent Space to resume later.

The full example

Install the two SDKs:

pip install anthropic podflare

Set your API keys:

export ANTHROPIC_API_KEY=sk-ant-...
export PODFLARE_API_KEY=pf_live_...

And here's the full loop:

import os
from anthropic import Anthropic
from podflare import Sandbox

client = Anthropic()

# Define the code-execution tool Claude can call.
TOOLS = [
    {
        "name": "run_python",
        "description": (
            "Execute Python code in a persistent REPL. Variables, "
            "imports, and state carry across calls. Returns stdout "
            "and stderr from the execution."
        ),
        "input_schema": {
            "type": "object",
            "properties": {
                "code": {
                    "type": "string",
                    "description": "Python source to execute.",
                }
            },
            "required": ["code"],
        },
    },
]

def run_conversation(user_prompt: str) -> str:
    """Run a tool-using conversation with Claude until it returns text."""
    messages = [{"role": "user", "content": user_prompt}]

    with Sandbox() as sb:
        while True:
            resp = client.messages.create(
                model="claude-sonnet-4-5",
                max_tokens=4096,
                tools=TOOLS,
                messages=messages,
            )

            # Append the assistant's full turn to history.
            messages.append({"role": "assistant", "content": resp.content})

            if resp.stop_reason == "end_turn":
                # Claude is done — pull out the final text.
                return "".join(
                    b.text for b in resp.content if b.type == "text"
                )

            if resp.stop_reason == "tool_use":
                # Execute every tool_use block in the turn.
                tool_results = []
                for block in resp.content:
                    if block.type != "tool_use":
                        continue
                    if block.name == "run_python":
                        code = block.input["code"]
                        result = sb.run_code(code)
                        out = (result.stdout or "") + (result.stderr or "")
                        tool_results.append({
                            "type": "tool_result",
                            "tool_use_id": block.id,
                            "content": out or "(no output)",
                        })
                messages.append({"role": "user", "content": tool_results})
                continue

            raise RuntimeError(f"unexpected stop_reason: {resp.stop_reason}")

if __name__ == "__main__":
    answer = run_conversation(
        "Fetch the last 10 days of Bitcoin price from "
        "api.coingecko.com, compute the daily return, "
        "and tell me the standard deviation."
    )
    print(answer)

What's happening on each turn

Trace the flow for the Bitcoin example:

Turn 1 (Claude): tool_use block with run_python(code="import requests; ... fetch JSON"). Your loop runs that in the sandbox, gets back the raw JSON as stdout, returns it as tool_result.
Turn 2 (Claude): another tool_use, this time parsing the JSON and computing daily returns. It can assume requests is already imported because the sandbox REPL kept it.
Turn 3 (Claude): tool_use for the stdev computation using numpy. Claude writes import numpy as np inline; the sandbox installs it with pip if needed (or it was already present).
Turn 4 (Claude): end_turn — Claude writes a natural-language summary of the result based on the stdev it saw in turn 3.

The cost is 4 round-trips to the Anthropic API plus 3 round-trips to the sandbox. Each sandbox call is ~46 ms from an in-cloud agent, ~190 ms from a laptop. The whole interaction completes in a couple of seconds including Anthropic's model-side latency.

Persistent state is what makes this cheap

The big win over "spin up a container per tool call" is that the Python REPL stays alive between run_code calls. When Claude imports pandas on turn 1 and calls pd.read_csv(...) on turn 2, the import isn't re-evaluated; globals()["pd"] still points at the pandas module. Same for any heavydf already in memory.

This is the feature that makes Claude agents that do multi-turn data exploration actually affordable. Every container-per-call platform forces the agent to re-parse, re-import, re-load on every turn, and the cost compounds fast.

Branching with fork() for tree-of-thought

For patterns where you want Claude to try multiple solutions in parallel and keep the best one, you can fork() the sandbox mid-conversation. Each child inherits the parent's full state — all imports, all variables:

with Sandbox() as parent:
    parent.run_code("import pandas as pd")
    parent.run_code("df = pd.read_csv('/data/big.csv')")  # expensive

    # Spawn 5 children, each with df already loaded
    children = parent.fork(n=5)
    for child, strategy in zip(children, strategies):
        child.run_code(strategy.code)
    # ...pick the best, merge it back into parent, destroy losers

Fork takes about 80 ms server-side for n=5, all parallel. No other cloud sandbox platform exposes this primitive today.

What about streaming?

If you want to stream Claude's tool calls as they emerge — useful for a chat UI where the user sees the model "thinking" — use the client.messages.stream(...) variant. The tool-use loop structure stays the same; you just assemble the content blocks from the stream rather than reading them off resp.content. The Podflare call itself is already streaming: run_code returns stdout/stderr as NDJSON over the wire, and you can hook into it with sb.run_code(code, on_stdout=lambda chunk: ...).

Security defaults you probably want

If the user prompt is untrusted — say the agent is serving a consumer product — tighten the sandbox at create time:

with Sandbox(
    egress=False,          # no outbound network
    max_lifetime_seconds=300,
    idle_timeout_seconds=60,
) as sb:
    # ...

egress=False detaches the guest's tap device from the host bridge; the guest still sees eth0 but every outbound packet dies at the host. That's usually too restrictive for agent workloads (no pip install), but it's the right default when you're running known-adversarial code. Domain- allowlist egress is on the Enterprise roadmap.

Adding code execution to Anthropic's tool_use with a secure cloud sandbox