Vercel’s AI SDK is great at one thing: stream-first LLM apps. streamText, tool(),generateObject. It’s the fastest path to shipping a chat UI. But the moment your agent needs to run code — not describe code, actually execute Python — you hit a gap: the SDK doesn’t ship a code-execution primitive.
You can fill the gap yourself. Define a run_python tool, route it to a hardware- isolated sandbox, return stdout to the model. Full TypeScript example below, works with the current AI SDK (4.x+) and any model provider the SDK supports.
Install
npm install ai @ai-sdk/openai podflare zod
(Use @ai-sdk/anthropic, @ai-sdk/google, whatever — Podflare doesn’t care which model provider you wire in.)
Declare the tool
Podflare ships an AI-SDK helper at podflare/ai-sdk that returns a pre-shaped tool():
import { openai } from "@ai-sdk/openai";
import { streamText } from "ai";
import { Sandbox } from "podflare";
import { podflareRunCode } from "podflare/ai-sdk";
const sandbox = new Sandbox();
await sandbox.open();
export async function POST(req: Request) {
const { messages } = await req.json();
const result = streamText({
model: openai("gpt-5"),
messages,
tools: {
run_python: podflareRunCode({ sandbox }),
},
maxSteps: 8, // allow multi-turn tool use
});
return result.toDataStreamResponse();
}podflareRunCode is a thin wrapper: zod schema for { code: string }, execute via sandbox.runCode(code), return stdout + stderr as a string. Nothing clever — just the boilerplate removed.
Hand-rolled version (for when you need custom behavior)
If you need to customize the tool description, add auth scoping, or wrap the result in structured output, skip the helper and build it yourself:
import { tool } from "ai";
import { z } from "zod";
tools: {
run_python: tool({
description:
"Execute Python code in a persistent REPL. Variables, "
+ "imports, and file state carry across calls.",
parameters: z.object({
code: z.string().describe("Python source code to execute."),
}),
execute: async ({ code }) => {
const result = await sandbox.runCode(code);
return (result.stdout ?? "") + (result.stderr ?? "");
},
}),
},Streaming the tool output back to the UI
The AI SDK’s toDataStreamResponse() already streams tool calls + tool results to the client as they happen, so your chat UI can show "running code..." and then the stdout in real time. If you want to stream the sandbox’s stdout byte-by-byte (useful for long-running code), use Podflare’s streaming API:
execute: async ({ code }) => {
let out = "";
await sandbox.runCode(code, {
onStdout: (chunk) => { out += chunk; },
});
return out;
},Lifecycle on a serverless Next.js route
Serverless routes (Vercel functions, Cloudflare Workers) are stateless between requests. So the sandbox has to be either (a) created per request and closed at the end, or (b) created once at user-session start and kept alive across requests by referencing its ID.
Pattern (b) for long conversations:
// On chat-open:
const sandbox = new Sandbox({ persistent: true });
await sandbox.open();
const sandboxId = sandbox.id;
// ...store sandboxId on the session / DB row
// On every tool call:
const sandbox = await Sandbox.attach(sandboxId); // reconnect
await sandbox.runCode(code);
// On chat-close or idle-timeout:
await sandbox.close();Spaces-backed persistence means the Python REPL state survives across HTTP requests. You load a DataFrame in request 1 and use it in request 10 without re-loading.
Why hardware isolation matters for Next.js agents
Your Next.js route handler almost certainly has access to your database, your API keys, your file system. A PythonShell.run or child_process.exec-based tool runs inside that process and shares all of it. One prompt injection and the model can exfiltrate your environment variables.
Podflare puts every code execution in a disposable Podflare Pod microVM with a dedicated kernel, no access to your app’s process, and no network access to internal services. See Why Docker isn’t enough for the threat-model argument.
Performance budget
- Sandbox create: ~190 ms p50 from a residential laptop. ~43 ms from a Vercel Serverless Function in a US region near Podflare. (The Vercel Edge runtime hitting Podflare via
api.podflare.aiis similarly fast because Cloudflare’s edge is co-located with Vercel’s.) - Hot exec: ~46 ms p50 per call on an already-open sandbox.
- Network budget: your Next.js route is the client, so these numbers dominate the tool-call latency your user sees. Pre-warming a sandbox when the chat opens (as in pattern (b) above) eliminates the cold-start cost from the first visible tool call.
Ship it
Free Podflare account ($200 starter credit). Working Vercel AI SDK example is in PodFlare-ai/demo. Deploy to Vercel, plug in your OPENAI_API_KEY + PODFLARE_API_KEY, ship.