
In this guide, we’ll use Python, LangChain, LangGraph and OpenAI to build a coding assistant. LangChain lets our model interface with predefined tools. LangGraph gives us an orchestration framework.
By the time you’re finished with this tutorial, you’ll be able to answer the following questions.
- What does an AI coding agent actually need to do?
- What tools does it need access to?
- How are AI models connected to LangChain and LangGraph?
- How do AI models call Python functions with LangChain?
- How is LangGraph used to orchestrate our workflow?
What does an AI coding agent actually do?
This is the real question. We need to understand which tools we need to power our AI agent. We can’t just plunge straight into building. We need to understand exactly what we’re building. Our AI coding assistant needs to do several really basic things.
- Create directories
- Inspect folders
- Read and write files
- Run tests
- Read test results
Before AI, we would have done most of this using Bash and Python. This hasn’t really changed. Most of this functionality comes built into the Python standard library.
These methods come from Python’s pathlib.
mkdir(): Named after the shell command,mkdir, this function makes a directory.iterdir(): Iterate through a directory and return its contents.read_text(): Read a file and return its text.write_text(): Write text to a file. This is about as basic as it gets when programming.
We need a few other tools from the standard library as well.
shutil.rmtree(): Delete an entire folder.subprocess.run(): Run a subprocess using the operating system.sys.executable: An executable that can be run by the operating system.
LangChain provides us with a @tool decorator. AI models can call any function that’s been wrapped using @tool. LangGraph gives models a StateGraph for tracking task progress and a ToolNode for calling LangChain tools reliably.
We’ll split this functionality into five separate pieces.
agent.py: TheStateGraphand compiled agent workflow.config.py: A relatively simple configuration file. This allows us to set our configuration variables as constants and tweak settings without breaking things.state.py: Track agent state using messages. Extract test results using regex.tools.py: Every tool called by the AI agent is held in this library.main.py: This holds the actual agent runtime. It will parse command line arguments for quick runs and it should also be able to run interactively.
Getting started
When building this system, you should be comfortable with the following.
- Python: All of our code is written in Python.
- Command line basics: You don’t need to be a shell scripting master but you should be aware of how commands like
ls,mkdir,touch,readandrmwork. - Large Language Models (LLMs): Our AI coding agent is powered by an LLM. You should know that these models take input and generate output based on their inputs.
- OpenAI API: You don’t need REST API experience. However, you do need an API key to access OpenAI models. You can create an account here.
To start, we need a project folder. The snippet below uses mkdir to create it and cd to move into the directory. Notice that we make both a coding-agent directory and a coding_agent directory. coding-agent exists just to hold our runtime during testing. coding_agent holds the actual AI coding agent program.
mkdir coding-agent
cd coding-agent
mkdir coding_agent
Create a new virtual environment.
python -m venv .venv
Activate your new environment. The command below will activate on macOS and Linux.
source .venv/bin/activate
If you’re on Windows, use this command instead.
..venvScriptsActivate.ps1
Next, we’ll make a requirements file.
langgraph==1.0.8
langchain==1.2.9
langchain-openai==1.1.8
python-dotenv==1.2.1
pytest==9.0.2
Install your dependencies.
pip install -r requirements.txt
Create a .env file and add the following line. Remember to replace the API key with your actual OpenAI API key.
OPENAI_API_KEY=<your-openai-api-key>
In the image below, you can see the files inside our coding_agent directory. Notice that we also have a file called __init__.py. The init file tells Python to treat the whole folder as a single program.

The inside of the coding_agent folder
The Python code
__init__.py
We’ll start with our easiest file: __init__.py. All you need to do is create it. Don’t add anything to it. As you can see in the image below, it’s just a blank file. Python sees this file and treats our whole project as a self contained program.

Our init file is blank
agent.py
This file holds the actual AI agent. Our system prompt tells the agent exactly what it needs to do. build_graph() is used to piece everything together and compile a LangGraph workflow. should_continue() decides when to exit the workflow.
from __future__ import annotations
from langchain_core.messages import SystemMessage
from langchain_openai import ChatOpenAI
from langchain.agents import create_agent
from langgraph.graph import StateGraph, START, END
from langgraph.prebuilt import ToolNode, tools_condition
from .config import MAX_MESSAGES, MAX_TOOL_CALLS, dprint
from .state import AgentState, count_tool_messages, extract_last_pytest_exit_code
from .tools import TOOLS
SYSTEM_PROMPT = SystemMessage(content="""
You are a coding agent fixing a tiny Python project.
Goal: make `python -m pytest -q` exit with code 0.
Rules:
- Use read_file before editing
- Use write_file for changes (small edits preferred)
- Run run_pytest frequently to check
- Stop when pytest exit_code == 0
- Do not create new files unless necessary
""")
def build_graph(model_name: str, temperature: float):
llm = ChatOpenAI(model=model_name, temperature=temperature)
llm_with_tools = llm.bind_tools(TOOLS)
agent = create_agent(
model=llm_with_tools,
tools=TOOLS,
system_prompt=SYSTEM_PROMPT,
)
def should_continue(state: AgentState) -> str:
messages = state["messages"]
tool_calls = count_tool_messages(messages)
exit_code = extract_last_pytest_exit_code(messages)
state["tool_calls"] = tool_calls
state["last_exit_code"] = exit_code
if exit_code == 0:
dprint(">> STOP: pytest passed")
return END
if len(messages) > MAX_MESSAGES:
dprint(">> STOP: message cap reached")
return END
if tool_calls > MAX_TOOL_CALLS:
dprint(">> STOP: tool-call cap reached")
return END
return "tools" if tools_condition(state) else END
workflow = StateGraph(state_schema=AgentState)
workflow.add_node("agent", agent)
workflow.add_node("tools", ToolNode(TOOLS))
workflow.add_edge(START, "agent")
workflow.add_conditional_edges("agent", should_continue, {"tools": "tools", END: END})
workflow.add_edge("tools", "agent")
return workflow.compile()
config.py
Here’s our config file. Though it’s small, this file holds critical runtime data. We use os.getenv() to get environment variables and set defaults. For instance, with os.getenv("AGENT_DEBUG", "1"), AGENT_DEBUG defaults to 1 — we run in debug mode by default. By default, our AI agent runs on gpt-5-mini. By routinely updating models, we can protect from model drift. Other options can be found on OpenAI’s models page.
import os
from pathlib import Path
PROJECT_ROOT = Path(os.environ.get("AGENT_PROJECT_ROOT", "./mini_sandbox")).resolve()
PROJECT_ROOT.mkdir(exist_ok=True)
DEBUG = os.getenv("AGENT_DEBUG", "1") == "1"
MAX_MESSAGES = int(os.getenv("AGENT_MAX_MESSAGES", "40"))
MAX_TOOL_CALLS = int(os.getenv("AGENT_MAX_TOOL_CALLS", "20"))
PYTEST_TIMEOUT_SECONDS = int(os.getenv("AGENT_PYTEST_TIMEOUT", "15"))
MODEL_NAME = os.getenv("AGENT_MODEL", "gpt-5-mini")
TEMPERATURE = float(os.getenv("AGENT_TEMPERATURE", "0"))
RECURSION_LIMIT = int(os.getenv("AGENT_RECURSION_LIMIT", "30"))
def dprint(*args: object) -> None:
if DEBUG:
print(*args, flush=True)
def safe_path(rel: str) -> Path:
full_path = (PROJECT_ROOT / rel).resolve()
if PROJECT_ROOT not in full_path.parents and full_path != PROJECT_ROOT:
raise ValueError("Path tries to escape project root")
return full_path
Also, pay attention to the two functions defined in this file.
dprint(): This is short for debug print.dprint()is called to print debug output to the terminal.safe_path(): This contains all actions within the root of the project. Our agent can’t accidentally delete the operating system.
state.py
Here’s our state file. In this file, we define our AgentState class. It holds a list of messages with annotations. tool_calls is used to keep track of how many tools the agent has called. last_exit_code lets the AI agent keep track of test results.
import re
from typing import Annotated, Optional, Sequence, TypedDict
from langchain_core.messages import BaseMessage, ToolMessage
class AgentState(TypedDict):
messages: Annotated[Sequence[BaseMessage], "add_messages"]
tool_calls: int
last_exit_code: Optional[int]
_exit_re = re.compile(r"exit_code=(d+)")
def extract_last_pytest_exit_code(messages: Sequence[BaseMessage]) -> Optional[int]:
for m in reversed(messages):
if isinstance(m, ToolMessage) and getattr(m, "name", None) == "run_pytest":
match = _exit_re.search(m.content or "")
if match:
return int(match.group(1))
return None
def count_tool_messages(messages: Sequence[BaseMessage]) -> int:
return sum(1 for m in messages if isinstance(m, ToolMessage))
Our functions in this file are used for basic state tracking.
extract_last_pytest_exit_code(): Save the results of the most recent pytest run.count_tool_messages(): Count the tool messages so the agent can keep track of task status.
tools.py
The tools file holds most of the magic that powers our AI agent. Here, we import our config variables and define the actual tools called by the AI agent. Notice how each function in this file has the @tool decorator.
import shutil
import subprocess
import sys
from pathlib import Path
from langchain_core.tools import tool
from .config import PROJECT_ROOT, PYTEST_TIMEOUT_SECONDS, safe_path
@tool
def list_tree(path: str = ".", max_entries: int = 300, max_depth: int = 6) -> str:
"""
Recursively list files/dirs under `path` (relative to project root).
Safe + fast: skips huge dirs and limits recursion depth.
"""
try:
root = safe_path(path)
except ValueError as e:
return f"Error: {e}"
if not root.exists():
return f"Not found: {path}"
# Skip common “giant” directories by default
skip_names = {
".venv", "venv", "node_modules", ".git",
"__pycache__", ".pytest_cache", ".mypy_cache",
".ruff_cache", ".tox", ".idea", ".vscode",
}
entries: list[str] = []
def walk(dir_path: Path, depth: int) -> None:
nonlocal entries
if len(entries) >= max_entries:
return
if depth > max_depth:
return
try:
children = sorted(dir_path.iterdir(), key=lambda p: (p.is_file(), p.name.lower()))
except Exception:
return
for child in children:
if len(entries) >= max_entries:
return
if child.name in skip_names:
rel = child.relative_to(PROJECT_ROOT)
entries.append(f"{rel}/ (skipped)")
continue
rel = child.relative_to(PROJECT_ROOT)
if child.is_dir():
entries.append(f"{rel}/")
walk(child, depth + 1)
else:
entries.append(str(rel))
if root.is_file():
return str(root.relative_to(PROJECT_ROOT))
walk(root, 0)
if len(entries) >= max_entries:
entries.append(f"... truncated (max_entries={max_entries})")
return "n".join(entries) if entries else "(empty)"
@tool
def write_file(path: str, content: str) -> str:
"""Write/overwrite file relative to project root."""
try:
full_path = safe_path(path)
except ValueError as e:
return f"Error: {e}"
full_path.parent.mkdir(parents=True, exist_ok=True)
full_path.write_text(content, encoding="utf-8")
return f"Wrote {path}"
@tool
def read_file(path: str) -> str:
"""Read file relative to project root."""
try:
full_path = safe_path(path)
except ValueError as e:
return f"Error: {e}"
if not full_path.is_file():
return f"File not found: {path}"
return full_path.read_text(encoding="utf-8")
@tool
def run_pytest() -> str:
"""Run pytest quietly and return result summary."""
try:
result = subprocess.run(
[sys.executable, "-m", "pytest", "-q", "--tb=no"],
cwd=PROJECT_ROOT,
capture_output=True,
text=True,
timeout=PYTEST_TIMEOUT_SECONDS,
)
stdout = (result.stdout or "").strip()
stderr = (result.stderr or "").strip()
return (
f"exit_code={result.returncode}n"
f"stdout:n{stdout[:1200]}n"
f"stderr:n{stderr[:800]}"
)
except subprocess.TimeoutExpired:
return f"exit_code=124nstdout:nnstderr:npytest timed out after {PYTEST_TIMEOUT_SECONDS}s"
except Exception as e:
return f"exit_code=1nstdout:nnstderr:npytest failed to run: {str(e)}"
@tool
def delete_file(path: str) -> str:
"""Delete a file relative to project root."""
try:
full_path = safe_path(path)
except ValueError as e:
return f"Error: {e}"
if not full_path.exists():
return f"Not found: {path}"
if full_path.is_dir():
return f"Refusing to delete directory: {path}"
full_path.unlink()
return f"Deleted {path}"
@tool
def rm_rf(path: str) -> str:
"""
Delete a file OR directory tree inside the sandbox.
Safe because it is constrained by safe_path() and refuses PROJECT_ROOT.
Equivalent-ish to: rm -rf <path>
"""
try:
target = safe_path(path)
except ValueError as e:
return f"Error: {e}"
if target == PROJECT_ROOT:
return "Error: refusing to delete project root"
if not target.exists():
return f"Not found: {path}"
try:
if target.is_dir():
shutil.rmtree(target)
return f"Deleted directory tree: {path}"
target.unlink()
return f"Deleted file: {path}"
except Exception as e:
return f"Error deleting {path}: {e}"
@tool
def move_file(src: str, dst: str) -> str:
"""Rename/move a file within the project root."""
try:
src_path = safe_path(src)
dst_path = safe_path(dst)
except ValueError as e:
return f"Error: {e}"
if not src_path.exists() or not src_path.is_file():
return f"Not found: {src}"
dst_path.parent.mkdir(parents=True, exist_ok=True)
src_path.replace(dst_path) # atomic rename on same filesystem
return f"Moved {src} -> {dst}"
TOOLS = [
list_tree,
write_file,
read_file,
move_file,
delete_file,
rm_rf,
run_pytest,
]
list_tree(): List the file tree of a given directory. If the agent needs to, it can walk nested directories from within this directory as well.write_file(): Write a string of text to a specified file path.read_file(): Read a file’s text and return its output as a string. The AI agent can use this to find errors.move_file(): Rename a file or move it to a new location.delete_file(): Delete a file. This is important when writing new code and removing messy or unneeded code files.rm_rf(): This function deletes an entire directory. The AI agent should use this for scrapping projects and removing artifacts.run_pytest(): Run pytest and return the test results. Without this function, the model has no clue if its code actually works.
main.py
Finally, we get to our main file. The DEFAULT_PROMPT tells the agent what to do on simple file run. _print_new() is used to keep the user informed of any code changes. run_once() is used to compile and run a workflow. parse_args() extracts any arguments passed when running the file with flags such as --prompt.
# coding_agent/main.py
from __future__ import annotations
import argparse
import sys
from typing import Any, List
from dotenv import load_dotenv
load_dotenv()
from langchain_core.messages import HumanMessage, AIMessage, ToolMessage
from .config import MODEL_NAME, TEMPERATURE, RECURSION_LIMIT
from .agent import build_graph
DEFAULT_PROMPT = (
"Create a minimal project in the current folder called 'mathy' "
"with one function that adds two numbers and a passing pytest test. "
"Then make sure pytest passes. Fix anything that fails."
)
def _print_new(messages: List[Any], start_idx: int) -> int:
"""Print only new AI/tool messages and return new cursor index."""
for m in messages[start_idx:]:
if isinstance(m, AIMessage):
content = (m.content or "").strip()
if content:
print("n🤖 Agent:n" + content)
elif isinstance(m, ToolMessage):
# Tool output can be noisy; keep it short but visible for debugging.
print(f"n🛠 Tool:{m.name}n{(m.content or '')[:600]}")
return len(messages)
def run_once(prompt: str, model: str, temp: float, recursion_limit: int) -> None:
graph = build_graph(model_name=model, temperature=temp)
state = {
"messages": [HumanMessage(content=prompt)],
"tool_calls": 0,
"last_exit_code": None,
}
print("Starting agent...n")
final_state = graph.invoke(state, {"recursion_limit": recursion_limit})
print("n=== Final state ===")
print("tool_calls:", final_state.get("tool_calls"))
print("last_exit_code:", final_state.get("last_exit_code"))
print("n=== Final messages ===")
cursor = 0
cursor = _print_new(list(final_state["messages"]), cursor)
def chat_loop(model: str, temp: float, recursion_limit: int, seed_prompt: str, no_seed: bool) -> None:
graph = build_graph(model_name=model, temperature=temp)
state = {"messages": [], "tool_calls": 0, "last_exit_code": None}
cursor = 0
if not no_seed and seed_prompt:
state["messages"] = [HumanMessage(content=seed_prompt)]
state = graph.invoke(state, {"recursion_limit": recursion_limit})
cursor = _print_new(list(state["messages"]), cursor)
print("nInteractive chat mode. /help for commands.n")
while True:
try:
text = input("👤 You> ").strip()
except (EOFError, KeyboardInterrupt):
print("nExiting.")
return
if not text:
continue
if text in ("/q", "/quit", "/exit"):
print("Exiting.")
return
if text == "/help":
print(
"nCommands:n"
" /help show this helpn"
" /exit quitn"
" /state show tool_calls + last_exit_coden"
" /reset clear conversation state (keeps sandbox files)n"
" /seed run the seed prompt oncen"
)
continue
if text == "/state":
print(f"tool_calls={state.get('tool_calls')} last_exit_code={state.get('last_exit_code')}")
continue
if text == "/reset":
state = {"messages": [], "tool_calls": 0, "last_exit_code": None}
cursor = 0
print("State cleared.")
continue
if text == "/seed":
if not seed_prompt:
print("No seed prompt set.")
continue
state["messages"] = list(state["messages"]) + [HumanMessage(content=seed_prompt)]
state = graph.invoke(state, {"recursion_limit": recursion_limit})
cursor = _print_new(list(state["messages"]), cursor)
continue
state["messages"] = list(state["messages"]) + [HumanMessage(content=text)]
state = graph.invoke(state, {"recursion_limit": recursion_limit})
cursor = _print_new(list(state["messages"]), cursor)
if state.get("last_exit_code") == 0:
print("n✅ Pytest exit_code=0 (agent would stop in one-shot mode).")
def parse_args(argv: List[str]) -> argparse.Namespace:
p = argparse.ArgumentParser(description="Mini coding agent (LangGraph + tools)")
p.add_argument("--chat", action="store_true", help="Interactive chat mode (state persists across turns)")
p.add_argument("--no-seed", action="store_true", help="In chat mode, do not auto-run the seed prompt")
p.add_argument("--prompt", default=DEFAULT_PROMPT, help="Seed prompt / one-shot prompt")
p.add_argument("--model", default=MODEL_NAME, help=f"Model name (default: {MODEL_NAME})")
p.add_argument("--temp", type=float, default=TEMPERATURE, help=f"Temperature (default: {TEMPERATURE})")
p.add_argument("--recursion-limit", type=int, default=RECURSION_LIMIT, help=f"Recursion limit (default: {RECURSION_LIMIT})")
return p.parse_args(argv)
def main(argv: List[str] | None = None) -> None:
args = parse_args(argv or sys.argv[1:])
if args.chat:
chat_loop(
model=args.model,
temp=args.temp,
recursion_limit=args.recursion_limit,
seed_prompt=args.prompt,
no_seed=args.no_seed,
)
else:
run_once(
prompt=args.prompt,
model=args.model,
temp=args.temp,
recursion_limit=args.recursion_limit,
)
if __name__ == "__main__":
main()
There are two other functions to pay attention to here as well. These are used to handle actual runtime.
chat_loop(): Keep a continuous interactive chat running. The user inputs a prompt and the model creates and runs a workflow. Then, the user inputs another prompt.main(): Determine whether to run continuously usingchat_loop()or to userun_once()and then exit.
Usage
To run this project and then exit, use the command below. Without any arguments, it will create a new project called mathy. It’s a quick way to build new template projects, skip boilerplate and write your own code.
python -m coding_agent.main
Additional flags
–prompt
The prompt flag injects a new prompt rather than reading the default. The snippet below tells our coding agent to expand mathy into a fully functional calculator.
python -m coding_agent.main --prompt "create a functional calculator with add, multiply, subtract and divide and add it to the mathy project"
After executing your instructions, the AI agent displays the test results and informs users of any changes made to the code.

Example –prompt usage
After telling it to rewrite mathy as a calculator, the model generated the code below. As you can see, we now have four functions: add(), multiply(), subtract() and divide().

Calculator code generated by the AI agent
–chat
To run interactively, use the chat flag.
python -m coding_agent.main --chat
When chat mode starts up, the agent inspects the project and runs the tests.

The AI agent inspects and tests the existing project
We then told the model to add server functionality to the project. It informs us that it created a new file and added some endpoints for a REST API.

The agent adds server functionality and tells the user how to run it
Below is a screenshot of the server generated by the AI coding agent.

Server code generated by the AI coding agent
Conclusion
AI coding agents aren’t magic. AI models have been generating code for years. AI coding agents use that same generative output with additional connectors for standard command line operations. By connecting AI models to basic I/O and file handling processes, we can pipe their code straight into runnable files.
More importantly, when defining model tools, we need to write them with correct constraints. An AI coding agent needs to clean its workspace. That same AI agent should not be able to clean folders or files outside its workspace. When these connectors are written carelessly, the model can destroy the system. When the constraints are sufficient, the model can build things and clean up after itself.
Tools like Codex and Claude Code are not sorcery. AI coding agents are generative AI models with the power to create, read, update, delete and test files. They continue to expand but their core principles remain the same.