Skip to main content

Build an AI Agent with Memory Using MongoDB

This guide shows how to build an AI agent with persistent memory using Python, MongoDB, and the OpenAI API. It explains why models are stateless, how chat history works, and how to create a practical memory backend for real applications.
Author Jake Nulty
Last updated

AI agent with MongoDB diagram

In this guide, we’ll build an AI agent using Python, the OpenAI API and MongoDB. When you build your own memory system, you gain a better understanding of how AI-powered apps really function. This allows us to create modular architectures where conversation and task state can persist despite the fact that the AI model itself is stateless.

Once you’ve finished this tutorial, you’ll be able to answer the following questions.

  • Why are AI models stateless?
  • How is memory addressed within the OpenAI API?
  • How do we build our own persistent memory backend?

How does model memory work?

AI model memory is something of an illusion. By nature, AI models are stateless. They don’t actually remember things. They rely on knowledge retrieval. Without knowledge retrieval, agentic and generative AI cannot properly manage context. When you send a prompt, the model generates output. When you read the model output and send another prompt, the model does not remember your first prompt. Take a look at the following basic chat log.

Basic conversation with an AI model

Now, inspect the first prompt. We inform the model of the user’s name and then the model generates an output based on the input it received: Hi Jake.... The model finishes its output by asking what it can help with. The user responds with Nothing!.

The model response to the second prompt is where the magic really happens. If our model is stateless, how did it remember the user’s name in the second output?

In reality, it didn’t remember. A chat interface injects the chat history into the next prompt. The model is always reading some type of chat history. The AI model re-reads the context when generating its next output. On the user end, we see the prompt: Nothing!. What happens under the hood is critical to chat continuity. When the Nothing! is sent to the model, it actually reads as something like this.

User: Hi! My name is Jake!
Model: Hi Jake... What can I help you with today?
User: Nothing!

Using the OpenAI API, we normally create a continuous chat by passing the ID of the previous response into the responses.create(). For instance, if we use Hi! My name is Jake! as our first prompt, we can give it an ID of 1. We generate the next response using responses.create("Nothing", previous_response_id=1). The AI model reads the input and output generated from response 1 when generating our new response. We need to reverse engineer this chat continuity using an external database.

Prerequisites

Users should have a basic familiarity with Python and the OpenAI API. A web data MCP tool is an optional enhancement — MCP servers give your AI agent reliable access to web scraping and search tools for data discovery.

Creating a MongoDB account

Head over to the MongoDB websites and create an account. Their registration page can be found here. If you’re running MongoDB locally, you can skip this step.

Next, deploy a MongoDB cluster. They offer three tiers: M10, Flex and Free. The Free tier is sufficient for this tutorial. If your application needs to scale, MongoDB allows you to upgrade plans.

Deploying a cluster with MongoDB

Click on the button titled “Create database user” and then move on to choosing a connection method. When choosing one, select “Drivers”. This is how we generate the URI of our database.

MongoDB connection methods

Now, enter your project details. Here, we’ll use “Python” and we’ll select “4.7 or later”. Then, copy the connection string and store it somewhere safe. Without it, you won’t be able to access your database.

Selecting your driver and version

Getting your OpenAI API key

If you don’t have one already, you’ll need to create an OpenAI developer account. From the dashboard, open the “API keys” tab. When you generate a key, store it somewhere safe.

OpenAI developer dashboard

Project setup

Create a new project folder and cd into the new project.

mkdir agent_with_memory
cd agent_with_memory

Create a new virtual environment.

python -m venv .venv

Activate the environment on Linux/macOS.

source .venv/bin/activate

If you’re on Windows, you can use the command below instead.

..venvScriptsActivate.ps1

Now, install dependencies. The pymongo and openai packages are software development kits (SDKs). With these packages, we can connect to both MongoDB and the OpenAI API easily.

pip install streamlit pymongo openai

We can use pip freeze to save our dependencies to a requirements.txt file.

pip freeze > requirements.txt

Building the AI agent with memory

Now, it’s time to build our AI agent. The application consists of three files.

  • settings.py: Basic helpers to save and load our settings.
  • agent.py: This holds all of our connections to the database and to the OpenAI API as well as tool helpers to use for logging tasks.
  • app.py: The Streamlit frontend contains a dashboard where we can input config variables like OpenAI key, model, MongoDB URI and MCP URL. This also holds our chat interface.

settings.py

Our settings file is pretty small. We’ve got three functions.

  • load_settings(): Read the app settings and return them. If no settings are found, return an empty dict object.
  • save_settings(): Take in a dict of settings and write them to the settings file.
  • get_value(): Find and return a specific value from the settings file. This function searches for settings both in .app_settings.json as well as environment variables.
# settings.py
import json
import os
from pathlib import Path
from typing import Any, Dict, Optional

PROJECT_DIR = Path(__file__).resolve().parent
SETTINGS_PATH = PROJECT_DIR / ".app_settings.json"


def load_settings() -> Dict[str, Any]:
    if not SETTINGS_PATH.exists():
        return {}
    try:
        return json.loads(SETTINGS_PATH.read_text(encoding="utf-8"))
    except Exception:
        return {}


def save_settings(s: Dict[str, Any]) -> None:
    SETTINGS_PATH.write_text(json.dumps(s, indent=2), encoding="utf-8")


def get_value(key: str, user_value: Optional[str] = None, default: str = "") -> str:
    if user_value and user_value.strip():
        return user_value.strip()

    s = load_settings()
    v = str(s.get(key, "") or "").strip()
    if v:
        return v

    env = str(os.environ.get(key, "") or "").strip()
    if env:
        return env

    return default

agent.py

Our agent file is the largest of the three. We begin with some helper functions for interacting with the database. utcnow() returns the current universal time. db() is a simple wrapper for MongoClient(). ensure() creates the indexes required to run our memory system.

The two functions below help to provide us with continuous chat.

  • save_msg(): Save a message to the database and updates sessions.last_seen_at.
  • history_str(): Find all messages attached to a certain session ID and sort them by timestamp. Return the ordered messages. If the limit is reached, it drops older messages.

We have a variety of task helpers. Tracking tasks helps us manage costs and prevents the model from running the same task twice.

  • active_task(): Find and return the status of an active task.
  • set_active_task(): Find a task based on its session ID and set its status to active.
  • start_task(): Insert a new task and set it to active.
  • update_task(): Change the status of an existing task within the database.

Next, we have our OpenAI and MCP helpers.

  • LLMConfig: Holds the model name that we’re using to power the AI agent.
  • _client(): A wrapper for the OpenAI client.
  • mcp_tools(): Return a list of MCP tools available for the AI agent to use.

We also have a couple of preprocessing functions to help with task extraction.

  • extract_task_patch(): Find and return a task within a given string of text using regex. If no task is found, return None.
  • strip_task_block(): Remove whitespace and unneeded characters from a task block.
# agent.py
import os
import re
import uuid
from dataclasses import dataclass
from datetime import datetime, timezone
from typing import Any, Dict, List, Optional, Tuple

from pymongo import MongoClient, ASCENDING


# ----------------------------
# Mongo
# ----------------------------
def utcnow() -> datetime:
    return datetime.now(timezone.utc)


def db(mongo_uri: str, name: str = "ai_agent"):
    return MongoClient(mongo_uri)[name]


def ensure(db) -> None:
    db.sessions.create_index([("session_id", ASCENDING)], unique=True)
    db.messages.create_index([("session_id", ASCENDING), ("ts", ASCENDING)])
    db.tasks.create_index([("task_id", ASCENDING)], unique=True)


def save_msg(db, sid: str, role: str, content: str) -> None:
    db.messages.insert_one({"session_id": sid, "role": role, "content": content, "ts": utcnow()})
    db.sessions.update_one(
        {"session_id": sid},
        {"$set": {"last_seen_at": utcnow()}, "$setOnInsert": {"created_at": utcnow()}},
        upsert=True,
    )


def history_str(db, sid: str, limit: int) -> str:
    docs = list(db.messages.find({"session_id": sid}, {"_id": 0}).sort("ts", 1))
    docs = [d for d in docs if d.get("role") in ("user", "assistant")]
    if limit:
        docs = docs[-limit:]
    lines = []
    for d in docs:
        role = "User" if d["role"] == "user" else "Assistant"
        lines.append(f"{role}: {d.get('content','')}".strip())
    return "n".join(lines).strip()


# ----------------------------
# Tasks (one active per session)
# ----------------------------
def active_task(db, sid: str) -> Optional[Dict[str, Any]]:
    s = db.sessions.find_one({"session_id": sid}, {"_id": 0, "active_task_id": 1}) or {}
    tid = s.get("active_task_id")
    if not tid:
        return None
    return db.tasks.find_one({"task_id": tid}, {"_id": 0})


def set_active_task(db, sid: str, tid: Optional[str]) -> None:
    db.sessions.update_one({"session_id": sid}, {"$set": {"active_task_id": tid}}, upsert=True)


def start_task(db, sid: str, goal: str) -> str:
    tid = str(uuid.uuid4())
    db.tasks.insert_one(
        {
            "task_id": tid,
            "session_id": sid,
            "goal": goal,
            "status": "running",  # running|done|paused|error
            "state": {},
            "created_at": utcnow(),
            "updated_at": utcnow(),
        }
    )
    set_active_task(db, sid, tid)
    return tid


def update_task(db, tid: str, *, status: Optional[str] = None, state: Optional[Dict[str, Any]] = None) -> None:
    patch: Dict[str, Any] = {"updated_at": utcnow()}
    if status:
        patch["status"] = status
    if state is not None:
        patch["state"] = state
    db.tasks.update_one({"task_id": tid}, {"$set": patch})


# ----------------------------
# OpenAI + MCP
# ----------------------------
@dataclass
class LLMConfig:
    model: str


def _client():
    from openai import OpenAI  # type: ignore
    return OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))


def mcp_tools(mcp_servers: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
    out: List[Dict[str, Any]] = []
    for s in (mcp_servers or []):
        tool: Dict[str, Any] = {
            "type": "mcp",
            "server_label": s.get("server_label", "WebResearch"),
            "server_url": s["server_url"],
            "require_approval": s.get("require_approval", "never"),
        }
        if s.get("allowed_tools"):
            tool["allowed_tools"] = s["allowed_tools"]
        if s.get("headers"):
            tool["headers"] = s["headers"]
        out.append(tool)
    return out


# Capture EVERYTHING between ```task ... ``` so nested JSON works.
_TASK_RE = re.compile(r"(?s)```tasks*(.*?)s*```")


def extract_task_patch(text: str) -> Optional[Dict[str, Any]]:
    m = _TASK_RE.search(text or "")
    if not m:
        return None
    import json
    raw = (m.group(1) or "").strip()
    try:
        data = json.loads(raw)
        return data if isinstance(data, dict) else None
    except Exception:
        return None


def strip_task_block(text: str) -> str:
    return _TASK_RE.sub("", text or "").strip()


# ----------------------------
# Turn runner (no streaming)
# ----------------------------
def run_turn(
    *,
    db,
    session_id: str,
    user_text: str,
    cfg: LLMConfig,
    system_prompt: str,
    mcp_servers: Optional[List[Dict[str, Any]]] = None,
    history_limit: int = 20,
) -> Tuple[str, Dict[str, Any]]:
    """
    Tutorial agent:
      - chat in Mongo
      - /task start|status|stop
      - MCP tools via OpenAI Responses API
      - task memory in Mongo (tasks.state)
    """
    t = (user_text or "").strip()

    # Save user message first
    save_msg(db, session_id, "user", t)

    # Task commands (deterministic)
    if t.startswith("/task "):
        cmd = t[6:].strip()

        if cmd.startswith("start "):
            goal = cmd[6:].strip()
            tid = start_task(db, session_id, goal)
            reply = f"Task started: `{tid}`nGoal: {goal}"
            save_msg(db, session_id, "assistant", reply)
            return reply, {"mode": "task_cmd", "task_id": tid}

        if cmd == "status":
            task = active_task(db, session_id)
            reply = "No active task."
            if task:
                reply = str({k: task.get(k) for k in ("task_id", "goal", "status", "state")})
            save_msg(db, session_id, "assistant", reply)
            return reply, {"mode": "task_cmd", "task": task}

        if cmd == "stop":
            set_active_task(db, session_id, None)
            reply = "Active task cleared."
            save_msg(db, session_id, "assistant", reply)
            return reply, {"mode": "task_cmd"}

        reply = "Commands: /task start <goal>, /task status, /task stop"
        save_msg(db, session_id, "assistant", reply)
        return reply, {"mode": "task_cmd"}

    task = active_task(db, session_id)

    # Make task updates reliable: if a task is active, require a task patch.
    task_block = ""
    if task:
        task_block = (
            "nnActive task:n"
            f"- task_id: {task['task_id']}n"
            f"- goal: {task['goal']}n"
            f"- status: {task['status']}n"
            f"- state: {task.get('state', {})}n"
            "nYou MUST include exactly one task update block at the end of your reply:n"
            "```taskn"
            '{"status":"running|done|paused|error","state":{...}}n'
            "```n"
            "Only include JSON in the task block."
        )

    transcript = history_str(db, session_id, history_limit)
    tools = mcp_tools(mcp_servers or [])

    client = _client()
    resp = client.responses.create(
        model=cfg.model,
        instructions=(system_prompt + task_block).strip(),
        tools=tools if tools else None,
        # Transcript already includes the latest user message (no duplication).
        input=transcript or f"User: {t}",
    )

    raw_text = (getattr(resp, "output_text", None) or "").strip() or "(no output)"
    patch = extract_task_patch(raw_text) if task else None
    visible_text = strip_task_block(raw_text)

    save_msg(db, session_id, "assistant", visible_text)

    if task and patch:
        update_task(
            db,
            task["task_id"],
            status=str(patch.get("status") or task["status"]),
            state=patch.get("state", task.get("state", {})),
        )

    return visible_text, {
        "active_task_id": task["task_id"] if task else None,
        "mcp": len(mcp_servers or []),
        "tool_calls": tool_trace(resp),
    }


def tool_trace(resp) -> List[Dict[str, Any]]:
    """
    Dump a lightweight view of ALL output items so we can see what the SDK is returning.
    This will catch MCP even if the item types aren't "tool_call/tool_result".
    """
    out: List[Dict[str, Any]] = []
    items = getattr(resp, "output", None) or []
    for it in items:
        if isinstance(it, dict):
            out.append({
                "type": it.get("type"),
                "keys": sorted(list(it.keys()))[:40],
                "preview": {k: it.get(k) for k in ("type", "id", "name", "tool_name", "call_id", "server_label", "server_url") if k in it},
            })
            continue

        t = getattr(it, "type", None)
        # try to expose fields that commonly exist on tool items
        preview = {"type": t}
        for k in ("id", "name", "tool_name", "call_id", "server_label", "server_url", "status"):
            if hasattr(it, k):
                try:
                    preview[k] = getattr(it, k)
                except Exception:
                    pass

        # if there's nested content, record its types too
        nested = []
        content = getattr(it, "content", None) or []
        for c in content:
            if isinstance(c, dict):
                nested.append(c.get("type"))
            else:
                nested.append(getattr(c, "type", None))
        if nested:
            preview["content_types"] = nested

        out.append(preview)
    return out


def as_dict(x: Any) -> Dict[str, Any]:
    if isinstance(x, dict):
        return x
    d: Dict[str, Any] = {}
    for k in ("type", "name", "id", "tool_name", "status", "arguments", "output", "content"):
        if hasattr(x, k):
            try:
                d[k] = getattr(x, k)
            except Exception:
                pass
    return d

The heavy lifting in agent.py comes from run_turn(). When a user enters a prompt from the dashboard, it gets passed into run_turn(). The function parses the text for /task. If a task is found, it gets inserted into the database. When the agent outputs a task JSON object, task status is updated on the backend.

app.py

Here is our app file. This is the frontend to our agent. The sidebar contains input boxes for variables such as MongoDB URI, MCP URL, model and OpenAI API key. When the user clicks the “Save” button, their settings file is updated. If a user wishes to remove their settings, they simply click the “Clear” button.

When a user enters a prompt, the prompt is displayed in the chat immediately and we wait for the model to respond — similar to the chat interfaces you see with mainstream LLMs. We use st.expander() to create a dropdown for inspecting model tasks and memory.

# app.py
import os
import uuid
import streamlit as st

from agent import LLMConfig, run_turn, db as get_db, ensure
from settings import load_settings, save_settings, get_value

st.set_page_config(page_title="AI agent (Mongo + MCP)", page_icon="🧠", layout="centered")


def clean_url(u: str) -> str:
    u = (u or "").strip()
    return u.strip("[](){}").strip("'"").strip()


# ----------------------------
# Sidebar
# ----------------------------
st.sidebar.header("Settings")
s = load_settings()

mongo_uri = st.sidebar.text_input(
    "MongoDB URI",
    value=get_value("MONGODB_URI", default="mongodb://localhost:27017"),
    type="password",
)

api_key = st.sidebar.text_input(
    "OpenAI API Key",
    value=get_value("OPENAI_API_KEY"),
    type="password",
)

model = st.sidebar.text_input(
    "Model",
    value=get_value("OPENAI_MODEL", default="gpt-5-mini"),
)

mcp_urls_in = st.sidebar.text_area(
    "MCP URLs (1 per line)",
    value=get_value("MCP_URLS", default=""),
    height=90,
    placeholder="http://localhost:3001/mcpnhttp://localhost:3400/mcp",
)

c1, c2 = st.sidebar.columns(2)
with c1:
    if st.sidebar.button("Save", use_container_width=True):
        s["MONGODB_URI"] = mongo_uri.strip()
        s["OPENAI_API_KEY"] = api_key.strip()
        s["OPENAI_MODEL"] = model.strip()
        s["MCP_URLS"] = mcp_urls_in.strip()
        save_settings(s)
        st.sidebar.success("Saved.")
with c2:
    if st.sidebar.button("Clear", use_container_width=True):
        for k in ("MONGODB_URI", "OPENAI_API_KEY", "OPENAI_MODEL", "MCP_URLS"):
            s.pop(k, None)
        save_settings(s)
        st.sidebar.success("Cleared.")

st.sidebar.caption("Tasks: `/task start <goal>`, `/task status`, `/task stop`")

# Apply runtime env (OpenAI SDK reads env)
if api_key.strip():
    os.environ["OPENAI_API_KEY"] = api_key.strip()
os.environ["OPENAI_MODEL"] = (model.strip() or "gpt-5-mini")

# MCP servers list passed to agent.py
mcp_urls = [clean_url(u) for u in (mcp_urls_in or "").splitlines() if clean_url(u)]
mcp_servers = [
    {"server_label": "WebResearch", "server_url": url, "require_approval": "never"}
    for url in mcp_urls
]

# ----------------------------
# DB init (cached)
# ----------------------------
@st.cache_resource
def get_cached_db(uri: str):
    d = get_db(uri or "mongodb://localhost:27017")
    ensure(d)
    return d

db = get_cached_db(mongo_uri.strip() or "mongodb://localhost:27017")

# ----------------------------
# Session
# ----------------------------
if "session_id" not in st.session_state:
    st.session_state.session_id = str(uuid.uuid4())
sid = st.session_state.session_id

# ----------------------------
# UI
# ----------------------------
st.title("🧠 AI agent (Mongo + MCP)")
st.caption("Chat + task state stored in MongoDB. MCP tools are connected via OpenAI tools.")

# Render history directly from Mongo
history = list(db.messages.find({"session_id": sid}, {"_id": 0}).sort("ts", 1))
for m in history:
    if m.get("role") in ("user", "assistant"):
        with st.chat_message(m["role"]):
            st.markdown(m.get("content", ""))

prompt = st.chat_input("Message…")
if prompt:

    with st.chat_message("user"):
        st.markdown(prompt)

    cfg = LLMConfig(model=os.environ.get("OPENAI_MODEL", "gpt-5-mini"))

    with st.chat_message("assistant"):
        text, debug = run_turn(
            db=db,
            session_id=sid,
            user_text=prompt,
            cfg=cfg,
            system_prompt="""
You are an AI agent with access to MCP tools.

RULES:
- If MCP tools are available, you MUST use them for tasks like browsing, scraping or research.
- Do NOT hallucinate tool results.
- Be concise and direct.
""",
            mcp_servers=mcp_servers,
            history_limit=20,
        )
        st.markdown(text)

    with st.expander("Inspector", expanded=False):
        st.write("MCP servers configured:", len(mcp_servers))
        tool_calls = debug.get("tool_calls") or []
        st.write("Tool calls captured:", len(tool_calls))
        st.json(tool_calls)

Usage

Once everything is in place, you can use the snippet below to launch the application.

streamlit run app.py

This screenshot shows our actual UI. On the left, you can see our sidebar with config variables as well as the chat UI to the right. In our prompt, we use /task to start a task and we tell the model to use its web access MCP tool to search for Data4AI. It proceeds with a search and provides a summary of the site.

Prompting the AI agent with /task to access Data4AI and provide a summary

At the end of the output, the model emits our task object.

AI agent updated the status and emitted the task object

Conclusion

AI model memory is a separate system component. It’s not a direct part of the AI model. Models generate their outputs based on your prompts. Chat and task persistence need to live outside the AI model.

Today, we built a memory backend using MongoDB. You can also connect models to external storage using vector databases.

Once you control the memory layer, you’re no longer dependent on a single provider’s session behavior.

Photo of Jake Nulty
Written by

Jake Nulty

Software Developer & Writer at Independent

Jacob is a software developer and technical writer with a focus on web data infrastructure, systems design and ethical computing.

221 articles Data collection framework-agnostic system design