Skip to main content

A Guide to LLM Grounding for AI Agents

This guide explains what LLM grounding is, why it matters, and how it helps reduce hallucinations with fresh external data. It also compares practical grounding methods, including RAG and CLI-based web data access, so you can pick the right approach for your infrastructure.
Author Jake Nulty
Last updated

In this guide, we’ll go over the basics of LLM grounding for AI agents. With grounding, we can provide external data to a model. This keeps outputs grounded in truth, even as the model gets older.

By the time you’re finished reading, you’ll be able to answer the following questions.

  • What is LLM grounding?
  • What methods of grounding are available when working with web data?
  • Which method is right for your infrastructure?

What exactly is LLM grounding?

We don’t usually think about it but LLMs are not fixed databases. They are predictive text algorithms. Their outputs are often predictable but they are not deterministic. No matter how well you curate your pipeline, the model’s training data is going to age. On top of that, no dataset is perfect.

Grounding helps address the following issues.

  • Hallucinations: Models hallucinate all the time. Most of the time, it’s harmless and can be corrected by simply altering your prompt. However, grounding provides additional data that helps guard against false outputs.
  • Obsolete information: Has an AI model ever corrected you, telling you that current events have not happened yet? Live data helps address this. The model can run a search to verify what you’ve told it.
  • Precision: A model can’t tell current weather conditions based on training data. To make real time decisions, models need real time data access.

Retrieval-Augmented Generation (RAG) is one of the most common methods of grounding. Unlike training, grounding injects fresh data into the model’s context window.

Grounding with shell access

First, we’ll look at a CLI-based approach for grounding. You can try this out using either Bright Data or Firecrawl. First, we’ll install Bright Data CLI. Then, we’ll run through the same process using the Firecrawl CLI.

These tools provide LLMs with shell access and skills they can use to collect data.

Bright Data CLI

To install the Bright Data CLI, run the npm command below. Notice that we install globally. With a global install, LLMs can call the package with simple shell commands.

npm install -g @brightdata/cli

You can verify the installation with the --version flag.

brightdata --version

Next, we need to authenticate. The login command allows you to open a browser and connect your API key to the CLI instance.

brightdata login

After you’ve logged in, you’ll see a verification similar to the one below. Your CLI is configured and you don’t need to manually handle API keys.

Successfully logged into Bright Data CLI

Skills can be added using brightdata skill add <name-of-skill>. Below, we add brightdata-cli to give our LLM access to the entire CLI.

brightdata skill add brightdata-cli

As you can see in the next screenshot, brightdata-cli has been installed for use by a variety of different LLMs.

Skills are installed

Now, it’s time to test. Here, we prompted Claude Code to run a search for the best ai data blogs. As you can see, Claude initiated the search using the CLI. The JSON was then piped into the shell for Claude to read.

Shell output from Bright Data skills

Claude then generates a list of blogs based on the information it read from the shell.

Claude's output after using Bright Data skills

Firecrawl CLI

To start, we need to install the Firecrawl CLI. Like the Bright Data CLI, we install it globally using npm.

npm install -g firecrawl-cli

You can verify the installation with --version.

firecrawl --version

We can use the login command to authenticate. Once again, we do not need to manually handle our API keys.

firecrawl login

When you’re finished logging in, Firecrawl will say you’re all set.

We've logged into Firecrawl using the browser

setup skills creates all the skills we need to use with a variety of LLMs.

firecrawl setup skills

When the setup is finished, the CLI will list the skills installed.

Skills are installed

Below, we tell Claude to use Firecrawl to find the best AI data blogs. The output gets piped into the shell so Claude can read it.

Output from Firecrawl skills

Claude then builds us a list of the best blogs using the data it read from the CLI.

Claude's output after using Firecrawl skills

Grounding with live web access

Now, let’s see what grounding looks like with live access to the web. Using MCP servers, AI agents get live web access through an API.

To add the Firecrawl and Bright Data MCP servers to Claude or Claude Code, open up your developer settings and click the “Edit Config” button.

Editing your Claude config

Then paste the following JSON snippet into the file. Save the file and restart Claude Code.

{
  "mcpServers": {
    "Bright Data": {
      "command": "npx",
      "args": [
        "mcp-remote",
        "https://mcp.brightdata.com/mcp?token=<your-bright-data-api-key>"
      ]
    },
    "firecrawl-mcp": {
      "command": "npx",
      "args": [
        "-y",
        "firecrawl-mcp"
      ],
      "env": {
        "FIRECRAWL_API_KEY": "<your-firecrawl-api-key>"
      }
    }
  }
}

Bright Data MCP

Now, we open up our project and tell Claude to perform the same search using Bright Data’s MCP server rather than the CLI.

Running the search using the Bright Data MCP

The results are slightly different. In this case, we’ve got a live datafeed going directly into the AI agent.

Results from the same search using the Bright Data MCP

Firecrawl MCP

Now, we’ll do the same with Firecrawl. As you can see, we also get live search results feeding into the AI agent.

Running the same search using the Firecrawl MCP

With Firecrawl, we received the same results using both the MCP and the search skill. However, the principle is still the same. When running a live search, results go straight from the API into the model context instead of getting piped into an output file or I/O shell.

Search results using the Firecrawl MCP

Key breakdown of approaches

Feature Without grounding CLI MCP
Setup None global install JSON config
Authentication None Browser login API key in config
Data freshness Training data only Shell or file output Live
LLM integration N/A Via shell skills Direct context injection
Best for Closed, simple tasks Offline pipelines, batch jobs Real-time agents

Conclusion

Without grounding, LLMs are severely limited. When your personal chatbot runs a search, this is grounding. Without grounding, AI assistants wouldn’t keep up with news, best practices or anything else in our constantly changing world.

LLM grounding is the basis on which real AI-powered applications are built. If your data pipeline relies on batch updates and reviewable files, a CLI tool is excellent for grounding. When your system relies on freshness up to the minute, MCP servers provide a better option by reducing latency and steps involved between the data source and the model ingesting it.

Photo of Jake Nulty
Written by

Jake Nulty

Software Developer & Writer at Independent

Jacob is a software developer and technical writer with a focus on web data infrastructure, systems design and ethical computing.

239 articles Data collection framework-agnostic system design