In this guide, we’ll go over the basics of LLM grounding for AI agents. With grounding, we can provide external data to a model. This keeps outputs grounded in truth, even as the model gets older.
By the time you’re finished reading, you’ll be able to answer the following questions.
- What is LLM grounding?
- What methods of grounding are available when working with web data?
- Which method is right for your infrastructure?
What exactly is LLM grounding?
We don’t usually think about it but LLMs are not fixed databases. They are predictive text algorithms. Their outputs are often predictable but they are not deterministic. No matter how well you curate your pipeline, the model’s training data is going to age. On top of that, no dataset is perfect.
Grounding helps address the following issues.
- Hallucinations: Models hallucinate all the time. Most of the time, it’s harmless and can be corrected by simply altering your prompt. However, grounding provides additional data that helps guard against false outputs.
- Obsolete information: Has an AI model ever corrected you, telling you that current events have not happened yet? Live data helps address this. The model can run a search to verify what you’ve told it.
- Precision: A model can’t tell current weather conditions based on training data. To make real time decisions, models need real time data access.
Retrieval-Augmented Generation (RAG) is one of the most common methods of grounding. Unlike training, grounding injects fresh data into the model’s context window.
Grounding with shell access
First, we’ll look at a CLI-based approach for grounding. You can try this out using either Bright Data or Firecrawl. First, we’ll install Bright Data CLI. Then, we’ll run through the same process using the Firecrawl CLI.
These tools provide LLMs with shell access and skills they can use to collect data.
Bright Data CLI
To install the Bright Data CLI, run the npm command below. Notice that we install globally. With a global install, LLMs can call the package with simple shell commands.
npm install -g @brightdata/cli
You can verify the installation with the --version flag.
brightdata --version
Next, we need to authenticate. The login command allows you to open a browser and connect your API key to the CLI instance.
brightdata login
After you’ve logged in, you’ll see a verification similar to the one below. Your CLI is configured and you don’t need to manually handle API keys.

Skills can be added using brightdata skill add <name-of-skill>. Below, we add brightdata-cli to give our LLM access to the entire CLI.
brightdata skill add brightdata-cli
As you can see in the next screenshot, brightdata-cli has been installed for use by a variety of different LLMs.

Now, it’s time to test. Here, we prompted Claude Code to run a search for the best ai data blogs. As you can see, Claude initiated the search using the CLI. The JSON was then piped into the shell for Claude to read.

Claude then generates a list of blogs based on the information it read from the shell.

Firecrawl CLI
To start, we need to install the Firecrawl CLI. Like the Bright Data CLI, we install it globally using npm.
npm install -g firecrawl-cli
You can verify the installation with --version.
firecrawl --version
We can use the login command to authenticate. Once again, we do not need to manually handle our API keys.
firecrawl login
When you’re finished logging in, Firecrawl will say you’re all set.

setup skills creates all the skills we need to use with a variety of LLMs.
firecrawl setup skills
When the setup is finished, the CLI will list the skills installed.

Below, we tell Claude to use Firecrawl to find the best AI data blogs. The output gets piped into the shell so Claude can read it.

Claude then builds us a list of the best blogs using the data it read from the CLI.

Grounding with live web access
Now, let’s see what grounding looks like with live access to the web. Using MCP servers, AI agents get live web access through an API.
To add the Firecrawl and Bright Data MCP servers to Claude or Claude Code, open up your developer settings and click the “Edit Config” button.

Then paste the following JSON snippet into the file. Save the file and restart Claude Code.
{
"mcpServers": {
"Bright Data": {
"command": "npx",
"args": [
"mcp-remote",
"https://mcp.brightdata.com/mcp?token=<your-bright-data-api-key>"
]
},
"firecrawl-mcp": {
"command": "npx",
"args": [
"-y",
"firecrawl-mcp"
],
"env": {
"FIRECRAWL_API_KEY": "<your-firecrawl-api-key>"
}
}
}
}
Bright Data MCP
Now, we open up our project and tell Claude to perform the same search using Bright Data’s MCP server rather than the CLI.

The results are slightly different. In this case, we’ve got a live datafeed going directly into the AI agent.

Firecrawl MCP
Now, we’ll do the same with Firecrawl. As you can see, we also get live search results feeding into the AI agent.

With Firecrawl, we received the same results using both the MCP and the search skill. However, the principle is still the same. When running a live search, results go straight from the API into the model context instead of getting piped into an output file or I/O shell.

Key breakdown of approaches
| Feature | Without grounding | CLI | MCP |
|---|---|---|---|
| Setup | None | global install | JSON config |
| Authentication | None | Browser login | API key in config |
| Data freshness | Training data only | Shell or file output | Live |
| LLM integration | N/A | Via shell skills | Direct context injection |
| Best for | Closed, simple tasks | Offline pipelines, batch jobs | Real-time agents |
Conclusion
Without grounding, LLMs are severely limited. When your personal chatbot runs a search, this is grounding. Without grounding, AI assistants wouldn’t keep up with news, best practices or anything else in our constantly changing world.
LLM grounding is the basis on which real AI-powered applications are built. If your data pipeline relies on batch updates and reviewable files, a CLI tool is excellent for grounding. When your system relies on freshness up to the minute, MCP servers provide a better option by reducing latency and steps involved between the data source and the model ingesting it.