Skip to main content

Building scalable AI agents: The power of cloud browser infrastructure

Learn when AI agents need browser access, why local headless browsers fail under concurrency, and how cloud browser infrastructure enables production-grade agentic workflows
Author Jake Nulty
Last updated

In this guide, you’ll learn about headless browsers and how cloud browsers differ from traditional headless browsers when building agentic workflows.

By the time you’ve finished reading, you’ll be able to answer the following questions.

  • When do AI agents need browser access?
  • What are headless browsers?
  • How do cloud browsers help your AI agents execute at a production level?

Why AI agents break at scale

At a big enough scale, all software hits a hardware bottleneck. This is no secret. Picture the following scenario. You’ve got an agentic system handling hundreds of customers at once. The agent controls a headless browser running locally on your laptop. First, one customer needs it to execute an action. Then, another. This goes on until all of these hundreds of customers are awaiting different tasks.

Meanwhile, the system has two options.

  • Launch hundreds of browsers simultaneously and crash the computer
  • Handle the customers using blocking and synchronous logic — slowing the system to a crawl.

Neither of the options above is actually acceptable for customer facing software. With proper cloud browser infrastructure, these bottlenecks don’t happen because the system scales as your usage does.

What modern AI agents actually do

Before we talk about browser infrastructure, we need to zoom out and take a look at the bigger picture of what AI agents actually do. Here’s the basic loop that an AI agent runs on.

  1. Interpret data — depending on the system, this comes from either a prompt or an automated data feed.
  2. Decide the proper course of action based on the given data.
  3. Create inputs to call a tool.
  4. Extract outputs from the tool.
  5. Go back to step one.

This loop does not end until the program shuts off. Let’s take a look at the workflow for a news agent. The workflow follows the same basic steps.

  1. Interpret data: Read the user prompt and decide what the user’s actual goal is.
  2. Determine a course of action: Break the user’s goal into steps.
  3. Create inputs to call a tool: In this case, the agent would likely extract URLs and keywords from the user’s prompt.
  4. Extract the tool output: The AI agent pipes the news headlines back to the user.
  5. Back to step one: Wait for the next user input.

Even as the workflow increases in complexity, the loop remains the same. Here’s what it looks like when using AI for trading. Even without executing trades, our agent needs to do the following just to flag different assets for us.

  1. Interpret data: Read incoming data from an API or websocket connection.
  2. Determine a course of action: Decide which assets to look into more deeply.
  3. Create inputs to call a tool: Call an external tool. In this case it’s likely a regression model.
  4. Extract the tool output: Output the projected price of certain assets.
  5. Back to step one: Interpret the most recent update to the data feed.

As complexity increases, so do hardware requirements. For lightweight cases, the news agent and the trading advisor can use static web pages. When sites require browser interaction or JavaScript rendering, the AI agent needs a browser. When an agent needs to perform deep research on news stories or financial assets, a browser becomes a necessary tool.

This is true for most agentic workflows, not just the ones listed above. Humans need browser access for complex web tasks. AI agents do too.

Why local headless browsers don’t scale

The first portion of the solution is a headless browser. Using a headless browser, teams can write scripts and control a real browser from their development environment. Your AI agents can even control them using integration layers like LangChain.

Headless browsers provide the following benefits to developers and AI models.

  • Resource control: Using a headless browser, developers (or AI models) can turn rendering on or off. This saves on RAM and processing power.
  • Standardized access: Chrome DevTools Protocol and Selenium’s WebDriver API give AI models a standardized way to communicate with the tool and perform tasks.
  • JavaScript rendering: On many sites, content is rendered dynamically. In order for this to work, JavaScript code needs to be executed within the environment.

That being said, running a headless browser on a local machine re-introduces several problems that cloud software has already solved.

  • Uptime: Imagine your headless browser is running on a home server or a Pi cluster. A spill or power outage can produce unnecessary downtime.
  • Resource limitations: Each browser instance eats RAM and processing power. There’s no preventing this. Eventually, the local machine cannot realistically support the workload.
  • Automated access difficulties: CAPTCHAs and IP blocking can derail even the best data systems.

In all of these cases, the local environment produces or magnifies the real issue. Uptime is threatened by day to day life. Resources are limited by the hardware inside the building — whenever you bottleneck, you need to purchase new hardware and migrate your applications. Site blocks, especially CAPTCHAs, are a holdover from a different paradigm — today, AI agents are everywhere and they need redundant access to execute tasks.

What cloud browser infrastructure is

Cloud infrastructure revolutionized the web over the last two decades. In the 1990s and early 2000s, it was a common practice to host infrastructure and early webapps using an on-site server. This introduced the same vulnerabilities listed above. Cloud infrastructure is designed to prevent many of the failure modes that come from local hosting.

When we run a headless browser in the cloud, our browser inherits the benefits that the rest of software gained from cloud infrastructure. Systems provide reliable uptime, scalable resources and in most cases better access through managed solutions. Solutions like Hyperbrowser and Bright Data allow users to manage concurrent browsing sessions remotely.

  • Uptime: Most cloud providers give you very close to 100% uptime. A power outage doesn’t take down the system. When outages happen, they’re typically expected and teams are alerted beforehand so they can make preparations.
  • Resources: Most modern cloud browsers are capable of horizontal scaling. Many companies will offer 20, 30 or even 100 concurrent sessions. Bright Data even offers unlimited concurrency for their Scraper API.
  • Automated access: Cloud browser providers almost all run on top of proxy networks by default. This reduces roadblocks and helps teams zero in on geo-sensitive data. In many cases, providers also offer CAPTCHA solving or CAPTCHA avoidance.

Cloud browsers provide AI agents with stable access, scalable hardware and reliable access using proxies and web unlocking.

A small test of both architectures

Now, let’s make a simple AI agent to demonstrate what we’ve been talking about. We’re not going to use a full agentic loop because our agent here only needs to accomplish a single task — opening 50 browsers at once. First, we’ll make it do so using a local installation of Playwright. Afterward, we’ll use cloud browser infrastructure.

The initial setup

We’ll begin by installing dependencies.

pip install openai playwright
playwright install chromium

In some cases, particularly on Windows, teams may need to use the command below to properly install Chromium.

python -m playwright install chromium

Now, we’ll begin our basic setup. We use asyncio to support asynchronous operations. The openai and playwright packages give us access to the APIs we’ll need. Our prompt is simple. We really just want the AI agent to operate browser instances at scale so we can see how the system behaves.

A couple of notes that you should pay attention to. Remember when using the credentials for the websocket connection and for OpenAI to swap the placeholders with your own. Also, we have two BRIGHTDATA_WSS variables and one is commented out — this is an intentional choice so we can toggle between local and remote hosting.

import asyncio
from openai import OpenAI
from playwright.async_api import async_playwright

MODEL = "gpt-4.1-mini"
BRIGHTDATA_WSS = "wss://brd-customer-<your-username>-<your-zone-name>:<your-password>@brd.superproxy.io:9222"
#BRIGHTDATA_WSS = None
client = OpenAI(api_key="<your-openai-api-key>")

PROMPT = """
Open 10 browser instances to confirm the title of example.com.

Return JSON only with:
- count (int)
- url (string)

Example:
{"count":10,"url":"https://example.com"}
""".strip()

The programming logic

Here’s where the actual work gets done. With agent_plan() lets our agent read and interact with the prompt. Our open_one() function simply opens a browser and extracts the title from the page. Inside our main(), the agent interprets the prompt and decides how many instances need to be opened. The agent outputs the number and we run open_one() that many times.

For example, in our setup code, we ask the agent to open 10 instances. The agent then inputs 10 as our count. The program then reads the count and opens that many browser instances simultaneously.


def agent_plan():
    r = client.chat.completions.create(
        model=MODEL,
        temperature=0,
        response_format={"type": "json_object"},
        messages=[{"role": "user", "content": PROMPT}],
    )
    return json.loads(r.choices[0].message.content)

async def open_one(i: int, url: str):
    async with async_playwright() as p:
        if BRIGHTDATA_WSS:
            browser = await p.chromium.connect_over_cdp(BRIGHTDATA_WSS)
            via = "brightdata"
        else:
            browser = await p.chromium.launch(headless=True)
            via = "local"

        context = await browser.new_context()
        page = await context.new_page()
        await page.goto(url, wait_until="domcontentloaded")
        title = await page.title()
        await context.close()
        await browser.close()
        
        return title, via

async def main():
    plan = agent_plan()
    count = plan["count"]
    url = plan["url"]

    start = time.perf_counter()

    results = await asyncio.gather(
        *[open_one(i, url) for i in range(count)]
    )

    elapsed = time.perf_counter() - start

    titles = {t for t, _ in results}
    via = results[0][1]

    print(f"mode={via}")
    print(f"instances={count}")
    print(f"unique_titles={titles}")
    print(f"elapsed_seconds={elapsed:.2f}")

Full code

import os
import json
import time
import asyncio
from openai import OpenAI
from playwright.async_api import async_playwright

MODEL = "gpt-4.1-mini"
BRIGHTDATA_WSS = "wss://brd-customer-<your-username>-<your-zone-name>:<your-password>@brd.superproxy.io:9222"
#BRIGHTDATA_WSS = None
client = OpenAI(api_key="<your-openai-api-key>")

PROMPT = """
Open 10 browser instances to confirm the title of example.com.

Return JSON only with:
- count (int)
- url (string)

Example:
{"count":10,"url":"https://example.com"}
""".strip()

def agent_plan():
    r = client.chat.completions.create(
        model=MODEL,
        temperature=0,
        response_format={"type": "json_object"},
        messages=[{"role": "user", "content": PROMPT}],
    )
    return json.loads(r.choices[0].message.content)

async def open_one(i: int, url: str):
    async with async_playwright() as p:
        if BRIGHTDATA_WSS:
            browser = await p.chromium.connect_over_cdp(BRIGHTDATA_WSS)
            via = "brightdata"
        else:
            browser = await p.chromium.launch(headless=True)
            via = "local"

        context = await browser.new_context()
        page = await context.new_page()
        await page.goto(url, wait_until="domcontentloaded")
        title = await page.title()
        await context.close()
        await browser.close()
        
        return title, via

async def main():
    plan = agent_plan()
    count = plan["count"]
    url = plan["url"]

    start = time.perf_counter()

    results = await asyncio.gather(
        *[open_one(i, url) for i in range(count)]
    )

    elapsed = time.perf_counter() - start

    titles = {t for t, _ in results}
    via = results[0][1]

    print(f"mode={via}")
    print(f"instances={count}")
    print(f"unique_titles={titles}")
    print(f"elapsed_seconds={elapsed:.2f}")

if __name__ == "__main__":
    asyncio.run(main())

10 browsers

First, we’ll try this script using the 10 browsers we initially wrote into the prompt. Below are screenshots of both outputs. For full transparency, these tests were all ran using an HP Omnibook X 14. It has a 10-core Snapdragon X Elite processor and 16GB of RAM.

10 local browsers

In this case, we ran title extraction using 10 instances in 6.09 seconds. This is pretty efficient.

10 cloud browsers

Our cloud browsers performed similarly. In total it took 6.97 seconds for the operation to complete.

100 browsers

With 100 browsers, we’re really beginning to scale the load. This is where the faults of local hardware begin to show.

As mentioned, to change the browser count, we just change the prompt.

PROMPT = """
Open 100 browser instances to confirm the title of example.com.

Return JSON only with:
- count (int)
- url (string)

Example:
{"count":100,"url":"https://example.com"}
""".strip()
100 local browsers

This time, it took 32.21 seconds to run 100 title extractions. The machine didn’t seem to struggle much but the time taken was noticeable.

100 cloud browsers

In contrast, the cloud browser infrastructure handled this with no issues at all. In total, it only took 10.47 seconds — three times as fast as the local instance.

200 browsers

200 title extractions makes this undeniable. This time, the local browsers couldn’t even finish. Within minutes, the fans were screaming and the cursor would no longer work. After multiple failed attempts and one full-on hardware crash, we gave up trying — at one point, we waited over 9,000 seconds (2.5 hours) and the operation still could not finish.

200 cloud browsers

Using 200 cloud browsers, the operation finished in 36.55 seconds. The cloud browser infrastructure didn’t break a sweat as we scaled this operation.

Realistic outcomes when AI agents use cloud browsers

When AI agents are powered by cloud browser infrastructure, the biggest change isn’t measured in terms of raw performance. You’re replacing a fragile environment with one made for production grade runtimes.

Concurrency

This is the most immediate outcome. AI agents no longer need to manage multiple browser sessions on a strained machine. The system assisting hundreds of customers no longer needs to choose between crashing the program or slowing to a crawl. Shared browser state is no longer a bottleneck. Each request or prompt from your users can be handled in a separate hardware environment independent of other tasks.

Recoverable failure

When a browser and agent run locally, an application crash can often take the agent down along with it. If state is corrupted, this often means resetting the entire system. If your agent is running on this same machine, the agent also needs to restart. When the AI model and its tools run as separate cloud pieces, the tool doesn’t take down that agent and the agent does not take down the tool.

Monitoring

Both cloud browser providers and AI model providers give you access to dashboards, monitoring and often alerts even at lower usage tiers. With effective monitoring, you can calculate usage and predict scaling costs with real world data. When outages are expected, you get advanced warning so you can make appropriate preparations. Rather than scrambling when something breaks, you can handle this as an operational decision by integrating a backup provider or informing users of scheduled downtime.

Infrastructure shapes intelligence

AI agents don’t run into scaling issues because of model performance. If an environment wasn’t designed to handle a production workload, it’s naturally going to have difficulties when it meets production level demand. This is simple math.

Think back to the AI agent handling hundreds of customers from a browser hosted on a laptop. As soon as the application becomes successful, it demands an architecture change. This isn’t inherently wrong but it introduces real bottlenecks when demand hits.

This is architectural reality. Teams can keep patching a local setup that was never designed for scale or they can deploy AI agents using tools and infrastructure that absorb and sustain growth.

Cloud browsers provide AI agents with stable environments built for production software.

Photo of Jake Nulty
Written by

Jake Nulty

Software Developer & Writer at Independent

Jacob is a software developer and technical writer with a focus on web data infrastructure, systems design and ethical computing.

214 articles Data collection framework-agnostic system design