AI agents are becoming central to how we automate research, analyze data and interact with the web. Yet most developers struggle to build AI Agents that do this successfully, especially when they’re faced with certain defenses that these websites deploy against automated traffic such as CAPTCHAs, rate limits, JavaScript challenges and IP-based blocking.
These challenges exist because AI agents do not naturally produce the subtle signals that human users generate. They often rely on static IPs or predictable headers. Furthermore, modern web architectures depend heavily on dynamic JavaScript rendering and complex navigation flows that simple HTTP libraries cannot execute.
When building our own agent infrastructure, we evaluated several approaches, from managing our own pool of rotating proxies to using headless browser clusters. We found that stitching together disparate tools for proxies, CAPTCHA solving and browser automation created a high operational burden.
To demonstrate this hybrid architecture, we evaluated several providers. We decided to use Bright Data for this implementation because their platform offers both an Unlocker API and a Browser API that can be integrated into a unified Model Context Protocol (MCP) server. We found that we could demonstrate the concept of dynamic tool switching without the overhead of integrating multiple providers.
Below is a guide on how to implement this architecture to keep your agent unblocked.
The two-tool architecture: Retrieval vs. Interaction
When architecting an AI agent, we identified two distinct types of web access requirements:
- Fast retrieval: Fetching static or lightly dynamic content (e.g., news feeds, product details).
- Complex interaction: Navigating multi-step flows (e.g., clicking through modals, handling infinite scrolls or pagination).
Attempting to use a single tool for both is inefficient. Full browsers are too expensive for simple text, and simple proxies fail at complex interactions. To solve this, we utilized two specific APIs that cover this spectrum.
Unlocker API: Fast, reliable content retrieval
For situations where our agent simply needed clean HTML or JSON from a public URL, we utilized the Unlocker API.
We selected this tool for the retrieval layer because it abstracts away the backend complexity. Instead of writing custom logic to rotate User-Agents or manage headers, the API handles IP rotation, cookie control and JavaScript rendering on the server side. It returns the final HTML directly.
Browser API: Complete interaction with dynamic websites
For workflows requiring actual user simulation, we switched to the Browser API. This tool provides a remote browser session compatible with libraries like Selenium and Playwright.
We found this necessary for sites built on frameworks like React or Vue, where the content only exists after client-side code executes. It allows the agent to perform “human” actions such as clicking buttons, scrolling to trigger lazy loading or selecting options in a UI, while maintaining a consistent fingerprint.
Decision framework: Cost vs. Capability
A major factor in our tool selection was the cost structure. Browser sessions are computationally expensive and cost significantly more than simple API requests.
Our rule of thumb:
- Default to the Unlocker API for low-cost, high-speed data extraction.
- Escalate to the Browser API only when the target page requires complex interaction or fails to render via standard requests.
MCP Server: Unified web access for AI agents
While having specialized tools is effective, managing the logic to switch between them manually can clutter the agent’s codebase. We explored various ways to orchestrate this and decided to implement the Model Context Protocol (MCP).
Bright Data provides a Web MCP Server that acts as a unified interface. Rather than hard-coding if/else logic to choose an API, the MCP server allows the AI agent to express intent via natural language.
We found that the MCP server successfully abstracted the complexity of tool selection:
- Search intent: If the agent asked to “find competitors,” the request was routed to a search-specific API.
- Retrieval intent: If the agent needed to “read text from a URL,” it routed to the Unlocker API.
- Action intent: If the agent needed to “click and navigate,” it spun up a Browser session.
This architectural choice decoupled our agent’s reasoning capabilities from the underlying scraping infrastructure, making the code more maintainable.
Implementation tutorial: Building the Agent
In this section, we will build a script capable of switching between simple data extraction and full browser automation. While we are using Bright Data for this implementation, similar patterns can be applied using other managed browser services such as Browserbase, ZenRows or ScrapFly.
Prerequisites
Before starting, ensure you have Python 3.8+ installed. You will need to install the following libraries:
pip install requests selenium beautifulsoup4
Using the Unlocker API
We used the Unlocker API to “read” web pages that blocked standard requests. To run this code, we first set up a “Web Unlocker” zone in the provider’s dashboard to generate our API token.
Setup: Getting Your Credentials
- Create a Zone: Log in to the Bright Data Dashboard, go to “My Zones” and select “Web Unlocker.” Name your zone (e.g., web_unlocker2) and save.

- Get Your API Token:
- Go to Settings > Account Settings > Users and API keys.
- Click “Add Token,” and copy the generated string.
In this example, the script sends a request to the API endpoint. The Unlocker API visits the target URL (in this case, a public Amazon product page), handles the anti-bot checks and returns the clean HTML.
import requests
from bs4 import BeautifulSoup
def fetch_product_page(product_url):
“””Fetch the product page HTML using Bright Data API”””
api_endpoint = “https://api.brightdata.com/request”
payload = {
“zone”: “web_unlocker2”,
“url”: product_url,
“format”: “raw”,
“method”: “GET”
}
headers = {
“Authorization”: “Bearer YOUR_GENERATED_API_TOKEN”,
“Content-Type”: “application/json”
}
try:
print(f”Agent is attempting to access: {product_url}”)
response = requests.post(api_endpoint, json=payload, headers=headers)
if response.status_code == 200:
html_content = response.text
if len(html_content) > 1000:
print(f”Success! Retrieved {len(html_content)} bytes of data.”)
return html_content
else:
print(“Warning: Retrieved empty or small response.”)
return None
else:
print(f”Failed with status code: {response.status_code}”)
print(f”Error message: {response.text}”)
return None
except Exception as e:
print(f”Error occurred: {e}”)
return None
def extract_product_data(html_content):
“””Extract price, title, and availability from Amazon product page HTML”””
soup = BeautifulSoup(html_content, ‘html.parser’)
product_data = {
‘title’: None,
‘price’: None,
‘availability’: None
}
# Extract Product Title
title_element = soup.find(‘span’, {‘id’: ‘productTitle’})
if title_element:
product_data[‘title’] = title_element.get_text().strip()
# Extract Price – try multiple methods as Amazon has various price formats
# Method 1: Look for priceToPay class
price_element = soup.find(‘span’, {‘class’: ‘priceToPay’})
if price_element:
price_whole = price_element.find(‘span’, {‘class’: ‘a-price-whole’})
price_fraction = price_element.find(‘span’, {‘class’: ‘a-price-fraction’})
if price_whole and price_fraction:
# Remove the decimal point from whole number and combine
whole = price_whole.get_text().replace(‘.’, ”).strip()
fraction = price_fraction.get_text().strip()
product_data[‘price’] = f”${whole}.{fraction}”
# Method 2: Alternative price location
if not product_data[‘price’]:
price_element = soup.find(‘span’, {‘class’: ‘a-offscreen’})
if price_element:
product_data[‘price’] = price_element.get_text().strip()
# Extract Availability
availability_element = soup.find(‘div’, {‘id’: ‘availability’})
if availability_element:
availability_span = availability_element.find(‘span’)
if availability_span:
product_data[‘availability’] = availability_span.get_text().strip()
return product_data
def print_product_info(product_data):
“””Print the extracted product information in a nice format”””
print(“\n” + “=”*50)
print(“PRODUCT INFORMATION”)
print(“=”*50)
print(f”Title: {product_data[‘title’] or ‘Not found’}”)
print(f”Price: {product_data[‘price’] or ‘Not found’}”)
print(f”Availability: {product_data[‘availability’] or ‘Not found’}”)
print(“=”*50 + “\n”)
# Main execution
if __name__ == “__main__”:
# Fetch the product page
url = “https://www.amazon.com/dp/B08FC5L3RG”
product_html = fetch_product_page(url)
if product_html:
# Extract product data
product_data = extract_product_data(product_html)
# Display the results
print_product_info(product_data)
# You can also access individual fields
# print(f”Just the price: {product_data[‘price’]}”)
else:
print(“Failed to retrieve product page.”)
When you run this script, here is the result you would get:
Using the Browser API
For tasks requiring interaction, such as filtering search results or handling pagination, we connected to the Browser API.
However, first you need to create a Browser API zone in the Bright Data dashboard.
Setup: getting your browser credentials
- Create a Zone: In the Bright Data Dashboard, select “Scraping Browser” under “My Zones.”

- Assign a name to your Browser API
- Click Add

- Click on your new zone.
- Look for the “Access Details” section:
- Note your Username (e.g., brd-customer-hl_xxxxx-zone-scraping_browser) and Password.
- Your host is typically brd.superproxy.io.

In this example, we use Selenium to connect to Browser API. The script will search for a product for example, “PS5,” waits for the dynamic results to load and clicks the first item.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
import time
# Bright Data Browser API endpoint
# REPLACE WITH YOUR CREDENTIALS
BROWSER_URL = “https://brd-customer-hl_c2fce8f9-zone-scraping_browser7:l30yay652npe@brd.superproxy.io:9515”
AMAZON_URL = “https://www.amazon.com/”
def setup_driver():
print(“Connecting to browser…”)
options = webdriver.ChromeOptions()
options.add_argument(“–no-sandbox”)
options.add_argument(“–disable-dev-shm-usage”)
options.add_argument(“–window-size=1920,1080”)
options.add_argument(“–lang=en-US,en;q=0.9”)
options.add_argument(“–user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36”)
driver = webdriver.Remote(
command_executor=BROWSER_URL,
options=options
)
print(“Connected!”)
return driver
def search_amazon(driver, search_query):
print(f”Searching for {search_query}…”)
search_url = f”https://www.amazon.com/s?k={search_query.replace(‘ ‘, ‘+’)}”
driver.get(search_url)
try:
WebDriverWait(driver, 20).until(
EC.presence_of_element_located(
(By.CSS_SELECTOR, “[data-component-type=’s-search-result’]”)
)
)
print(“Search results loaded!”)
return True
except Exception:
html = driver.page_source.lower()
if “robot check” in html or “captcha” in html:
print(“Amazon blocked the request – CAPTCHA detected.”)
else:
print(“Search results not detected – unexpected page layout.”)
return False
def click_first_product(driver):
print(“Looking for first product…”)
try:
# Find all search result blocks
products = driver.find_elements(
By.CSS_SELECTOR,
“div[data-component-type=’s-search-result’][data-asin]”
)
if not products:
print(“No valid product blocks found.”)
return False
# Amazon uses many different link structures → try all
title_selectors = [
“[data-cy=’title-recipe’] a”,
“h2 a”,
“h2 span a”,
“a.a-link-normal.a-text-normal”,
“a.a-link-normal.s-no-outline”,
“a[href*=’/dp/’]”, # fallback DP link
]
for index, product in enumerate(products):
print(f”Checking product block {index}…”)
for selector in title_selectors:
try:
link = product.find_element(By.CSS_SELECTOR, selector)
# Skip sponsored/empty links
href = link.get_attribute(“href”)
if not href or “/dp/” not in href:
continue
title = link.text.strip()
print(f”Found product link using selector: {selector}”)
print(f”Title: {title}”)
print(“Clicking…”)
driver.execute_script(“arguments[0].click();”, link)
WebDriverWait(driver, 15).until(
EC.presence_of_element_located((By.ID, “productTitle”))
)
print(“Product page loaded!”)
return True
except Exception:
continue
print(“No clickable product link found in any result block.”)
return False
except Exception as e:
print(f”Error clicking product: {e}”)
return False
def extract_product_data(driver):
“””Extract product data from the current page”””
print(“Extracting product data…”)
# Get page source and parse with BeautifulSoup
html = driver.page_source
soup = BeautifulSoup(html, ‘html.parser’)
product_data = {
‘title’: None,
‘price’: None,
‘availability’: None
}
# Extract Product Title
try:
title_element = driver.find_element(By.ID, “productTitle”)
product_data[‘title’] = title_element.text.strip()
except:
title_element = soup.find(‘span’, {‘id’: ‘productTitle’})
if title_element:
product_data[‘title’] = title_element.get_text().strip()
# Extract Price
try:
# Try finding price using Selenium first
price_element = driver.find_element(By.CSS_SELECTOR, “.priceToPay”)
product_data[‘price’] = price_element.text.strip()
except:
# Fallback to BeautifulSoup
price_element = soup.find(‘span’, {‘class’: ‘priceToPay’})
if price_element:
price_whole = price_element.find(‘span’, {‘class’: ‘a-price-whole’})
price_fraction = price_element.find(‘span’, {‘class’: ‘a-price-fraction’})
if price_whole and price_fraction:
whole = price_whole.get_text().replace(‘.’, ”).strip()
fraction = price_fraction.get_text().strip()
product_data[‘price’] = f”${whole}.{fraction}”
# Alternative price location
if not product_data[‘price’]:
price_element = soup.find(‘span’, {‘class’: ‘a-offscreen’})
if price_element:
product_data[‘price’] = price_element.get_text().strip()
# Extract Availability
try:
availability_element = driver.find_element(By.ID, “availability”)
product_data[‘availability’] = availability_element.text.strip()
except:
availability_element = soup.find(‘div’, {‘id’: ‘availability’})
if availability_element:
availability_span = availability_element.find(‘span’)
if availability_span:
product_data[‘availability’] = availability_span.get_text().strip()
return product_data
def print_product_info(product_data):
“””Print the extracted product information”””
print(“\n” + “=”*60)
print(“PRODUCT INFORMATION”)
print(“=”*60)
print(f”Title: {product_data[‘title’] or ‘Not found’}”)
print(f”Price: {product_data[‘price’] or ‘Not found’}”)
print(f”Availability: {product_data[‘availability’] or ‘Not found’}”)
print(“=”*60 + “\n”)
def main():
“””Main function to run the scraper”””
driver = None
try:
# Set your search query here
search_query = “PS5” # Change this to “laptop” or any other product
# Setup browser connection
driver = setup_driver()
# Search for product
if not search_amazon(driver, search_query):
print(“Failed to search Amazon”)
return
# Small delay to ensure page is fully loaded
time.sleep(2)
# Click first product
if not click_first_product(driver):
print(“Failed to click on first product”)
return
# Small delay for product page to load
time.sleep(2)
# Extract product data
product_data = extract_product_data(driver)
# Display results
print_product_info(product_data)
# Return data for further processing if needed
return product_data
except Exception as e:
print(f”An error occurred: {e}”)
finally:
if driver:
print(“Closing browser…”)
driver.quit()
print(“Browser closed!”)
if __name__ == “__main__”:
main()
Note: Replace the BROWSER_URL variable with your specific credentials formatted as: https://<username>:<password>@brd.superproxy.io:9515
Here is the response:
Using the MCP server
To unify these tools, we configured the MCP server in Claude Desktop. This allowed us to bypass writing the routing logic manually.
Here are the steps:
- Download and install Claude Desktop
- Get your API token:
- Go to Bright Data user settings.
- Copy your API token
- Configure your MCP server
- Open Claude Desktop
- Go to: Settings → Developer → Edit Config
- Add this to your claude_desktop_config.json:
{
“mcpServers”: {
“Bright Data”: {
“command”: “npx”,
“args”: [“@brightdata/mcp”],
“env”: {
“API_TOKEN”: “<replace_with_your_api_token”,
“WEB_UNLOCKER_ZONE”: “<replace_with_your_web_unlocker_zone>”,
“BROWSER_ZONE”: “<replace_with_your_browser_zone>”
}
}
}
}
- Next, save and restart the Claude desktop application.
Once configured, we could prompt the agent with natural language. Ask Claude:
Get the product title, price, and availability from this Amazon page:
Claude will attempt to get the result and will fail; it will then ask for your permission to use the agent you have configured. When you permit it, the agent will retrieve the information you requested from Amazon. The agent automatically determines when to use the browser for searching and when to use the API for extraction.

You can also try complex workflows.
Advanced Patterns: Optimizing performance
Through our testing, we identified several architectural patterns that improve reliability and reduce costs.
Pattern 1: Navigation and handoff (Cookie priming)
Complex sites often require setting specific session states, such as selecting a “delivery location” or dismissing a “first-time user” modal, before the correct data is displayed.
A cost-effective strategy is to use the Browser API only for the setup phase.
- Browser API: Navigates to the site, handles the modal/location selection and solves any initial CAPTCHAs.
- Extraction: The agent extracts the session cookies (session_id, preferences).
- Handoff: The agent passes these cookies to the cheaper Unlocker API to scrape the actual data pages.
Pattern 2: Soft-block detection
We found that a “200 OK” status code can be misleading. Sophisticated sites often return a successful HTTP status while serving a “Please verify you are human” page.
To prevent data poisoning, we implemented a validation layer:
def validate_response(html_content):
suspicious_phrases = [“verify you are human”, “access denied”, “please wait”]
if any(phrase in html_content.lower() for phrase in suspicious_phrases):
return False
return True
If validation fails, our logic triggers a retry with a new IP or escalates to the Browser API.
Pattern 3: The “Diet” Browser
When using the Browser API, we reduced bandwidth costs by intercepting and aborting requests for non-essential resources like high-res images, video ads and custom fonts. This reduced data usage by about 60% in our tests.
Real-world use case: Complex travel aggregation
This architecture can be tested in a real-world scenario, such as a travel analytics tool, where you would need to aggregate flight prices from a travel aggregator. Those travel aggregator sites require complex interactions: selecting “Multi-city” from a dropdown, choosing dates via a calendar widget and waiting for a dynamic list of results that loads via infinite scroll.
The Solution:
- Infrastructure: Deployed the Browser API to handle the UI interactions (calendar/dropdowns).
- Workflow: The agent connected to the remote browser, executed the search parameters and scrolled to load results.
- Result: The browser session successfully mimicked a human user’s journey, generating the necessary cookies and fingerprints to keep the session valid, which simple requests failed to do.
Alternative solutions and tool selection
While we chose Bright Data for this tutorial due to its unified MCP support, it is important to consider the other tools when selecting your stack.
- Open source frameworks (Selenium/Playwright): They are free, highly customizable, and have a large community. However, they require you to build and manage your own proxy infrastructure and unblocking logic. Scaling this locally often leads to immediate IP bans.
- Managed browser clouds (e.g., Browserbase): They are excellent for “browser-first” workflows with strong debugging tools. However, they often lack a dedicated, lightweight “Unlocker” equivalent for simple requests, which can make high-volume data extraction more expensive.
- Unblocking APIs (e.g., ZenRows): They are strong capabilities for retrieving static HTML without blocks. However, they may offer less granular control over complex browser orchestration compared to dedicated browser grids.
Conclusion
Building an unblocked AI agent is more about adopting the right architecture than about the AI itself. Through our implementation, we learned that distinguishing between simple extraction and complex interaction is important for building scalable AI agents.
By using a tiered approach, that is starting with lightweight APIs and then moving to full browsers only when necessary, developers can build agents that are resilient to modern web defenses without blowing their budget.