Skip to main content

AI for e-commerce: Leveraging web data for competitive analysis and product intelligence

Explore how you can use an AI agent with live web data to perform product analysis for E-commerce

Why web data matters for AI e-commerce

The modern e-commerce industry is reactive. Supply changes, competitor pricing and environmental factors impact pricing. These factors are the driving forces behind algorithmic pricing. Retailers like Amazon often change prices based on location, supply and competitor activity — sometimes as often as every 15 minutes.

By combining an AI agent with a data pipeline, we can analyze product data in real time. If an agent can reference historical data, it can quickly identify trends. This architecture lets your company react immediately to ever-changing market conditions.

In this article, we’ll explore how you can use an AI agent with live web data to perform product analysis. A loss leader is a product sold at a loss to get customers in the door. We’ll even build a small agent that identifies loss leaders for competitive analysis.

AI for e-commerce header image.

Why would I want to know which products are loss leaders?

Loss leaders are risky. You can buy out competitors or raise prices to meet the demand surge when they run out of product.

Data sources and extraction strategies (APIs, browser, scraping and marketplaces)

Data comes in all shapes and sizes. You might find fair market pricing from an API feed or you might even calculate it yourself — this is often the better decision. You might use a headless browser like Playwright or Selenium or static parsing with BeautifulSoup.

  • API Feeds: Many providers will sell data through an Application Programming Interface (API). Your pipeline receives structured JSON data — often clean and ready for use.
  • Browser-Based Scraping: We mentioned headless browsers above. There are also point-and-click tools and even fully AI-controlled browsers like Firecrawl.
  • Scraping: Traditional scrapers use a static parser to extract data from a web page. You can get your data without the resource overhead of high cost of other options.
  • Marketplaces: Not everyone’s a developer and not everyone needs to know how to scrape. Providers like Bright Data offer real time data and historical datasets ready for use out of the box.

Setting up the workflow: tools and key steps

“90% of the game is half mental.”
— John Madden

Before you build an AI agent for competitive analysis, you need to understand the workflow. Without proper planning, every project is doomed to fail.

1. Define your objective

What does the agent need to do? When the workflow finishes running, what happens?

Basic e-commerce scraping workflow
Basic e-commerce scraping workflow

Our workflow in this case is going to be pretty simple. We need a function that retrieves Amazon listings. Then, our AI agent needs access to the function. Afterward, it will generate a report. After reading the report, we can adjust the pricing at our hypothetical store.

2. Access the data

You need the right tools to meet your objective. Does this require a browser? An API feed? Traditional scraping? Do you plan on obtaining datasets?

In our project, we need the following tools.

  • Python Requests
  • BeautifulSoup
  • LangGraph
  • LangChain’s OpenAI extension
  • A proxy provider or web unblocker

First, install our dependencies.

pip install requests beautifulsoup4 langchain-openai langgraph

To access our data, we’ll write a small function.

proxy = "http://<your-proxy-username>:<your-password>@<your-proxy-domain>:<your-port-number>"
proxies = {
    "http": proxy,
    "https": proxy
}

def get_amazon_page_text(url: str) -> str:
    """Fetch visible text from an Amazon search or product page."""
    try:
        res = requests.get(url, proxies=proxies, verify=False, timeout=15)
        res.raise_for_status()
        soup = BeautifulSoup(res.text, "html.parser")
        for tag in soup(["script", "style", "noscript"]):
            tag.decompose()
        return soup.get_text(separator="\n", strip=True)
    except requests.RequestException as e:
        return f"Error fetching page: {str(e)}"

If you need a proxy provider, take a look at these vendors.

3. Automate the process

Now, we need to actually automate this process. If we wanted to change our pricing every 15 minutes, we’d use a while True loop. Since this is just a demonstration, we’ll write a simple top-down script.

import os
import requests
from bs4 import BeautifulSoup
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from langgraph.graph import StateGraph, START, END
from langgraph.prebuilt import ToolNode, tools_condition
from langgraph.graph.message import MessagesState

os.environ["OPENAI_API_KEY"] = "your-openai-api-key"

proxy = "http://<your-proxy-username>:<your-password>@<your-proxy-domain>:<your-port-number>"
proxies = {
    "http": proxy,
    "https": proxy
}

#wrap our function as a tool
@tool
def get_amazon_page_text(url: str) -> str:
    """Fetch visible text from an Amazon search or product page."""
    try:
        res = requests.get(url, proxies=proxies, verify=False, timeout=15)
        res.raise_for_status()
        soup = BeautifulSoup(res.text, "html.parser")
        for tag in soup(["script", "style", "noscript"]):
            tag.decompose()
        return soup.get_text(separator="\n", strip=True)
    except requests.RequestException as e:
        return f"Error fetching page: {str(e)}"

#create the agent and bind it to the tools
tools = [get_amazon_page_text]
llm = ChatOpenAI(model="gpt-4o", temperature=0).bind_tools(tools)

#create a state class
class AgentState(MessagesState):
    pass

#create a chatbot function
def chatbot(state: AgentState):
    return {"messages": llm.invoke(state["messages"])}

#create the agent graph
graph = StateGraph(AgentState)
graph.add_node("chatbot", chatbot)
graph.add_node("tools", ToolNode(tools=tools))
graph.add_edge(START, "chatbot")
graph.add_conditional_edges(
    "chatbot",
    tools_condition,
    {
        "tools": "tools",
        END: END
    }
)
graph.add_edge("tools", "chatbot")

#compile the graph
app = graph.compile()

#invoke the agent
url = "https://www.amazon.com/s?k=laptop"
query = f"Go to {url} and tell me the top 5 best laptop deals. Look for big discounts or potential loss leaders."
result = app.invoke({"messages": [{"role": "user", "content": query}]})

#print the results
print("=== Agent Output ===")
print(result["messages"][-1].content)

We now have a full fledged agent with access to our scraping tool.

4. Clean and validate the data

Next, we need to clean and validate our output data from the agent. Make sure that this data will fit within your system and that noise has been removed. In our case, the AI agent removes all noise and prints a simple summary for us to read.

=== Agent Output ===
Here are the top 5 laptop deals on Amazon based on significant discounts or potential loss leaders:

1. **15.6 Inch Laptop with Office 365**
   - **Price:** $209.99
   - **Typical Price:** $619.99
   - **Discount:** Significant reduction from the typical price.
   - **Specs:** 4GB RAM, 128GB Storage, Windows 11.

2. **HP 14 Laptop**
   - **Price:** $173.40
   - **List Price:** $229.99
   - **Discount:** Reduced from the list price.
   - **Specs:** Intel Celeron N4020, 4GB RAM, 64GB Storage, Windows 11 Home.

3. **HP 2025 14 inch HD Laptop**
   - **Price:** $358.00
   - **Typical Price:** $1,299.00
   - **Discount:** Huge discount from the typical price.
   - **Specs:** Intel Processor N150, 16GB RAM, 384GB Storage, Windows 11 Pro.

4. **ACEMAGIC Laptop**
   - **Price:** $379.99
   - **Typical Price:** $1,399.99
   - **Discount:** Large discount from the typical price.
   - **Specs:** Intel Quad-Core Processor, 16GB DDR4, 512GB SSD, Windows 11.

5. **HP Stream 14" HD BrightView Laptop**
   - **Price:** $262.65
   - **List Price:** $399.00
   - **Discount:** Reduced from the list price.
   - **Specs:** Intel Celeron N4120, 16GB RAM, 288GB Storage, Windows 11 S.

These deals offer substantial savings compared to their typical or list prices, making them attractive options for budget-conscious buyers.

If you’re familiar with the laptop market, you should spot that our agent has made an error. If you’re not familiar, we’ll point it out anyway.

Laptops 1, 3 and 4 from our list all have discrepancies.

Laptop #1

1. **15.6 Inch Laptop with Office 365**
   - **Price:** $209.99
   - **Typical Price:** $619.99
   - **Discount:** Significant reduction from the typical price.
   - **Specs:** 4GB RAM, 128GB Storage, Windows 11.

The discrepancy here isn’t all that much. A little over 50% off. This is could be a real listing, although the typical price is pretty inflated.

Laptops 3 and 4

Both of these machines are listed at nearly $1,000 off. This is possible, but highly unlikely. These listings need to be either double checked or thrown out.

3. **HP 2025 14 inch HD Laptop**
   - **Price:** $358.00
   - **Typical Price:** $1,299.00
   - **Discount:** Huge discount from the typical price.
   - **Specs:** Intel Processor N150, 16GB RAM, 384GB Storage, Windows 11 Pro.

4. **ACEMAGIC Laptop**
   - **Price:** $379.99
   - **Typical Price:** $1,399.99
   - **Discount:** Large discount from the typical price.
   - **Specs:** Intel Quad-Core Processor, 16GB DDR4, 512GB SSD, Windows 11.

5. Integrate with your system

Our remaining data needs to be integrated with our overall system. In most cases, our data report would come in JSON form or something similar — a portable format easily read by our software.

We’ve been using Markdown simply because it’s easier for you to read. However, here’s what it would look like as JSON — this is what your server would use to render a site or run a pricing algorithm.

{
  "top_laptop_deals": [
    {
      "title": "15.6 Inch Laptop with Office 365",
      "price": "$209.99",
      "typical_price": "$619.99",
      "discount": "Significant reduction from the typical price.",
      "specs": "4GB RAM, 128GB Storage, Windows 11"
    },
    {
      "title": "HP 14 Laptop",
      "price": "$173.40",
      "list_price": "$229.99",
      "discount": "Reduced from the list price.",
      "specs": "Intel Celeron N4020, 4GB RAM, 64GB Storage, Windows 11 Home"
    },
    {
      "title": "HP Stream 14\" HD BrightView Laptop",
      "price": "$262.65",
      "list_price": "$399.00",
      "discount": "Reduced from the list price.",
      "specs": "Intel Celeron N4120, 16GB RAM, 288GB Storage, Windows 11 S"
    }
  ]
}

When structured as JSON, our data’s ready to feed into a site server for rendering or anywhere else we want to put out data. Perhaps we even leave it in this format and feed it to another agent for price adjustment.

Practical use cases: price, review, inventory, market monitoring, agent tasks

AI agents give you access to real time data spanning key areas of e-commerce.

  • Price tracking: Monitor competitor pricing and react in real time.
  • Sentiment analysis: Monitor consumer reviews and sentiment across products and competitors.
  • Inventory checks: Learn what your competitors have in stock to anticipate shortages and adjust for surge pricing.
  • Market trends: Identify consumer trends early to reveal which products you need to push harder upfront.
  • Agent tasks: When you have agents that can update frontend datafeeds or pricing, you can achieve a near autonomous pricing strategy.

Agent automation examples: browser-based action flows, dynamic checks

Once your agent can interact with tools, it’s time to start chaining actions into a flow. Browser automation lets your agent simulate real user behavior.

  • Cart checks: Add items to a cart and verify their pricing after shipping and promotions have been applied.
  • Competitor monitoring: Browse through listings to identify which competitors have the strongest sales and happiest customers.
  • Loss leader identification: By identifying loss leaders, you can react strategically, whether that means matching the offer, raising your own price or buying them out.

Tips for scaling, error handling and workflow maintenance

  • Scale your requests: Tools like asyncio and concurrent let you scale your HTTP traffic by making multiple requests simultaneously.
  • Retry and fallbacks: Error handling is paramount. Eventually, something is going to fail. Your agent should retry and implement fallback plans if the failure is persistent.
  • Modularity: Break your software (including your AI agent) into smaller pieces. When you change your proxy URL, it shouldn’t break the frontend of a site or take out your pricing system.

AI for e-commerce isn’t slowing down

Your data strategy shouldn’t either. With AI agents and real time data, your business can react to market shifts as they occur. These insights give you the keys to anticipate and outmaneuver your competitors.

In the 2010s, Agile completely reshaped the way we think of software. In 2025, AI agents are reshaping competitive analysis in the same way.