Why drift is a risk for web-fed AI
All software and frankly, all human creations degrade over time. Addressing these changes is the purpose of routine maintenance. When AI models shift from their expected outputs, we call this drift. When our data comes from the web, model drift becomes especially pronounced.
The web is constantly evolving. Just in the time you spent reading the paragraph above, terabytes (TB) have likely been added. News, social media and e-commerce sites are some of the quickest changing sites in the world. As our data changes, your original pretraining data doesn’t keep up. This is why AI models need frequent fine-tuning and sometimes even retraining.
As time goes by, your model’s training data becomes stale. In this guide, we’ll learn how to detect and address model drift caused by stale training data.

An AI model drifting out to sea
Types of drift and web data challenges
Model drift usually occurs in one of two areas: Data and concept. Eventually, your model is going to receive inputs that differ too drastically from its training data and outputs won’t be accurate. It’s also going see new datasets with relationships and patterns that it doesn’t understand.
On the web, today’s truths are different from yesterday’s truths. As the world and the internet change, your model’s understanding of truth becomes outdated.
Types of model drift
- Data drift: When input data distribution changes from the model’s initial training data, the model can’t generate accurate output. Here are some common causes of data drift.
- Website layout changes: The model no longer understands the site layout so it can’t find its data.
- New data types: Product categories, slang terms and unseen sentiment patterns can easily throw off a model’s output.
- Seasonal changes and unseen events: Imagine a model that predicts retail data. A blizzard comes unexpectedly. The model won’t predict customers loading up on ice melt and snow shovels.
- Concept drift: When relationships between datapoints and their labels change, models fail to see new patterns and inference suffers. Here are some examples that cause concept drift.
- Pattern failures: When a model is trained on old patterns, new patterns can throw the model off. A rate predictor based on last year’s sales campaigns doesn’t understand new consumer priorities.
- Sentiment analysis: Introduce new slang terms and the model won’t understand consumer tone. In the mid 2010s, “salty” began to mean irritated, upset or sour. Before that, good people were known as the “salt of the earth.”
Web data makes models drift harder
The internet is volatile and constantly changing. Depending on your target source, foundational data can change within days, hours or even minutes. Let’s pretend we train an AI model on bits and pieces of selective data from years past. The goal here is to show an absurd amount of model drift.
- In 2005, generative models existed, but practical usage was largely a thing of fiction.
- In 2015, global political sentiments were vastly different from how they are today.
- In 2017, Bitcoin at $17,000 was an all time high. By 2023, this price was considered an existential crisis.
Imagine training a model on AI use cases from 2005, political climates from 2015 and asset prices from 2017 or even 2023. The model would understand these concepts as the data was presented but real world changes would immediately render the model unusable.
Think of the following user prompts and outputs coming from an AI model in 2025.
- Where is AI currently being used?
AI models are used increasingly in dictation software and speech to text platforms. Autonomous robots are a thing of science fiction but one day we'll hopefully see them.
- What is the current outlook on the geopolitical climate?
US-China relations are better than they've ever been. Tensions in the Middle East have resolved since the end of the Iraq War.
- I bought some Bitcoin at $20,000, should I sell it now?
Having bought at $20,000, you immensely overpaid. Even at its all time high, Bitcoin was around $17,000.
In the examples above, we use extremely stale data to highlight the issues with model output. Years-old datasets can result in a model that’s completely out of touch with reality. Days-old datasets can result in misunderstanding of news and events. Hours-old datasets fail to reflect price and sentiment changes in current conditions.
Monitoring and detection workflow
Now that we understand what model drift is, let’s create some actual model drift so we can evaluate it. We’ll use pandas for basic data handling. Scikit-learn gives us the power to instantiate a model and create a new predictive dataset. Finally, we’ll use Evidently to identify real model drift.
Getting started
We’ll start by installing dependencies. We can install all of them using pip.
Install pandas
pip install pandas
Install scikit-learn
pip install scikit-learn
Install evidently
pip install evidently
The code
In our code below, we first use make_classification() to create some training data. Next, we use LogisticRegression() to create a simple classification model. We train the model using model.fit(). model.predict() creates a baseline or “gold standard” for our label column.
We then use make_classification() again to create a dataset simulating production drift. The keyword argument random_state=99 changes the distribution of data even before we manually inject the drift. Just to make our drift more noticeable, we inject a shift into the dataset with X_current[:, 0] += 2.5.
Finally, we create a Report object and run the report to obtain our result. We save our report results as an HTML file to view in the browser and we then use accuracy_score to generate a real score for the model.
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from evidently import Report
from evidently.presets import DataDriftPreset
#create some synthetic training data
X_base, y_base = make_classification(
n_samples=500, n_features=5, n_informative=3, n_redundant=0, random_state=42
)
baseline_df = pd.DataFrame(X_base, columns=[f"feature_{i}" for i in range(5)])
baseline_df["label"] = y_base
#train a model
model = LogisticRegression()
model.fit(baseline_df.drop(columns="label"), baseline_df["label"])
baseline_df["prediction"] = model.predict(baseline_df.drop(columns="label"))
#create production data with synthetic drift
X_current, y_current = make_classification(
n_samples=500, n_features=5, n_informative=3, n_redundant=0, random_state=99
)
#inject a shift to make the drift more pronounced
X_current[:, 0] += 2.5
#create a dataframe from our X_current
current_df = pd.DataFrame(X_current, columns=[f"feature_{i}" for i in range(5)])
current_df["label"] = y_current
current_df["prediction"] = model.predict(current_df.drop(columns="label"))
#create a drift report instance
drift_report = Report(metrics=[DataDriftPreset()])
#run the report
result = drift_report.run(reference_data=baseline_df, current_data=current_df)
#save the result
result.save_html("drift_report.html")
#compare our drifted dataset predictions against its ground truth labels
accuracy = accuracy_score(current_df["label"], current_df["prediction"])
print(f"Current accuracy: {accuracy:.2f}")
if accuracy < 0.85:
print("Model accuracy below threshold — retraining recommended.")
else:
print("Model accuracy is within acceptable range.")
print("Drift report saved as 'drift_report.html'. Open it in your browser to view details.")
The output
If everything went well, you’ll see output similar to what we have below. Our accuracy score was 0.34 and the script recommends that we retrain the model.

If you open up drift_report.html in your browser, you’ll see a full breakdown of the results. Everything is held together in a clean, tabular format. Each column of the training data is evaluated for drift. As you can see, label and feature_2 have no drift. There was significant drift detected in our prediction column. When we inject the drift into feature_0, it significantly impacted the prediction column because the two columns are tightly correlated.
Tools and automation for retraining/mitigation

When we created and ran our example, we ran a Python script manually to detect drift. In production, you can’t realistically watch for changes by hand. Once drift exceeds your threshold, your system should flag the issue, refresh the data and automatically schedule a retraining workflow that doesn’t disrupt everything else.
Monitoring and detection
- Evidently AI: Track data drift, prediction drift, schema changes and distribution shifts over time.
- Great Expectations: Check your data quality before it enters the pipeline. Find anomalies and identify model drift early before it corrupts your model output.
- TensorFlow Data Validation: Drift detection and data scale profiling for TensorFlow projects.
- Grafana/Prometheus: Use interactive dashboards for live monitoring of drift in your metrics and thresholds.
Data refresh and enrichment
- Bright Data: Run data collection on demand and at scale using prebuilt scrapers. Power your collection infrastructure with proxies and even use their MCP server to give AI agents autonomy when using these tools.
- Apify: A self-contained ecosystem where developers create and sell scraping tools.
- Firecrawl: Scrape and process unstructured web content into structured datasets. With their extract feature, you can also define custom schema.
- Oxylabs: Get proxy and scraping infrastructure for reliable data extraction.
Pipeline orchestration
- Apache Airflow: Programmatically author, schedule and monitor workflows. Automate your entire pipeline from a single interface.
- Dagster: Orchestrate your AI data with strong typing and monitor both data quality and lineage.
- Prefect: Build Pythonic workflows that respond to your data. With dashboards and alerts, you can see exactly why parts of your pipeline are breaking.
How it all fits together
Here’s how a typical drift mitigation setup fits together.
- Monitor: Evidently or TensorFlow Data Validation monitors incoming data and predictions in real time.
- Detect: When drift exceeds a defined threshold, an alert is raised via Grafana or Prometheus.
- Refresh data: The pipeline then calls Bright Data, Apify, Firecrawl or Oxylabs to gather updated web data.
- Validate: Great Expectations checks the new dataset for quality and consistency within your schema.
- Retrain: The orchestration tool (Airflow, Dagster or Prefect) kicks off a retraining job. Then it tests the retrained model and deploys it.
Drift detection isn’t just a report you run. It’s a trigger for a self-healing AI system that can adapt to ever evolving web data.
Real-world case scenarios
- E-commerce pricing: A price prediction model stops applying certain discounts after a retailer changes their category labels. Your system should flag the category change, trigger a collection and then retrain the model.
- Social sentiment: Imagine a new meme blows up and slang changes the meaning of a word. Your system should spot the conceptual drift, trigger a collection and then fine-tune the model for better understanding.
Final recommendations and best practices
AI models are only as good as their data. When dealing with web data, model drift is inevitable. The best systems can identify and react to model drift as it occurs.
Always make sure you have adequate monitoring, detection and training workflows to be proactive against model drift.