Scrapling is a new framework for scraping the web. It was initially launched in 2024. In this guide, we’ll walk through how to install and use Scrapling for both minimal prototypes and scraping at scale.
By the time you’ve finished reading, you’ll be able to answer the following questions.
- How do I fetch sites using Scrapling?
- How do I extract data using Scrapling?
- How do Scrapling spiders make it easier to scrape at scale?
What is Scrapling?
Scrapling is a web scraping framework that borrows from all of the major frameworks the industry has used in the past. It has a similar feel to Scrapy with less of a learning curve. With its project based architecture, Scrapy is a powerhouse for the older generation of extraction frameworks. However, with hardcoded middlewares and settings, its own set of custom shell commands and a highly nested project layout, Scrapy introduces complexities that aren’t present in other frameworks.
Scrapling borrows major concepts from Scrapy like spiders and asynchronous execution and packages them in a more modern and intuitive library. Using Scrapling, you can write async scrapers and run your spiders without the difficulty of mastering an entire framework, something we are required to do with Scrapy.
With Scrapling, you can create and run spiders with just a single file rather than setting up a nested project folder. Take a look at some of Scrapling’s features below.
- Fetching pages: Users can fetch pages using traditional HTTP requests or they can choose to render a headless browser.
- Parsing: Scrapling supports a variety of parsing methods with support for widely used standards such as CSS and XPath. It also supports text based parsing.
- Spiders: As we mentioned above, Scrapling lets you write spiders the way you would when using Scrapy. Spiders make it much easier when scraping at scale.
- Output: Users can output their scraped data straight to a JSON file using the
.to_json()method on spider results. - Proxy integration: Scrapling offers first class support for proxy integration and even comes with a builtin proxy rotator.
Getting started
Users should make sure to have an installation of Python 3.10 or higher. That is the only requirement. Basic familiarity with Python helps but is not required.
First, we need to create a new project folder.
mkdir scrapling_demo
cd scrapling_demo
Now, we’ll create a new virtual environment.
python -m venv .venv
Activate the environment. Use the snippet below for macOS/Linux.
source .venv/bin/activate
Alternatively, you can use this command for Windows.
..venvScriptsActivate.ps1
Install Scrapling.
pip install scrapling[all]
Parsing
Before we write a parser, we need to examine the page. At Books to Scrape, each book is inside an article element and uses a CSS class called product_pod.

Scrapling allows us to extract all of these products using a simple CSS selector. You can find a full list of Scrapling’s parsing options here. To make HTTP requests, we can use Scrapling’s Fetcher class. We then use .get() to make a GET request — similar to Python Requests.
from scrapling import Fetcher
page = Fetcher.get("https://books.toscrape.com")
books = page.css("article.product_pod")
for book in books:
print(book.get_all_text(), "n----------------")
Here’s what our output looks like at this point. It’s not perfect but we are getting each product and printing its data to the terminal. We need to add structure to these extracted objects.

To finish writing our parser, we need to extract individual details from each book and build a structured object. Take a look at how we’re pulling the data from each book.
.css(): This method takes in a CSS selector and returns an array of objects matching the selector. If only one element is found, it still comes in an array. Each book only has oneanested within anh3element. However, we still call the element withbook.css("h3 > a")[0]..attrib.get():.attribtells Scrapling that we want to access an element’s attributes. We use.get()to extract the value of an attribute..get_all_text(): Returns all of the text within an element.
from scrapling import Fetcher
page = Fetcher.get("https://books.toscrape.com")
books = page.css("article.product_pod")
for book in books:
product_link_element = book.css("h3 > a")[0]
title = product_link_element.attrib.get("title")
product_url = product_link_element.attrib.get("href")
image = book.css("img")[0].attrib.get("src")
price = book.css("p.price_color")[0].get_all_text()
availability = book.css("p.instock.availability")[0].get_all_text().strip()
parsed_book = {
"title": title,
"product_url": product_url,
"image": image,
"price": price,
"availability": availability
}
print(parsed_book, "n----------------")
Take a look at the output from the parser now. Each product is now a structured object with fields that hold our extracted data. This object is similar to what you’d get when using a JSON API.

Headless browsing
Switching from HTTP requests to a headless browser is very straightforward. Just replace Fetcher with DynamicFetcher and then replace .get() with .fetch(). This runs a Playwright instance when fetching the content. The rest of our code remains the same.
Before running, make sure to run playwright install from your terminal. Playwright is used to power some of the best browser automation tools on the market.
from scrapling.fetchers import DynamicFetcher
page = DynamicFetcher.fetch("https://books.toscrape.com")
books = page.css("article.product_pod")
for book in books:
product_link_element = book.css("h3 > a")[0]
title = product_link_element.attrib.get("title")
product_url = product_link_element.attrib.get("href")
image = book.css("img")[0].attrib.get("src")
price = book.css("p.price_color")[0].get_all_text()
availability = book.css("p.instock.availability")[0].get_all_text().strip()
parsed_book = {
"title": title,
"product_url": product_url,
"image": image,
"price": price,
"availability": availability
}
print(parsed_book, "n----------------")
Scrapling at scale: Crawling an entire product catalogue
Our parser can easily be converted into a spider we can use for scraping at scale. Instead of parsing one page, now we’ll crawl the entire site. There are 1,000 books in total.
In the spider below, we take our parsing logic from the example above and place it within the parse() function. Whenever our spider receives a response, this function gets called to extract the data. Rather than printing these extracted values, we return them using yield. The spider then checks for a link to the next page. If the link is found, the spider uses response.follow() to fetch and parse it. We use result.items.to_json() to save our extracted data to a JSON file.
from scrapling.spiders import Spider, Response
class BookSpider(Spider):
name = "books"
start_urls = ["https://books.toscrape.com"]
async def parse(self, response: Response):
for book in response.css("article.product_pod"):
product_link_element = book.css("h3 > a")[0]
title = product_link_element.attrib.get("title")
product_url = product_link_element.attrib.get("href")
image = book.css("img")[0].attrib.get("src")
price = book.css("p.price_color")[0].get_all_text()
availability = book.css("p.instock.availability")[0].get_all_text().strip()
yield {
"title": title,
"product_url": product_url,
"image": image,
"price": price,
"availability": availability
}
next_page = response.css("li.next > a")
if next_page:
yield response.follow(next_page[0].attrib.get("href"), callback=self.parse)
result = BookSpider().start()
result.items.to_json("books.json", indent=True)
Unlike a Scrapy spider, we can run a Scrapling spider as a simple Python file.
python book_spider.py
Take a look at the output. Scrapling made 16.05 requests per second. It took roughly three seconds to index and parse the entire product catalogue.

Now, we’ll check the integrity of our JSON file.

Each book is parsed properly and has the following fields.
titleproduct_urlimagepriceavailability
Proxy support
Scrapling offers seamless proxy integration. The ProxyRotator class lets you pass in a list of proxies and automatically rotate between them. We simply add the method below to our spider class.
def configure_sessions(self, manager):
rotator = ProxyRotator([
"http://proxy1:8080",
"http://proxy2:8080",
"http://user:pass@proxy3:8080",
])
manager.add("default", FetcherSession(proxy_rotator=rotator))
In the code below, we’ve added proxy integration using Bright Data‘s residential proxies. Feel free to use any proxy provider you choose. With Bright Data’s residential proxies, we only need to pass one URL into ProxyRotator. Bright Data handles the rotation for us. Please note that we pass verify=False into the FetcherSession. This helps to avoid SSL errors when calling HTTP proxies.
from scrapling.spiders import Spider, Response
from scrapling.fetchers import FetcherSession, ProxyRotator
class BookSpider(Spider):
name = "books"
start_urls = ["https://books.toscrape.com"]
def configure_sessions(self, manager):
rotator = ProxyRotator([
"http://brd-customer-<your-username>-zone-<your-zone-name>:<your-password>@brd.superproxy.io:33335",
])
manager.add("default", FetcherSession(proxy_rotator=rotator, verify=False))
async def parse(self, response: Response):
for book in response.css("article.product_pod"):
product_link_element = book.css("h3 > a")[0]
title = product_link_element.attrib.get("title")
product_url = product_link_element.attrib.get("href")
image = book.css("img")[0].attrib.get("src")
price = book.css("p.price_color")[0].get_all_text()
availability = book.css("p.instock.availability")[0].get_all_text().strip()
yield {
"title": title,
"product_url": product_url,
"image": image,
"price": price,
"availability": availability
}
next_page = response.css("li.next > a")
if next_page:
yield response.follow(next_page[0].attrib.get("href"), callback=self.parse)
result = BookSpider().start()
result.items.to_json("books.json", indent=True)
The proxy connection will slow down your spider. When using a proxy connection, your web traffic gets routed through the proxy as an intermediary instead of communicating directly with the target site.
You can see the output in the screenshot below. Once again, we extracted and saved 1,000 products.

Our JSON file looks exactly like it did when we crawled the full site earlier. The spider is working as intended.

Conclusion
Scrapling is one of many open source web scraping frameworks making data extraction easier. With Scrapling, we can scrape using simple HTTP requests and we can also render dynamic content using a browser as well. Perhaps Scrapling’s most useful feature is the spider. With a Scrapling spider, we get the same scalability we expect from established frameworks like Scrapy.
Combine the features above with intuitive proxy support and Scrapling checks all the boxes required to make a splash in the scraping world. If you’re interested in learning more about Scrapling, take a look at their docs.