How To Develop a New Scraper

Under Construction

This section is being updated. Some information may be outdated or inaccurate.

Find a website

First, check if the website is already supported:

Check the Supported Sites
Or verify programmatically:

from recipe_scrapers import SCRAPERS
# Check if site is supported
print(SCRAPERS.get("bbcgoodfood.com"))

Track Your Progress

Create an issue to track your work.

Setup Repository

Fork the recipe-scrapers repository on GitHub and follow these steps:

Quick Setup

    # Clone your fork
    git clone https://github.com/YOUR-USERNAME/recipe-scrapers.git
    cd recipe-scrapers

    # Set up Python environment
    python -m venv .venv
    source .venv/bin/activate  # On Windows use: .venv\Scripts\activate
    python -m pip install --upgrade pip
    pip install -e ".[all]"

Create a new branch:

git checkout -b site/website-name

Run Tests

    python -m unittest

    # Optional: Parallel testing
    pip install unittest-parallel
    unittest-parallel --level test

Generate Scraper Files

1. Select Recipe URL

Recipe Selection

Choose a recipe with multiple instructions when possible. Single-instruction recipes may indicate parsing errors, unless explicitly handled.

2. Check Schema Support

Test if the site uses Recipe Schema:

from urllib.request import urlopen
from recipe_scrapers import scrape_html

url = "https://example.com/your-recipe"
html = urlopen(url).read().decode("utf-8")

scraper = scrape_html(html, url, wild_mode=True)
print(scraper.schema.data)  # Empty dict if schema not supported

3. Generate Files

python generate.py <ClassName> <URL>

<URL> should be the recipe page you selected in the first step. The script downloads this recipe and uses it to create the initial test data.

This creates:

Scraper file in recipe_scrapers/
Test files in tests/test_data/<host>/

Implementation

With Recipe Schema

    from recipe_scrapers import scrape_html

    scraper = scrape_html(html, url)
    print(scraper.title())
    print(scraper.ingredients())

Without Recipe Schema

    def title(self):
        return self.soup.find('h1').get_text()

Resources

Testing

1. Update Test Data

Edit tests/test_data/<host>/test.json:

{
    "host": "<host>",
    "canonical_url": "...",
    "site_name": "...",
    "author": "...",
    "language": "...",
    "title": "...",
    "ingredients": "...",
    "instructions_list": "...",
    "total_time": "...",
    "yields": "...",
    "image": "...",
    "description": "..."
}

Test Data Population Help

The HTML file generated by generate.py can be used to help you fill in the required fields within the test JSON file

from pathlib import Path
from recipe_scrapers import scrape_html
import json

html = Path("tests/test_data/<host>/<TestFileName>.testhtml").read_text(encoding="utf-8")
scraper = scrape_html(html, "<URL>")
print(json.dumps(scraper.to_json(), indent=2, ensure_ascii=False))

This will print the output returned by the scraper to your terminal for reference

2. Run Tests

python -m unittest -k <ClassName.lower()>

Edge Cases

Test with multiple recipes to catch potential edge cases.

Submit Changes

Commit your work:

git add -p  # Review changes
git commit -m "Add scraper for example.com"
git push origin site/website-name

Create a pull request at recipe-scrapers