Skip to content

How To Develop a New Scraper

Under Construction

This section is being updated. Some information may be outdated or inaccurate.

Find a website

First, check if the website is already supported:

from recipe_scrapers import SCRAPERS
# Check if site is supported
print(SCRAPERS.get("bbcgoodfood.com"))

Track Your Progress

Create an issue to track your work.

Setup Repository

Fork the recipe-scrapers repository on GitHub and follow these steps:

Quick Setup

# Clone your fork
git clone https://github.com/YOUR-USERNAME/recipe-scrapers.git
cd recipe-scrapers

# Set up Python environment
python -m venv .venv
source .venv/bin/activate  # On Windows use: .venv\Scripts\activate
python -m pip install --upgrade pip
pip install -e ".[all]"

Create a new branch:

git checkout -b site/website-name

Run Tests

python -m unittest

# Optional: Parallel testing
pip install unittest-parallel
unittest-parallel --level test

Generate Scraper Files

1. Select Recipe URL

Recipe Selection

Choose a recipe with multiple instructions when possible. Single-instruction recipes may indicate parsing errors, unless explicitly handled.

2. Check Schema Support

Test if the site uses Recipe Schema:

from recipe_scrapers import scrape_html

scraper = scrape_html(html, url, wild_mode=True)
print(scraper.schema.data)  # Empty dict if schema not supported

3. Generate Files

python generate.py <ClassName> <URL>

This creates:

  • Scraper file in recipe_scrapers/
  • Test files in tests/test_data/<host>/

Implementation

from recipe_scrapers import scrape_html

scraper = scrape_html(html, url)
print(scraper.title())
print(scraper.ingredients())
def title(self):
    return self.soup.find('h1').get_text()

Testing

1. Update Test Data

Edit tests/test_data/<host>/test.json:

{
    "host": "<host>",
    "canonical_url": "...",
    "site_name": "...",
    "author": "...",
    "language": "...",
    "title": "...",
    "ingredients": "...",
    "instructions_list": "...",
    "total_time": "...",
    "yields": "...",
    "image": "...",
    "description": "..."
}

2. Run Tests

python -m unittest -k <ClassName.lower()>

Edge Cases

Test with multiple recipes to catch potential edge cases.

Submit Changes

  1. Commit your work:

    git add -p  # Review changes
    git commit -m "Add scraper for example.com"
    git push origin site/website-name
    

  2. Create a pull request at recipe-scrapers