How To Develop a New Scraper
Under Construction
This section is being updated. Some information may be outdated or inaccurate.
Find a website
First, check if the website is already supported:
- Check the Supported Sites
- Or verify programmatically:
from recipe_scrapers import SCRAPERS
# Check if site is supported
print(SCRAPERS.get("bbcgoodfood.com"))
Track Your Progress
Create an issue to track your work.
Setup Repository
Fork the recipe-scrapers repository on GitHub and follow these steps:
Quick Setup
Create a new branch:
Run Tests
Generate Scraper Files
1. Select Recipe URL
Recipe Selection
Choose a recipe with multiple instructions when possible. Single-instruction recipes may indicate parsing errors, unless explicitly handled.
2. Check Schema Support
Test if the site uses Recipe Schema:
from recipe_scrapers import scrape_html
scraper = scrape_html(html, url, wild_mode=True)
print(scraper.schema.data) # Empty dict if schema not supported
3. Generate Files
This creates:
- Scraper file in
recipe_scrapers/
- Test files in
tests/test_data/<host>/
Implementation
Testing
1. Update Test Data
Edit tests/test_data/<host>/test.json
:
{
"host": "<host>",
"canonical_url": "...",
"site_name": "...",
"author": "...",
"language": "...",
"title": "...",
"ingredients": "...",
"instructions_list": "...",
"total_time": "...",
"yields": "...",
"image": "...",
"description": "..."
}
2. Run Tests
Edge Cases
Test with multiple recipes to catch potential edge cases.
Submit Changes
-
Commit your work:
-
Create a pull request at recipe-scrapers