How To Develop a New Scraper
Under Construction
This section is being updated. Some information may be outdated or inaccurate.
Find a website
First, check if the website is already supported:
- Check the Supported Sites
- Or verify programmatically:
from recipe_scrapers import SCRAPERS
# Check if site is supported
print(SCRAPERS.get("bbcgoodfood.com"))
Track Your Progress
Create an issue to track your work.
Setup Repository
Fork the recipe-scrapers repository on GitHub and follow these steps:
Quick Setup
Create a new branch:
Run Tests
Generate Scraper Files
1. Select Recipe URL
Recipe Selection
Choose a recipe with multiple instructions when possible. Single-instruction recipes may indicate parsing errors, unless explicitly handled.
2. Check Schema Support
Test if the site uses Recipe Schema:
from urllib.request import urlopen
from recipe_scrapers import scrape_html
url = "https://example.com/your-recipe"
html = urlopen(url).read().decode("utf-8")
scraper = scrape_html(html, url, wild_mode=True)
print(scraper.schema.data) # Empty dict if schema not supported
3. Generate Files
<URL>
should be the recipe page you selected in the first step. The script
downloads this recipe and uses it to create the initial test data.
This creates:
- Scraper file in
recipe_scrapers/
- Test files in
tests/test_data/<host>/
Implementation
Testing
1. Update Test Data
Edit tests/test_data/<host>/test.json
:
{
"host": "<host>",
"canonical_url": "...",
"site_name": "...",
"author": "...",
"language": "...",
"title": "...",
"ingredients": "...",
"instructions_list": "...",
"total_time": "...",
"yields": "...",
"image": "...",
"description": "..."
}
Test Data Population Help
The HTML file generated by generate.py
can be used to help you fill in the required fields within the test JSON file
from pathlib import Path
from recipe_scrapers import scrape_html
import json
html = Path("tests/test_data/<host>/<TestFileName>.testhtml").read_text(encoding="utf-8")
scraper = scrape_html(html, "<URL>")
print(json.dumps(scraper.to_json(), indent=2, ensure_ascii=False))
This will print the output returned by the scraper to your terminal for reference
2. Run Tests
Edge Cases
Test with multiple recipes to catch potential edge cases.
Submit Changes
-
Commit your work:
-
Create a pull request at recipe-scrapers