Recipe: Scrapy logs to DB
Store Scrapy (Zyte-like) lines in job_logs with minimal noise.
1) Enable (settings.py)
EXTENSIONS = {
'scrapy_item_ingest.LoggingExtension': 500,
}
# Minimum level to store
LOG_DB_LEVEL = 'INFO' # or 'DEBUG' for more
# Capture level for Scrapy (DB only; console unchanged)
LOG_DB_CAPTURE_LEVEL = 'DEBUG'
# Allowed namespaces (defaults include [spider.name, 'scrapy'])
# LOG_DB_LOGGERS = ['scrapy']
# Reduce noise (defaults already exclude scraper dumps)
# LOG_DB_EXCLUDE_LOGGERS = ['scrapy.core.scraper']
# LOG_DB_EXCLUDE_PATTERNS = ['Scraped from <']
# Database
DB_URL = 'postgresql://user:password@localhost:5432/database'
# or discrete fields
# DB_HOST = 'localhost'
# DB_PORT = 5432
# DB_USER = 'user'
# DB_PASSWORD = 'password'
# DB_NAME = 'database'
CREATE_TABLES = True
2) Run
scrapy crawl your_spider
Expected
job_logscontains start/close and Scrapy framework lines (e.g., Crawled, engine events).No item-dump “Scraped from <…>” lines by default.
Tips
To hide telnet line: add
'scrapy.extensions.telnet'toLOG_DB_EXCLUDE_LOGGERS.Keep discrete DB fields if your password has special characters.