Basic Recipe: Items + Requests + Logs
The fastest way to store items, requests, and Scrapy logs in PostgreSQL.
1) Install
pip install scrapy-item-ingest
2) Enable (settings.py)
ITEM_PIPELINES = {
'scrapy_item_ingest.DbInsertPipeline': 300,
}
EXTENSIONS = {
'scrapy_item_ingest.LoggingExtension': 500,
}
# Either one URL
DB_URL = 'postgresql://user:password@localhost:5432/database'
# or discrete fields (no URL encoding)
# DB_HOST = 'localhost'
# DB_PORT = 5432
# DB_USER = 'user'
# DB_PASSWORD = 'password'
# DB_NAME = 'database'
CREATE_TABLES = True # auto-create tables on first run
# JOB_ID = 1 # optional; spider name if omitted
3) Run
scrapy crawl your_spider
Expected tables
job_items: JSON itemsjob_requests: requests with parent/response_timejob_logs: spider + selected Scrapy lines
Tips
Password contains
@or$? In URLs encode:@->%40,$->%24.Prefer discrete DB fields to avoid encoding entirely.