Quick Start
Get running in minutes.
1) Install
pip install scrapy-item-ingest
2) Enable (settings.py)
ITEM_PIPELINES = {
'scrapy_item_ingest.DbInsertPipeline': 300,
}
EXTENSIONS = {
'scrapy_item_ingest.LoggingExtension': 500,
}
# EITHER a single URL
DB_URL = 'postgresql://user:password@localhost:5432/database'
# OR discrete fields (no URL encoding needed)
# DB_HOST = 'localhost'
# DB_PORT = 5432
# DB_USER = 'user'
# DB_PASSWORD = 'password'
# DB_NAME = 'database'
# Optional
CREATE_TABLES = True
# JOB_ID = 1 # or omit to use spider name
3) Run
scrapy crawl your_spider
4) Verify
Data is written into these tables (created automatically when CREATE_TABLES = True):
job_items
job_requests
job_logs
5) Troubleshooting
Password contains @ or $? If using DB_URL, encode them (@ -> %40, $ -> %24).
Or use discrete fields to avoid encoding.
That’s it.