Pipelines API Reference

Minimal, auto-generated API docs for pipelines. See README/Quickstart for usage.

Main pipelines

DbInsertPipeline

class scrapy_item_ingest.DbInsertPipeline(settings)[source]

Bases: ItemsPipeline, RequestsPipeline

Main pipeline that combines item processing and request tracking. Inherits from both ItemsPipeline and RequestsPipeline.

__init__(settings)[source]
classmethod from_crawler(crawler)[source]

Create pipeline instance from crawler

open_spider(spider)[source]

Called when spider is opened

close_spider(spider)[source]

Called when spider is closed

process_item(item, spider)[source]

Process and store item in database

ItemsPipeline

class scrapy_item_ingest.ItemsPipeline(settings)[source]

Bases: BasePipeline

Pipeline for handling scraped items

process_item(item, spider)[source]

Process and store item in database

RequestsPipeline

class scrapy_item_ingest.RequestsPipeline(settings)[source]

Bases: BasePipeline

Pipeline for handling request tracking

__init__(settings)[source]
classmethod from_crawler(crawler)[source]

Create pipeline instance from crawler

log_request(request, spider, response=None)[source]

Log request to database with complete information

request_scheduled(request, spider)[source]

Called when a request is scheduled - track start time

response_received(response, request, spider)[source]

Called when a response is received - log request with complete info

Base class

class scrapy_item_ingest.pipelines.base.BasePipeline(settings)[source]

Bases: object

Base pipeline with common functionality

__init__(settings)[source]
classmethod from_crawler(crawler)[source]

Create pipeline instance from crawler

open_spider(spider)[source]

Called when spider is opened

close_spider(spider)[source]

Called when spider is closed

get_identifier_info(spider)[source]

Get identifier column and value for the spider

Notes

  • Tables: job_items, job_requests, job_logs (created when CREATE_TABLES = True).

  • Configure DB via DB_URL or discrete fields (DB_HOST, DB_USER, etc.).

  • See configuration for all settings and extensions for DB logging controls.