You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ZyteJobsComparisonMonitor is a monitor that compares item_scraped_count from the current job against past jobs from ScrapyCloud.
By default, it will grab #SPIDERMON_JOBS_COMPARISON past jobs, and these jobs can be filtered using:
SPIDERMON_JOBS_COMPARISON_STATES (only keep jobs that have those states)
SPIDERMON_JOBS_COMPARISON_TAGS (only keep jobs with those tags if they are present in the current job too)
This works, but it's common to need more filtering options in practice.
Particularly, it would be nice to add to the filter by close_reason rather than state.
state can be "finished," "running," "pending," or "deleted," which is not quite helpful, as we mostly want to check "finished" jobs. close_reason, on the other hand, could allow us to keep only successful finished past jobs instead of failed or banned ones, which can have truncated item counts we wouldn't want to compare the current job against.
Proposal
Add a new SPIDERMON_JOBS_COMPARISON_CLOSE_REASONS setting to allow ZyteJobsComparisonMonitor to filter by the ScrapyCloud jobs' close_reason stat.
The text was updated successfully, but these errors were encountered:
Background
ZyteJobsComparisonMonitor is a monitor that compares
item_scraped_count
from the current job against past jobs from ScrapyCloud.By default, it will grab
#SPIDERMON_JOBS_COMPARISON
past jobs, and these jobs can be filtered using:SPIDERMON_JOBS_COMPARISON_STATES
(only keep jobs that have those states)SPIDERMON_JOBS_COMPARISON_TAGS
(only keep jobs with those tags if they are present in the current job too)This works, but it's common to need more filtering options in practice.
Particularly, it would be nice to add to the filter by
close_reason
rather thanstate
.state
can be "finished," "running," "pending," or "deleted," which is not quite helpful, as we mostly want to check "finished" jobs.close_reason
, on the other hand, could allow us to keep only successful finished past jobs instead of failed or banned ones, which can have truncated item counts we wouldn't want to compare the current job against.Proposal
Add a new
SPIDERMON_JOBS_COMPARISON_CLOSE_REASONS
setting to allow ZyteJobsComparisonMonitor to filter by the ScrapyCloud jobs'close_reason
stat.The text was updated successfully, but these errors were encountered: