Open
Description
#205 added the user-submitted args
and settings
to the listjobs.json response for pending jobs. We can consider doing the same for other jobs. I had started work on this, but noticed a few things to consider.
Running (and finished) jobs differ from pending jobs in that their args
looks like:
["/usr/bin/python", "-m", "scrapyd.runner", "crawl", "s2", "-s", "DOWNLOAD_DELAY=2", "-a", "arg1=val1"]
- This is specific to the implementation of the default Launcher service. Other configurations might not have the same format. We need to be careful about hardcoding behavior that is specific to the default Launcher.
- It contains details that are not user-submitted and are implementation-specific, like the Python path. The information might be hard for users to read or use, since it doesn't match what they submitted.
- If we add
args
, we'll need to implement a migration inSqliteJobStorage
, similar to what was in 1a0cb2b#diff-40a7dd64b23747429cf84c808a761ad8185bd2a1b96400a512800b7bb0ae6f8fR145-R152
def ensure_insert_time_column(self):
q = "SELECT sql FROM sqlite_master WHERE type='table' AND name='%s'" % self.table
if 'insert_time TIMESTAMP' not in self.conn.execute(q).fetchone()[0]:
q = "ALTER TABLE %s ADD COLUMN insert_time TIMESTAMP" % self.table
self.conn.execute(q)
q = "UPDATE %s SET insert_time=CURRENT_TIMESTAMP" % self.table
self.conn.execute(q)
self.conn.commit()
Opening issue for discussion.
Note: Running (and finished) jobs also store an env
, but this should not be published via API, because Scrapyd adds the main process' environment variables, which might have secrets (e.g. an admin might have deployed Scrapyd with secrets in env vars that spiders need to log in to remote services, and Scrapyd users might not otherwise have access to those).