Skip to content

listjobs.json: args in running jobs and finished jobs #529

Open
@jpmckinney

Description

@jpmckinney

#205 added the user-submitted args and settings to the listjobs.json response for pending jobs. We can consider doing the same for other jobs. I had started work on this, but noticed a few things to consider.

Running (and finished) jobs differ from pending jobs in that their args looks like:

["/usr/bin/python", "-m", "scrapyd.runner", "crawl", "s2", "-s", "DOWNLOAD_DELAY=2", "-a", "arg1=val1"]
  • This is specific to the implementation of the default Launcher service. Other configurations might not have the same format. We need to be careful about hardcoding behavior that is specific to the default Launcher.
  • It contains details that are not user-submitted and are implementation-specific, like the Python path. The information might be hard for users to read or use, since it doesn't match what they submitted.
  • If we add args, we'll need to implement a migration in SqliteJobStorage, similar to what was in 1a0cb2b#diff-40a7dd64b23747429cf84c808a761ad8185bd2a1b96400a512800b7bb0ae6f8fR145-R152
    def ensure_insert_time_column(self):
        q = "SELECT sql FROM sqlite_master WHERE type='table' AND name='%s'" % self.table
        if 'insert_time TIMESTAMP' not in self.conn.execute(q).fetchone()[0]:
            q = "ALTER TABLE %s ADD COLUMN insert_time TIMESTAMP" % self.table
            self.conn.execute(q)
            q = "UPDATE %s SET insert_time=CURRENT_TIMESTAMP" % self.table
            self.conn.execute(q)
            self.conn.commit()

Opening issue for discussion.


Note: Running (and finished) jobs also store an env, but this should not be published via API, because Scrapyd adds the main process' environment variables, which might have secrets (e.g. an admin might have deployed Scrapyd with secrets in env vars that spiders need to log in to remote services, and Scrapyd users might not otherwise have access to those).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions