Skip to content

Web API

Marc Mengel edited this page Apr 16, 2025 · 1 revision

Python API

Python API is the recommended way for client side Python applications to communicate with the Data Dispatcher server. To use the API, you need to install Data Dispatcher client module:

$ pip install --user datadispatcher

Then import the API module and create a DataDispatcherClient object:

from data_dispatcher.api import DataDispatcherClient
client = DataDispatcherClient("http://server.host.domain:8080/dd/data")
class data_dispatcher.api.DataDispatcherClient(server_url=None, auth_server_url=None, worker_id=None, worker_id_file=None, token=None, token_file=None, token_library=None, cpu_site='DEFAULT', timeout=300)

Initializes the DataDispatcherClient object

Keyword Arguments:
  • server_url (str) – The server endpoint URL. If unspecified, the value of the DATA_DISPATCHER_URL environment will be used

  • auth_server_url (str) – The endpoint URL for the Authentication server. If unspecified, the value of the DATA_DISPATCHER_AUTH_URL environment will be used

  • worker_id_file (str) – File path to read/store the worker ID. Default: <cwd>/.data_dispatcher_worker_id

  • worker_id (str) – Worker ID to use when reserving next file. If unspecified, will be read from the worker ID file.

  • cpu_site (str) – Name of the CPU site where the client is running, optional. Will be used when reserving project files.

  • timeout (float or int) – Number of seconds to wait for a response.

activate_project(project_id)

Resets the state of an abandoned project back to “active”

auth_info()

Returns information about current authentication token.

Returns:
  • str – username of the authenticated user

  • numeric – token expiration timestamp

cancel_project(project_id)

Cancels a project by id

Parameters:

project_id (str) – project id

Returns:

(dict) project information

copy_project(project_id, common_attributes={}, project_attributes={}, worker_timeout=None, idle_timeout=None)

Creates new project

Parameters:

project_id (int) – id of the project to copy

Keyword Arguments:
  • common_attributes (dict) – file attributes to override

  • project_attributes (dict) – project attributes to override

  • worker_timeout (int or float) – worker timeout to override

Returns:

(dict) new project information

create_project(files, common_attributes={}, project_attributes={}, query=None, worker_timeout=None, idle_timeout=259200, users=[], roles=[])

Creates new project

Parameters:
  • files (list) – Each item in the list is either a dictionary with keys: “namespace”, “name”, “attributes” (optional) or a string “namespace:name”

  • common_attributes (dict) – attributes to attach to each file, will be overridden by the individual file attribute values with the same key

  • project_attributes (dict) – attriutes to attach to the new project

  • query (str) – MQL query to be associated with the project. Thit attribute optiona and is not used by Data Dispatcher in any way. It is used for informational purposes only.

  • worker_timeout (int or float) – If not None, all file handles will be automatically released if allocated by same worker for longer than the worker_timeout seconds

  • idle_timeout (int or float) – If there is no file reserve/release activity for the specified time interval, the project goes into “abandoned” state. Default is 72 hours (3 days). If set to None, the project remains active until complete.

  • users (list of strings) – List of users who can use the worker interface (next_file, done, failed…), in addition to the project creator.

  • roles (list of strings) – List of roles, members of which are authorized to use the worker interface.

Returns:

new project information

Return type:

dict

file_done(project_id, did, worker_id=None)

Notifies Data Dispatcher that the file was successfully processed and should be marked as “done”.

Parameters:
  • project_id (int) – project id

  • did (str) – file DID (“<namespace>:<name>”)

file_failed(project_id, did, retry=True, worker_id=None)

Notifies Data Dispatcher that the file was successfully processed and should be marked as “done”.

Parameters:
  • project_id (int) – project id

  • did (str) – file DID (“<namespace>:<name>”)

get_file(namespace, name)

Deprecated

get_handle(project_id, namespace, name)

Gets information about a file handle

Parameters:
  • project_id (str) – project id

  • namespace (str) – file namespace

  • name (str) – file name

Returns:

(dict) file handle information or None if not found

get_project(project_id, with_files=True, with_replicas=False)

Gets information about the project

Parameters:

project_id (str) – project id

Keyword Arguments:
  • with_files (boolean) – whether to include iformation about project files. Default: True

  • with_replicas (boolean) – whether to include iformation about project file replicas. Default: False

Returns:

(dict) project information or None if project not found.

The dictionary will include the following values:

  • project_id: numeric, project id

  • owner: str, project owner username,

  • state: str, current project state,

  • attributes: dict, project metadata attributes as set by the create_project(),

  • created_timestamp: numeric, timestamp for the project creation time,

  • ended_timestamp: numeric or None, project end timestamp,

  • active: boolean, whether the project is active - at least one handle is not done or failed,

  • query: str, MQL query string associated with the project,

  • worker_timeout: numeric or None, worker idle timeout, in seconds

  • idle_timeout: numeric or None, project inactivity timeout in seconds

get_rse(name)

Returns information about RSE

Parameters:

name (str) – RSE name

Returns:

dictionary with RSE information or None if not found

list_handles(project_id, state=None, not_state=None, with_replicas=False)

Deprecated

list_projects(owner=None, state='active', not_state='abandoned', attributes=None, with_files=True, with_replicas=False)

Lists existing projects

Keyword Arguments:
  • owner (str) – Include only projects owned by the specified user. Default: all users

  • state (str) – Include only projects in specified state. Default: active only

  • not_state (str) – Exclude projects in the specified state. Default: exclude abandoned

  • attributes (dict) – Include only projects with specified attribute values. Default: do not filter by attributes

  • with_files (boolean) – Include information about files. Default: True

  • with_replicas (boolean) – Include information about file replics. Default: False

Returns:

list of dictionaries with information about projects selected

list_rses()

Return information about all RSEs

Args:

Returns:

list of dictionaries with RSE information

login_digest(username, password, save_token=False)

Performs password-based authentication and stores the authentication token locally.

Parameters:
  • username (str)

  • password (str) – Password is not sent over the network. It is hashed and then used for digest authentication (RFC 2617).

Returns:
  • str – username of the authenticated user (same as usernme argument)

  • numeric – token expiration timestamp

login_ldap(username, password)

Performs password-based authentication and stores the authentication token locally using LDAP.

Parameters:
  • username (str)

  • password (str) – Password

Returns:
  • str – username of the authenticated user (same as usernme argument)

  • numeric – token expiration timestamp

login_password(username, password)

Combines LDAP and RFC 2617 digest authentication by calling login_ldap first and then, if it fails, ldap_digest methods

Parameters:
  • username (str)

  • password (str) – Password

Returns:
  • str – username of the authenticated user (same as usernme argument)

  • numeric – token expiration timestamp

login_token(username, encoded_token)

Authenticate using a JWT or a SciToken.

Parameters:
  • username (str)

  • encoded_token (str or bytes)

Returns:
  • str – username of the authenticated user (same as usernme argument)

  • numeric – authentication expiration timestamp

login_x509(username, cert, key=None)

Performs X.509 authentication and stores the authentication token locally.

Parameters:
  • username (str)

  • cert (str) – Path to the file with the X.509 certificate or the certificate and private key

  • key (str) – Path to the file with the X.509 private key

Returns:
  • str – username of the authenticated user (same as usernme argument)

  • numeric – token expiration timestamp

new_worker_id(new_id=None, worker_id_file=None)

Sets or generates new worker ID to be used for next file allocation.

Keyword Arguments:
  • new_id (str or None) – New worker id to use. If None, a random worker_id will be generated.

  • worker_id_file (str or None) – Path to store the worker id. Default: <cwd>/.data_dispatcher_worker_id

Returns:

(str) assigned worker id

next_file(project_id, cpu_site=None, worker_id=None, timeout=None, stagger=10)

Reserves next available file from the project

Parameters:
  • project_id (int) – project id to reserve a file from

  • cpu_site (str) – optional, if specified, the file will be reserved according to the CPU/RSE proximity map

  • timeout (int or float) – optional, if specified, time to wait for a file to become available. Otherwise, will wait indefinitely

  • stagger (int or float) – optional, introduce a random delay between 0 and <stagger> seconds before sending first request. This will help mitigate the effect of synchronous stard of multiple workers. Default: 10

Returns:

Dictionary or boolean. If dictionary, the dictionary contains the reserved file information. “replicas” field will be a dictionary will contain a subdictionary with replicas information indexed by RSE name. If True: the request timed out, but can be retried. If False: the project has ended.

static random_worker_id(prefix='')

Static method to generate random worker id

reserved_handles(project_id, worker_id=None)

Returns list of file handles reserved in the project by given worker

Parameters:
  • project_id (int) – Project id

  • worker_id (str or None) – Worker id. If None, client’s worker id will be used

Returns:

list of dictionaries with the file handle information

restart_handles(project_id, done=False, failed=False, reserved=False, all=False, handles=[])

Restart processing of project file handles

Parameters:

project_id (int) – id of the project to restart

Keyword Arguments:
  • done (boolean) – default=False, restart done handles

  • reserved (boolean) – default=False, restart reserved handles

  • failed (boolean) – default=False, restart failed handles

  • all (boolean) – default=False, restart all handles

  • handles (list of DIDs) – default=[], restart specific handles

Returns:

(dict) project information

retry_request(method, url, timeout=None, **args)

Implements the functionality to retry on 503 response with random exponentially growing delay Use timemout = 0 to try the request exactly once Returns the response with status=503 on timeout

search_projects(search_query, owner=None, state='active', with_files=True, with_replicas=False)

Lists existing projects

Parameters:

search_query (str) – project search query in subset of MQL

Keyword Arguments:
  • owner (str) – Include only projects owned by the specified user. Default: all users

  • with_files (boolean) – Include information about files. Default: True

  • with_replicas (boolean) – Include information about file replics. Default: False

Returns:

list of dictionaries with information about projects found

set_rse_availability(name, available)

Changes RSE availability flag. The user must be an admin.

Parameters:
  • name (str) – RSE name

  • available (boolean) – RSE availability

Returns:

dictionary with updated RSE information or None if not found

version()

Returns the server version as a string

Clone this wiki locally