From c2ff942701a16113bcad0068045395d69388ccaa Mon Sep 17 00:00:00 2001 From: Josh Sumner <51797700+joshqsumner@users.noreply.github.com> Date: Tue, 21 Jan 2025 09:41:14 -0600 Subject: [PATCH 1/4] bullet points --- docs/parallel_config.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/docs/parallel_config.md b/docs/parallel_config.md index d45fb8477..59593f9cf 100644 --- a/docs/parallel_config.md +++ b/docs/parallel_config.md @@ -155,26 +155,26 @@ After defining the cluster, parameters are used to define the size of and reques environment. These settings are defined in the `cluster_config` parameter. We define by default the following parameters: -**n_workers**: (int, required, default = 1): the number of workers/slots to request from the cluster. Because we +* **n_workers**: (int, required, default = 1): the number of workers/slots to request from the cluster. Because we generally use 1 CPU per image analysis workflow, this is effectively the maximum number of concurrently running workflows. -**cores**: (int, required, default = 1): the number of compute cores per workflow. This should be left as 1 unless a +* **cores**: (int, required, default = 1): the number of compute cores per workflow. This should be left as 1 unless a workflow is designed to use multiple CPUs/cores/threads. -**memory**: (str, required, default = "1GB"): the amount of memory/RAM used per workflow. Can be set as a number plus +* **memory**: (str, required, default = "1GB"): the amount of memory/RAM used per workflow. Can be set as a number plus units (KB, MB, GB, etc.). -**disk**: (str, required, default = "1GB"): the amount of disk space used per workflow. Can be set as a number plus +* **disk**: (str, required, default = "1GB"): the amount of disk space used per workflow. Can be set as a number plus units (KB, MB, GB, etc.). -**log_directory**: (str, optional, default = `None`): directory where worker logs are stored. Can be set to a path or +* **log_directory**: (str, optional, default = `None`): directory where worker logs are stored. Can be set to a path or environmental variable. -**local_directory**: (str, optional, default = `None`): dask working directory location. Can be set to a path or +* **local_directory**: (str, optional, default = `None`): dask working directory location. Can be set to a path or environmental variable. -**job_extra_directives**: (dict, optional, default = `None`): extra parameters sent to the scheduler. Specified as a dictionary +* **job_extra_directives**: (dict, optional, default = `None`): extra parameters sent to the scheduler. Specified as a dictionary of key-value pairs (e.g. `{"getenv": "true"}`). !!! note From 683695450a680c20d9cbdebb4abcc6b63e307cf2 Mon Sep 17 00:00:00 2001 From: Josh Sumner <51797700+joshqsumner@users.noreply.github.com> Date: Tue, 21 Jan 2025 09:41:21 -0600 Subject: [PATCH 2/4] typo --- docs/parallel_config.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/parallel_config.md b/docs/parallel_config.md index 59593f9cf..2a7eb1bfa 100644 --- a/docs/parallel_config.md +++ b/docs/parallel_config.md @@ -180,7 +180,7 @@ of key-value pairs (e.g. `{"getenv": "true"}`). !!! note `n_workers` is the only parameter used by `LocalCluster`, all others are currently ignored. `n_workers`, `cores`, `memory`, and `disk` are required by the other clusters. All other parameters are optional. Additional parameters - defined in the [dask-jobqueu API](https://jobqueue.dask.org/en/latest/api.html) can be supplied. + defined in the [dask-jobqueue API](https://jobqueue.dask.org/en/latest/api.html) can be supplied. ### Example From faae6062cd49a210661cd842d163d6c6beec8b7b Mon Sep 17 00:00:00 2001 From: Josh Sumner <51797700+joshqsumner@users.noreply.github.com> Date: Tue, 21 Jan 2025 09:53:01 -0600 Subject: [PATCH 3/4] change txt to json --- docs/parallel_config.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/parallel_config.md b/docs/parallel_config.md index 2a7eb1bfa..01deb24c2 100644 --- a/docs/parallel_config.md +++ b/docs/parallel_config.md @@ -8,7 +8,7 @@ to run workflows in parallel. Create a configuration file from a template: ```bash -plantcv-run-workflow --template my_config.txt +plantcv-run-workflow --template my_config.json ``` *class* **plantcv.parallel.WorkflowConfig** From 478d380c3df89c17e9356dcd24757752e77fa04a Mon Sep 17 00:00:00 2001 From: Josh Sumner <51797700+joshqsumner@users.noreply.github.com> Date: Tue, 21 Jan 2025 09:53:39 -0600 Subject: [PATCH 4/4] change txt to json and clarify where to find guidance for manual edits --- docs/pipeline_parallel.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/pipeline_parallel.md b/docs/pipeline_parallel.md index ce74ec07a..6d5a8c0c8 100644 --- a/docs/pipeline_parallel.md +++ b/docs/pipeline_parallel.md @@ -18,14 +18,15 @@ a configuration file can be edited and input. To create a configuration file, run the following: ```bash -plantcv-run-workflow --template my_config.txt +plantcv-run-workflow --template my_config.json ``` The code above saves a text configuration file in JSON format using the built-in defaults for parameters. The parameters can be modified directly in Python as demonstrated in the [WorkflowConfig documentation](parallel_config.md). A configuration can be saved at any time using the `save_config` method to save for later use. Alternatively, open the saved config -file with your favorite text editor and adjust the parameters as needed. +file with your favorite text editor and adjust the parameters as needed (refer to the attributes section of +[WorkflowConfig documentation](parallel_config.md) for details about each parameter). **Some notes on JSON format:**