Skip to content

Commit

Permalink
deploy: f2d5c77
Browse files Browse the repository at this point in the history
  • Loading branch information
ErinWeisbart committed Oct 28, 2024
1 parent c9889aa commit a5158ae
Show file tree
Hide file tree
Showing 8 changed files with 89 additions and 39 deletions.
Binary file added _images/sample_DCP_config_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions _sources/config_examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ Our internal configurations for each pipeline are as follows:
| EBS_VOL_SIZE (if using S3 mounted as a file system) | 22 | 22 | 22 | 22 | 22 | Files are read directly off of S3, mounted as a file system when `DOWNLOAD_FILES = False`. |
| EBS_VOL_SIZE (if downloading files) | 22 | 200 | 22 | 22 | 40 | Files are downloaded to the EBS volume when `DOWNLOAD_FILES = True`. |
| DOWNLOAD_FILES | 'False' | 'False' | 'False' | 'False' | 'False' | |
| ASSIGN_IP | 'False' | 'False' | 'False' | 'False' | 'False' | |
| DOCKER_CORES | 4 | 4 | 4 | 4 | 3 | If using c class machines and large images (2k + pixels) then you might need to reduce this number. |
| CPU_SHARES | DOCKER_CORES * 1024 | DOCKER_CORES * 1024 | DOCKER_CORES * 1024 | DOCKER_CORES * 1024 | DOCKER_CORES * 1024 | We never change this. |
| MEMORY | 7500 | 7500 | 7500 | 7500 | 7500 | This must match your machine type. m class use 15000, c class use 7500. |
Expand Down
24 changes: 18 additions & 6 deletions _sources/costs.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,28 @@

Distributed-CellProfiler is run by a series of three commands, only one of which incurs costs at typical scale of usage:

[`setup`](step_1_configuration.md) creates a queue in SQS and a cluster, service, and task definition in ECS.
ECS is entirely free.
[`setup`](step_1_configuration.md) creates a queue in SQS and a cluster, service, and task definition in ECS.
ECS is entirely free.
SQS queues are free to create and use up to 1 million requests/month.

[`submitJobs`](step_2_submit_jobs.md) places messages in the SQS queue which is free (under 1 million requests/month).

[`startCluster`](step_3_start_cluster.md) is the only command that incurs costs with initiation of your spot fleet request, creating machine alarms, and optionally creating a run dashboard.
[`startCluster`](step_3_start_cluster.md) is the only command that incurs costs with initiation of your spot fleet request, creating machine alarms, and optionally creating a run dashboard.

The spot fleet is the major cost of running Distributed-CellProfiler, exact pricing of which depends on the number of machines, type of machines, and duration of use.
The spot fleet is the major cost of running Distributed-CellProfiler, exact pricing of which depends on the number of machines, type of machines, and duration of use.
Your bid is configured in the [config file](step_1_configuration.md).
Simple spot fleet configurations can be minimized by:

1) Optimize `MACHINE_TYPE` and `EBS_VOL_SIZE` based on the actual memory and harddrive needs of your run.
2) When possible, mount your S3 bucket using S3FS so that you can set `DOWNLOAD_FILES = 'False'` to not incur file egress costs.
See [Step 1 Configuration](step_1_configuration.md) for more information.
Data egress charges range for various reasons including traversing AWS regions and/or AWS availability zones but are [often $0.08–$0.12 per GB](https://aws.amazon.com/blogs/apn/aws-data-transfer-charges-for-server-and-serverless-architectures/).
3) Set `ASSIGN_IP = 'False'` so that you don't pay for IPv4 addresses per EC2 instance in your spot fleet.
Public IPv4 costs are minimal ([$0.005/IP/hour as of February 1, 2024](https://aws.amazon.com/blogs/aws/new-aws-public-ipv4-address-charge-public-ip-insights/)) but there is no need to incur even this minimal cost unless you have a specific need for it.
See [Step 1 Configuration](step_1_configuration.md) for more information.

Spot fleet costs can be minimized/stopped in multiple ways:

1) We encourage the use of [`monitor`](step_4_monitor.md) during your job to help minimize the spot fleet cost as it automatically scales down your spot fleet request as your job queue empties and cancels your spot fleet request when you have no more jobs in the queue.
Note that you can also perform a more aggressive downscaling of your fleet by monitor by engaging Cheapest mode (see [`more information here`](step_4_monitor.md)).
2) If your job is finished, you can still initiate [`monitor`](step_4_monitor.md) to perform the same cleanup (without the automatic scaling).
Expand All @@ -23,14 +33,16 @@ Note that you can also perform a more aggressive downscaling of your fleet by mo
After the spot fleet has started, a Cloudwatch instance alarm is automatically placed on each instance in the fleet.
Cloudwatch instance alarms [are currently $0.10/alarm/month](https://aws.amazon.com/cloudwatch/pricing/).
Cloudwatch instance alarm costs can be minimized/stopped in multiple ways:

1) If you run monitor during your job, it will automatically delete Cloudwatch alarms for any instance that is no longer in use once an hour while running and at the end of a run.
2) If your job is finished, you can still initiate [`monitor`](step_4_monitor.md) to delete Cloudwatch alarms for any instance that is no longer in use.
3) In [AWS Cloudwatch console](https://console.aws.amazon.com/cloudwatch/) you can select unused alarms by going to Alarms => All alarms. Change Any State to Insufficient Data, select all alarms, and then Actions => Delete.
4) We provide a [hygiene script](hygiene.md) that will clean up old alarms for you.

Cloudwatch Dashboards [are currently free](https://aws.amazon.com/cloudwatch/pricing/) for 3 Dashboards with up to 50 metrics per month and are $3 per dashboard per month after that.
Cloudwatch Dashboards [are currently free](https://aws.amazon.com/cloudwatch/pricing/) for 3 Dashboards with up to 50 metrics per month and are $3 per dashboard per month after that.
Cloudwatch Dashboard costs can be minimized/prevented in multiple ways:

1) You can choose not to have Distributed-CellProfiler create a Dashboard by setting `CREATE_DASHBOARD = 'False'` in your [config file](step_1_configuration.md).
2) We encourage the use of [`monitor`](step_4_monitor.md) during your job as if you have set `CLEAN_DASHBOARD = 'True'` in your [config file](step_1_configuration.md) it will automatically delete your Dashboard when your job is done.
3) If your job is finished, you can still initiate [`monitor`](step_4_monitor.md) to perform the same cleanup (without the automatic scaling).
4) You can manually delete Dashboards in the [Cloudwatch Console]((https://console.aws.amazon.com/cloudwatch/)) by going to Dashboards, selecting your Dashboard, and selecting Delete.
4) You can manually delete Dashboards in the [Cloudwatch Console]((https://console.aws.amazon.com/cloudwatch/)) by going to Dashboards, selecting your Dashboard, and selecting Delete.
24 changes: 18 additions & 6 deletions _sources/step_1_configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ It need not be unique, but it should be descriptive enough that you can tell job
***

### AWS GENERAL SETTINGS

These are settings that will allow your instances to be configured correctly and access the resources they need- see [Step 0: Prep](step_0_prep.md) for more information.

Bucket configurations allow you to read/write from/to different bucket in different accounts from where you are running DCP.
Expand Down Expand Up @@ -48,15 +49,23 @@ Distinct clusters for each job are not necessary, but if you're running multiple
* **MACHINE_PRICE:** How much you're willing to pay per hour for each machine launched.
AWS has a handy [price history tracker](https://console.aws.amazon.com/ec2sp/v1/spot/home) you can use to make a reasonable estimate of how much to bid.
If your jobs complete quickly and/or you don't need the data immediately you can reduce your bid accordingly; jobs that may take many hours to finish or that you need results from immediately may justify a higher bid.
See also [AWS on-demand pricing](https://aws.amazon.com/ec2/pricing/on-demand/) to compare the cost savings of using spot fleets.
* **EBS_VOL_SIZE:** The size of the temporary hard drive associated with each EC2 instance in GB.
The minimum allowed is 22.
If you have multiple Dockers running per machine, each Docker will have access to (EBS_VOL_SIZE/TASKS_PER_MACHINE)- 2 GB of space.
* **DOWNLOAD_FILES:** Whether or not to download the image files to the EBS volume before processing, as opposed to accessing them all from S3FS.
This typically requires a larger EBS volume (depending on the size of your image sets, and how many sets are processed per group), but avoids occasional issues with S3FS that can crop up on longer runs.
By default, DCP uses S3FS to mount the S3 `SOURCE_BUCKET` as a pseudo-file system on each EC2 instance in your spot fleet to avoid file download.
If you are unable to mount the `SOURCE_BUCKET` (perhaps because of a permissions issue) you should proceed with `DOWNLOAD_FILES = 'True'`.
* **ASSIGN_IP:** Whether or not to assign an a public IPv4 address to each instance in the spot fleet.
If set to 'False' will overwrite whatever is in the Fleet file.
If set to 'True' will respect whatever is in the Fleet file.
Distributed-CellProfiler originally defaulted to assign an IP address to each instance so that one could connect to the instance for troubleshooting but that need has been mostly obviated by the level of logging currently in DCP.

***

### DOCKER INSTANCE RUNNING ENVIRONMENT

* **DOCKER_CORES:** How many copies of your script to run in each Docker container.
* **CPU_SHARES:** How many CPUs each Docker container may have.
* **MEMORY:** How much memory each Docker container may have.
Expand All @@ -83,8 +92,9 @@ See [Step 0: Prep](step_0_prep.med) for more information.

***

### MONITORING
* **AUTO_MONITOR:** Whether or not to have Auto-Monitor automatically monitor your jobs.
### MONITORING

* **AUTO_MONITOR:** Whether or not to have Auto-Monitor automatically monitor your jobs.

***

Expand All @@ -111,6 +121,7 @@ Useful when trying to detect jobs that may have exported smaller corrupted files
***

### CELLPROFILER SETTINGS

* **ALWAYS CONTINUE:** Whether or not to run CellProfiler with the --always-continue flag, which will keep CellProfiler from crashing if it errors.
Use with caution.
This can be particularly helpful in jobs where a large number of files are loaded in a single run (such as during illumination correction) so that a corrupted or missing file doesn't prevent the whole job completing.
Expand All @@ -120,6 +131,7 @@ We suggest using this setting in conjunction with a small number of JOB_RETRIES.
***

### PLUGINS

* **USE_PLUGINS:** Whether or not you will be using external plugins from the CellProfiler-plugins repository.
When True, passes the `--plugins-directory` flag to CellProfiler.
Defaults to the current v1.0 `CellProfiler-plugins/active_plugins` location for plugins but will revert to the historical location of plugins in the `CellProfiler-plugins` root directory if the `active_plugins` folder is not present.
Expand Down Expand Up @@ -147,7 +159,7 @@ If you need to use deprecated plugin organization you can access previous commit

### EXAMPLE CONFIGURATIONS

!(Sample_Distributed-CellProfiler_Configuration_1)[images/sample_DCP_config_1.png]
![Sample_Distributed-CellProfiler_Configuration_1](images/sample_DCP_config_1.png)

This is an example of one possible configuration.
It's a fairly large machine that is able to process 64 jobs at the same time.
Expand All @@ -159,9 +171,9 @@ The Config settings for this example are:

**DOCKER_CORES** = 4 (copies of CellProfiler to run inside a docker)
**CPU_SHARES** = 4096 (number of cores for each Docker * 1024)
**MEMORY** = 15000 (MB for each Docker)
**MEMORY** = 15000 (MB for each Docker)

!(Sample_Distributed-CellProfiler_Configuration_2)[images/sample_DCP_config_2.png]
![Sample_Distributed-CellProfiler_Configuration_2](images/sample_DCP_config_2.png)

This is an example of another possible configuration.
When we run Distributed CellProfiler we tend to prefer running a larger number of smaller machine.
Expand All @@ -175,4 +187,4 @@ The Config settings for this example are:

**DOCKER_CORES** = 4 (copies of CellProfiler to run inside a docker)
**CPU_SHARES** = 4096 (number of cores for each Docker * 1024)
**MEMORY** = 15000 (MB for each Docker)
**MEMORY** = 15000 (MB for each Docker)
50 changes: 29 additions & 21 deletions config_examples.html
Original file line number Diff line number Diff line change
Expand Up @@ -594,167 +594,175 @@ <h1>config.py configuration examples<a class="headerlink" href="#config-py-confi
<td><p>‘False’</p></td>
<td><p></p></td>
</tr>
<tr class="row-even"><td><p>DOCKER_CORES</p></td>
<tr class="row-even"><td><p>ASSIGN_IP</p></td>
<td><p>‘False’</p></td>
<td><p>‘False’</p></td>
<td><p>‘False’</p></td>
<td><p>‘False’</p></td>
<td><p>‘False’</p></td>
<td><p></p></td>
</tr>
<tr class="row-odd"><td><p>DOCKER_CORES</p></td>
<td><p>4</p></td>
<td><p>4</p></td>
<td><p>4</p></td>
<td><p>4</p></td>
<td><p>3</p></td>
<td><p>If using c class machines and large images (2k + pixels) then you might need to reduce this number.</p></td>
</tr>
<tr class="row-odd"><td><p>CPU_SHARES</p></td>
<tr class="row-even"><td><p>CPU_SHARES</p></td>
<td><p>DOCKER_CORES * 1024</p></td>
<td><p>DOCKER_CORES * 1024</p></td>
<td><p>DOCKER_CORES * 1024</p></td>
<td><p>DOCKER_CORES * 1024</p></td>
<td><p>DOCKER_CORES * 1024</p></td>
<td><p>We never change this.</p></td>
</tr>
<tr class="row-even"><td><p>MEMORY</p></td>
<tr class="row-odd"><td><p>MEMORY</p></td>
<td><p>7500</p></td>
<td><p>7500</p></td>
<td><p>7500</p></td>
<td><p>7500</p></td>
<td><p>7500</p></td>
<td><p>This must match your machine type. m class use 15000, c class use 7500.</p></td>
</tr>
<tr class="row-odd"><td><p>SECONDS_TO_START</p></td>
<tr class="row-even"><td><p>SECONDS_TO_START</p></td>
<td><p>60</p></td>
<td><p>3*60</p></td>
<td><p>60</p></td>
<td><p>3*60</p></td>
<td><p>3*60</p></td>
<td><p></p></td>
</tr>
<tr class="row-even"><td><p>SQS_QUEUE_NAME</p></td>
<tr class="row-odd"><td><p>SQS_QUEUE_NAME</p></td>
<td><p>APP_NAME + ‘Queue’</p></td>
<td><p>APP_NAME + ‘Queue’</p></td>
<td><p>APP_NAME + ‘Queue’</p></td>
<td><p>APP_NAME + ‘Queue’</p></td>
<td><p>APP_NAME + ‘Queue’</p></td>
<td><p>We never change this.</p></td>
</tr>
<tr class="row-odd"><td><p>SQS_MESSAGE_VISIBILITY</p></td>
<tr class="row-even"><td><p>SQS_MESSAGE_VISIBILITY</p></td>
<td><p>3*60</p></td>
<td><p>240*60</p></td>
<td><p>15*60</p></td>
<td><p>10*60</p></td>
<td><p>120*60</p></td>
<td><p>About how long you expect a job to take * 1.5 in seconds</p></td>
</tr>
<tr class="row-even"><td><p>SQS_DEAD_LETTER_QUEUE</p></td>
<tr class="row-odd"><td><p>SQS_DEAD_LETTER_QUEUE</p></td>
<td><p>‘YOURNAME_DEADMESSAGES’</p></td>
<td><p>‘YOURNAME_DEADMESSAGES’</p></td>
<td><p>‘YOURNAME_DEADMESSAGES’</p></td>
<td><p>‘YOURNAME_DEADMESSAGES’</p></td>
<td><p>‘YOURNAME_DEADMESSAGES’</p></td>
<td><p></p></td>
</tr>
<tr class="row-odd"><td><p>JOB_RETRIES</p></td>
<tr class="row-even"><td><p>JOB_RETRIES</p></td>
<td><p>3</p></td>
<td><p>3</p></td>
<td><p>3</p></td>
<td><p>3</p></td>
<td><p>3</p></td>
<td><p></p></td>
</tr>
<tr class="row-even"><td><p>AUTO_MONITOR</p></td>
<tr class="row-odd"><td><p>AUTO_MONITOR</p></td>
<td><p>‘True’</p></td>
<td><p>‘True’</p></td>
<td><p>‘True’</p></td>
<td><p>‘True’</p></td>
<td><p>‘True’</p></td>
<td><p>Can be turned off if manually running Monitor.</p></td>
</tr>
<tr class="row-odd"><td><p>CREATE_DASHBOARD</p></td>
<tr class="row-even"><td><p>CREATE_DASHBOARD</p></td>
<td><p>‘True’</p></td>
<td><p>‘True’</p></td>
<td><p>‘True’</p></td>
<td><p>‘True’</p></td>
<td><p>‘True’</p></td>
<td><p></p></td>
</tr>
<tr class="row-even"><td><p>CLEAN_DASHBOARD</p></td>
<tr class="row-odd"><td><p>CLEAN_DASHBOARD</p></td>
<td><p>‘True’</p></td>
<td><p>‘True’</p></td>
<td><p>‘True’</p></td>
<td><p>‘True’</p></td>
<td><p>‘True’</p></td>
<td><p></p></td>
</tr>
<tr class="row-odd"><td><p>CHECK_IF_DONE_BOOL</p></td>
<tr class="row-even"><td><p>CHECK_IF_DONE_BOOL</p></td>
<td><p>‘True’</p></td>
<td><p>‘True’</p></td>
<td><p>‘True’</p></td>
<td><p>‘True’</p></td>
<td><p>‘True’</p></td>
<td><p>Can be turned off if wanting to overwrite old data.</p></td>
</tr>
<tr class="row-even"><td><p>EXPECTED_NUMBER_FILES</p></td>
<tr class="row-odd"><td><p>EXPECTED_NUMBER_FILES</p></td>
<td><p>1 (an image)</p></td>
<td><p>number channels + 1 (an .npy for each channel and isdone)</p></td>
<td><p>3 (Experiment.csv, Image.csv, and isdone)</p></td>
<td><p>1 (an image)</p></td>
<td><p>5 (Experiment, Image, Cells, Nuclei, and Cytoplasm .csvs)</p></td>
<td><p>Better to underestimate than overestimate.</p></td>
</tr>
<tr class="row-odd"><td><p>MIN_FILE_SIZE_BYTES</p></td>
<tr class="row-even"><td><p>MIN_FILE_SIZE_BYTES</p></td>
<td><p>1</p></td>
<td><p>1</p></td>
<td><p>1</p></td>
<td><p>1</p></td>
<td><p>1</p></td>
<td><p>Count files of any size.</p></td>
</tr>
<tr class="row-even"><td><p>NECESSARY_STRING</p></td>
<tr class="row-odd"><td><p>NECESSARY_STRING</p></td>
<td><p>‘’</p></td>
<td><p>‘’</p></td>
<td><p>‘’</p></td>
<td><p>‘’</p></td>
<td><p>‘’</p></td>
<td><p>Not necessary for standard workflows.</p></td>
</tr>
<tr class="row-odd"><td><p>ALWAYS_CONTINUE</p></td>
<tr class="row-even"><td><p>ALWAYS_CONTINUE</p></td>
<td><p>‘False’</p></td>
<td><p>‘False’</p></td>
<td><p>‘False’</p></td>
<td><p>‘False’</p></td>
<td><p>‘False’</p></td>
<td><p>Use with caution.</p></td>
</tr>
<tr class="row-even"><td><p>USE_PLUGINS</p></td>
<tr class="row-odd"><td><p>USE_PLUGINS</p></td>
<td><p>‘False’</p></td>
<td><p>‘False’</p></td>
<td><p>‘False’</p></td>
<td><p>‘False’</p></td>
<td><p>‘False’</p></td>
<td><p>Not necessary for standard workflows.</p></td>
</tr>
<tr class="row-odd"><td><p>UPDATE_PLUGINS</p></td>
<tr class="row-even"><td><p>UPDATE_PLUGINS</p></td>
<td><p>‘False’</p></td>
<td><p>‘False’</p></td>
<td><p>‘False’</p></td>
<td><p>‘False’</p></td>
<td><p>‘False’</p></td>
<td><p>Not necessary for standard workflows.</p></td>
</tr>
<tr class="row-even"><td><p>PLUGINS_COMMIT</p></td>
<tr class="row-odd"><td><p>PLUGINS_COMMIT</p></td>
<td><p>‘’</p></td>
<td><p>‘’</p></td>
<td><p>‘’</p></td>
<td><p>‘’</p></td>
<td><p>‘’</p></td>
<td><p>Not necessary for standard workflows.</p></td>
</tr>
<tr class="row-odd"><td><p>INSTALL_REQUIREMENTS</p></td>
<tr class="row-even"><td><p>INSTALL_REQUIREMENTS</p></td>
<td><p>‘False’</p></td>
<td><p>‘False’</p></td>
<td><p>‘False’</p></td>
<td><p>‘False’</p></td>
<td><p>‘False’</p></td>
<td><p>Not necessary for standard workflows.</p></td>
</tr>
<tr class="row-even"><td><p>REQUIREMENTS_FILE</p></td>
<tr class="row-odd"><td><p>REQUIREMENTS_FILE</p></td>
<td><p>‘’</p></td>
<td><p>‘’</p></td>
<td><p>‘’</p></td>
Expand Down
Loading

0 comments on commit a5158ae

Please sign in to comment.