Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CP Headless Only Runs 1 Job at a Time #12

Open
roshankern opened this issue May 3, 2023 · 2 comments
Open

CP Headless Only Runs 1 Job at a Time #12

roshankern opened this issue May 3, 2023 · 2 comments
Assignees
Labels
bug Something isn't working documentation Improvements or additions to documentation

Comments

@roshankern
Copy link
Member

CellProfiler Headless (command line) only seems to run 1 job at a time, whereas the GUI version utilizes all cores.
The tests below are run with CellProfiler 4.2.4

Running CellProfiler analysis with images from cell-health-data from headless has the following system usage:
image

Running this same pipeline, with the same images from GUI has the following system usage:
image
(Notice the multiple workers mentioned in command line for the GUI version).

Beth Cimini mentions this aspect of CellProfiler in Getting started using CellProfiler from the command line (April 30, 2021):

Running CellProfiler from the command line has a major advantage — you don't need to spend computational power or memory creating the graphical user interface (GUI) that you're used to using in CellProfiler! It also has a major potential disadvantage in that while CellProfiler running graphically on your desktop will automatically use as many cores on your computer as you permit, CellProfiler running headlessly from the command line will only run one job at a time.

@jenna-tomkinson
Copy link
Member

jenna-tomkinson commented May 3, 2023

@roshankern

Thank you for bringing this up! I did not see the extent of the disadvantage until now.

I think the main upside to this is that you can run multiple pipelines in a row using a bash script or the python function that I have. I have not found another way to do this through the GUI.

I have two ideas that could work to go around this issue with the GUI are: @gwaybio

  1. If you are using the Images, Metadata, NamesandTypes, and Groups module to input data, you could make a directory where all the images are located per plate in separate folders and use the Groups module to group by plate so that then all images per plates are run through the pipeline separately. Then you could proceed running with the GUI.

  2. If you are using the LoadData module, you could combine the LoadData CSVs for each plate into one and group by plate which would be the same as above.

I have not tried this before, and it is unfortunate that there is this huge difference between GUI and headless. I can look into making an issue on CellProfiler regarding this issue since it would make sense for headless to work just as well as GUI for jobs.

@jenna-tomkinson jenna-tomkinson added bug Something isn't working documentation Improvements or additions to documentation labels May 3, 2023
@jenna-tomkinson
Copy link
Member

@gwaybio @roshankern

Further investigation shows that CellProfiler has limited headless mode to only one job at a time. We will have to work around this since I have not found an answer or evidence that you can output multiple SQLite files from the CellProfiler GUI.

I have developed a Python utility file with functions to run CellProfiler CLI in parallel using multi-processing. This will be added to a private repo, and I will utilize it to refactor the analysis module in this repo later. Once that function is added, then I will close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants