parallel implementation #31

jiwen90 · 2023-01-28T06:42:46Z

This resolves #6 and #17.

jackbibby1 · 2023-01-30T18:44:26Z

Hi. Thanks for this -- have been meaning to implement some parallelisation of the pathways, but haven't had the time yet. I'm currently getting an error when I run this on my machine, using the naive_cd4.rds dataset from this tutorial.

df <- readRDS("naive_cd4.rds")

pathways <- msigdbr("Homo sapiens", "H") %>%
  format_pathways()

p1 <- seurat_extract(df, meta1 = "Hour", value_meta1 = "0")
p2 <- seurat_extract(df, meta1 = "Hour", value_meta1 = "24")

scpa_out <- compare_pathways_parallel(samples = list(p1, p2), pathways = pathways, cores = 2)

--------

Cell numbers in population 1 = 4428
Cell numbers in population 2 = 5919
- If greater than 500 cells, these populations will be downsampled

All 50 pathways passed the min/max genes threshold

Calculating pathway fold changes...

Performing a two-sample analysis with SCPA...
Error in { : task 1 failed - "multi-argument returns are not permitted"

Is this running fine on your end?

Jack

jiwen90 · 2023-01-30T19:37:29Z

Hi,

It works for me. The "multi-argument returns are not permitted" is likely due to the tryCatch error message, which I just removed. I'm not sure what error was thrown on your end -- can you run it again, optionally without the tryCatch, to see what's the problem?

> df <- readRDS("naive_cd4.rds")

> pathways <- msigdbr("Homo sapiens", "H") %>%
  format_pathways()

> p1 <- seurat_extract(df, meta1 = "Hour", value_meta1 = "0")
> p2 <- seurat_extract(df, meta1 = "Hour", value_meta1 = "24")

> scpa_out <- compare_pathways_parallel(samples = list(p1, p2), pathways = pathways, cores = 2)

Cell numbers in population 1 = 4428
Cell numbers in population 2 = 5919
- If greater than 500 cells, these populations will be downsampled

All 50 pathways passed the min/max genes threshold

Calculating pathway fold changes...

Performing a two-sample analysis with SCPA...
Loading required package: doParallel
Loading required package: foreach
Loading required package: iterators
Loading required package: parallel

> scpa_out
                                      Pathway          Pval       adjPval
1          HALLMARK_INTERFERON_GAMMA_RESPONSE 5.341958e-106 2.670979e-104
2                HALLMARK_ALLOGRAFT_REJECTION 1.253504e-102 6.267519e-101
3          HALLMARK_INTERFERON_ALPHA_RESPONSE  2.589021e-99  1.294511e-97
4              HALLMARK_INFLAMMATORY_RESPONSE  1.061001e-89  5.305003e-88
5            HALLMARK_TNFA_SIGNALING_VIA_NFKB  3.796084e-88  1.898042e-86
6            HALLMARK_IL6_JAK_STAT3_SIGNALING  4.522036e-82  2.261018e-80
7                     HALLMARK_MYC_TARGETS_V1  4.522036e-82  2.261018e-80
8                     HALLMARK_UV_RESPONSE_UP  1.379483e-80  6.897414e-79
9                          HALLMARK_APOPTOSIS  4.076176e-79  2.038088e-77
10                        HALLMARK_COMPLEMENT  3.234390e-76  1.617195e-74
...

jackbibby1 · 2023-01-30T19:57:57Z

Hmm, odd. New error, but this time:

df <- readRDS("naive_cd4.rds")

pathways <- msigdbr("Homo sapiens", "H") %>%
  format_pathways()

p1 <- seurat_extract(df, meta1 = "Hour", value_meta1 = "0")
p2 <- seurat_extract(df, meta1 = "Hour", value_meta1 = "24")

scpa_out <- compare_pathways_parallel(samples = list(p1, p2), pathways = pathways, cores = 2)

-------

Cell numbers in population 1 = 4428
Cell numbers in population 2 = 5919
- If greater than 500 cells, these populations will be downsampled

All 50 pathways passed the min/max genes threshold

Calculating pathway fold changes...

Performing a two-sample analysis with SCPA...
Error in `dplyr::arrange()`:
! Problem with the implicit `transmute()` step.
✖ Problem while computing `..1 = qval`.
Caused by error in `mask$eval_all_mutate()`:
! object 'qval' not found
Run `rlang::last_error()` to see where the error occurred.

Having a closer look, in the scpa_result <- foreach(pathway = pathways_filtered) %dopar% { ... }, error = function(e) { }) } part of the function I'm just getting a bunch of NULL values, which will be causing the error with arrange...

> scpa_result

[[1]]
NULL

[[2]]
NULL

...

[[49]]
NULL

[[50]]
NULL

jiwen90 · 2023-01-30T20:09:29Z

Can you

Check that all required packages are installed (doParallel and others)
Run without tryCatch? The NULL values are from the catch block, which indicates an error inside the main function.

Thanks!

jackbibby1 · 2023-01-30T20:14:25Z

Yup, my bad -- that was me being an idiot. I ran without the tryCatch and it's running fine now

jiwen90 · 2023-01-30T20:30:18Z

The tryCatch block itself shouldn't cause errors and should be included for safe cluster shutdown. If you can run it with the tryCatch block, then this should be good to merge. Otherwise we need to find the underlying problem.

jackbibby1 · 2023-01-30T20:56:48Z

Yup -- looks like I caused an issue with the %>% dependency when I was running it. Anyway, it all looks to be working well now, so have merged everything. Thanks a bunch for that, it was really useful

jiwen90 · 2023-01-31T17:42:47Z

It may be worth noting that the greatest speedup comes from running many pathways (eg 1000+). With only 50 pathways, as in the tutorial, the processes spawn time begins to exceed actual execution time.

jackbibby1 · 2023-02-01T16:06:35Z

Yeah, good point -- I'll add some more benchmarking with increasing gene set sizes

jiwen90 · 2023-02-02T05:50:33Z

Small usability point - I would remove the parallel parameter and instead set cores=1 as default. If cores > 1, then use parallel processing. This avoids an extra parameter and user confusion.

jackbibby1 · 2023-02-03T15:19:37Z

Hmm yeah, I think that's probably best on reflection. I'll change it up in the next version

jiwen90 added 2 commits January 28, 2023 01:37

parallel implementation

ee3ea2b

fix downsample

da5b7bf

fix catch

ed704d7

jackbibby1 merged commit cc2ec22 into jackbibby1:main Jan 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parallel implementation #31

parallel implementation #31

jiwen90 commented Jan 28, 2023

jackbibby1 commented Jan 30, 2023

jiwen90 commented Jan 30, 2023

jackbibby1 commented Jan 30, 2023

jiwen90 commented Jan 30, 2023

jackbibby1 commented Jan 30, 2023

jiwen90 commented Jan 30, 2023

jackbibby1 commented Jan 30, 2023

jiwen90 commented Jan 31, 2023

jackbibby1 commented Feb 1, 2023

jiwen90 commented Feb 2, 2023

jackbibby1 commented Feb 3, 2023

parallel implementation #31

parallel implementation #31

Conversation

jiwen90 commented Jan 28, 2023

jackbibby1 commented Jan 30, 2023

jiwen90 commented Jan 30, 2023

jackbibby1 commented Jan 30, 2023

jiwen90 commented Jan 30, 2023

jackbibby1 commented Jan 30, 2023

jiwen90 commented Jan 30, 2023

jackbibby1 commented Jan 30, 2023

jiwen90 commented Jan 31, 2023

jackbibby1 commented Feb 1, 2023

jiwen90 commented Feb 2, 2023

jackbibby1 commented Feb 3, 2023