Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The clonalCluster() function cannot generate clusters. #433

Closed
HaiqiaoSun opened this issue Oct 30, 2024 · 6 comments
Closed

The clonalCluster() function cannot generate clusters. #433

HaiqiaoSun opened this issue Oct 30, 2024 · 6 comments
Labels
more info needed Waiting on more information

Comments

@HaiqiaoSun
Copy link

My combined <- combineBCR(csv1, samples = c("sp_exp"), ID = c(NULL)) successfully created Cluster in the CTstrict column with the default threshold of 0.85, which seems perfectly correct.

combined[["sp_exp"]][["CTstrict"]]
[1] "IGH.1376.IGHV3-43_IGLC:Cluster.12.IGKV2-24" "IGH.1376.IGHV3-43_IGLC:Cluster.12.IGKV2-24"
[3] "IGH.1505.IGHV1-18_IGLC:Cluster.12.IGKV2-24" "IGH.1741.IGHV3-64D_IGLC:Cluster.12.IGKV2-24"

However, when using the clonalCluster function as follows:

igraph.object <- clonalCluster(combined,
chain = "IGH",
sequence = "aa",
threshold = 0.85,
samples = NULL,
exportGraph = TRUE)
it returns an error: clonalCluster(combined, chain = "IGH", sequence = "aa", threshold = 0.85, : No clusters detected with current parameters.

Do you have any ideas about this? Thank you.

@ncborcherding
Copy link
Owner

Hey @HaiqiaoSun

Thanks for reaching out.

Please make a reproducible example for this issue so I know what you are experiencing.

Also with the example, please make sure to include the output of sessionInfo().

Thanks a lot,
Nick

@ncborcherding ncborcherding added the more info needed Waiting on more information label Oct 30, 2024
@HaiqiaoSun
Copy link
Author

HaiqiaoSun commented Oct 30, 2024

Thank you for your prompt response. Here is the data and example I used.

https://gist.github.com/HaiqiaoSun/698bc2d3e14950feb085fa19a6d56077

> library(scRepertoire)
> csv1 <- read.csv("E:/R/filtered_contig_annotations.csv", stringsAsFactors = F)
> combined <- combineBCR(csv1, samples = c("sp_exp"),
+                        ID = c(NULL))
> head(combined[[1]])
                      barcode sample                          IGH                  cdr3_aa1
1 sp_exp_ATCCACGCCGCAGTCCAGTA sp_exp  IGHV3-33.IGHD3-3.IGHJ6.IGHM CARDRQVLRFLEWLLPYYYYGMDVW
2 sp_exp_AGTATCTAACCACTTGTAAG sp_exp        IGHV3-7.NA.IGHJ4.IGHM               CARDGGDDGYW
3 sp_exp_AACTGGCGAGCAATAACACT sp_exp IGHV3-21.IGHD3-10.IGHJ4.IGHM      CARIPPVTMVRGVIKFFDYW
4 sp_exp_GACAGCGAATGGAGTTAAGG sp_exp       IGHV3-53.NA.IGHJ4.IGHM               CARDSYSSHYW
5 sp_exp_TTGTCGAAGGCTCATACGCT sp_exp       IGHV4-34.NA.IGHJ6.IGHM       CASGIKQQLAYYYYGMDVW
6 sp_exp_TAGCGTTCGTAACGATTCAA sp_exp                         <NA>                      <NA>
                                                                     cdr3_nt1                IGLC    cdr3_aa2
1 TGTGCGAGAGATCGTCAAGTATTACGATTTTTGGAGTGGTTATTACCCTACTACTACTACGGTATGGACGTCTGG IGKV2-24.IGKJ2.IGKC CTQATQSPYTF
2                                           TGTGCGAGAGATGGCGGTGACGACGGGTACTGG IGKV2-24.IGKJ2.IGKC CTQATQFPYTF
3                TGTGCGAGAATACCTCCGGTTACTATGGTTCGGGGAGTTATTAAATTCTTTGACTACTGG IGKV2-24.IGKJ2.IGKC CTQATQFPYTF
4                                           TGTGCGAGAGATTCGTATAGCAGCCATTACTGG IGKV2-24.IGKJ1.IGKC CTQATQFPWTF
5                   TGTGCGAGTGGAATTAAGCAGCAGCTGGCCTACTACTACTACGGTATGGACGTCTGG IGKV2-24.IGKJ4.IGKC CTQATQFPHTF
6                                                                        <NA> IGKV2-24.IGKJ5.IGKC  CTQATQFPHF
                           cdr3_nt2                                           CTgene
1 TGCACGCAAGCTACACAATCCCCGTACACTTTT  IGHV3-33.IGHD3-3.IGHJ6.IGHM_IGKV2-24.IGKJ2.IGKC
2 TGCACGCAAGCTACACAATTTCCGTACACTTTT        IGHV3-7.NA.IGHJ4.IGHM_IGKV2-24.IGKJ2.IGKC
3 TGCACGCAAGCTACACAATTTCCGTACACTTTT IGHV3-21.IGHD3-10.IGHJ4.IGHM_IGKV2-24.IGKJ2.IGKC
4 TGCACGCAAGCTACACAATTTCCGTGGACGTTC       IGHV3-53.NA.IGHJ4.IGHM_IGKV2-24.IGKJ1.IGKC
5 TGCACGCAAGCTACACAATTTCCTCACACTTTC       IGHV4-34.NA.IGHJ6.IGHM_IGKV2-24.IGKJ4.IGKC
6    TGCACGCAAGCTACACAATTTCCTCACTTC                           NA_IGKV2-24.IGKJ5.IGKC
                                                                                                           CTnt
1 TGTGCGAGAGATCGTCAAGTATTACGATTTTTGGAGTGGTTATTACCCTACTACTACTACGGTATGGACGTCTGG_TGCACGCAAGCTACACAATCCCCGTACACTTTT
2                                           TGTGCGAGAGATGGCGGTGACGACGGGTACTGG_TGCACGCAAGCTACACAATTTCCGTACACTTTT
3                TGTGCGAGAATACCTCCGGTTACTATGGTTCGGGGAGTTATTAAATTCTTTGACTACTGG_TGCACGCAAGCTACACAATTTCCGTACACTTTT
4                                           TGTGCGAGAGATTCGTATAGCAGCCATTACTGG_TGCACGCAAGCTACACAATTTCCGTGGACGTTC
5                   TGTGCGAGTGGAATTAAGCAGCAGCTGGCCTACTACTACTACGGTATGGACGTCTGG_TGCACGCAAGCTACACAATTTCCTCACACTTTC
6                                                                             NA_TGCACGCAAGCTACACAATTTCCTCACTTC
                                   CTaa                                         CTstrict
1 CARDRQVLRFLEWLLPYYYYGMDVW_CTQATQSPYTF IGH:Cluster.32.IGHV3-33_IGLC:Cluster.14.IGKV2-24
2               CARDGGDDGYW_CTQATQFPYTF         IGH.403.IGHV3-7_IGLC:Cluster.14.IGKV2-24
3      CARIPPVTMVRGVIKFFDYW_CTQATQFPYTF         IGH.33.IGHV3-21_IGLC:Cluster.14.IGKV2-24
4               CARDSYSSHYW_CTQATQFPWTF       IGH.1202.IGHV3-53_IGLC:Cluster.14.IGKV2-24
5       CASGIKQQLAYYYYGMDVW_CTQATQFPHTF IGH:Cluster.56.IGHV4-34_IGLC:Cluster.14.IGKV2-24
6                         NA_CTQATQFPHF                   NA.NA_IGLC:Cluster.14.IGKV2-24
> #igraph
> igraph.object <- clonalCluster(combined,
+                                chain = "IGH", 
+                                sequence = "aa",
+                                threshold = 0.85,
+                                samples = NULL, 
+                                exportGraph = TRUE)
错误于clonalCluster(combined, chain = "IGH", sequence = "aa", threshold = 0.85, : 
  No clusters detected with current parameters.`

@HaiqiaoSun
Copy link
Author

sessionInfo()
R version 4.4.1 (2024-06-14 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale:
[1] LC_COLLATE=Chinese (Simplified)_China.utf8 LC_CTYPE=Chinese (Simplified)_China.utf8
[3] LC_MONETARY=Chinese (Simplified)_China.utf8 LC_NUMERIC=C
[5] LC_TIME=Chinese (Simplified)_China.utf8

time zone: Asia/Shanghai
tzcode source: internal

attached base packages:
[1] grid stats4 stats graphics grDevices utils datasets methods base

other attached packages:
[1] ggraph_2.2.1 scater_1.32.1 scuttle_1.14.0 ggsci_3.2.0
[5] scRepertoire_2.0.7 plot1cell_0.0.0.9000 ComplexHeatmap_2.20.0 wordcloud_2.6
[9] simplifyEnrichment_1.14.1 GEOquery_2.72.0 rlang_1.1.4 cowplot_1.1.3
[13] EnsDb.Hsapiens.v86_2.99.0 ensembldb_2.28.1 AnnotationFilter_1.28.0 GenomicFeatures_1.56.0
[17] AnnotationDbi_1.66.0 loomR_0.2.0 itertools_0.1-3 iterators_1.0.14
[21] R6_2.5.1 hdf5r_1.3.11 data.table_1.16.2 DoubletFinder_2.0.4
[25] ComplexUpset_1.3.3 purrr_1.0.2 ggbeeswarm_0.7.2 reshape2_1.4.4
[29] biomaRt_2.60.1 RColorBrewer_1.1-3 progress_1.2.3 scales_1.3.0
[33] MASS_7.3-61 ggh4x_0.2.8 circlize_0.4.16 plotly_4.10.4
[37] ggrastr_1.0.2 shiny_1.9.1 Matrix_1.7-1 SeuratWrappers_0.3.0
[41] msigdbr_7.5.1 SCPA_1.6.2 Nebulosa_1.14.0 ggplot2_3.5.1
[45] viridis_0.6.5 viridisLite_0.4.2 patchwork_1.3.0 Seurat_5.1.0
[49] SeuratObject_5.0.2 sp_2.1-4 dplyr_1.1.4 monocle3_1.2.9
[53] SingleCellExperiment_1.26.0 SummarizedExperiment_1.34.0 GenomicRanges_1.56.2 GenomeInfoDb_1.40.1
[57] IRanges_2.38.1 S4Vectors_0.42.1 MatrixGenerics_1.16.0 matrixStats_1.4.1
[61] Biobase_2.64.0 BiocGenerics_0.50.0

loaded via a namespace (and not attached):
[1] igraph_2.1.1 hash_2.2.6.3 ica_1.0-3 Formula_1.2-5
[5] zlibbioc_1.50.0 tidyselect_1.2.1 bit_4.5.0 doParallel_1.0.17
[9] clue_0.3-65 tm_0.7-14 lattice_0.22-6 rjson_0.2.23
[13] evmix_2.12 blob_1.2.4 stringr_1.5.1 S4Arrays_1.4.1
[17] parallel_4.4.1 png_0.1-8 cli_3.6.3 ProtGenerics_1.36.0
[21] goftest_1.2-3 textshaping_0.4.0 BiocIO_1.14.0 BiocNeighbors_1.22.0
[25] stringdist_0.9.12 uwot_0.2.2 curl_5.2.3 mime_0.12
[29] evaluate_1.0.1 leiden_0.4.3.1 stringi_1.8.4 backports_1.5.0
[33] gsl_2.1-8 XML_3.99-0.17 httpuv_1.6.15 magrittr_2.0.3
[37] rappdirs_0.3.3 splines_4.4.1 mclust_6.1.1 org.Hs.eg.db_3.19.1
[41] sctransform_0.4.1 DBI_1.2.3 terra_1.7-83 jquerylib_0.1.4
[45] withr_3.0.2 systemfonts_1.1.0 lmtest_0.9-40 tidygraph_1.3.1
[49] rtracklayer_1.64.0 BiocManager_1.30.25 htmlwidgets_1.6.4 fs_1.6.4
[53] ggrepel_0.9.6 labeling_0.4.3 SparseArray_1.4.8 reticulate_1.39.0
[57] zoo_1.8-12 XVector_0.44.0 knitr_1.48 UCSC.utils_1.0.0
[61] foreach_1.5.2 fansi_1.0.6 quantreg_5.99 R.oo_1.26.0
[65] RSpectra_0.16-2 irlba_2.3.5.1 fastDummies_1.7.4 lazyeval_0.2.2
[69] yaml_2.3.10 survival_3.7-0 scattermore_1.2 crayon_1.5.3
[73] RcppAnnoy_0.0.22 tidyr_1.3.1 progressr_0.14.0 tweenr_2.0.3
[77] later_1.3.2 ggridges_0.5.6 codetools_0.2-20 base64enc_0.1-3
[81] GlobalOptions_0.1.2 KEGGREST_1.44.1 Rtsne_0.17 shape_1.4.6.1
[85] limma_3.60.6 Rsamtools_2.20.0 filelock_1.0.3 foreign_0.8-87
[89] pkgconfig_2.0.3 xml2_1.3.6 spatstat.univar_3.0-1 GenomicAlignments_1.40.0
[93] iNEXT_3.0.1 evd_2.3-7.1 spatstat.sparse_3.1-0 xtable_1.8-4
[97] highr_0.11 plyr_1.8.9 httr_1.4.7 tools_4.4.1
[101] globals_0.16.3 beeswarm_0.4.0 htmlTable_2.4.3 checkmate_2.3.2
[105] nlme_3.1-166 dbplyr_2.5.0 MatrixModels_0.5-3 assertthat_0.2.1
[109] lme4_1.1-35.5 digest_0.6.37 farver_2.1.2 tzdb_0.4.0
[113] ks_1.14.3 yulab.utils_0.1.7 rpart_4.1.23 glue_1.8.0
[117] cachem_1.1.0 BiocFileCache_2.12.0 nbpMatching_1.5.6 polyclip_1.10-7
[121] Hmisc_5.1-3 generics_0.1.3 proxyC_0.4.1 Biostrings_2.72.1
[125] ggalluvial_0.12.5 mvtnorm_1.3-1 parallelly_1.38.0 multicross_2.1.0
[129] statmod_1.5.0 RcppHNSW_0.6.0 ragg_1.3.3 crossmatch_1.4-0
[133] ScaledMatrix_1.12.0 minqa_1.2.8 pbapply_1.7-2 httr2_1.0.5
[137] spam_2.11-0 utf8_1.2.4 graphlayouts_1.2.0 gridExtra_2.3
[141] GenomeInfoDbData_1.2.12 R.utils_2.12.3 RCurl_1.98-1.16 memoise_2.0.1
[145] rmarkdown_2.28 R.methodsS3_1.8.2 future_1.34.0 RANN_2.6.2
[149] Cairo_1.6-2 spatstat.data_3.1-2 rstudioapi_0.17.1 cluster_2.1.6
[153] spatstat.utils_3.1-0 hms_1.1.3 fitdistrplus_1.2-1 munsell_0.5.1
[157] colorspace_2.1-1 ggdendro_0.2.0 DelayedMatrixStats_1.26.0 sparseMatrixStats_1.16.0
[161] truncdist_1.0-2 dotCall64_1.2 ggforce_0.4.2 xfun_0.48
[165] remotes_2.5.0 abind_1.4-8 GOSemSim_2.30.2 tibble_3.2.1
[169] readr_2.1.5 bitops_1.0-9 promises_1.3.0 RSQLite_2.3.7
[173] leidenbase_0.1.31 DelayedArray_0.30.1 proxy_0.4-27 GO.db_3.19.1
[177] compiler_4.4.1 prettyunits_1.2.0 boot_1.3-31 beachmat_2.20.0
[181] SparseM_1.84-2 listenv_0.9.1 Rcpp_1.0.13 BiocSingular_1.20.0
[185] tensor_1.5 BiocParallel_1.38.0 cubature_2.1.1 babelgene_22.9
[189] spatstat.random_3.3-2 fastmap_1.2.0 vipor_0.4.7 ROCR_1.0-11
[193] rsvd_1.0.5 nnet_7.3-19 gtable_0.3.6 KernSmooth_2.23-24
[197] miniUI_0.1.1.1 deldir_2.0-4 htmltools_0.5.8.1 bit64_4.5.2
[201] spatstat.explore_3.3-3 lifecycle_1.0.4 NLP_0.3-0 nloptr_2.1.1
[205] restfulr_0.0.15 sass_0.4.9 vctrs_0.6.5 isoband_0.2.7
[209] rsconnect_1.3.2 VGAM_1.1-12 slam_0.1-54 spatstat.geom_3.3-3
[213] future.apply_1.11.3 pracma_2.4.4 bslib_0.8.0 pillar_1.9.0
[217] jsonlite_1.8.9 GetoptLong_1.0.5

@HaiqiaoSun
Copy link
Author

HaiqiaoSun commented Oct 31, 2024

I found that setting the exportGraph parameter of the clonalCluster function to F allows the function to run and generate the igraph.object, but it does not produce the clusters.

To generate an igraph similar to the one which Nick did in https://www.bioconductor.org/packages/devel/bioc/vignettes/scRepertoire/inst/doc/vignette.html#1_Introduction, I tried using the cluster generated by combineBCR function to draw the clone connection plot, which produces a similar effect, though not as visually appealing as Nick's example.
P10 plot
git
Below is the code I used. Note that my dataset does not include multiple groups, and I did not adjust the node size based on the cluster size.

#igraph BCR connection plot

library(scRepertoire)
library(igraph)
library(ggraph)
library(tidyverse)

# Load data
csv1 <- read.csv("E:/R/filtered_contig_annotations.csv", stringsAsFactors = F)
combined <- combineBCR(csv1, samples = c("sp_exp"), ID = NULL,
                       threshold = 0.95)

# Create clonal graph object
igraph.object <- clonalCluster(combined,
                               chain = "IGH", 
                               sequence = "aa",
                               threshold = 0.95,
                               samples = NULL, 
                               exportGraph = F)

# Convert graph object to a dataframe
df <- as.data.frame(igraph.object)
if (!is.data.frame(df)) {
  df <- as.data.frame(df)
}

# Extract IGH:Cluster part from sp_exp.CTstrict column to define each node's cluster
df <- df %>%
  dplyr::mutate(IGH_cluster = sapply(strsplit(as.character(sp_exp.CTstrict), "_"), function(x) x[1]))

# Filter rows containing IGH cluster information and remove NA values
df_igh <- df %>% dplyr::filter(!is.na(IGH_cluster))

# Keep only clusters with at least 3 nodes
df_filtered <- df_igh %>%
  group_by(IGH_cluster) %>%
  dplyr::filter(n() >= 3) %>%  # Keep only clusters with at least 3 nodes
  ungroup()

# Create an edge list, connecting samples within the same cluster
edge_list <- df_filtered %>% 
  group_by(IGH_cluster) %>% 
  summarise(pairs = list(combn(sp_exp.barcode, 2, simplify = FALSE))) %>% 
  unnest_longer(pairs) %>% 
  mutate(from = sapply(pairs, `[`, 1),
         to = sapply(pairs, `[`, 2)) %>% 
  dplyr::select(from, to)

# Remove possible NA values
edge_list <- edge_list %>% dplyr::filter(!is.na(from) & !is.na(to))

# Create graph object, only keep nodes that are in the filtered edge list
graph <- graph_from_data_frame(edge_list, directed = TRUE)

# Update graph node attributes
V(graph)$cluster <- df_filtered$IGH_cluster[match(V(graph)$name, df_filtered$sp_exp.barcode)]
V(graph)$size <- 1.5  # Node size

# Assign a unique color to each cluster
library(viridis)
unique_clusters <- unique(V(graph)$cluster)
color_map <- setNames(viridis(length(unique_clusters)), unique_clusters)
col_samples <- color_map[V(graph)$cluster]

# Use MDS layout for graph plotting
layout <- layout_with_mds(graph)

plot(
  graph,
  layout = layout,
  vertex.size = V(graph)$size,
  vertex.label = NA,
  edge.arrow.size = 0.1,
  vertex.color = col_samples,
  vertex.frame.width = 0.2
)

ncborcherding added a commit that referenced this issue Oct 31, 2024
Per issue #433 exportGraph = TRUE breaks in the context of a single sample, using igraph::union before evaluation statement
@ncborcherding
Copy link
Owner

@HaiqiaoSun

Thank you so much for the excellent followup and rundown of the problem. I was able to find the issue - evaluation statement during exportGraph = TRUE assumed that there would be more than 1 sample - so feeding in a single sequencing run broke it.

We now combine the entire graph before we evaluate:

cluster <- do.call(igraph::union, output.list)

Here is a rundown of the fixed function:

BCR <- read.csv("https://www.borch.dev/uploads/contigs/b_contigs.csv")
combined <- combineBCR(BCR, 
                       samples = "Patient1", 
                       threshold = 0.85)

igraph.object <- clonalCluster(combined,
                               chain = "IGH", 
                               sequence = "aa",
                               threshold = 0.85,
                               samples = NULL, 
                               exportGraph = TRUE)
plot(igraph.object)
Screenshot 2024-10-31 at 2 40 54 PM

Fix is up in the dev branch - @Qile0317 and I are working on some minor changes before we get this version shipped and pushed to bioconductor/main branch of github

Thanks again,
Nick

@HaiqiaoSun
Copy link
Author

@ncborcherding
Thank you for your response and for the continuous updates to scRepertoire. The updated clonalCluster function runs perfectly on my single sequencing dataset. Not only does it facilitate research on T cells, but scRepertoire also provides convenience for B cell researchers like me. Thanks again.

Haiqiao

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
more info needed Waiting on more information
Projects
None yet
Development

No branches or pull requests

2 participants