Using PyScenic with C.elegans #602

rayajallad · 2025-02-03T17:59:32Z

We're trying to utilize a C.elegans dataset with your pipeline; however, both our and public datasets seemingly lack data on orthologous identity for our motif annotations. Do you have a specific way to compute that in a way thats best for your pipeline?

ghuls · 2025-02-05T09:15:46Z

If you have your own motifs, you can create your own cisTarget gene databases with https://github.com/aertslab/create_cisTarget_databases (e.g. define your gene regions as 2kb upstream of TSS) and create your own motif to TF file (orthologous identity is not needed to have as it is just a way to be able to use more motifs that where annotated in another species).

Do you have a link to the motifs that you have for C.elegans?

rayajallad · 2025-02-05T20:57:46Z

Hey again,

I'm working with create_cisTarget_database at the moment and I've come across an issue. I keep coming across this error:

Every listed motif passed through -m has a corresponding .cb file in -M. I double checked and made sure that everything had the proper read, write and execute permissions too.

The .cb data is of this format:

In terms of where we got the data, we were directly given motif and pwm data from CIS-BP's main contact: https://cisbp.ccbr.utoronto.ca/contact.php

Thanks so much for your help!

ghuls · 2025-02-05T23:07:01Z

Your motifs are not in Cluster-Buster format:

>cisbp__M0001
countA countC countG countT
countA countC countG countT
...

Frequency matrices need to be converted to count matrices for Cluster-Buster because of the default pseudo count value it adds to each value in the matrix, so normally I multiply frequency matrices by 100.

Biopython has code for reading a lot of different motif formats that I wrote.

As we have CIS-BP v2.00 motif collection in our motif collection, you will not have to do this conversion yourself. If somehow you got access to v3.00 which seems to be available soonish, you will have to do the conversion yourself for now. We likely will include that one ourselves later.

rayajallad · 2025-02-10T17:35:52Z

Hey again!

We did the conversion ourselves and it went perfectly. Our only problem now is that even though orthologous identity and similarity q value aren't needed, we still get this error:

ValueError: Usecols do not match columns, columns expected but not found: ['motif_similarity_qvalue', 'orthologous_identity']
when running prune2df(dbs, modules, MOTIF_ANNOTATIONS_FNAME)

(For context, these are columns we do have: TF_ID Family_ID TSource_ID #motif_id MSource_ID DBID gene_name TF_Species TF_Status Family_Name DBDs DBD_Count Cutoff DBID Motif_Type MSource_Identifier MSource_Type MSource_Author MSource_Year PMID MSource_Version SR_Model SR_NoThreshold TfSource_Name TfSource_URL TfSource_Year TfSource_Month TfSource_Day)

We understand that those values aren't inherently necessary, but the prune2df function still requires that we have them. Other than making dummy columns just to fill out the requirement, is there some workaround to this?

Thank you so much again.

ghuls · 2025-02-12T14:53:28Z

You will need to provide dummy values.

rayajallad · 2025-02-12T20:53:30Z

So we provided dummy values and that worked perfectly. However we came across yet another issue. In which this error comes up:

We tried verifying our installations and cross-referencing our dataset with example datasets on your github, and everything seems to match up just fine. After a lot of bugfixing we more or less chalked the error up to some quirk of pyscenic we aren't aware of.

Thank you so much again for your patience.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using PyScenic with C.elegans #602

Using PyScenic with C.elegans #602

rayajallad commented Feb 3, 2025

ghuls commented Feb 5, 2025

rayajallad commented Feb 5, 2025 •

edited

Loading

ghuls commented Feb 5, 2025

rayajallad commented Feb 10, 2025

ghuls commented Feb 12, 2025

rayajallad commented Feb 12, 2025 •

edited

Loading

Using PyScenic with C.elegans #602

Using PyScenic with C.elegans #602

Comments

rayajallad commented Feb 3, 2025

ghuls commented Feb 5, 2025

rayajallad commented Feb 5, 2025 • edited Loading

ghuls commented Feb 5, 2025

rayajallad commented Feb 10, 2025

ghuls commented Feb 12, 2025

rayajallad commented Feb 12, 2025 • edited Loading

rayajallad commented Feb 5, 2025 •

edited

Loading

rayajallad commented Feb 12, 2025 •

edited

Loading