Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using PyScenic with C.elegans #602

Open
rayajallad opened this issue Feb 3, 2025 · 6 comments
Open

Using PyScenic with C.elegans #602

rayajallad opened this issue Feb 3, 2025 · 6 comments

Comments

@rayajallad
Copy link

We're trying to utilize a C.elegans dataset with your pipeline; however, both our and public datasets seemingly lack data on orthologous identity for our motif annotations. Do you have a specific way to compute that in a way thats best for your pipeline?

@ghuls
Copy link
Member

ghuls commented Feb 5, 2025

If you have your own motifs, you can create your own cisTarget gene databases with https://github.com/aertslab/create_cisTarget_databases (e.g. define your gene regions as 2kb upstream of TSS) and create your own motif to TF file (orthologous identity is not needed to have as it is just a way to be able to use more motifs that where annotated in another species).

Do you have a link to the motifs that you have for C.elegans?

@rayajallad
Copy link
Author

rayajallad commented Feb 5, 2025

Hey again,

I'm working with create_cisTarget_database at the moment and I've come across an issue. I keep coming across this error:
Image

Every listed motif passed through -m has a corresponding .cb file in -M. I double checked and made sure that everything had the proper read, write and execute permissions too.

The .cb data is of this format:
Image

In terms of where we got the data, we were directly given motif and pwm data from CIS-BP's main contact: https://cisbp.ccbr.utoronto.ca/contact.php

Thanks so much for your help!

@ghuls
Copy link
Member

ghuls commented Feb 5, 2025

Your motifs are not in Cluster-Buster format:

>cisbp__M0001
countA countC countG countT
countA countC countG countT
...

Frequency matrices need to be converted to count matrices for Cluster-Buster because of the default pseudo count value it adds to each value in the matrix, so normally I multiply frequency matrices by 100.

Biopython has code for reading a lot of different motif formats that I wrote.

As we have CIS-BP v2.00 motif collection in our motif collection, you will not have to do this conversion yourself. If somehow you got access to v3.00 which seems to be available soonish, you will have to do the conversion yourself for now. We likely will include that one ourselves later.

@rayajallad
Copy link
Author

Hey again!

We did the conversion ourselves and it went perfectly. Our only problem now is that even though orthologous identity and similarity q value aren't needed, we still get this error:

ValueError: Usecols do not match columns, columns expected but not found: ['motif_similarity_qvalue', 'orthologous_identity']
when running prune2df(dbs, modules, MOTIF_ANNOTATIONS_FNAME)

(For context, these are columns we do have: TF_ID Family_ID TSource_ID #motif_id MSource_ID DBID gene_name TF_Species TF_Status Family_Name DBDs DBD_Count Cutoff DBID Motif_Type MSource_Identifier MSource_Type MSource_Author MSource_Year PMID MSource_Version SR_Model SR_NoThreshold TfSource_Name TfSource_URL TfSource_Year TfSource_Month TfSource_Day)

We understand that those values aren't inherently necessary, but the prune2df function still requires that we have them. Other than making dummy columns just to fill out the requirement, is there some workaround to this?

Thank you so much again.

@ghuls
Copy link
Member

ghuls commented Feb 12, 2025

You will need to provide dummy values.

@rayajallad
Copy link
Author

rayajallad commented Feb 12, 2025

So we provided dummy values and that worked perfectly. However we came across yet another issue. In which this error comes up:

Image

Image

Image

We tried verifying our installations and cross-referencing our dataset with example datasets on your github, and everything seems to match up just fine. After a lot of bugfixing we more or less chalked the error up to some quirk of pyscenic we aren't aware of.

Thank you so much again for your patience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants