-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using PyScenic with C.elegans #602
Comments
If you have your own motifs, you can create your own cisTarget gene databases with https://github.com/aertslab/create_cisTarget_databases (e.g. define your gene regions as 2kb upstream of TSS) and create your own motif to TF file (orthologous identity is not needed to have as it is just a way to be able to use more motifs that where annotated in another species). Do you have a link to the motifs that you have for C.elegans? |
Hey again, I'm working with create_cisTarget_database at the moment and I've come across an issue. I keep coming across this error: Every listed motif passed through -m has a corresponding .cb file in -M. I double checked and made sure that everything had the proper read, write and execute permissions too. The .cb data is of this format: In terms of where we got the data, we were directly given motif and pwm data from CIS-BP's main contact: https://cisbp.ccbr.utoronto.ca/contact.php Thanks so much for your help! |
Your motifs are not in Cluster-Buster format:
Frequency matrices need to be converted to count matrices for Cluster-Buster because of the default pseudo count value it adds to each value in the matrix, so normally I multiply frequency matrices by 100. Biopython has code for reading a lot of different motif formats that I wrote. As we have CIS-BP v2.00 motif collection in our motif collection, you will not have to do this conversion yourself. If somehow you got access to v3.00 which seems to be available soonish, you will have to do the conversion yourself for now. We likely will include that one ourselves later. |
Hey again! We did the conversion ourselves and it went perfectly. Our only problem now is that even though orthologous identity and similarity q value aren't needed, we still get this error: ValueError: Usecols do not match columns, columns expected but not found: ['motif_similarity_qvalue', 'orthologous_identity'] (For context, these are columns we do have: TF_ID Family_ID TSource_ID #motif_id MSource_ID DBID gene_name TF_Species TF_Status Family_Name DBDs DBD_Count Cutoff DBID Motif_Type MSource_Identifier MSource_Type MSource_Author MSource_Year PMID MSource_Version SR_Model SR_NoThreshold TfSource_Name TfSource_URL TfSource_Year TfSource_Month TfSource_Day) We understand that those values aren't inherently necessary, but the prune2df function still requires that we have them. Other than making dummy columns just to fill out the requirement, is there some workaround to this? Thank you so much again. |
You will need to provide dummy values. |
So we provided dummy values and that worked perfectly. However we came across yet another issue. In which this error comes up: We tried verifying our installations and cross-referencing our dataset with example datasets on your github, and everything seems to match up just fine. After a lot of bugfixing we more or less chalked the error up to some quirk of pyscenic we aren't aware of. Thank you so much again for your patience. |
We're trying to utilize a C.elegans dataset with your pipeline; however, both our and public datasets seemingly lack data on orthologous identity for our motif annotations. Do you have a specific way to compute that in a way thats best for your pipeline?
The text was updated successfully, but these errors were encountered: