Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flexidot computational time #6

Open
paolo002 opened this issue Nov 29, 2018 · 8 comments
Open

Flexidot computational time #6

paolo002 opened this issue Nov 29, 2018 · 8 comments

Comments

@paolo002
Copy link

paolo002 commented Nov 29, 2018

Hi
Thanks a lot for developing this toolkit looks really amazing.
However, may I know the computational time and threads needed to obtain the graphs?
Because I am trying it but it seems to be taking long time to even perform a first calculation.
Does it need to be run on a server?
Best Regards
PL

@paolo002
Copy link
Author

paolo002 commented Dec 1, 2018

Hi again
I have tried to run flexidot on a server but the time needed is still quite long.
Is the tool not able to support parallelisation?
Also, after some time I got some output as txt file but I cannot find the images of the plots...
Regards
PL

@crimBubble
Copy link
Collaborator

Hi @paolo002
I've worked with flexidot for a few times now and everything runs in reasonable time. So may I ask you what kind of data you would like to compare or plot?

@paolo002
Copy link
Author

Hi @crimBubble

Thanks for your reply.
At the moment I have downloaded from UCSC a nucleotide DNA sequence of a region which encompass several genes (the region is pretty large, it should be thousands of base pairs...). I would like to compare it to itself in order to find regions of repeats or inversions.
In the past I wanted to do a pairwise comparison of 3 DNA sequences (which are shorter)
At first the run was stuck and I could not get an output,then suddenly for some reason the tool started working and I got the output for the 3 sequences immediately, (I don't know why the run seemed to be stuck when I run it at the beginning and then it started working..not sure if that depends on the memory available at the time of the run).
Regarding the longer sequence I still did not get an output when I run on my laptop. When I run on a server, the run is complete but I can't see any output.
Please, let me know your advice
Thanks

@molbio-dresden
Copy link
Owner

Hi @paolo002,

thank you for giving FlexiDot a try. We are a bit unsure about where the problem actually resides. Do you get some kind of error message? FlexiDot should give a warning, if parameters are incorrect or files are missing. Maybe you can post the command-line output, so that we can see if there is something wrong. After the run is finished, you should expect "Thank you for using FlexiDot!" to be printed.

We regularly use FlexiDot on SMRT reads (up to 50-100 kb) with reasonable run times. Depending on the repetitiveness, it should take something on the minute scale. To just check the command-line/ tool performance, we recommend that you crop your sequence to something short, maybe 5000 bp, and test, if you get the expected output files. Usually, output files comprise text and image files. The dotplot image itself is the last one generated. Alternatively, you can try to analyse your sequences with a longer word size to rule out memory issues, e.g. -k 15 -S 2 (default -k 7 -S 0 ).

For your information, we are currently preparing the next FlexiDot release which clearly shortens the runtime for long, repetitive sequences. We are testing it at the moment with the most common commands, and if we do not run into any trouble, it should be online this week. In the future, we would like to parallelize FlexiDot. However, we did not yet find a satisfying library that works cross-platform.

In general, we would not recommend using FlexiDot for pseudochromosomes or other sequences in the Mbp scale, as it would simply take too long. Especially for small word sizes this might also raise memory issues. For long sequences, we recommend the use of longer word sizes, maybe with mismatches. There are other tools (such as dgenies), which perform better on super-long sequences.

Best wishes, Kathrin and Tony

@paolo002
Copy link
Author

Hi Kathryn and Tony

I just tried it right now and I increased the word size value -k to 20 as you suggested and it worded. I got the output within few minutes. By the way, my sequence is 600kb so this kind of size should be fine right?
Good to hear that you are realising a new version. This tool is very nice, especially the graphic of the output.
Thanks
Regards
Paolo

@paolo002
Copy link
Author

Hi
sorry to disturb you again, I was trying it with larger sequence, such as around 1MBp.
It seems to be stuck again.
Maybe there still some issue with the memory and longer sequences.
If it is ready this week, I will try the new version too and see how it will perform compared to the old one.
Thanks a lot
Best
Paolo

@molbio-dresden
Copy link
Owner

Hi @paolo002 and @crimBubble,

we just uploaded the new FlexiDot version. Maybe you want to try, how it performs on your data? It won't address the memory issue, but it will be faster.

Best,

Kathrin & Tony

@skeffington
Copy link

Hi, I'm enjoying using Flexidot, but I had similar issues at first in that it took a long time to run. Then I realized I'd missed the -t option: when I specify that the sequences are protein it runs much quicker (surprisingly it runs to completion even with the wrong sequence type). I guess it might be useful if Flexidot detects whether the -t option is correct for the sequences it reads, and throws an error if not.

Thanks for the nice tool!
Alastair

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants