You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Despite updating the evaluate_paf script to handle queries better, the performance of the script is inadequate for large-scale CI jobs.
One solution to this is to ditch the interval tree data structure and instead rely on sorted PAF input. For large PAF files, this may still take a significant amount of time, though it should significantly reduce the memory usage (requiring only two PAF records to be kept in memory at a time; currently, all truth set records are maintained in memory).
Another option would be to provide random access to bgzipped PAF files, either through TABIX or some other API.
The text was updated successfully, but these errors were encountered:
edawson
changed the title
[pygenomeworks] evaluate_paf script is too slow to be practical
[pygenomeworks] evaluate_paf script is too slow to be practical for very large PAF files
Sep 23, 2020
Despite updating the evaluate_paf script to handle queries better, the performance of the script is inadequate for large-scale CI jobs.
One solution to this is to ditch the interval tree data structure and instead rely on sorted PAF input. For large PAF files, this may still take a significant amount of time, though it should significantly reduce the memory usage (requiring only two PAF records to be kept in memory at a time; currently, all truth set records are maintained in memory).
Another option would be to provide random access to bgzipped PAF files, either through TABIX or some other API.
The text was updated successfully, but these errors were encountered: