You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It would be nice if a Faidx instance could be constructed that accesses the fasta file via a memory mapped file rather than explicit file seek/read. This would have several benefits, including
Removing the need to protect reads against multiple threads
Avoiding the need to make system calls for each sequence read
I'd imagine adding a parameter 'file_mapped=True' to the 'Faidx' constructor, and leveraging the 'mutable' parameter to determine if the memory mapping is read-only. The 'Faidx.file' object should probably be wrapped in a small object that implements random access read/write of sequences, which could then work for both normal file access and memory mapped access.
If you would consider this an acceptable addition, and would accept a patch, I should be able to provide a potential implementation in the next few days.
Thanks,
Doug
The text was updated successfully, but these errors were encountered:
Thanks for the suggestion! I actually implemented Faidx using a memory-mapped file in very early versions, but didn't see much performance benefit. I think that you only avoid a system call when the data resides in the OS buffer. Otherwise you generate a page fault which causes the OS to read the data from disk, which isn't really faster than reading from disk using a system call. Also I wanted to avoid any issues with mapping larger FASTA files under 32-bit OSes.
However, if you would like to submit a PR which adds the type of wrapper you describe then I'd be all for it, since there are definitely use cases where there could be a performance benefit. I'd prefer to leave the locks in place, unless there is a clear performance benefit to be had from removing them. Also, the locks around writing a new index file should stay in place no matter what.
It would be nice if a Faidx instance could be constructed that accesses the fasta file via a memory mapped file rather than explicit file seek/read. This would have several benefits, including
I'd imagine adding a parameter 'file_mapped=True' to the 'Faidx' constructor, and leveraging the 'mutable' parameter to determine if the memory mapping is read-only. The 'Faidx.file' object should probably be wrapped in a small object that implements random access read/write of sequences, which could then work for both normal file access and memory mapped access.
If you would consider this an acceptable addition, and would accept a patch, I should be able to provide a potential implementation in the next few days.
Thanks,
Doug
The text was updated successfully, but these errors were encountered: