ros3 vs. s3fs #78
bendichter
started this conversation in
Show and tell
Replies: 2 comments 3 replies
-
Are there any important tests I missed? |
Beta Was this translation helpful? Give feedback.
2 replies
-
With a little more, it looks like data read time in non-linear:. Here I am reading more rows of the dataset and measuring the time. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Here I am comparing two different ways of directly accessing s3 buckets: the HDF5 ros3 driver vs. s3fs. It looks like we will gain substantial performance benefits from switching to s3fs.
In order to get this to work, I had to relax a few input validation mechanisms in both pynwb and hdmf. See changes here and here.
Tests were done on DANDI Hub.
First, I set up the environment:
nwb-events is needed because we don't have loading namespaces working for s3fs yet. I don't think this would be hard to implement, but it didn't seem necessary for this performance test.
Then I get the s3 location for a Neuropixel NWB file. s3fs requires the url in a slightly different form, but it's the same underlying data.
Now let's get a baseline for ros3 read
The first time I execute this it takes about 4 seconds. Interestingly, subsequent runs take about 0.15 seconds, which indicates that there must be some caching going on.
The analogous lines for s3fs:
take about 4.5 seconds. Consistently longer than ros3, but not by much.
Then I tried data read for ros3:
compared to data read for s3fs:
My findings are that with s3fs you pay a small time penalty on initialization over ros3, but then you can read data roughly 2.5-3x as fast (it varies by run). Also, it looks like the ros3 driver is caching header info so subsequent nwbfile initializations are really fast but s3fs is not. Note that this does not appear to be the case for the dataset values, only the header info. I don't think this is much of a benefit though because I don't expect users to be initializing files multiple times.
Beta Was this translation helpful? Give feedback.
All reactions