You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @kaixinbear. The original intent of the repository was to stream from uninterrupted stream readers (e.g hard to seek-to video, online data, etc.) , so by definition 1 process = 1 or multiple streams. The use-case where 1 stream is handled by several processes in parallel (so num_workers can be >> batch_size) is possible if you can seek inside your stream. In that case you can use the original dataloader of pytorch (you just need to map batch_index to file position and use this in the __getitem__ method).
However i am not totally convinced this is so much slower, can you provide an example ?
Hi, etienne87:
I find that this repo would change num_worker to no larger than batch_size.(https://github.com/etienne87/pytorch-stream-dataloader/blob/master/pytorch_stream_dataloader/stream_dataloader.py#L45)
However, this would make the dataloader much slower than vannilar dataloader, which would make training time longer.
Do you have any suggestion on how to support num_workers just like original dataloader in pytorch ?
The text was updated successfully, but these errors were encountered: