-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
download data, why success is always 0.14, but the file size is 80% of the total file (total file size is in README) #39
Comments
Did you setup DNS resolving? See readme
…On Wed, Jul 26, 2023, 08:40 KylinChen ***@***.***> wrote:
Sharding file number 214 of 253 called /hetu_group/chenqilin/datacomp/data_medium/metadata/1df9462bc6885e969f11aaa635d9332c.parquet
File sharded in 0 shards
Downloading starting now, check your bandwidth speed (with bwm-ng)your cpu (with htop), and your disk usage (with iotop)!
Sharding file number 215 of 253 called /hetu_group/chenqilin/datacomp/data_medium/metadata/1df9dd8c3710199fc1a3553e2c32c088.parquet
File sharded in 0 shards
Downloading starting now, check your bandwidth speed (with bwm-ng)your cpu (with htop), and your disk usage (with iotop)!
Sharding file number 216 of 253 called /hetu_group/chenqilin/datacomp/data_medium/metadata/1e035aecc740dc7c69d6078e631623b1.parquet
File sharded in 0 shards
Downloading starting now, check your bandwidth speed (with bwm-ng)your cpu (with htop), and your disk usage (with iotop)!
Sharding file number 217 of 253 called /hetu_group/chenqilin/datacomp/data_medium/metadata/1e04906d042a35489b73c3b4ac13dca2.parquet
File sharded in 0 shards
Downloading starting now, check your bandwidth speed (with bwm-ng)your cpu (with htop), and your disk usage (with iotop)!
Sharding file number 218 of 253 called /hetu_group/chenqilin/datacomp/data_medium/metadata/1e06f0994865438d88fd682a32a406a4.parquet
File sharded in 0 shards
Downloading starting now, check your bandwidth speed (with bwm-ng)your cpu (with htop), and your disk usage (with iotop)!
Sharding file number 219 of 253 called /hetu_group/chenqilin/datacomp/data_medium/metadata/1e15153d29898fddda371065d92a3690.parquet
File sharded in 0 shards
Downloading starting now, check your bandwidth speed (with bwm-ng)your cpu (with htop), and your disk usage (with iotop)!
Sharding file number 220 of 253 called /hetu_group/chenqilin/datacomp/data_medium/metadata/1e5389481e14532f3bafbd7a863154d3.parquet
File sharded in 0 shards
Downloading starting now, check your bandwidth speed (with bwm-ng)your cpu (with htop), and your disk usage (with iotop)!
Sharding file number 221 of 253 called /hetu_group/chenqilin/datacomp/data_medium/metadata/1e72542b65bdbb87aacee8dc4dc77108.parquet
File sharded in 0 shards
Downloading starting now, check your bandwidth speed (with bwm-ng)your cpu (with htop), and your disk usage (with iotop)!
Sharding file number 222 of 253 called /hetu_group/chenqilin/datacomp/data_medium/metadata/1e763882b04e4d151feb536eeb41c3b6.parquet
File sharded in 0 shards
Downloading starting now, check your bandwidth speed (with bwm-ng)your cpu (with htop), and your disk usage (with iotop)!
Sharding file number 223 of 253 called /hetu_group/chenqilin/datacomp/data_medium/metadata/1e93a84dd21c161eb9d58d8bd2a13824.parquet
File sharded in 0 shards
Downloading starting now, check your bandwidth speed (with bwm-ng)your cpu (with htop), and your disk usage (with iotop)!
Sharding file number 224 of 253 called /hetu_group/chenqilin/datacomp/data_medium/metadata/1e962285d730d3e11dc685ebbd09af05.parquet
File sharded in 0 shards
Downloading starting now, check your bandwidth speed (with bwm-ng)your cpu (with htop), and your disk usage (with iotop)!
Sharding file number 225 of 253 called /hetu_group/chenqilin/datacomp/data_medium/metadata/1ea5c7f274e3ea11f34eba02d7737502.parquet
File sharded in 0 shards
Downloading starting now, check your bandwidth speed (with bwm-ng)your cpu (with htop), and your disk usage (with iotop)!
Sharding file number 226 of 253 called /hetu_group/chenqilin/datacomp/data_medium/metadata/1ed3681d826a23ad6ac71368a2c70c55.parquet
File sharded in 0 shards
Downloading starting now, check your bandwidth speed (with bwm-ng)your cpu (with htop), and your disk usage (with iotop)!
Sharding file number 227 of 253 called /hetu_group/chenqilin/datacomp/data_medium/metadata/1ed52da08ab1b413fd9cfa39d0142933.parquet
File sharded in 0 shards
Downloading starting now, check your bandwidth speed (with bwm-ng)your cpu (with htop), and your disk usage (with iotop)!
Sharding file number 228 of 253 called /hetu_group/chenqilin/datacomp/data_medium/metadata/1f1fcad950bd01a8473a1486c6970b09.parquet
File sharded in 0 shards
Downloading starting now, check your bandwidth speed (with bwm-ng)your cpu (with htop), and your disk usage (with iotop)!
Sharding file number 229 of 253 called /hetu_group/chenqilin/datacomp/data_medium/metadata/1f2b17df53b505a0d0b892a649cfeb12.parquet
File sharded in 0 shards
Downloading starting now, check your bandwidth speed (with bwm-ng)your cpu (with htop), and your disk usage (with iotop)!
Sharding file number 230 of 253 called /hetu_group/chenqilin/datacomp/data_medium/metadata/1f3b931c1473c40a0f54b343158f963e.parquet
File sharded in 0 shards
Downloading starting now, check your bandwidth speed (with bwm-ng)your cpu (with htop), and your disk usage (with iotop)!
Sharding file number 231 of 253 called /hetu_group/chenqilin/datacomp/data_medium/metadata/1f69531c3338a697864f0be16a031b09.parquet
File sharded in 0 shards
Downloading starting now, check your bandwidth speed (with bwm-ng)your cpu (with htop), and your disk usage (with iotop)!
Sharding file number 232 of 253 called /hetu_group/chenqilin/datacomp/data_medium/metadata/1f8934426b1e4464a7f427c24e10ce6f.parquet
File sharded in 0 shards
Downloading starting now, check your bandwidth speed (with bwm-ng)your cpu (with htop), and your disk usage (with iotop)!
Sharding file number 233 of 253 called /hetu_group/chenqilin/datacomp/data_medium/metadata/1f89fb35014d629dd9eb9aa536354463.parquet
File sharded in 0 shards
Downloading starting now, check your bandwidth speed (with bwm-ng)your cpu (with htop), and your disk usage (with iotop)!
Sharding file number 234 of 253 called /hetu_group/chenqilin/datacomp/data_medium/metadata/1f99a90abc00dda416cf9f9fda2f033c.parquet
File sharded in 0 shards
Downloading starting now, check your bandwidth speed (with bwm-ng)your cpu (with htop), and your disk usage (with iotop)!
Sharding file number 235 of 253 called /hetu_group/chenqilin/datacomp/data_medium/metadata/1fa4fcf25d3e3f17943912b5dfcffb8a.parquet
File sharded in 0 shards
Downloading starting now, check your bandwidth speed (with bwm-ng)your cpu (with htop), and your disk usage (with iotop)!
Sharding file number 236 of 253 called /hetu_group/chenqilin/datacomp/data_medium/metadata/1fbf923f6041602b76974860585f70e6.parquet
File sharded in 18 shards
Downloading starting now, check your bandwidth speed (with bwm-ng)your cpu (with htop), and your disk usage (with iotop)!
Sharding file number 237 of 253 called /hetu_group/chenqilin/datacomp/data_medium/metadata/1feceb12579113e8578942797b962e01.parquet
File sharded in 51 shards
Downloading starting now, check your bandwidth speed (with bwm-ng)your cpu (with htop), and your disk usage (with iotop)!
6it [03:21, 15.04s/it]Sharding file number 238 of 253 called /hetu_group/chenqilin/datacomp/data_medium/metadata/1ff0057a457efcdae3ac0aafbc3ace3d.parquet
File sharded in 51 shards
Downloading starting now, check your bandwidth speed (with bwm-ng)your cpu (with htop), and your disk usage (with iotop)!
11it [03:24, 2.89s/it]worker - success: 0.141 - failed to download: 0.858 - failed to resize: 0.001 - images per sec: 95 - count: 10000
total - success: 0.141 - failed to download: 0.858 - failed to resize: 0.001 - images per sec: 95 - count: 10000
worker - success: 0.145 - failed to download: 0.855 - failed to resize: 0.001 - images per sec: 97 - count: 10000
total - success: 0.143 - failed to download: 0.856 - failed to resize: 0.001 - images per sec: 190 - count: 20000
worker - success: 0.144 - failed to download: 0.854 - failed to resize: 0.001 - images per sec: 98 - count: 10000
total - success: 0.143 - failed to download: 0.856 - failed to resize: 0.001 - images per sec: 285 - count: 30000
worker - success: 0.139 - failed to download: 0.860 - failed to resize: 0.001 - images per sec: 98 - count: 10000
total - success: 0.142 - failed to download: 0.857 - failed to resize: 0.001 - images per sec: 380 - count: 40000
worker - success: 0.146 - failed to download: 0.853 - failed to resize: 0.001 - images per sec: 97 - count: 10000
total - success: 0.143 - failed to download: 0.856 - failed to resize: 0.001 - images per sec: 474 - count: 50000
worker - success: 0.143 - failed to download: 0.856 - failed to resize: 0.001 - images per sec: 97 - count: 10000
total - success: 0.143 - failed to download: 0.856 - failed to resize: 0.001 - images per sec: 569 - count: 60000
worker - success: 0.142 - failed to download: 0.857 - failed to resize: 0.001 - images per sec: 98 - count: 10000
total - success: 0.143 - failed to download: 0.856 - failed to resize: 0.001 - images per sec: 664 - count: 70000
worker - success: 0.146 - failed to download: 0.853 - failed to resize: 0.001 - images per sec: 96 - count: 10000
total - success: 0.143 - failed to download: 0.856 - failed to resize: 0.001 - images per sec: 759 - count: 80000
worker - success: 0.137 - failed to download: 0.863 - failed to resize: 0.001 - images per sec: 95 - count: 10000
total - success: 0.143 - failed to download: 0.857 - failed to resize: 0.001 - images per sec: 854 - count: 90000
16it [03:30, 1.61s/it]worker - success: 0.140 - failed to download: 0.858 - failed to resize: 0.001 - images per sec: 89 - count: 10000
total - success: 0.142 - failed to download: 0.857 - failed to resize: 0.001 - images per sec: 892 - count: 100000
worker - success: 0.144 - failed to download: 0.854 - failed to resize: 0.001 - images per sec: 92 - count: 10000
total - success: 0.143 - failed to download: 0.857 - failed to resize: 0.001 - images per sec: 981 - count: 110000
worker - success: 0.141 - failed to download: 0.858 - failed to resize: 0.001 - images per sec: 93 - count: 10000
total - success: 0.142 - failed to download: 0.857 - failed to resize: 0.001 - images per sec: 1070 - count: 120000
worker - success: 0.143 - failed to download: 0.856 - failed to resize: 0.001 - images per sec: 94 - count: 10000
total - success: 0.142 - failed to download: 0.857 - failed to resize: 0.001 - images per sec: 1159 - count: 130000
worker - success: 0.139 - failed to download: 0.860 - failed to resize: 0.001 - images per sec: 94 - count: 10000
total - success: 0.142 - failed to download: 0.857 - failed to resize: 0.001 - images per sec: 1248 - count: 140000
worker - success: 0.136 - failed to download: 0.863 - failed to resize: 0.001 - images per sec: 93 - count: 10000
total - success: 0.142 - failed to download: 0.857 - failed to resize: 0.001 - images per sec: 1337 - count: 150000
worker - success: 0.138 - failed to download: 0.862 - failed to resize: 0.001 - images per sec: 92 - count: 10000
total - success: 0.142 - failed to download: 0.858 - failed to resize: 0.001 - images per sec: 1427 - count: 160000
17it [04:06, 10.24s/it]worker - success: 0.138 - failed to download: 0.861 - failed to resize: 0.001 - images per sec: 97 - count: 4493
total - success: 0.141 - failed to download: 0.858 - failed to resize: 0.001 - images per sec: 1107 - count: 164493
—
Reply to this email directly, view it on GitHub
<#39>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAR437RWSBKTZEFU6QVX6ULXSC3XPANCNFSM6AAAAAA2YDB27M>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
I am sorry I am working in campus network, for security-related reasons I can't change the DNS, is there alternative solution for DNS? |
You can ask your dns provider to setup knot resolver on your behalf
Img2dataset cannot work if DNS resolving is slow
…On Wed, Jul 26, 2023, 13:19 KylinChen ***@***.***> wrote:
I am sorry I am working in campus network, for security-related reasons I
can't change the DNS, is there alternative solution for DNS?
—
Reply to this email directly, view it on GitHub
<#39 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAR437WIJXCYLCXOY26FEBTXSD4K5ANCNFSM6AAAAAA2YDB27M>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
You can also try to download slowly by turning down thread and process
settings
…On Wed, Jul 26, 2023, 13:26 Romain Beaumont ***@***.***> wrote:
You can ask your dns provider to setup knot resolver on your behalf
Img2dataset cannot work if DNS resolving is slow
On Wed, Jul 26, 2023, 13:19 KylinChen ***@***.***> wrote:
> I am sorry I am working in campus network, for security-related reasons I
> can't change the DNS, is there alternative solution for DNS?
>
> —
> Reply to this email directly, view it on GitHub
> <#39 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AAR437WIJXCYLCXOY26FEBTXSD4K5ANCNFSM6AAAAAA2YDB27M>
> .
> You are receiving this because you commented.Message ID:
> ***@***.***>
>
|
If I use a proxy to download the data, should I set some parameters in img2dataset? And I think it's strange that the downloaded medium_data/shards takes up 660G, and the complete size is 750G(in README), why the shell show |
Hi Team, We are getting success rate of +- 86% (downloading the small, 12 million images, dataset): 15it [44:26, 19.65s/it]worker - success: 0.867 - failed to download: 0.128 - failed to resize: 0.005 - images per sec: 4 - count: 10000 total - success: 0.865 - failed to download: 0.130 - failed to resize: 0.005 - images per sec: 56 - count: 150000 We are using knot dns resolver (8 instances), using the default 16 processes, and using just 16 threads (instead of the default 128). We did this to slow down the dns resolve requests. We use an e2-highcpu-32 (Efficient Instance, 32 vCPUs, 32 GB RAM) instance on GCP. 86% is the best we could get, but due to the low number of threads, the downloading is very slow. Can you share any advice or tips to improve speed and/or success rate? Many thanks! |
Can you make sure knot resolver is actually being used? You can check that
by checking if the knot resolver CPU usage is more than 0
If it's not the case, you can try to kill your other DNS resolver and adapt
the resolv.conf file (see instructions in img2dataset readme for that)
…On Wed, Aug 9, 2023, 17:44 Alexander Remmerie ***@***.***> wrote:
Hi Team,
We are getting success rate of +- 86% (downloading the small, 12 million
images, dataset):
15it [44:26, 19.65s/it]worker - success: 0.867 - failed to download: 0.128
- failed to resize: 0.005 - images per sec: 4 - count: 10000 total -
success: 0.865 - failed to download: 0.130 - failed to resize: 0.005 -
images per sec: 56 - count: 150000
16it [44:49, 20.77s/it]worker - success: 0.860 - failed to download: 0.134
- failed to resize: 0.006 - images per sec: 4 - count: 10000 total -
success: 0.864 - failed to download: 0.131 - failed to resize: 0.005 -
images per sec: 60 - count: 160000
We are using knot dns resolver (8 instances), using the default 16
processes, and using just 16 threads (instead of the default 128). We did
this to slow down the dns resolve requests. We use an e2-highcpu-32
(Efficient Instance, 32 vCPUs, 32 GB RAM) instance on GCP.
86% is the best we could get, but due to the low number of threads, the
downloading is very slow. Can you share any advice or tips to improve speed
and/or success rate? Many thanks!
—
Reply to this email directly, view it on GitHub
<#39 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAR437UTFFKGGJFK2BNKLJ3XUOV53ANCNFSM6AAAAAA2YDB27M>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
The text was updated successfully, but these errors were encountered: