Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clip-retrieval inference: Failed to read Parquet files from Huggingface laion/laion-pop dataset #393

Open
jenniferlin0815 opened this issue Nov 25, 2024 · 3 comments

Comments

@jenniferlin0815
Copy link

Hi! Thank you for the great work!

I am trying to build a system that given an image, return k most similar images. I was trying to use laion/relaion2B-en-research from huggingface but decided to use laion/laion-pop dataset first to familiarize myself with the tool. I downloaded the laion/laion-pop dataset using huggingface's snapshot_download() function so the Parquet files downloaded are storing real data, unlike using the cached one which are storing links.

When I ran clip-retrieval inference --input_dataset '<folder>/part-{00000..00127}-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet' --output_folder $output_folder --input_format webdataset --enable_text False, the output is as below:

/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00004-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00004-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
...
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('bad checksum', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00044-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00044-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))

with 125 UserWarning: ReadError('invalid header'... and 3 UserWarning: ReadError('bad checksum'...

and the $output_folder has following structure:

.
├── img_emb
├── metadata
├── stats
│   ├── 0.json
│   ├── 1.json
│   └── 2.json
└── text_emb

where 0.json, 1.json, and 2.json are files containing a {} and nothing else.

I tried re-downloading the huggingface dataset but it did not work. Comparing to the official example which uses test_1000.parquet having just three keys:

URL: string
TEXT: string
__index_level_0__: int64

The parquet file of laion/laion-pop has much more keys:

url: string
key: string
cogvlm_caption: string
llava_caption: string
nsfw_prediction: float
alt_txt: string
alt_txt_similarity: double
width: double
height: double
original_width: double
original_height: double
exif: string

and I wonder if this is the reason of error.

My virtual environment has following packages:

Package                        Version
------------------------------ ----------------------------
absl_py                        2.1.0+computecanada
accelerate                     1.0.1+computecanada
aiohappyeyeballs               2.4.3+computecanada
aiohttp                        3.10.9+computecanada
aiosignal                      1.3.1+computecanada
albucore                       0.0.19+computecanada
albumentations                 1.4.20+computecanada
all_clip                       1.2.0+computecanada
aniso8601                      9.0.1+computecanada
annotated_types                0.7.0+computecanada
anyio                          4.6.2.post1+computecanada
appdirs                        1.4.4+computecanada
argon2_cffi                    23.1.0+computecanada
argon2_cffi_bindings           21.2.0+computecanada
arrow                          1.3.0+computecanada
asttokens                      2.4.1+computecanada
astunparse                     1.6.3+computecanada
async_lru                      2.0.4+computecanada
async_timeout                  4.0.3+computecanada
attrs                          24.2.0+computecanada
autofaiss                      2.17.0+computecanada
autokeras                      1.0.18
babel                          2.16.0+computecanada
beautifulsoup4                 4.12.3+computecanada
bleach                         6.2.0+computecanada
blinker                        1.9.0+computecanada
braceexpand                    0.1.7+computecanada
cachetools                     5.5.0+computecanada
certifi                        2024.8.30+computecanada
cffi                           1.16.0+computecanada
charset_normalizer             3.4.0+computecanada
click                          8.1.7+computecanada
clip_anytorch                  2.6.0+computecanada
clip_retrieval                 2.44.0+computecanada
comm                           0.2.2+computecanada
contourpy                      1.2.1+computecanada
cycler                         0.12.1+computecanada
dataclasses                    0.6+computecanada
dataclasses_json               0.6.7+computecanada
datasets                       3.1.0
debugpy                        1.8.1+computecanada
decorator                      5.1.1+computecanada
defusedxml                     0.7.1+computecanada
dill                           0.3.8+computecanada
distro                         1.9.0+computecanada
docker-pycreds                 0.4.0+computecanada
embedding_reader               1.7.0+computecanada
eval_type_backport             0.2.0+computecanada
exceptiongroup                 1.2.1+computecanada
executing                      2.0.1+computecanada
ExifRead_nocycle               3.0.1+computecanada
faiss                          1.8.0
fastjsonschema                 2.20.0+computecanada
filelock                       3.16.1+computecanada
fire                           0.4.0+computecanada
flask                          3.0.3+computecanada
Flask_Cors                     4.0.1+computecanada
Flask_RESTful                  0.3.10+computecanada
flatbuffers                    24.3.25+computecanada
fonttools                      4.53.0+computecanada
fqdn                           1.5.1+computecanada
frozenlist                     1.5.0
fsspec                         2024.9.0+computecanada
ftfy                           6.3.0+computecanada
gast                           0.6.0+computecanada
gitdb                          4.0.11+computecanada
GitPython                      3.1.43+computecanada
google-auth                    2.36.0
google_auth_oauthlib           1.2.1+computecanada
google-pasta                   0.2.0+computecanada
greenlet                       3.1.1+computecanada
grpcio                         1.67.0+computecanada
h11                            0.14.0+computecanada
h5py                           3.12.0+computecanada
httpcore                       1.0.7
httpx                          0.27.2+computecanada
httpx-sse                      0.4.0
huggingface_hub                0.26.2+computecanada
idna                           3.10+computecanada
img2dataset                    1.45.0+computecanada
ipykernel                      6.29.4+computecanada
ipython                        8.25.0+computecanada
isoduration                    20.11.0+computecanada
itsdangerous                   2.2.0+computecanada
jedi                           0.19.1+computecanada
jinja2                         3.1.4+computecanada
jiter                          0.6.1+computecanada
joblib                         1.4.2+computecanada
json5                          0.9.28
jsonpatch                      1.33+computecanada
jsonpointer                    3.0.0+computecanada
jsonschema                     4.23.0+computecanada
jsonschema_specifications      2024.10.1+computecanada
jupyter_client                 8.6.2+computecanada
jupyter_core                   5.7.2+computecanada
jupyter_events                 0.10.0+computecanada
jupyter_lsp                    2.2.5+computecanada
jupyter_server                 2.14.2+computecanada
jupyter_server_terminals       0.5.3+computecanada
jupyterlab                     4.3.1
jupyterlab_pygments            0.3.0+computecanada
jupyterlab_server              2.27.3+computecanada
keras                          2.15.0+computecanada
Keras-Preprocessing            1.1.2+computecanada
keras_tuner                    1.4.7+computecanada
kiwisolver                     1.4.5+computecanada
kt_legacy                      1.0.5+computecanada
langchain                      0.3.7
langchain-community            0.3.7
langchain-core                 0.3.20
langchain-huggingface          0.1.2
langchain-openai               0.2.9
langchain-text-splitters       0.3.2
langchainhub                   0.1.21
langgraph                      0.2.53
langgraph-checkpoint           2.0.5
langgraph-sdk                  0.1.36
langsmith                      0.1.145
libclang                       14.0.1+computecanada
Markdown                       3.7+computecanada
markdown_it_py                 3.0.0+computecanada
MarkupSafe                     2.1.5+computecanada
marshmallow                    3.23.1
matplotlib                     3.9.0+computecanada
matplotlib_inline              0.1.7+computecanada
mdurl                          0.1.2+computecanada
mistune                        3.0.2+computecanada
ml_dtypes                      0.3.2
mpmath                         1.3.0+computecanada
msgpack                        1.1.0+computecanada
multidict                      6.1.0+computecanada
multilingual_clip              1.0.10+computecanada
multiprocess                   0.70.16+computecanada
mypy_extensions                1.0.0+computecanada
namex                          0.0.8+computecanada
nbclient                       0.10.0+computecanada
nbconvert                      7.16.4+computecanada
nbformat                       5.10.4+computecanada
nest_asyncio                   1.6.0+computecanada
networkx                       3.4.2+computecanada
nose                           1.3.7+computecanada
notebook_shim                  0.2.4+computecanada
numpy                          1.26.4+computecanada
oauthlib                       3.2.2+computecanada
open_clip_torch                2.29.0+computecanada
openai                         1.55.0
opencv_contrib_python          4.10.0
opencv_contrib_python_headless 4.10.0
opencv_python                  4.10.0
opencv_python_headless         4.10.0
opt_einsum                     3.4.0+computecanada
optree                         0.12.1+computecanada
orjson                         3.10.5+computecanada
overrides                      7.7.0+computecanada
packaging                      24.1+computecanada
pandas                         2.2.1+computecanada
pandocfilters                  1.5.1+computecanada
parso                          0.8.4+computecanada
pexpect                        4.9.0+computecanada
Pillow                         9.4.0
Pillow_SIMD                    9.5.0.post2+computecanada
pip                            23.0.1
platformdirs                   3.9.1+computecanada
prometheus_client              0.21.0+computecanada
prompt_toolkit                 3.0.47+computecanada
propcache                      0.2.0+computecanada
protobuf                       4.25.5
psutil                         5.9.8
ptyprocess                     0.7.0+computecanada
pure_eval                      0.2.2+computecanada
pyarrow                        14.0.1
pyasn1                         0.6.1+computecanada
pyasn1_modules                 0.4.1+computecanada
pycparser                      2.22+computecanada
pydantic                       2.10.1
pydantic_core                  2.27.1
pydantic-settings              2.6.1
pygments                       2.18.0+computecanada
pyparsing                      3.1.2+computecanada
python_dateutil                2.9.0.post0+computecanada
python_dotenv                  1.0.1+computecanada
python_json_logger             2.0.7+computecanada
pytz                           2024.1+computecanada
PyYAML                         6.0.2+computecanada
pyzmq                          26.0.3+computecanada
referencing                    0.35.1+computecanada
regex                          2024.9.11+computecanada
requests                       2.32.3+computecanada
requests_oauthlib              2.0.0+computecanada
requests_toolbelt              1.0.0+computecanada
rfc3339_validator              0.1.4+computecanada
rfc3986_validator              0.1.1+computecanada
rich                           13.9.4+computecanada
rpds_py                        0.20.0+computecanada
rsa                            4.9+computecanada
safetensors                    0.4.5+computecanada
scikit_learn                   1.5.2+computecanada
scipy                          1.11.2+computecanada
Send2Trash                     1.8.3+computecanada
sentence_transformers          2.7.0+computecanada
sentry_sdk                     2.17.0+computecanada
setproctitle                   1.3.2+computecanada
setuptools                     65.5.0
six                            1.16.0+computecanada
smmap                          5.0.1+computecanada
sniffio                        1.3.1+computecanada
soupsieve                      2.6+computecanada
SQLAlchemy                     2.0.35+computecanada
stack_data                     0.6.3+computecanada
stringzilla                    3.10.5+computecanada
sympy                          1.13.1+computecanada
tenacity                       9.0.0+computecanada
tensorboard                    2.15.2+computecanada
tensorboard_data_server        0.7.2+computecanada
tensorflow                     2.15.1+computecanada
tensorflow_estimator           2.15.0+computecanada
tensorflow_io_gcs_filesystem   0.32.0+computecanada
termcolor                      2.5.0+computecanada
terminado                      0.18.1+computecanada
threadpoolctl                  3.5.0+computecanada
tiktoken                       0.7.0+computecanada
timm                           1.0.11+computecanada
tinycss2                       1.4.0+computecanada
tokenizers                     0.20.0+computecanada
tomli                          2.1.0
torch                          2.5.0+computecanada
torchvision                    0.20.0+computecanada
tornado                        6.3.3+computecanada
tqdm                           4.67.0+computecanada
traitlets                      5.14.3+computecanada
transformers                   4.45.0
types_python_dateutil          2.9.0.20241003+computecanada
types-requests                 2.32.0.20241016
typing_extensions              4.12.2+computecanada
typing_inspect                 0.9.0+computecanada
tzdata                         2024.1+computecanada
uri_template                   1.3.0+computecanada
urllib3                        1.26.20+computecanada
wandb                          0.16.0+computecanada
wcwidth                        0.2.13+computecanada
webcolors                      24.11.1+computecanada
webdataset                     0.2.48+computecanada
webencodings                   0.5.1+computecanada
websocket_client               1.8.0+computecanada
werkzeug                       3.1.3+computecanada
wheel                          0.45.1
wrapt                          1.14.1+computecanada
xxhash                         3.5.0+computecanada
yarl                           1.18.0

I appreciate any input and please share with me if there is a tutorial of using huggingface dataset with clip-retrieval.

Thank you!

@jenniferlin0815
Copy link
Author

Due to limit of number of characters, I copied the full output message after running clip-retrieval inference as below:

/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00004-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00004-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00000-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00000-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00006-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00006-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00002-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00002-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00008-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00008-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00012-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00012-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00014-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00014-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00010-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00010-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00016-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00016-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00018-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00018-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00020-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00020-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00022-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00022-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00024-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00024-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00026-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00026-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00030-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00030-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00028-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00028-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00032-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00032-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00034-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00034-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00038-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00038-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00036-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00036-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00042-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00042-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00040-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00040-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00046-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00046-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('bad checksum', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00044-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00044-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00050-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00050-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00048-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00048-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00054-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00054-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00052-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00052-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00058-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00058-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00056-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00056-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00062-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00062-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00060-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00060-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00066-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00066-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00064-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00064-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00070-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00070-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00068-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00068-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00074-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00074-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00072-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00072-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00078-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00078-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00076-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00076-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00082-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00082-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00080-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00080-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00086-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00086-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00084-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00084-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00090-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00090-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00088-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00088-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00094-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00094-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00092-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00092-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00098-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00098-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00096-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00096-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00102-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00102-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00100-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00100-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('bad checksum', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00106-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00106-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00104-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00104-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00110-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00110-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00108-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00108-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00114-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00114-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00112-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00112-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00118-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00118-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00116-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00116-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00122-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00122-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00120-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00120-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00126-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00126-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00124-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00124-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
Starting work on task 1
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00003-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00003-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00001-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00001-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00005-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00005-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00007-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00007-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00011-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00011-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00009-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00009-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00013-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00013-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00015-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00015-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00017-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00017-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00021-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00021-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00019-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00019-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00023-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00023-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00027-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00027-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00029-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00029-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00025-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00025-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00031-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00031-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00037-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00037-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00033-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00033-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00035-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00035-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00039-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00039-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00045-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00045-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00041-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00041-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00043-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00043-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00047-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00047-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00053-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00053-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00049-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00049-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00051-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00051-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00055-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00055-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00061-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00061-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00057-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00057-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00059-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00059-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00063-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00063-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00065-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00065-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00069-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00069-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00067-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00067-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00071-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00071-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00073-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00073-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00077-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00077-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00075-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00075-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00079-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00079-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00081-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00081-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00085-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00085-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00083-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00083-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00087-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00087-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00089-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00089-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00093-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00093-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00091-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00091-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00095-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00095-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00097-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00097-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00101-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00101-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00099-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00099-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00103-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00103-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00105-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00105-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('bad checksum', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00109-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00109-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00107-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00107-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00111-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00111-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00113-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00113-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00117-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00117-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00115-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00115-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00119-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00119-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00121-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00121-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00125-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00125-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00123-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00123-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/webdataset/handlers.py:33: UserWarning: ReadError('invalid header', <_io.BufferedReader name='/project/jl0815/laion_pop_snapdownloaded/part-00127-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'>, '/project/jl0815/laion_pop_snapdownloaded/part-00127-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet')
  warnings.warn(repr(exn))

@jenniferlin0815
Copy link
Author

jenniferlin0815 commented Nov 26, 2024

When I ran clip-retrieval end2end '/project/jl0815/laion_pop_snapdownloaded/part-{00000..00127}-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet' $tmp_folder, I got the following messages:

/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/albumentations/__init__.py:24: UserWarning: A new version of Albumentations is available: 1.4.21 (you have 1.4.20+computecanada). Upgrade using: pip install -U albumentations. To disable automatic update checks, set the environment variable NO_ALBUMENTATIONS_UPDATE to 1.
  check_for_updates()
Starting the downloading of this file
0it [00:00, ?it/s]er 1 of 1 called /project/jl0815/laion_pop_snapdownloaded/part-{00000..00127}-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet
0it [01:00, ?it/s]
Traceback (most recent call last):
  File "/project/jl0815/venv_llama32_laion/bin/clip-retrieval", line 8, in <module>
    sys.exit(main())
  File "/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/clip_retrieval/cli.py", line 18, in main
    fire.Fire(
  File "/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/fire/core.py", line 466, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/clip_retrieval/clip_end2end.py", line 24, in clip_end2end
    download(
  File "/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/img2dataset/main.py", line 262, in download
    distributor_fn(
  File "/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/img2dataset/distributor.py", line 36, in multiprocessing_distributor
    failed_shards = run(reader)
  File "/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/img2dataset/distributor.py", line 31, in run
    for status, row in tqdm(process_pool.imap_unordered(downloader, gen)):
  File "/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/tqdm/std.py", line 1181, in __iter__
    for obj in iterable:
  File "/cvmfs/soft.computecanada.ca/easybuild/software/2023/x86-64-v3/Compiler/gcccore/python/3.10.13/lib/python3.10/multiprocessing/pool.py", line 873, in next
    raise value
FileNotFoundError: [Errno 2] No such file or directory: '/project/jl0815/laion_pop_snapdownloaded/part-{00000..00127}-a5835434-5909-4f72-a89e-2fc1d17efc62-c000.snappy.parquet'
/project/jl0815/venv_llama32_laion/lib/python3.10/site-packages/albumentations/__init__.py:24: UserWarning: A new version of Albumentations is available: 1.4.21 (you have 1.4.20+computecanada). Upgrade using: pip install -U albumentations. To disable automatic update checks, set the environment variable NO_ALBUMENTATIONS_UPDATE to 1.
  check_for_updates()
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/cvmfs/soft.computecanada.ca/easybuild/software/2023/x86-64-v3/Compiler/gcccore/python/3.10.13/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "/cvmfs/soft.computecanada.ca/easybuild/software/2023/x86-64-v3/Compiler/gcccore/python/3.10.13/lib/python3.10/multiprocessing/spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
  File "/cvmfs/soft.computecanada.ca/easybuild/software/2023/x86-64-v3/Compiler/gcccore/python/3.10.13/lib/python3.10/multiprocessing/synchronize.py", line 110, in __setstate__
    self._semlock = _multiprocessing.SemLock._rebuild(*state)
FileNotFoundError: [Errno 2] No such file or directory

@rom1504
Copy link
Owner

rom1504 commented Nov 26, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants