Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test passes locally but fails on CI with identical environments #84

Open
qin-yu opened this issue Jul 11, 2024 · 4 comments
Open

Test passes locally but fails on CI with identical environments #84

qin-yu opened this issue Jul 11, 2024 · 4 comments
Assignees

Comments

@qin-yu
Copy link
Contributor

qin-yu commented Jul 11, 2024

TorchScript is deterministic for the same input, given the same model state and environment. This means that, in theory, for the same input, it should always produce the same output if no external factors change. Both bioimageio test rdf.yml pytorch_state_dict and bioimageio test rdf.yml torchscript passes in my env created from

  • mamba create -n bioimageio.core -c conda-forge -c pytorch bioimageio.core pytorch, or
  • mamba create -n bioimageio.core -c conda-forge -c pytorch bioimageio.core pytorch torchvision torchaudio cpuonly, or
  • mamba create -n bioimageio.core.online -c pytorch -c conda-forge "bioimageio.core==0.6.7" "pytorch==2.3.1" "blas==1.0" "mkl==2022.2.1" "numpy==1.26.4" torchvision torchaudio cpuonly which generates identical list packages and versions to the CI version

Anyways, CI fails, such as this: https://github.com/bioimage-io/collection/actions/runs/9892771006/job/27326341115

conda list output

$ mamba list
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                  2_kmp_llvm    conda-forge
annotated-types           0.7.0              pyhd8ed1ab_0    conda-forge
aom                       3.9.1                hac33072_0    conda-forge
bioimageio.core           0.6.7              pyhd8ed1ab_0    conda-forge
bioimageio.spec           0.5.3.post4        pyhd8ed1ab_0    conda-forge
blas                      1.0                         mkl    conda-forge
brotli-python             1.1.0           py312h30efb56_1    conda-forge
bzip2                     1.0.8                hd590300_5    conda-forge
ca-certificates           2024.7.4             hbcca054_0    conda-forge
cairo                     1.18.0               hbb29018_2    conda-forge
certifi                   2024.7.4           pyhd8ed1ab_0    conda-forge
cffi                      1.16.0          py312hf06ca03_0    conda-forge
charset-normalizer        3.3.2              pyhd8ed1ab_0    conda-forge
colorama                  0.4.6              pyhd8ed1ab_0    conda-forge
cpuonly                   2.0                           0    pytorch
dav1d                     1.2.1                hd590300_0    conda-forge
distro                    1.9.0              pyhd8ed1ab_0    conda-forge
dnspython                 2.6.1              pyhd8ed1ab_1    conda-forge
email-validator           2.2.0              pyhd8ed1ab_0    conda-forge
email_validator           2.2.0                hd8ed1ab_0    conda-forge
expat                     2.6.2                h59595ed_0    conda-forge
ffmpeg                    7.0.1           gpl_h9be9148_104    conda-forge
filelock                  3.15.4             pyhd8ed1ab_0    conda-forge
fire                      0.6.0              pyhd8ed1ab_0    conda-forge
font-ttf-dejavu-sans-mono 2.37                 hab24e00_0    conda-forge
font-ttf-inconsolata      3.000                h77eed37_0    conda-forge
font-ttf-source-code-pro  2.038                h77eed37_0    conda-forge
font-ttf-ubuntu           0.83                 h77eed37_2    conda-forge
fontconfig                2.14.2               h14ed4e7_0    conda-forge
fonts-conda-ecosystem     1                             0    conda-forge
fonts-conda-forge         1                             0    conda-forge
freetype                  2.12.1               h267a509_2    conda-forge
fribidi                   1.0.10               h36c2ea0_0    conda-forge
gettext                   0.22.5               h59595ed_2    conda-forge
gettext-tools             0.22.5               h59595ed_2    conda-forge
gmp                       6.3.0                hac33072_2    conda-forge
gmpy2                     2.1.5           py312h1d5cde6_1    conda-forge
gnutls                    3.7.9                hb077bed_0    conda-forge
graphite2                 1.3.13            h59595ed_1003    conda-forge
h2                        4.1.0              pyhd8ed1ab_0    conda-forge
harfbuzz                  9.0.0                hfac3d4d_0    conda-forge
hpack                     4.0.0              pyh9f0ad1d_0    conda-forge
hyperframe                6.0.1              pyhd8ed1ab_0    conda-forge
icu                       73.2                 h59595ed_0    conda-forge
idna                      3.7                pyhd8ed1ab_0    conda-forge
imageio                   2.34.2             pyh12aca89_0    conda-forge
jinja2                    3.1.4              pyhd8ed1ab_0    conda-forge
lame                      3.100             h166bdaf_1003    conda-forge
lcms2                     2.16                 hb7c19ff_0    conda-forge
ld_impl_linux-64          2.40                 hf3520f5_7    conda-forge
lerc                      4.0.0                h27087fc_0    conda-forge
libabseil                 20240116.2      cxx17_h59595ed_0    conda-forge
libasprintf               0.22.5               h661eb56_2    conda-forge
libasprintf-devel         0.22.5               h661eb56_2    conda-forge
libass                    0.17.1               h39113c1_2    conda-forge
libblas                   3.9.0            16_linux64_mkl    conda-forge
libcblas                  3.9.0            16_linux64_mkl    conda-forge
libdeflate                1.20                 hd590300_0    conda-forge
libdrm                    2.4.122              h4ab18f5_0    conda-forge
libexpat                  2.6.2                h59595ed_0    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 14.1.0               h77fa898_0    conda-forge
libgettextpo              0.22.5               h59595ed_2    conda-forge
libgettextpo-devel        0.22.5               h59595ed_2    conda-forge
libglib                   2.80.3               h8a4344b_1    conda-forge
libhwloc                  2.11.0          default_h5622ce7_1000    conda-forge
libiconv                  1.17                 hd590300_2    conda-forge
libidn2                   2.3.7                hd590300_0    conda-forge
libjpeg-turbo             3.0.0                hd590300_1    conda-forge
liblapack                 3.9.0            16_linux64_mkl    conda-forge
libnsl                    2.0.1                hd590300_0    conda-forge
libopenvino               2024.2.0             h2da1b83_1    conda-forge
libopenvino-auto-batch-plugin 2024.2.0             hb045406_1    conda-forge
libopenvino-auto-plugin   2024.2.0             hb045406_1    conda-forge
libopenvino-hetero-plugin 2024.2.0             h5c03a75_1    conda-forge
libopenvino-intel-cpu-plugin 2024.2.0             h2da1b83_1    conda-forge
libopenvino-intel-gpu-plugin 2024.2.0             h2da1b83_1    conda-forge
libopenvino-intel-npu-plugin 2024.2.0             he02047a_1    conda-forge
libopenvino-ir-frontend   2024.2.0             h5c03a75_1    conda-forge
libopenvino-onnx-frontend 2024.2.0             h07e8aee_1    conda-forge
libopenvino-paddle-frontend 2024.2.0             h07e8aee_1    conda-forge
libopenvino-pytorch-frontend 2024.2.0             he02047a_1    conda-forge
libopenvino-tensorflow-frontend 2024.2.0             h39126c6_1    conda-forge
libopenvino-tensorflow-lite-frontend 2024.2.0             he02047a_1    conda-forge
libopus                   1.3.1                h7f98852_1    conda-forge
libpciaccess              0.18                 hd590300_0    conda-forge
libpng                    1.6.43               h2797004_0    conda-forge
libprotobuf               4.25.3               h08a7969_0    conda-forge
libsqlite                 3.46.0               hde9e2c9_0    conda-forge
libstdcxx-ng              14.1.0               hc0a3c3a_0    conda-forge
libtasn1                  4.19.0               h166bdaf_0    conda-forge
libtiff                   4.6.0                h1dd3fc0_3    conda-forge
libunistring              0.9.10               h7f98852_0    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libva                     2.22.0               hb711507_0    conda-forge
libvpx                    1.14.1               hac33072_0    conda-forge
libwebp-base              1.4.0                hd590300_0    conda-forge
libxcb                    1.16                 hd590300_0    conda-forge
libxcrypt                 4.4.36               hd590300_1    conda-forge
libxml2                   2.12.7               h4c95cb1_2    conda-forge
libzlib                   1.3.1                h4ab18f5_1    conda-forge
llvm-openmp               15.0.7               h0cdce71_0    conda-forge
loguru                    0.7.2           py312h7900ff3_1    conda-forge
markdown-it-py            3.0.0              pyhd8ed1ab_0    conda-forge
markupsafe                2.1.5           py312h98912ed_0    conda-forge
mdurl                     0.1.2              pyhd8ed1ab_0    conda-forge
mkl                       2022.2.1         h84fe81f_16997    conda-forge
mpc                       1.3.1                hfe3b2da_0    conda-forge
mpfr                      4.2.1                h9458935_1    conda-forge
mpmath                    1.3.0              pyhd8ed1ab_0    conda-forge
ncurses                   6.5                  h59595ed_0    conda-forge
nettle                    3.9.1                h7ab15ed_0    conda-forge
networkx                  3.3                pyhd8ed1ab_1    conda-forge
numpy                     1.26.4          py312heda63a1_0    conda-forge
ocl-icd                   2.3.2                hd590300_1    conda-forge
openh264                  2.4.1                h59595ed_0    conda-forge
openjpeg                  2.5.2                h488ebb8_0    conda-forge
openssl                   3.3.1                h4ab18f5_1    conda-forge
p11-kit                   0.24.1               hc5aa10d_0    conda-forge
packaging                 24.1               pyhd8ed1ab_0    conda-forge
pandas                    2.2.2           py312h1d6d2e6_1    conda-forge
pcre2                     10.44                h0f59acf_0    conda-forge
pillow                    10.4.0          py312h287a98d_0    conda-forge
pip                       24.0               pyhd8ed1ab_0    conda-forge
pixman                    0.43.2               h59595ed_0    conda-forge
platformdirs              4.2.2              pyhd8ed1ab_0    conda-forge
pooch                     1.8.2              pyhd8ed1ab_0    conda-forge
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
pugixml                   1.14                 h59595ed_0    conda-forge
pycparser                 2.22               pyhd8ed1ab_0    conda-forge
pydantic                  2.8.2              pyhd8ed1ab_0    conda-forge
pydantic-core             2.20.1          py312hf008fa9_0    conda-forge
pydantic-settings         2.3.4              pyhd8ed1ab_0    conda-forge
pygments                  2.18.0             pyhd8ed1ab_0    conda-forge
pysocks                   1.7.1              pyha2e5f31_6    conda-forge
python                    3.12.4          h194c7f8_0_cpython    conda-forge
python-dateutil           2.9.0              pyhd8ed1ab_0    conda-forge
python-dotenv             1.0.1              pyhd8ed1ab_0    conda-forge
python-tzdata             2024.1             pyhd8ed1ab_0    conda-forge
python_abi                3.12                    4_cp312    conda-forge
pytorch                   2.3.1              py3.12_cpu_0    pytorch
pytorch-mutex             1.0                         cpu    pytorch
pytz                      2024.1             pyhd8ed1ab_0    conda-forge
pyyaml                    6.0.1           py312h98912ed_1    conda-forge
readline                  8.2                  h8228510_1    conda-forge
requests                  2.32.3             pyhd8ed1ab_0    conda-forge
rich                      13.7.1             pyhd8ed1ab_0    conda-forge
ruyaml                    0.91.0             pyhd8ed1ab_0    conda-forge
setuptools                70.2.0             pyhd8ed1ab_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
snappy                    1.2.1                ha2e4443_0    conda-forge
sniffio                   1.3.1              pyhd8ed1ab_0    conda-forge
svt-av1                   2.1.2                hac33072_0    conda-forge
sympy                     1.12.1          pypyh2585a3b_103    conda-forge
tbb                       2021.12.0            h434a139_2    conda-forge
termcolor                 2.4.0              pyhd8ed1ab_0    conda-forge
tk                        8.6.13          noxft_h4845f30_101    conda-forge
torchaudio                2.3.1                 py312_cpu    pytorch
torchvision               0.18.1                py312_cpu    pytorch
tqdm                      4.66.4             pyhd8ed1ab_0    conda-forge
typing-extensions         4.12.2               hd8ed1ab_0    conda-forge
typing_extensions         4.12.2             pyha770c72_0    conda-forge
tzdata                    2024a                h0c530f3_0    conda-forge
urllib3                   2.2.2              pyhd8ed1ab_1    conda-forge
wayland                   1.23.0               h5291e77_0    conda-forge
wayland-protocols         1.36                 hd8ed1ab_0    conda-forge
wheel                     0.43.0             pyhd8ed1ab_1    conda-forge
x264                      1!164.3095           h166bdaf_2    conda-forge
x265                      3.5                  h924138e_3    conda-forge
xarray                    2024.6.0           pyhd8ed1ab_1    conda-forge
xorg-fixesproto           5.0               h7f98852_1002    conda-forge
xorg-kbproto              1.0.7             h7f98852_1002    conda-forge
xorg-libice               1.1.1                hd590300_0    conda-forge
xorg-libsm                1.2.4                h7391055_0    conda-forge
xorg-libx11               1.8.9                hb711507_1    conda-forge
xorg-libxau               1.0.11               hd590300_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xorg-libxext              1.3.4                h0b41bf4_2    conda-forge
xorg-libxfixes            5.0.3             h7f98852_1004    conda-forge
xorg-libxrender           0.9.11               hd590300_0    conda-forge
xorg-renderproto          0.11.1            h7f98852_1002    conda-forge
xorg-xextproto            7.3.0             h0b41bf4_1003    conda-forge
xorg-xproto               7.0.31            h7f98852_1007    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
yaml                      0.2.5                h7f98852_2    conda-forge
zlib                      1.3.1                h4ab18f5_1    conda-forge
zstandard                 0.22.0          py312h5b18bf6_1    conda-forge
zstd                      1.5.6                ha6fb4c9_0    conda-forge
@qin-yu qin-yu changed the title Bioimage.IO Core test passes locally but fails on CI Test passes locally but fails on CI with identical environments Jul 12, 2024
@qin-yu
Copy link
Contributor Author

qin-yu commented Jul 15, 2024

Hey @oeway this job https://github.com/bioimage-io/collection/actions/runs/9942967885 has been queuing for a while but there are no other jobs running. Could you have a look?

@qin-yu
Copy link
Contributor Author

qin-yu commented Jul 16, 2024

With the same conda environment on my EMBL disk, running bioimage.io test rfd.yaml on EMBL Kreshuk node will have a mismatch of 15.1% at 4-decimal precision for a package exported on EMBL Jupyter Hub VM; and vise versa.

I believe it's a problem cause by environment variable and/or CPU architecture.

@qin-yu qin-yu self-assigned this Jul 17, 2024
@qin-yu
Copy link
Contributor Author

qin-yu commented Jul 17, 2024

Alright, now we know that AVX2 and AVX512 on Xeon give similar but slightly different results (Mismatched elements: 4073 / 2073600 (0.196%)), while these results are very different compared to non-Xeon machines (Mismatched elements: 313129 / 2073600 (15.1%) ). So the problem is not from the use of AVX512 in Xeon, but something else.

@qin-yu
Copy link
Contributor Author

qin-yu commented Jul 17, 2024

Another test I made was: I opened a Xeon Jupyter Hub instance and the output matches my Xeon kreshuk-gpu1. This rules out the possibility of user-set environment variables being the cause.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant