Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

seed_everything(..., workers=True) causes the Dataloader to apply exactly the same augmentations each epoch if they sample values from torch.distributions #20412

Open
nan-dre opened this issue Nov 12, 2024 · 1 comment
Labels
bug Something isn't working needs triage Waiting to be triaged by maintainers ver: 2.4.x

Comments

@nan-dre
Copy link

nan-dre commented Nov 12, 2024

Bug description

If seed_everything with the workers=True flag is used, the seeds generated by _generate_seed_sequence that are used in pl_worker_init_fn will make each worker apply the same augmentations/transforms each epoch, if these augmentations use torch.distributions to sample random numbers (like for example torchvision.transforms.v2.MixUp).

Ergo, when you run torch.manual_seed() on seeds returned by _generate_seed_sequence, the number returned by torch.distributions.Beta().sample() will be the same for every seed, even though these seeds are different. Some examples of these generated seeds and the described behaviour is shown in the Example Seeds section. This is a problem since these augmentations need to change each epoch, otherwise they have no use.

Example seeds

import torch

# Seeds generated by _generate_seed_sequence
seeds = [
    17057675637851947009,
    15432886765291044865,
    9319658520661983233,
    5078955623891075073,
    5019553885430218753,
    16668740089769099265,
    9604690464733134849,
    3797031770071760897
]
    
for seed in seeds:
    torch.manual_seed(seed)
    print(torch.distributions.Beta(torch.tensor([0.8]), torch.tensor([0.8])).sample(()))

"""
Output - different seeds generate the same result:
tensor([0.6195])
tensor([0.6195])
tensor([0.6195])
tensor([0.6195])
tensor([0.6195])
tensor([0.6195])
tensor([0.6195])
tensor([0.6195])
"""

Minimal reproduction using pytest

On my fork of pytorch lightning, I added 3 tests that replicate the issue described above. I added this line in fabric/utilities/seed.py to sample values in the actual implementation of lightning. I also copied the relevant functions in the test_worker_custom_seed_everything test, for easier debugging. They have the same described behaviour. On the other hand, in the test_worker_no_seed test, no seed is set, and I added a custom worker_init_fn to the Dataloader, that just samples a value from a distribution and prints it.

Outputs

python -m pytest tests/tests_fabric/test_fabric.py::test_worker_default_seed_everything -v -s

tests/tests_fabric/test_fabric.py::test_worker_default_seed_everything Seed set to 3
Distribution: tensor([0.6195])
Distribution: tensor([0.6195])
Distribution: tensor([0.6195])
Distribution: tensor([0.6195])
Distribution: tensor([0.6195])
Distribution: tensor([0.6195])
Distribution: tensor([0.6195])
Distribution: tensor([0.6195])
Distribution: tensor([0.6195])
Distribution: tensor([0.6195])

python -m pytest tests/tests_fabric/test_fabric.py::test_worker_custom_seed_everything -v -s

tests/tests_fabric/test_fabric.py::test_worker_custom_seed_everything 
Initial seed: 7806346530661755125
Generated seed: 12756162343541407745
Distribution sample: tensor([0.6195])

Initial seed: 341095593985703797
Generated seed: 13505154612184743937
Distribution sample: tensor([0.6195])

Initial seed: 319168898032224452
Generated seed: 8469589144509612033
Distribution sample: tensor([0.6195])

Initial seed: 2646418513755438653
Generated seed: 4689922562870738945
Distribution sample: tensor([0.6195])

Initial seed: 2838346758071243699
Generated seed: 14917333150370627585
Distribution sample: tensor([0.6195])

Initial seed: 4003951842641992261
Generated seed: 7713286416326721537
Distribution sample: tensor([0.6195])

Initial seed: 5779861845470787470
Generated seed: 4475370722889302017
Distribution sample: tensor([0.6195])

Initial seed: 2159604158492218676
Generated seed: 15752753823300452353
Distribution sample: tensor([0.6195])

Initial seed: 48532092260506644
Generated seed: 11352619751932166145
Distribution sample: tensor([0.6195])

Initial seed: 5586823654916928366
Generated seed: 6292404260059480065
Distribution sample: tensor([0.6195])

python -m pytest tests/tests_fabric/test_fabric.py::test_worker_no_seed -v -s

tests/tests_fabric/test_fabric.py::test_worker_no_seed 
Initial seed: 5868116136142833399
Distribution sample: tensor([0.3792])

Initial seed: 6977025398337771032
Distribution sample: tensor([0.0429])

Initial seed: 7110739861327367931
Distribution sample: tensor([0.1800])

Initial seed: 1354156111310776062
Distribution sample: tensor([0.8291])

Initial seed: 4218882694964505147
Distribution sample: tensor([0.9748])

Initial seed: 4652230168177803610
Distribution sample: tensor([0.0358])

Initial seed: 7909160312112240407
Distribution sample: tensor([0.1071])

Initial seed: 7149162527360068890
Distribution sample: tensor([0.8885])

Initial seed: 881190753266873297
Distribution sample: tensor([0.9649])

Initial seed: 3918191294271262714
Distribution sample: tensor([0.6064])

In this case, a DataLoader with only 1 worker is created, to observe the problem easier. It also happens when multiple workers are used.

You can observe that even though the seed used to set torch.manual_seed in the worker's init function is different, the sampled values are the same across epochs. These tests are not complete, I could not figure out how to get either the initial seed or the sampled values of the workers, inside the test function.

Probable cause

Removing torch.manual_seed(seed_sqeuence[0]) from the pl_worker_init_function resolves the issue of repeated values across epochs. It is also how pytorch recommends implementing this function in their documentation (only setting the random seed for numpy and random).

What version are you seeing the problem on?

master

How to reproduce the bug

Below is a full end-to-end training example of the issue. The code is a reimplementation of this paper.

How to run

python main.py none - will run without setting any seed
python main.py custom - will run using a version of seed_everything implemented in the current file, in order to see the worker seeds and sampled values
python main.py lightning - will run using the default seed_everything function from lightning

Plotting the train_loss graphs on wandb, you can see that the custom and lightning versions have the same smooth loss, while the version that uses no set seed is spiky. This indicates that the no-seed version applies the augmentations correctly, while the set-seed versions don't.

image

import gc
import random
from typing import List, Tuple, Dict, Any, Optional, no_type_check

import lightning as pl
import numpy as np
import torch
import torchvision
import sys
from dataclasses import dataclass, field, asdict
from lightning import Trainer, LightningModule, LightningDataModule, seed_everything
from lightning.pytorch.loggers import WandbLogger
from sklearn.model_selection import train_test_split
from torch import nn
from torch.utils.data import DataLoader, Dataset, default_collate
from torchmetrics.functional.classification import multiclass_accuracy
from torchvision.transforms import v2


def custom_seed_everything(seed : int = 42):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

@dataclass
class DictDataclass:
    def to_dict(self, expand: bool = True):
        return transform_dict(asdict(self), expand)

@dataclass
class LoggerConfig(DictDataclass):
    project: str = 'scaling-mlps'
    name: str = '1_worker_lightning_seed'
    offline : bool = False

@dataclass
class ModelConfig(DictDataclass):
    resolution : int = 64
    width : int = 1024
    depth : int = 6
    dim_in : int = resolution ** 2 * 3
    dim_out : int = 10
    example_input_array : Optional[torch.Tensor] = None
    checkpoint : Optional[str] = None 
    lr : float = 1e-5
    weight_decay : float = 1e-3
    label_smoothing : float = 0.3

@dataclass
class DataConfig(DictDataclass):
    num_classes : int = 10
    img_size : Tuple[int, int] = (64, 64)
    dataset_path : str = f'sets/cifar10/'
    batch_size : int = 1024
    num_workers : int = 1
    crop_ratio : Tuple[float, float] = (0.8, 1.0)
    crop_scale : Tuple[float, float] = (0.4, 1.0)
    mixup : float = 0.8

@dataclass
class Config(DictDataclass):
    seed: Optional[str] = None
    logger: LoggerConfig = LoggerConfig()
    model: ModelConfig = ModelConfig()
    data: DataConfig = DataConfig()

# Helper function that converts a dataclass into a dictionary, because wandb cannot log dataclasses
@no_type_check
def transform_dict(config_dict: Dict[str, Any], expand: bool = True):
    ret: Dict[str, Any] = {}
    for k, v in config_dict.items():
        if v is None or isinstance(v, (int, float, str)):
            ret[k] = v
        elif isinstance(v, (list, tuple, set)):
            t = transform_dict(dict(enumerate(v)), expand)
            ret[k] = t if expand else [t[str(i)] for i in range(len(v))]
        elif isinstance(v, dict):
            ret[k] = transform_dict(v, expand)
        else:
            vname = v.__name__ if hasattr(v, '__name__') else v.__class__.__name__
            ret[k] = f"{v.__module__}:{vname}"
    return ret

class BottleneckMLP(pl.LightningModule):
    def __init__(self, cfg: Config):
        super(BottleneckMLP, self).__init__()
        self.cfg = cfg.model
        self.block_dims = [
            [4 * cfg.model.width, cfg.model.width] for _ in range(cfg.model.depth)
        ]
        self.norm = nn.LayerNorm
        self.example_input_array = cfg.model.example_input_array
        self.linear_in = nn.Linear(cfg.model.dim_in, self.block_dims[0][1])
        self.linear_out = nn.Linear(self.block_dims[-1][1], cfg.model.dim_out)
        blocks = []
        layernorms = []

        for block_dim in self.block_dims:
            wide, thin = block_dim
            blocks.append(BottleneckBlock(thin=thin, wide=wide))
            layernorms.append(self.norm(thin))

        self.blocks = nn.ModuleList(blocks)
        self.layernorms = nn.ModuleList(layernorms)

    def forward(self, x):
        x = torch.reshape(x, (x.shape[0], -1))
        x = self.linear_in(x)

        for block, norm in zip(self.blocks, self.layernorms):
            x = x + block(norm(x))

        out = self.linear_out(x)

        return out

    def training_step(self, batch, batch_idx):
        outputs, probs, metrics = self._compute_metrics(batch, "train")
        self.log_dict(metrics, on_epoch=True, on_step=False, prog_bar=True)
        return metrics["train_loss"]

    def validation_step(self, batch, batch_idx):
        outputs, probs, metrics = self._compute_metrics(batch, "val")
        self.log_dict(metrics, on_epoch=True, on_step=False, prog_bar=True)
        return metrics["val_loss"]

    def test_step(self, batch, batch_idx):
        outputs, probs, metrics = self._compute_metrics(batch, "test")
        self.log_dict(metrics, on_epoch=True, on_step=False, prog_bar=True)
        return metrics["test_loss"]

    def configure_optimizers(self) -> torch.optim.Optimizer:
        return torch.optim.Adam(self.parameters(), lr=self.cfg.lr)

    def _compute_metrics(self, batch, split) -> tuple:
        x, y = batch
        outputs = self(x)
        loss = nn.functional.cross_entropy(outputs, y, label_smoothing=self.cfg.label_smoothing)

        probs = torch.argmax(torch.softmax(outputs, dim=-1), dim=1)

        metrics: Dict[str, Any] = {f"{split}_loss": loss}
        if split == "val":
            metrics["val_acc"] = multiclass_accuracy(preds=probs, target=y, num_classes=self.cfg.dim_out)

        if split == "test":
            metrics["test_acc"] = multiclass_accuracy(preds=probs, target=y, num_classes=self.cfg.dim_out)
        return outputs, probs, metrics


class BottleneckBlock(nn.Module):
    def __init__(self, thin, wide, act=nn.GELU()):
        super(BottleneckBlock, self).__init__()
        self.block = nn.Sequential(nn.Linear(thin, wide), act, nn.Linear(wide, thin))

    def forward(self, x):
        out = self.block(x)
        return out

class MyDataset(Dataset):
    def __init__(self, subset, transform=None):
        self.subset = subset
        self.transform = transform
        
    def __getitem__(self, index):
        x, y = self.subset[index]
        if self.transform:
            x = self.transform(x)
        return x, y
        
    def __len__(self):
        return len(self.subset)

class CIFAR10DataModule(LightningDataModule):
    def __init__(self, cfg : Config):
        super().__init__()
        self.root_cfg = cfg
        self.cfg = cfg.data
        self.generator = torch.Generator()
        self.generator.manual_seed(0)
        self.save_hyperparameters()
        

    # Function copied from fabric/utilities/seed.py
    def _generate_seed_sequence(self, base_seed: int, worker_id: int, global_rank: int, count: int) -> List[int]:
        """Generates a sequence of seeds from a base seed, worker id and rank using the linear congruential generator (LCG)
        algorithm."""
        # Combine base seed, worker id and rank into a unique 64-bit number
        combined_seed = (base_seed << 32) | (worker_id << 16) | global_rank
        seeds = []
        for _ in range(count):
            # x_(n+1) = (a * x_n + c) mod m. With c=1, m=2^64 and a is D. Knuth's constant
            combined_seed = (combined_seed * 6364136223846793005 + 1) & ((1 << 64) - 1)
            seeds.append(combined_seed)
        return seeds

    def _worker_init_fn(self, worker_id):
        process_seed = torch.initial_seed()
        print(f"Worker {worker_id}")
        print(f"Initial seed: {process_seed}")
        base_seed = process_seed - worker_id
        seed_sequence = self._generate_seed_sequence(base_seed=base_seed , worker_id=worker_id, global_rank=0, count=4)
        torch.manual_seed(seed_sequence[0])  # torch takes a 64-bit seed
        print(f"Generated seed: {seed_sequence[0]}")
        print(f"Dist: {torch.distributions.Beta(torch.tensor([0.8]), torch.tensor([0.8])).sample(())}")
        print(f"Binomial: {torch.distributions.Binomial(100, torch.tensor([0 , .2, .8, 1])).sample()}")
        random.seed((seed_sequence[1] << 32) | seed_sequence[2])  # combine two 64-bit seeds
        np.random.seed(seed_sequence[3] & 0xFFFFFFFF)  # numpy takes 32-bit seed only
        
    def setup(self, stage: Optional[str] = None) -> None:
        if stage == "fit" or stage == "test":
            train_transforms = v2.Compose([
                    v2.ToImage(),
                    v2.RandomResizedCrop(size=self.cfg.img_size, ratio=self.cfg.crop_ratio, scale=self.cfg.crop_scale),
                    v2.ToDtype(torch.float32, scale=True),
                    v2.Normalize(mean=[0.49139968, 0.48215827, 0.44653124], std=[0.24703233, 0.24348505, 0.26158768]),
            ])
            
            test_transforms = v2.Compose([
                    v2.ToImage(),
                    v2.Resize(size=self.cfg.img_size),
                    v2.ToDtype(torch.float32, scale=True),
                    v2.Normalize(mean=[0.49139968, 0.48215827, 0.44653124], std=[0.24703233, 0.24348505, 0.26158768]),
            ])

            train_data = torchvision.datasets.CIFAR10(root=self.cfg.dataset_path, train=True, download=True)
            test_data = torchvision.datasets.CIFAR10(root=self.cfg.dataset_path, train=False, download=True)
            train_indices, val_indices = train_test_split(
                list(range(len(train_data))),
                test_size=10000,
                stratify=train_data.targets
            )
            self.train_set = MyDataset(torch.utils.data.Subset(train_data, train_indices), transform=train_transforms)
            self.val_set = MyDataset(torch.utils.data.Subset(train_data, val_indices), transform=test_transforms)
            self.test_set = MyDataset(test_data, transform=test_transforms)
    
    def train_dataloader(self) -> DataLoader:
        if self.root_cfg.seed == 'lightning' or self.root_cfg.seed == 'none':
            return DataLoader(
                dataset=self.train_set,
                batch_size=self.cfg.batch_size,
                num_workers=self.cfg.num_workers,
                pin_memory=True,
                shuffle=True,
                drop_last=True,
                collate_fn=self._mixup_collate_fn,
            )
        elif self.root_cfg.seed == 'custom':
            return DataLoader(
                dataset=self.train_set,
                batch_size=self.cfg.batch_size,
                num_workers=self.cfg.num_workers,
                pin_memory=True,
                shuffle=True,
                drop_last=True,
                collate_fn=self._mixup_collate_fn,
                worker_init_fn=self._worker_init_fn
            )
        else:
            exit()


    def val_dataloader(self) -> DataLoader:
        return DataLoader(
            dataset=self.val_set,
            batch_size=self.cfg.batch_size,
            num_workers=self.cfg.num_workers,
            pin_memory=True,
            drop_last=False,
        )

    def test_dataloader(self) -> DataLoader:
        return DataLoader(
            dataset=self.test_set,
            batch_size=self.cfg.batch_size,
            num_workers=self.cfg.num_workers,
            drop_last=False,
        )
        
    def _mixup_collate_fn(self, batch):
        mixup = v2.MixUp(alpha = self.cfg.mixup, num_classes=self.cfg.num_classes)(*default_collate(batch)) # type: ignore
        return mixup

def training_loop(cfg: Config) -> tuple[LightningModule, LightningDataModule]:
    if cfg.seed == 'lightning':
        seed_everything(0, workers=True)
    elif cfg.seed == 'custom':
        custom_seed_everything(0)
    model = BottleneckMLP(cfg)
    datamodule = CIFAR10DataModule(cfg)

    logger = WandbLogger(
        project=cfg.logger.project,
        name=cfg.logger.name,
        offline=cfg.logger.offline,
        config=cfg.to_dict(),
    )

    trainer = Trainer(
        devices = 1,
        max_epochs = 100,
        precision = 'bf16-mixed',
        deterministic = True,
        log_every_n_steps = 100,
        logger=logger,
    )
    trainer.fit(model=model, datamodule=datamodule)
    return model, datamodule


if __name__ == "__main__":
    torch.cuda.empty_cache()
    gc.collect()
    
    config = Config()
    config.seed = sys.argv[1]
    
    training_loop(config)

Error messages and logs

No response

Environment

Current environment
  • CUDA:
    - GPU:
    - NVIDIA GeForce RTX 3060
    - available: True
    - version: 12.4
  • Lightning:
    - lightning: 2.5.0.dev0
    - lightning-utilities: 0.11.8
    - pytorch-lightning: 2.4.0
    - torch: 2.5.1
    - torchmetrics: 1.5.2
    - torchvision: 0.20.1
  • Packages:
    - absl-py: 2.1.0
    - aiohappyeyeballs: 2.4.3
    - aiohttp: 3.10.10
    - aiosignal: 1.3.1
    - alabaster: 0.7.16
    - antlr4-python3-runtime: 4.9.3
    - anyio: 4.6.2.post1
    - appdirs: 1.4.4
    - argcomplete: 3.4.0
    - argon2-cffi: 23.1.0
    - argon2-cffi-bindings: 21.2.0
    - arrow: 1.3.0
    - astroid: 3.2.4
    - asttokens: 2.4.1
    - async-lru: 2.0.4
    - async-timeout: 4.0.3
    - attrs: 23.2.0
    - autocommand: 2.2.2
    - babel: 2.16.0
    - backcall: 0.2.0
    - backports.tarfile: 1.2.0
    - beautifulsoup4: 4.12.3
    - bitsandbytes: 0.44.1
    - black: 24.8.0
    - bleach: 6.2.0
    - boltons: 23.0.0
    - brotli: 1.0.9
    - certifi: 2024.8.30
    - cffi: 1.17.1
    - cfgv: 3.4.0
    - charset-normalizer: 3.3.2
    - click: 8.1.7
    - cloudpickle: 2.2.1
    - colorama: 0.4.6
    - coloredlogs: 15.0.1
    - comm: 0.2.2
    - conda: 23.3.1
    - conda-package-handling: 2.3.0
    - conda-package-streaming: 0.10.0
    - contourpy: 1.2.0
    - coverage: 7.3.1
    - cryptography: 43.0.0
    - curio: 1.6
    - cycler: 0.12.1
    - debugpy: 1.8.8
    - decorator: 5.1.1
    - deepspeed: 0.9.3
    - defusedxml: 0.7.1
    - dill: 0.3.8
    - distlib: 0.3.9
    - docstring-parser: 0.16
    - docutils: 0.21.2
    - dotty-dict: 1.3.1
    - exceptiongroup: 1.2.2
    - executing: 2.1.0
    - fastapi: 0.115.4
    - fastjsonschema: 2.20.0
    - filelock: 3.16.1
    - flatbuffers: 24.3.25
    - fonttools: 4.47.0
    - fqdn: 1.5.1
    - frozenlist: 1.5.0
    - fsspec: 2024.10.0
    - grpcio: 1.67.1
    - h11: 0.14.0
    - halo: 0.0.31
    - hid: 1.0.6
    - hjson: 3.1.0
    - httpcore: 1.0.6
    - httpx: 0.27.2
    - humanfriendly: 10.0
    - hydra-core: 1.3.2
    - identify: 2.6.2
    - idna: 3.7
    - imagesize: 1.4.1
    - importlib-metadata: 8.5.0
    - importlib-resources: 6.1.1
    - inflect: 7.3.1
    - iniconfig: 2.0.0
    - ipykernel: 6.29.5
    - ipyparallel: 9.0.0
    - ipython: 8.1.1
    - ipywidgets: 8.1.5
    - isoduration: 20.11.0
    - isort: 5.13.2
    - jaraco.collections: 5.1.0
    - jaraco.context: 5.3.0
    - jaraco.functools: 4.0.1
    - jaraco.text: 3.12.1
    - jedi: 0.19.2
    - jinja2: 3.1.4
    - joblib: 1.4.2
    - json5: 0.9.28
    - jsonargparse: 4.34.0
    - jsonpatch: 1.33
    - jsonpointer: 2.1
    - jsonschema: 4.23.0
    - jsonschema-specifications: 2023.12.1
    - jupyter-client: 8.6.3
    - jupyter-core: 5.7.2
    - jupyter-events: 0.10.0
    - jupyter-lsp: 2.2.5
    - jupyter-server: 2.14.2
    - jupyter-server-terminals: 0.5.3
    - jupyterlab: 4.2.5
    - jupyterlab-pygments: 0.3.0
    - jupyterlab-server: 2.27.3
    - jupyterlab-widgets: 3.0.13
    - kiwisolver: 1.4.5
    - lightning: 2.5.0.dev0
    - lightning-utilities: 0.11.8
    - log-symbols: 0.0.14
    - markdown: 3.7
    - markdown-it-py: 3.0.0
    - markupsafe: 3.0.2
    - matplotlib: 3.9.2
    - matplotlib-inline: 0.1.7
    - mccabe: 0.7.0
    - mdurl: 0.1.2
    - milc: 1.8.0
    - mistune: 3.0.2
    - more-itertools: 10.3.0
    - mpmath: 1.3.0
    - multidict: 6.1.0
    - mypy-extensions: 1.0.0
    - nbclient: 0.10.0
    - nbconvert: 7.16.4
    - nbformat: 5.10.4
    - nest-asyncio: 1.6.0
    - networkx: 3.2.1
    - ninja: 1.11.1.1
    - nodeenv: 1.9.1
    - notebook: 7.2.2
    - notebook-shim: 0.2.4
    - numpy: 1.26.4
    - nvidia-cublas-cu12: 12.4.5.8
    - nvidia-cuda-cupti-cu12: 12.4.127
    - nvidia-cuda-nvrtc-cu12: 12.4.127
    - nvidia-cuda-runtime-cu12: 12.4.127
    - nvidia-cudnn-cu12: 9.1.0.70
    - nvidia-cufft-cu12: 11.2.1.3
    - nvidia-curand-cu12: 10.3.5.147
    - nvidia-cusolver-cu12: 11.6.1.9
    - nvidia-cusparse-cu12: 12.3.1.170
    - nvidia-nccl-cu12: 2.21.5
    - nvidia-nvjitlink-cu12: 12.4.127
    - nvidia-nvtx-cu12: 12.4.127
    - omegaconf: 2.3.0
    - onnx: 1.17.0
    - onnxruntime: 1.19.2
    - outcome: 1.3.0.post0
    - overrides: 7.7.0
    - packaging: 24.1
    - pandas: 2.2.3
    - pandocfilters: 1.5.1
    - parso: 0.8.4
    - pathspec: 0.10.3
    - pexpect: 4.9.0
    - pickleshare: 0.7.5
    - pillow: 10.4.0
    - pip: 24.2
    - platformdirs: 3.10.0
    - pluggy: 1.0.0
    - pre-commit: 4.0.1
    - prometheus-client: 0.21.0
    - prompt-toolkit: 3.0.48
    - propcache: 0.2.0
    - protobuf: 5.28.3
    - psutil: 5.9.8
    - ptyprocess: 0.7.0
    - pure-eval: 0.2.3
    - py-cpuinfo: 9.0.0
    - pycocotools: 2.0
    - pycosat: 0.6.6
    - pycparser: 2.21
    - pydantic: 1.10.19
    - pygments: 2.18.0
    - pylint: 3.2.7
    - pyopenssl: 24.2.1
    - pyparsing: 3.1.1
    - pyserial: 3.5
    - pysocks: 1.7.1
    - pytest: 7.4.0
    - pytest-asyncio: 0.23.8
    - pytest-cov: 4.1.0
    - pytest-random-order: 1.1.0
    - pytest-rerunfailures: 12.0
    - pytest-timeout: 2.1.0
    - python-dateutil: 2.9.0.post0
    - python-json-logger: 2.0.7
    - pytorch-lightning: 2.4.0
    - pytz: 2024.2
    - pyusb: 1.2.1
    - pyyaml: 6.0.2
    - pyzmq: 26.2.0
    - qmk: 1.1.5
    - qtconsole: 5.6.1
    - qtpy: 2.4.2
    - referencing: 0.35.1
    - requests: 2.32.3
    - rfc3339-validator: 0.1.4
    - rfc3986-validator: 0.1.1
    - rich: 13.9.4
    - rpds-py: 0.19.1
    - ruamel.yaml: 0.17.21
    - ruamel.yaml.clib: 0.2.8
    - scikit-learn: 1.5.2
    - scipy: 1.13.1
    - send2trash: 1.8.3
    - setuptools: 75.1.0
    - six: 1.16.0
    - sniffio: 1.3.1
    - snowballstemmer: 2.2.0
    - sortedcontainers: 2.4.0
    - soupsieve: 2.6
    - sphinx: 7.4.7
    - sphinxcontrib-applehelp: 2.0.0
    - sphinxcontrib-devhelp: 2.0.0
    - sphinxcontrib-htmlhelp: 2.1.0
    - sphinxcontrib-jsmath: 1.0.1
    - sphinxcontrib-qthelp: 2.0.0
    - sphinxcontrib-serializinghtml: 2.0.0
    - spinners: 0.0.24
    - stack-data: 0.6.3
    - starlette: 0.41.2
    - sympy: 1.13.1
    - tensorboard: 2.18.0
    - tensorboard-data-server: 0.7.2
    - tensorboardx: 2.6.2.2
    - termcolor: 2.4.0
    - terminado: 0.18.1
    - testpath: 0.6.0
    - threadpoolctl: 3.5.0
    - tinycss2: 1.4.0
    - tomli: 2.0.1
    - tomlkit: 0.13.2
    - toolz: 0.12.0
    - torch: 2.5.1
    - torchmetrics: 1.5.2
    - torchvision: 0.20.1
    - tornado: 6.4.1
    - tqdm: 4.66.5
    - traitlets: 5.14.3
    - trio: 0.27.0
    - triton: 3.1.0
    - typeguard: 4.3.0
    - types-colorama: 0.4.15.20240311
    - types-python-dateutil: 2.9.0.20241003
    - typeshed-client: 2.7.0
    - typing-extensions: 4.11.0
    - tzdata: 2024.2
    - uri-template: 1.3.0
    - urllib3: 2.2.3
    - uvicorn: 0.32.0
    - virtualenv: 20.27.1
    - wcwidth: 0.2.13
    - webcolors: 24.11.1
    - webencodings: 0.5.1
    - websocket-client: 1.8.0
    - werkzeug: 3.1.3
    - wheel: 0.44.0
    - widgetsnbextension: 4.0.13
    - yarl: 1.17.1
    - zipp: 3.21.0
    - zstandard: 0.23.0
  • System:
    - OS: Linux
    - architecture:
    - 64bit
    - ELF
    - processor: x86_64
    - python: 3.9.20
    - release: 5.15.153.1-microsoft-standard-WSL2
    - version: Proposal for help #1 SMP Fri Mar 29 23:14:13 UTC 2024

More info

No response

@nan-dre nan-dre added bug Something isn't working needs triage Waiting to be triaged by maintainers labels Nov 12, 2024
@nan-dre nan-dre changed the title seed_everything with workers=True causes the Dataloader to apply exactly the same augmentations each epoch if they sample values from torch.distributions seed_everything(..., workers=True) causes the Dataloader to apply exactly the same augmentations each epoch if they sample values from torch.distributions Nov 12, 2024
@nan-dre
Copy link
Author

nan-dre commented Nov 14, 2024

@awaelchli @01AbhiSingh I see this commit that you authored is the one that changed the code regarding the seed generation. Could you take a look at this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Waiting to be triaged by maintainers ver: 2.4.x
Projects
None yet
Development

No branches or pull requests

1 participant