Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: CUDA error: device-side assert triggered #158

Open
sunyiwk opened this issue Oct 1, 2024 · 0 comments
Open

RuntimeError: CUDA error: device-side assert triggered #158

sunyiwk opened this issue Oct 1, 2024 · 0 comments

Comments

@sunyiwk
Copy link

sunyiwk commented Oct 1, 2024

[2024-10-01 21:30:25] [INFO] Running bash "/gemini/code/fluxgym/outputs/uwyy/train.sh"
[2024-10-01 21:31:17] [INFO] The following values were not passed to accelerate launch and had defaults used instead:
[2024-10-01 21:31:17] [INFO] --num_processes was set to a value of 1
[2024-10-01 21:31:17] [INFO] --num_machines was set to a value of 1
[2024-10-01 21:31:17] [INFO] --dynamo_backend was set to a value of 'no'
[2024-10-01 21:31:17] [INFO] To avoid this warning pass in values for each of the problematic parameters or run accelerate config.
[2024-10-01 21:32:08] [INFO] highvram is enabled / highvramが有効です
[2024-10-01 21:32:08] [INFO] 2024-10-01 21:32:08 WARNING cache_latents_to_disk is enabled, so cache_latents is also enabled / train_util.py:4022
[2024-10-01 21:32:08] [INFO] cache_latents_to_diskが有効なため、cache_latentsを有効にします
[2024-10-01 21:32:09] [INFO] 2024-10-01 21:32:09 INFO t5xxl_max_token_length: 512 flux_train_network.py:155
[2024-10-01 21:32:09] [INFO] INFO load tokenizer from cache: strategy_base.py:43
[2024-10-01 21:32:09] [INFO] /gemini/code/fluxgym/sd-scripts/library/models--openai--clip-vit-large-patch
[2024-10-01 21:32:09] [INFO] 14
[2024-10-01 21:32:09] [INFO] /gemini/code/fluxgym/env/lib/python3.11/site-packages/transformers/tokenization_utils_base.py:1601: FutureWarning: clean_up_tokenization_spaces was not set. It will be set to True by default. This behavior will be depracted in transformers v4.45, and will be then set to False by default. For more details check this issue: huggingface/transformers#31884
[2024-10-01 21:32:09] [INFO] warnings.warn(
[2024-10-01 21:32:09] [INFO] INFO load tokenizer from cache: strategy_base.py:43
[2024-10-01 21:32:09] [INFO] /gemini/code/fluxgym/sd-scripts/library/models--openai--clip-vit-large-patch
[2024-10-01 21:32:09] [INFO] 14
[2024-10-01 21:32:10] [INFO] 2024-10-01 21:32:10 INFO Loading dataset config from /gemini/code/fluxgym/outputs/uwyy/dataset.toml train_network.py:280
[2024-10-01 21:32:10] [INFO] INFO prepare images. train_util.py:1872
[2024-10-01 21:32:10] [INFO] INFO get image size from name of cache files train_util.py:1810
[2024-10-01 21:32:10] [INFO] 0%| | 0/2 [00:00<?, ?it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 3653.57it/s]
[2024-10-01 21:32:10] [INFO] INFO set image size from cache files: 0/2 train_util.py:1817
[2024-10-01 21:32:10] [INFO] INFO found directory /gemini/code/fluxgym/datasets/uwyy contains 2 image files train_util.py:1819
[2024-10-01 21:32:10] [INFO] INFO 2 train images with repeating. train_util.py:1913
[2024-10-01 21:32:10] [INFO] INFO 0 reg images. train_util.py:1916
[2024-10-01 21:32:10] [INFO] WARNING no regularization images / 正則化画像が見つかりませんでした train_util.py:1921
[2024-10-01 21:32:10] [INFO] INFO [Dataset 0] config_util.py:570
[2024-10-01 21:32:10] [INFO] batch_size: 1
[2024-10-01 21:32:10] [INFO] resolution: (512, 512)
[2024-10-01 21:32:10] [INFO] enable_bucket: False
[2024-10-01 21:32:10] [INFO] network_multiplier: 1.0
[2024-10-01 21:32:10] [INFO]
[2024-10-01 21:32:10] [INFO] [Subset 0 of Dataset 0]
[2024-10-01 21:32:10] [INFO] image_dir: "/gemini/code/fluxgym/datasets/uwyy"
[2024-10-01 21:32:10] [INFO] image_count: 2
[2024-10-01 21:32:10] [INFO] num_repeats: 1
[2024-10-01 21:32:10] [INFO] shuffle_caption: False
[2024-10-01 21:32:10] [INFO] keep_tokens: 1
[2024-10-01 21:32:10] [INFO] keep_tokens_separator:
[2024-10-01 21:32:10] [INFO] caption_separator: ,
[2024-10-01 21:32:10] [INFO] secondary_separator: None
[2024-10-01 21:32:10] [INFO] enable_wildcard: False
[2024-10-01 21:32:10] [INFO] caption_dropout_rate: 0.0
[2024-10-01 21:32:10] [INFO] caption_dropout_every_n_epoches: 0
[2024-10-01 21:32:10] [INFO] caption_tag_dropout_rate: 0.0
[2024-10-01 21:32:10] [INFO] caption_prefix: None
[2024-10-01 21:32:10] [INFO] caption_suffix: None
[2024-10-01 21:32:10] [INFO] color_aug: False
[2024-10-01 21:32:10] [INFO] flip_aug: False
[2024-10-01 21:32:10] [INFO] face_crop_aug_range: None
[2024-10-01 21:32:10] [INFO] random_crop: False
[2024-10-01 21:32:10] [INFO] token_warmup_min: 1,
[2024-10-01 21:32:10] [INFO] token_warmup_step: 0,
[2024-10-01 21:32:10] [INFO] alpha_mask: False,
[2024-10-01 21:32:10] [INFO] is_reg: False
[2024-10-01 21:32:10] [INFO] class_tokens: uwyya
[2024-10-01 21:32:10] [INFO] caption_extension: .txt
[2024-10-01 21:32:10] [INFO]
[2024-10-01 21:32:10] [INFO]
[2024-10-01 21:32:10] [INFO] INFO [Dataset 0] config_util.py:576
[2024-10-01 21:32:10] [INFO] INFO loading image sizes. train_util.py:909
[2024-10-01 21:32:10] [INFO] 0%| | 0/2 [00:00<?, ?it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 728.37it/s]
[2024-10-01 21:32:10] [INFO] INFO prepare dataset train_util.py:917
[2024-10-01 21:32:10] [INFO] INFO preparing accelerator train_network.py:345
[2024-10-01 21:32:10] [INFO] Using config file: /etc/orion/env/env.conf
[2024-10-01 21:32:12] [INFO] 2024-10-01 21:32:12 WARNING Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; other.py:349
[2024-10-01 21:32:12] [INFO] this can cause the process to hang. It is recommended to upgrade the kernel to the
[2024-10-01 21:32:12] [INFO] minimum version or higher.
[2024-10-01 21:32:12] [INFO] accelerator device: cuda
[2024-10-01 21:32:12] [INFO] INFO Building Flux model dev flux_utils.py:45
[2024-10-01 21:32:12] [INFO] INFO Loading state dict from /gemini/code/fluxgym/models/unet/flux1-dev.sft flux_utils.py:52
[2024-10-01 21:32:20] [INFO] 2024-10-01 21:32:20 INFO Loaded Flux: flux_utils.py:55
[2024-10-01 21:32:20] [INFO] INFO prepare split model flux_train_network.py:110
[2024-10-01 21:32:20] [INFO] INFO load state dict for lower flux_train_network.py:117
[2024-10-01 21:32:20] [INFO] INFO load state dict for upper flux_train_network.py:122
[2024-10-01 21:32:20] [INFO] INFO prepare upper model flux_train_network.py:125
[2024-10-01 21:37:18] [INFO] 2024-10-01 21:37:18 INFO split model prepared flux_train_network.py:140
[2024-10-01 21:37:18] [INFO] INFO Building CLIP flux_utils.py:74
[2024-10-01 21:37:18] [INFO] INFO Loading state dict from /gemini/code/fluxgym/models/clip/clip_l.safetensors flux_utils.py:167
[2024-10-01 21:37:20] [INFO] 2024-10-01 21:37:20 INFO Loaded CLIP: flux_utils.py:170
[2024-10-01 21:37:20] [INFO] INFO Loading state dict from flux_utils.py:215
[2024-10-01 21:37:20] [INFO] /gemini/code/fluxgym/models/clip/t5xxl_fp16.safetensors
[2024-10-01 21:37:26] [INFO] 2024-10-01 21:37:26 INFO Loaded T5xxl: flux_utils.py:218
[2024-10-01 21:37:26] [INFO] INFO Building AutoEncoder flux_utils.py:62
[2024-10-01 21:37:26] [INFO] INFO Loading state dict from /gemini/code/fluxgym/models/vae/ae.sft flux_utils.py:66
[2024-10-01 21:37:28] [INFO] 2024-10-01 21:37:28 INFO Loaded AE: flux_utils.py:69
[2024-10-01 21:37:28] [INFO] import network module: networks.lora_flux
[2024-10-01 21:37:33] [INFO] 2024-10-01 21:37:33 INFO [Dataset 0] train_util.py:2396
[2024-10-01 21:37:33] [INFO] INFO caching latents with caching strategy. train_util.py:1017
[2024-10-01 21:37:33] [INFO] INFO checking cache validity... train_util.py:1044
[2024-10-01 21:37:33] [INFO] 0%| | 0/2 [00:00<?, ?it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 1707.08it/s]
[2024-10-01 21:37:33] [INFO] INFO caching latents... train_util.py:1091
[2024-10-01 21:38:00] [INFO] 0%| | 0/2 [00:00<?, ?it/s]
50%|████████████████████████████████████████████▌ | 1/2 [00:26<00:26, 26.63s/it]
100%|█████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:26<00:00, 4.21it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:26<00:00, 13.43s/it]
[2024-10-01 21:38:00] [INFO] 2024-10-01 21:38:00 INFO move vae and unet to cpu to save memory flux_train_network.py:208
[2024-10-01 21:38:00] [INFO] INFO move text encoders to gpu flux_train_network.py:216
[2024-10-01 21:42:11] [INFO] 2024-10-01 21:42:11 INFO [Dataset 0] train_util.py:2417
[2024-10-01 21:42:11] [INFO] INFO caching Text Encoder outputs with caching strategy. train_util.py:1179
[2024-10-01 21:42:11] [INFO] INFO checking cache validity... train_util.py:1185
[2024-10-01 21:42:11] [INFO] 0%| | 0/2 [00:00<?, ?it/s]
100%|███████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 1346.49it/s]
[2024-10-01 21:42:11] [INFO] INFO caching Text Encoder outputs... train_util.py:1211
[2024-10-01 21:42:13] [INFO] 0%| | 0/2 [00:00<?, ?it/s]../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [250,0,0], thread: [32,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [250,0,0], thread: [33,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [250,0,0], thread: [34,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [250,0,0], thread: [35,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [250,0,0], thread: [36,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [250,0,0], thread: [37,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [250,0,0], thread: [38,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [250,0,0], thread: [39,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [250,0,0], thread: [40,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [250,0,0], thread: [41,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [250,0,0], thread: [42,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [250,0,0], thread: [43,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [250,0,0], thread: [44,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [250,0,0], thread: [45,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [250,0,0], thread: [46,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [250,0,0], thread: [47,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [250,0,0], thread: [48,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [250,0,0], thread: [49,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [250,0,0], thread: [50,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [250,0,0], thread: [51,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [250,0,0], thread: [52,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [250,0,0], thread: [53,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [250,0,0], thread: [54,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [250,0,0], thread: [55,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [250,0,0], thread: [56,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [250,0,0], thread: [57,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [250,0,0], thread: [58,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [250,0,0], thread: [59,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [250,0,0], thread: [60,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [250,0,0], thread: [61,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [250,0,0], thread: [62,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [250,0,0], thread: [63,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [134,0,0], thread: [96,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [134,0,0], thread: [97,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [134,0,0], thread: [98,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [134,0,0], thread: [99,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [134,0,0], thread: [100,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [134,0,0], thread: [101,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [134,0,0], thread: [102,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [134,0,0], thread: [103,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [134,0,0], thread: [104,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [134,0,0], thread: [105,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [134,0,0], thread: [106,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [134,0,0], thread: [107,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [134,0,0], thread: [108,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [134,0,0], thread: [109,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [134,0,0], thread: [110,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [134,0,0], thread: [111,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [134,0,0], thread: [112,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [134,0,0], thread: [113,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [134,0,0], thread: [114,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [134,0,0], thread: [115,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [134,0,0], thread: [116,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [134,0,0], thread: [117,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [134,0,0], thread: [118,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [134,0,0], thread: [119,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [134,0,0], thread: [120,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [134,0,0], thread: [121,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [134,0,0], thread: [122,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [134,0,0], thread: [123,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [134,0,0], thread: [124,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [134,0,0], thread: [125,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [134,0,0], thread: [126,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:13] [INFO] ../aten/src/ATen/native/cuda/Indexing.cu:1284: indexSelectLargeIndex: block: [134,0,0], thread: [127,0,0] Assertion srcIndex < srcSelectDimSize failed.
[2024-10-01 21:42:14] [INFO] 0%| | 0/2 [00:02<?, ?it/s]
[2024-10-01 21:42:14] [INFO] Traceback (most recent call last):
[2024-10-01 21:42:14] [INFO] File "/gemini/code/fluxgym/sd-scripts/flux_train_network.py", line 519, in
[2024-10-01 21:42:14] [INFO] trainer.train(args)
[2024-10-01 21:42:14] [INFO] File "/gemini/code/fluxgym/sd-scripts/train_network.py", line 402, in train
[2024-10-01 21:42:14] [INFO] self.cache_text_encoder_outputs_if_needed(args, accelerator, unet, vae, text_encoders, train_dataset_group, weight_dtype)
[2024-10-01 21:42:14] [INFO] File "/gemini/code/fluxgym/sd-scripts/flux_train_network.py", line 228, in cache_text_encoder_outputs_if_needed
[2024-10-01 21:42:14] [INFO] dataset.new_cache_text_encoder_outputs(text_encoders, accelerator.is_main_process)
[2024-10-01 21:42:14] [INFO] File "/gemini/code/fluxgym/sd-scripts/library/train_util.py", line 2418, in new_cache_text_encoder_outputs
[2024-10-01 21:42:14] [INFO] dataset.new_cache_text_encoder_outputs(models, is_main_process)
[2024-10-01 21:42:14] [INFO] File "/gemini/code/fluxgym/sd-scripts/library/train_util.py", line 1214, in new_cache_text_encoder_outputs
[2024-10-01 21:42:14] [INFO] caching_strategy.cache_batch_outputs(tokenize_strategy, models, text_encoding_strategy, batch)
[2024-10-01 21:42:14] [INFO] File "/gemini/code/fluxgym/sd-scripts/library/strategy_flux.py", line 162, in cache_batch_outputs
[2024-10-01 21:42:14] [INFO] l_pooled, t5_out, txt_ids, _ = flux_text_encoding_strategy.encode_tokens(tokenize_strategy, models, tokens_and_masks)
[2024-10-01 21:42:14] [INFO] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2024-10-01 21:42:14] [INFO] File "/gemini/code/fluxgym/sd-scripts/library/strategy_flux.py", line 76, in encode_tokens
[2024-10-01 21:42:14] [INFO] t5_out, _ = t5xxl(t5_tokens.to(t5xxl.device), attention_mask, return_dict=False, output_hidden_states=True)
[2024-10-01 21:42:14] [INFO] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2024-10-01 21:42:14] [INFO] File "/gemini/code/fluxgym/env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[2024-10-01 21:42:14] [INFO] return self._call_impl(*args, **kwargs)
[2024-10-01 21:42:14] [INFO] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2024-10-01 21:42:14] [INFO] File "/gemini/code/fluxgym/env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[2024-10-01 21:42:14] [INFO] return forward_call(*args, **kwargs)
[2024-10-01 21:42:14] [INFO] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2024-10-01 21:42:14] [INFO] File "/gemini/code/fluxgym/env/lib/python3.11/site-packages/transformers/models/t5/modeling_t5.py", line 1971, in forward
[2024-10-01 21:42:14] [INFO] encoder_outputs = self.encoder(
[2024-10-01 21:42:14] [INFO] ^^^^^^^^^^^^^
[2024-10-01 21:42:14] [INFO] File "/gemini/code/fluxgym/env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[2024-10-01 21:42:14] [INFO] return self._call_impl(*args, **kwargs)
[2024-10-01 21:42:14] [INFO] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2024-10-01 21:42:14] [INFO] File "/gemini/code/fluxgym/env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[2024-10-01 21:42:14] [INFO] return forward_call(*args, **kwargs)
[2024-10-01 21:42:14] [INFO] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2024-10-01 21:42:14] [INFO] File "/gemini/code/fluxgym/env/lib/python3.11/site-packages/transformers/models/t5/modeling_t5.py", line 1012, in forward
[2024-10-01 21:42:14] [INFO] inputs_embeds = self.embed_tokens(input_ids)
[2024-10-01 21:42:14] [INFO] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2024-10-01 21:42:14] [INFO] File "/gemini/code/fluxgym/env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
[2024-10-01 21:42:14] [INFO] return self._call_impl(*args, **kwargs)
[2024-10-01 21:42:14] [INFO] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2024-10-01 21:42:14] [INFO] File "/gemini/code/fluxgym/env/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
[2024-10-01 21:42:14] [INFO] return forward_call(*args, **kwargs)
[2024-10-01 21:42:14] [INFO] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2024-10-01 21:42:14] [INFO] File "/gemini/code/fluxgym/env/lib/python3.11/site-packages/torch/nn/modules/sparse.py", line 164, in forward
[2024-10-01 21:42:14] [INFO] return F.embedding(
[2024-10-01 21:42:14] [INFO] ^^^^^^^^^^^^
[2024-10-01 21:42:14] [INFO] File "/gemini/code/fluxgym/env/lib/python3.11/site-packages/torch/nn/functional.py", line 2267, in embedding
[2024-10-01 21:42:14] [INFO] return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
[2024-10-01 21:42:14] [INFO] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[2024-10-01 21:42:14] [INFO] RuntimeError: CUDA error: device-side assert triggered
[2024-10-01 21:42:14] [INFO] Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
[2024-10-01 21:42:14] [INFO]
[2024-10-01 21:42:20] [INFO] Traceback (most recent call last):
[2024-10-01 21:42:20] [INFO] File "/gemini/code/fluxgym/env/bin/accelerate", line 8, in
[2024-10-01 21:42:20] [INFO] sys.exit(main())
[2024-10-01 21:42:20] [INFO] ^^^^^^
[2024-10-01 21:42:20] [INFO] File "/gemini/code/fluxgym/env/lib/python3.11/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
[2024-10-01 21:42:20] [INFO] args.func(args)
[2024-10-01 21:42:20] [INFO] File "/gemini/code/fluxgym/env/lib/python3.11/site-packages/accelerate/commands/launch.py", line 1106, in launch_command
[2024-10-01 21:42:20] [INFO] simple_launcher(args)
[2024-10-01 21:42:20] [INFO] File "/gemini/code/fluxgym/env/lib/python3.11/site-packages/accelerate/commands/launch.py", line 704, in simple_launcher
[2024-10-01 21:42:20] [INFO] raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
[2024-10-01 21:42:20] [INFO] subprocess.CalledProcessError: Command '['/gemini/code/fluxgym/env/bin/python', 'sd-scripts/flux_train_network.py', '--pretrained_model_name_or_path', '/gemini/code/fluxgym/models/unet/flux1-dev.sft', '--clip_l', '/gemini/code/fluxgym/models/clip/clip_l.safetensors', '--t5xxl', '/gemini/code/fluxgym/models/clip/t5xxl_fp16.safetensors', '--ae', '/gemini/code/fluxgym/models/vae/ae.sft', '--cache_latents_to_disk', '--save_model_as', 'safetensors', '--sdpa', '--persistent_data_loader_workers', '--max_data_loader_n_workers', '2', '--seed', '42', '--gradient_checkpointing', '--mixed_precision', 'bf16', '--save_precision', 'bf16', '--network_module', 'networks.lora_flux', '--network_dim', '4', '--optimizer_type', 'adafactor', '--optimizer_args', 'relative_step=False', 'scale_parameter=False', 'warmup_init=False', '--split_mode', '--network_args', 'train_blocks=single', '--lr_scheduler', 'constant_with_warmup', '--max_grad_norm', '0.0', '--learning_rate', '8e-4', '--cache_text_encoder_outputs', '--cache_text_encoder_outputs_to_disk', '--fp8_base', '--highvram', '--max_train_epochs', '1', '--save_every_n_epochs', '4', '--dataset_config', '/gemini/code/fluxgym/outputs/uwyy/dataset.toml', '--output_dir', '/gemini/code/fluxgym/outputs/uwyy', '--output_name', 'uwyy', '--timestep_sampling', 'shift', '--discrete_flow_shift', '3.1582', '--model_prediction_type', 'raw', '--guidance_scale', '1', '--loss_type', 'l2']' returned non-zero exit status 1.
[2024-10-01 21:42:20] [ERROR] Command exited with code 1
[2024-10-01 21:42:20] [INFO] Runner:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant