Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexError in tune_reconstruction task with Larger Datasets #102

Open
delillo-florencia opened this issue Jan 28, 2025 · 0 comments
Open

Comments

@delillo-florencia
Copy link

Hi!! When running the tune_reconstruction task, the program encounters an IndexError caused by a shape mismatch between the mask and the dataset. This issue occurs inconsistently—while it works perfectly for smaller datasets, it fails when using larger datasets (e.g., 5000 or 10000 samples).
I’m not entirely sure what’s causing the error, but I’ve run several tests with different configurations and still encounter the same issue. I’ve included all details below to help with troubleshooting. Do you have any idea what could be causing this?

Thanks in advance for your help!

hydra:
  mode: MULTIRUN
  sweeper:
    params:
      task.batch_size: 10, 50
      task.model.num_hidden: "[500],[1000]"
      task.training_loop.num_epochs: 40, 60, 100

The Error is:


[INFO  - tune_model]: Job name: task.batch_size=50;task.model.num_hidden=[1000];task.training_loop.num_epochs=100
Error executing job with overrides: ['task.batch_size=10', 'task.model.num_hidden=[500]', 'task.training_loop.num_epochs=40', 'experiment=gtex__tune_reconstruction_10000_samples']
Traceback (most recent call last):
  File "/home/local/tools/anaconda3-24.10.1/bin/move-dl", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/local/tools/anaconda3-24.10.1/lib/python3.12/site-packages/hydra/main.py", line 94, in decorated_main
    _run_hydra(
  File "/home/local/tools/anaconda3-24.10.1/lib/python3.12/site-packages/hydra/_internal/utils.py", line 394, in _run_hydra
    _run_app(
  File "/home/local/tools/anaconda3-24.10.1/lib/python3.12/site-packages/hydra/_internal/utils.py", line 465, in _run_app
    run_and_report(
  File "/home/local/tools/anaconda3-24.10.1/lib/python3.12/site-packages/hydra/_internal/utils.py", line 223, in run_and_report
    raise ex
  File "/home/local/tools/anaconda3-24.10.1/lib/python3.12/site-packages/hydra/_internal/utils.py", line 220, in run_and_report
    return func()
           ^^^^^^
  File "/home/local/tools/anaconda3-24.10.1/lib/python3.12/site-packages/hydra/_internal/utils.py", line 466, in <lambda>
    lambda: hydra.multirun(
            ^^^^^^^^^^^^^^^
  File "/home/local/tools/anaconda3-24.10.1/lib/python3.12/site-packages/hydra/_internal/hydra.py", line 162, in multirun
    ret = sweeper.sweep(arguments=task_overrides)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/local/tools/anaconda3-24.10.1/lib/python3.12/site-packages/hydra/_internal/core_plugins/basic_sweeper.py", line 181, in sweep
    _ = r.return_value
        ^^^^^^^^^^^^^^
  File "/home/local/tools/anaconda3-24.10.1/lib/python3.12/site-packages/hydra/core/utils.py", line 260, in return_value
    raise self._return_value
  File "/home/local/tools/anaconda3-24.10.1/lib/python3.12/site-packages/hydra/core/utils.py", line 186, in run_job
    ret.return_value = task_function(task_cfg)
                       ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/local/tools/anaconda3-24.10.1/lib/python3.12/site-packages/move/__main__.py", line 38, in main
    move.tasks.tune_model(config)
  File "/home/local/tools/anaconda3-24.10.1/lib/python3.12/site-packages/move/tasks/tune_model.py", line 250, in tune_model
    _tune_reconstruction(task_config)
  File "/home/local/tools/anaconda3-24.10.1/lib/python3.12/site-packages/move/tasks/tune_model.py", line 170, in _tune_reconstruction
    train_dataloader = make_dataloader(
                       ^^^^^^^^^^^^^^^^
  File "/home/local/tools/anaconda3-24.10.1/lib/python3.12/site-packages/move/data/dataloaders.py", line 188, in make_dataloader
    dataset = make_dataset(cat_list, con_list, mask)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/local/tools/anaconda3-24.10.1/lib/python3.12/site-packages/move/data/dataloaders.py", line 157, in make_dataset
    con_all = con_all[mask]
              ~~~~~~~^^^^^^
IndexError: The shape of the mask [100] at index 0 does not match the shape of the indexed tensor [10000, 53336] at index 0

The log for the reconstruction:

[2025-01-28 20:22:24] [INFO  - tune_model]: Beginning task: tune model reconstruction 1
[2025-01-28 20:22:24] [INFO  - tune_model]: Job name: task.batch_size=10;task.model.num_hidden=[500];task.training_loop.num_epochs=40
[2025-01-28 20:22:24] [DEBUG - tune_model]: Reading data
[2025-01-28 20:22:28] [INFO  - tune_model]: Beginning task: tune model reconstruction 2
[2025-01-28 20:22:28] [INFO  - tune_model]: Job name: task.batch_size=10;task.model.num_hidden=[500];task.training_loop.num_epochs=60
[2025-01-28 20:22:28] [DEBUG - tune_model]: Reading data
[2025-01-28 20:22:32] [INFO  - tune_model]: Beginning task: tune model reconstruction 3
[2025-01-28 20:22:32] [INFO  - tune_model]: Job name: task.batch_size=10;task.model.num_hidden=[500];task.training_loop.num_epochs=100
[2025-01-28 20:22:32] [DEBUG - tune_model]: Reading data
[2025-01-28 20:22:36] [INFO  - tune_model]: Beginning task: tune model reconstruction 4
[2025-01-28 20:22:36] [INFO  - tune_model]: Job name: task.batch_size=10;task.model.num_hidden=[1000];task.training_loop.num_epochs=40
[2025-01-28 20:22:36] [DEBUG - tune_model]: Reading data
[2025-01-28 20:22:39] [INFO  - tune_model]: Beginning task: tune model reconstruction 5
[2025-01-28 20:22:39] [INFO  - tune_model]: Job name: task.batch_size=10;task.model.num_hidden=[1000];task.training_loop.num_epochs=60
[2025-01-28 20:22:39] [DEBUG - tune_model]: Reading data
[2025-01-28 20:22:43] [INFO  - tune_model]: Beginning task: tune model reconstruction 6
[2025-01-28 20:22:43] [INFO  - tune_model]: Job name: task.batch_size=10;task.model.num_hidden=[1000];task.training_loop.num_epochs=100
[2025-01-28 20:22:43] [DEBUG - tune_model]: Reading data
[2025-01-28 20:22:47] [INFO  - tune_model]: Beginning task: tune model reconstruction 7
[2025-01-28 20:22:47] [INFO  - tune_model]: Job name: task.batch_size=50;task.model.num_hidden=[500];task.training_loop.num_epochs=40
[2025-01-28 20:22:47] [DEBUG - tune_model]: Reading data
[2025-01-28 20:22:51] [INFO  - tune_model]: Beginning task: tune model reconstruction 8
[2025-01-28 20:22:51] [INFO  - tune_model]: Job name: task.batch_size=50;task.model.num_hidden=[500];task.training_loop.num_epochs=60
[2025-01-28 20:22:51] [DEBUG - tune_model]: Reading data
[2025-01-28 20:22:55] [INFO  - tune_model]: Beginning task: tune model reconstruction 9
[2025-01-28 20:22:55] [INFO  - tune_model]: Job name: task.batch_size=50;task.model.num_hidden=[500];task.training_loop.num_epochs=100
[2025-01-28 20:22:55] [DEBUG - tune_model]: Reading data
[2025-01-28 20:22:59] [INFO  - tune_model]: Beginning task: tune model reconstruction 10
[2025-01-28 20:22:59] [INFO  - tune_model]: Job name: task.batch_size=50;task.model.num_hidden=[1000];task.training_loop.num_epochs=40
[2025-01-28 20:22:59] [DEBUG - tune_model]: Reading data
[2025-01-28 20:23:03] [INFO  - tune_model]: Beginning task: tune model reconstruction 11
[2025-01-28 20:23:03] [INFO  - tune_model]: Job name: task.batch_size=50;task.model.num_hidden=[1000];task.training_loop.num_epochs=60
[2025-01-28 20:23:03] [DEBUG - tune_model]: Reading data
[2025-01-28 20:23:06] [INFO  - tune_model]: Beginning task: tune model reconstruction 12
[2025-01-28 20:23:06] [INFO  - tune_model]: Job name: task.batch_size=50;task.model.num_hidden=[1000];task.training_loop.num_epochs=100
[2025-01-28 20:23:06] [DEBUG - tune_model]: Reading data
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant