Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: Unable to retrieve parameter 'w' when trying to use eval_inference #38

Open
jeremyrcoyle opened this issue Jul 20, 2023 · 10 comments

Comments

@jeremyrcoyle
Copy link

When invoking experiment.py to do inference:

python3 ./tapnet/experiment.py \
  --config=./tapnet/configs/tapnet_config.py \
  --jaxline_mode=eval_inference \
  --config.checkpoint_dir=./tapnet/checkpoint/ \
  --config.experiment_kwargs.config.inference.input_video_path=fixed10.mp4 \
  --config.experiment_kwargs.config.inference.output_video_path=result.mp4 \
  --config.experiment_kwargs.config.inference.resize_height=256 \
  --config.experiment_kwargs.config.inference.resize_width=256 \
  --config.experiment_kwargs.config.inference.num_points=20

I get the following error:

Traceback (most recent call last):
  File "./tapnet/experiment.py", line 431, in <module>
    app.run(main)
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
  File "./tapnet/experiment.py", line 424, in main
    platform.main(
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/jaxline/utils.py", line 484, in inner_wrapper
    return f(*args, **kwargs)
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/jaxline/platform.py", line 137, in main
    train.evaluate(experiment_class, config, checkpointer, writer,
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/jaxline/utils.py", line 620, in inner_wrapper
    return fn(*args, **kwargs)
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/jaxline/train.py", line 225, in evaluate
    scalar_values = utils.evaluate_should_return_dict(experiment.evaluate)(
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/jaxline/utils.py", line 521, in evaluate_with_warning
    evaluate_out = f(*args, **kwargs)
  File "./tapnet/experiment.py", line 405, in evaluate
    eval_scalars = point_prediction_task.evaluate(
  File "/home/jrcoyle/tapnet/supervised_point_prediction.py", line 370, in evaluate
    self._eval_inference(
  File "/home/jrcoyle/tapnet/supervised_point_prediction.py", line 981, in _eval_inference
    outputs, _ = self._infer_batch(
  File "/home/jrcoyle/tapnet/supervised_point_prediction.py", line 440, in _infer_batch
    output, _ = functools.partial(wrapped_forward_fn, input_key=input_key)(
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/haiku/_src/transform.py", line 357, in apply_fn
    out = f(*args, **kwargs)
  File "./tapnet/experiment.py", line 125, in forward
    return self.point_prediction.forward_fn(
  File "/home/jrcoyle/tapnet/supervised_point_prediction.py", line 150, in forward_fn
    return shared_modules[self.model_key](
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/haiku/_src/module.py", line 426, in wrapped
    out = f(*args, **kwargs)
  File "/usr/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/haiku/_src/module.py", line 272, in run_interceptors
    return bound_method(*args, **kwargs)
  File "/home/jrcoyle/tapnet/tapnet_model.py", line 215, in __call__
    latent = self.tsm_resnet(
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/haiku/_src/module.py", line 426, in wrapped
    out = f(*args, **kwargs)
  File "/usr/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/haiku/_src/module.py", line 272, in run_interceptors
    return bound_method(*args, **kwargs)
  File "/home/jrcoyle/tapnet/models/tsm_resnet.py", line 383, in __call__
    net = hk.Conv2D(
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/haiku/_src/module.py", line 426, in wrapped
    out = f(*args, **kwargs)
  File "/usr/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/haiku/_src/module.py", line 272, in run_interceptors
    return bound_method(*args, **kwargs)
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/haiku/_src/conv.py", line 200, in __call__
    w = hk.get_parameter("w", w_shape, inputs.dtype, init=w_init)
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/haiku/_src/base.py", line 448, in wrapped
    return wrapped._current(*args, **kwargs)
  File "/home/jrcoyle/.local/lib/python3.8/site-packages/haiku/_src/base.py", line 524, in get_parameter
    raise ValueError(
ValueError: Unable to retrieve parameter 'w' for module 'tap_net/~/tsm_resnet_video/tsm_resnet_stem' All parameters must be created as part of `init`.

Attempting to use a local GPU. The live_demo.py script works for me, so not sure what the issue is here.

@cdoersch
Copy link
Collaborator

live_demo.py uses a TAPIR model, but it looks like you're using a TAP-Net config. What checkpoint file are you trying to use with that code? Did you perhaps intend to use a TAPIR config?

@jeremyrcoyle
Copy link
Author

I'm using the checkpoint from https://storage.googleapis.com/dm-tapnet/causal_tapir_checkpoint.npy. I see that above I'm referencing a TAP-Net config, so I tried again with a TAPIR config, and got the same error.

@jeremyrcoyle
Copy link
Author

Please let me know if there is any more info I can provide or debugging steps on my end.

@cdoersch
Copy link
Collaborator

Is it an option to use the code snippet we provide for inference in the colab?

Has the error message changed from using the tapir config? The traceback you provided above has tap_net/ as the prefix for the variable names; it should be tapir if you're actually running a tapir model. Without more information it's difficult to guess why that's happening.

@wenshengyoung
Copy link

Did you solve the problem, I had the same thing happen to me.

@ldg810
Copy link

ldg810 commented Aug 26, 2023

I am getting same error... Is there any solution??

@ldg810
Copy link

ldg810 commented Aug 26, 2023

I am getting same error... Is there any solution??

I found the problem. you shoud have checkpoint.npy file in checkpoint path.

wget https://storage.googleapis.com/dm-tapnet/checkpoint.npy -o tapnet/checkpoint/checkpoint.npy

@cdalinghaus
Copy link

I am getting same error... Is there any solution??

I found the problem. you shoud have checkpoint.npy file in checkpoint path.

wget https://storage.googleapis.com/dm-tapnet/checkpoint.npy -o tapnet/checkpoint/checkpoint.npy

Also, this is a different checkpoint than in

I'm using the checkpoint from https://storage.googleapis.com/dm-tapnet/causal_tapir_checkpoint.npy. I see that above I'm referencing a TAP-Net config, so I tried again with a TAPIR config, and got the same error.

Using https://storage.googleapis.com/dm-tapnet/checkpoint.npy, I got it to work with the experiment script.

@nutsintheshell
Copy link

I am getting same error... Is there any solution??

I found the problem. you shoud have checkpoint.npy file in checkpoint path.

wget https://storage.googleapis.com/dm-tapnet/checkpoint.npy -o tapnet/checkpoint/checkpoint.npy

Also, this is a different checkpoint than in

I'm using the checkpoint from https://storage.googleapis.com/dm-tapnet/causal_tapir_checkpoint.npy. I see that above I'm referencing a TAP-Net config, so I tried again with a TAPIR config, and got the same error.

Using https://storage.googleapis.com/dm-tapnet/checkpoint.npy, I got it to work with the experiment script.

I try your method.But another error occurs:
Traceback (most recent call last):
File "/home/jishengyin/anaconda3/envs/tapnet/lib/python3.10/site-packages/numpy/lib/npyio.py", line 465, in load
return pickle.load(fid, **pickle_kwargs)
_pickle.UnpicklingError: invalid load key, '-'.

@nutsintheshell
Copy link

I am getting same error... Is there any solution??

I found the problem. you shoud have checkpoint.npy file in checkpoint path.

wget https://storage.googleapis.com/dm-tapnet/checkpoint.npy -o tapnet/checkpoint/checkpoint.npy

Also, this is a different checkpoint than in

I'm using the checkpoint from https://storage.googleapis.com/dm-tapnet/causal_tapir_checkpoint.npy. I see that above I'm referencing a TAP-Net config, so I tried again with a TAPIR config, and got the same error.

Using https://storage.googleapis.com/dm-tapnet/checkpoint.npy, I got it to work with the experiment script.

I would like to evaluate a model. evaluation means first train a model and then evaluate it in evaluation dataset.(maybe).It means it doesn't need a pretrained model.So I can't understand why I got the error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants