Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError when loading clay-v1-base.ckpt into ClayMAEModule #333

Open
jxiongbayer opened this issue Nov 23, 2024 · 2 comments
Open

RuntimeError when loading clay-v1-base.ckpt into ClayMAEModule #333

jxiongbayer opened this issue Nov 23, 2024 · 2 comments

Comments

@jxiongbayer
Copy link

try to run docs/tutorials/clay-v1-wall-to-wall.ipynb on AWS SageMaker Studio

turn out the ckpt is not available. Any alternative link?

device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
ckpt = "https://clay-model-ckpt.s3.amazonaws.com/v0.5.7/mae_v0.5.7_epoch-13_val-loss-0.3098.ckpt"
torch.set_default_device(device)

model = ClayMAEModule.load_from_checkpoint(
    ckpt, metadata_path="../../configs/metadata.yaml", shuffle=False, mask_ratio=0
)
model.eval()

model = model.to(device)
@jxiongbayer
Copy link
Author

changed to

ckpt = "https://huggingface.co/made-with-clay/Clay/resolve/main/Clay_v0.1_epoch-24_val-loss-0.46.ckpt?download=true"

Kernel Restarting
The kernel for model/clay-v1-wall-to-wall.ipynb appears to have died. It will restart automatically.

@jxiongbayer jxiongbayer changed the title Access Denied in clay-v1-wall-to-wall.ipynb Access Denied in clay-v1-wall-to-wall.ipynb and kernel died after changed to huggingface Nov 23, 2024
@jxiongbayer
Copy link
Author

changed to the following, still failed, any idea?

ckpt = "https://huggingface.co/made-with-clay/Clay/resolve/main/clay-v1-base.ckpt?download=true"
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[25], line 10
      7 ckpt = "https://huggingface.co/made-with-clay/Clay/resolve/main/clay-v1-base.ckpt?download=true"
      8 torch.set_default_device(device)
---> 10 model = ClayMAEModule.load_from_checkpoint(
     11     ckpt, metadata_path="configs/metadata.yaml", shuffle=False, mask_ratio=0
     12 )
     13 model.eval()
     15 model = model.to(device)

File [/opt/conda/lib/python3.11/site-packages/lightning/pytorch/utilities/model_helpers.py:125](https://dd2gatsar7e49oq.studio.us-east-1.sagemaker.aws/opt/conda/lib/python3.11/site-packages/lightning/pytorch/utilities/model_helpers.py#line=124), in _restricted_classmethod_impl.__get__.<locals>.wrapper(*args, **kwargs)
    120 if instance is not None and not is_scripting:
    121     raise TypeError(
    122         f"The classmethod `{cls.__name__}.{self.method.__name__}` cannot be called on an instance."
    123         " Please call it on the class type and make sure the return value is used."
    124     )
--> 125 return self.method(cls, *args, **kwargs)

File [/opt/conda/lib/python3.11/site-packages/lightning/pytorch/core/module.py:1586](https://dd2gatsar7e49oq.studio.us-east-1.sagemaker.aws/opt/conda/lib/python3.11/site-packages/lightning/pytorch/core/module.py#line=1585), in LightningModule.load_from_checkpoint(cls, checkpoint_path, map_location, hparams_file, strict, **kwargs)
   1497 @_restricted_classmethod
   1498 def load_from_checkpoint(
   1499     cls,
   (...)
   1504     **kwargs: Any,
   1505 ) -> Self:
   1506     r"""Primary way of loading a model from a checkpoint. When Lightning saves a checkpoint it stores the arguments
   1507     passed to ``__init__``  in the checkpoint under ``"hyper_parameters"``.
   1508 
   (...)
   1584 
   1585     """
-> 1586     loaded = _load_from_checkpoint(
   1587         cls,  # type: ignore[arg-type]
   1588         checkpoint_path,
   1589         map_location,
   1590         hparams_file,
   1591         strict,
   1592         **kwargs,
   1593     )
   1594     return cast(Self, loaded)

File [/opt/conda/lib/python3.11/site-packages/lightning/pytorch/core/saving.py:91](https://dd2gatsar7e49oq.studio.us-east-1.sagemaker.aws/opt/conda/lib/python3.11/site-packages/lightning/pytorch/core/saving.py#line=90), in _load_from_checkpoint(cls, checkpoint_path, map_location, hparams_file, strict, **kwargs)
     89     return _load_state(cls, checkpoint, **kwargs)
     90 if issubclass(cls, pl.LightningModule):
---> 91     model = _load_state(cls, checkpoint, strict=strict, **kwargs)
     92     state_dict = checkpoint["state_dict"]
     93     if not state_dict:

File [/opt/conda/lib/python3.11/site-packages/lightning/pytorch/core/saving.py:187](https://dd2gatsar7e49oq.studio.us-east-1.sagemaker.aws/opt/conda/lib/python3.11/site-packages/lightning/pytorch/core/saving.py#line=186), in _load_state(cls, checkpoint, strict, **cls_kwargs_new)
    184     obj.on_load_checkpoint(checkpoint)
    186 # load the state_dict on the model automatically
--> 187 keys = obj.load_state_dict(checkpoint["state_dict"], strict=strict)
    189 if not strict:
    190     if keys.missing_keys:

File [/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py:2189](https://dd2gatsar7e49oq.studio.us-east-1.sagemaker.aws/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py#line=2188), in Module.load_state_dict(self, state_dict, strict, assign)
   2184         error_msgs.insert(
   2185             0, 'Missing key(s) in state_dict: {}. '.format(
   2186                 ', '.join(f'"{k}"' for k in missing_keys)))
   2188 if len(error_msgs) > 0:
-> 2189     raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
   2190                        self.__class__.__name__, "\n\t".join(error_msgs)))
   2191 return _IncompatibleKeys(missing_keys, unexpected_keys)

RuntimeError: Error(s) in loading state_dict for ClayMAEModule:
	Unexpected key(s) in state_dict: "model.decoder.transformer.layers.4.0.norm.weight", "model.decoder.transformer.layers.4.0.norm.bias", "model.decoder.transformer.layers.4.0.to_qkv.weight", "model.decoder.transformer.layers.4.0.to_out.weight", "model.decoder.transformer.layers.4.1.net.0.weight", "model.decoder.transformer.layers.4.1.net.0.bias", "model.decoder.transformer.layers.4.1.net.1.weight", "model.decoder.transformer.layers.4.1.net.1.bias", "model.decoder.transformer.layers.4.1.net.3.weight", "model.decoder.transformer.layers.4.1.net.3.bias", "model.decoder.transformer.layers.5.0.norm.weight", "model.decoder.transformer.layers.5.0.norm.bias", "model.decoder.transformer.layers.5.0.to_qkv.weight", "model.decoder.transformer.layers.5.0.to_out.weight", "model.decoder.transformer.layers.5.1.net.0.weight", "model.decoder.transformer.layers.5.1.net.0.bias", "model.decoder.transformer.layers.5.1.net.1.weight", "model.decoder.transformer.layers.5.1.net.1.bias", "model.decoder.transformer.layers.5.1.net.3.weight", "model.decoder.transformer.layers.5.1.net.3.bias". 
	size mismatch for model.decoder.transformer.layers.0.0.to_qkv.weight: copying a param with shape torch.Size([1152, 512]) from checkpoint, the shape in current model is torch.Size([768, 512]).
	size mismatch for model.decoder.transformer.layers.0.0.to_out.weight: copying a param with shape torch.Size([512, 384]) from checkpoint, the shape in current model is torch.Size([512, 256]).
	size mismatch for model.decoder.transformer.layers.1.0.to_qkv.weight: copying a param with shape torch.Size([1152, 512]) from checkpoint, the shape in current model is torch.Size([768, 512]).
	size mismatch for model.decoder.transformer.layers.1.0.to_out.weight: copying a param with shape torch.Size([512, 384]) from checkpoint, the shape in current model is torch.Size([512, 256]).
	size mismatch for model.decoder.transformer.layers.2.0.to_qkv.weight: copying a param with shape torch.Size([1152, 512]) from checkpoint, the shape in current model is torch.Size([768, 512]).
	size mismatch for model.decoder.transformer.layers.2.0.to_out.weight: copying a param with shape torch.Size([512, 384]) from checkpoint, the shape in current model is torch.Size([512, 256]).
	size mismatch for model.decoder.transformer.layers.3.0.to_qkv.weight: copying a param with shape torch.Size([1152, 512]) from checkpoint, the shape in current model is torch.Size([768, 512]).
	size mismatch for model.decoder.transformer.layers.3.0.to_out.weight: copying a param with shape torch.Size([512, 384]) from checkpoint, the shape in current model is torch.Size([512, 256]).

@jxiongbayer jxiongbayer changed the title Access Denied in clay-v1-wall-to-wall.ipynb and kernel died after changed to huggingface RuntimeError when loading clay-v1-base.ckpt into ClayMAEModule Dec 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant