Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model tf_cnn_histology - Federation run fails with - AttributeError: 'Adam' object has no attribute 'weights' #1127

Open
noopurintel opened this issue Nov 7, 2024 · 2 comments
Assignees
Labels
bug Something isn't working duplicate This issue or pull request already exists

Comments

@noopurintel
Copy link
Contributor

noopurintel commented Nov 7, 2024

Describe the bug
While trying the Quick Start Guide for model tf_cnn_histology, the federation run is failing.

Last 2-3 lines from the error message:

line 334, in get_tensor_dict
    opt_weights = self._get_weights_dict(self.model.optimizer, suffix)
  File "/home/azureuser/openfl-quickstart/lib/python3.10/site-packages/openfl/federated/task/runner_keras.py", line 301, in _get_weights_dict
    weight_names = [weight.name for weight in obj.weights]
AttributeError: 'Adam' object has no attribute 'weights'

To Reproduce
Steps to reproduce the behavior:

  1. Follow the steps mentioned in Quick Start replacing model torch_cnn_mnist with tf_cnn_histology
  2. Create workspace, certify it.
  3. Generate CSR request for aggregator with CA signing it.
  4. Initialise the plan
  5. Setup Collaborator1 & Collaborator2
  6. Run the federation.

Expected behavior
Experiment should complete without error.

Screenshots

Aggregator screen
image

Collaborators screen
image

Machine
Ubuntu 22.04

@kta-intel
Copy link
Collaborator

kta-intel commented Nov 8, 2024

Thanks for reporting this @noopurintel. This is actually a known issue with the tf workspaces and is being tracked in #973. However, with the release of keras 3.x (which is used by tensorflow v2.16+ by default), additional changes were introduced that further break our tf workspaces. Since tensorflow is now at v2.18, the original fix is stale and will likely require much more refactoring to get it working

Edit:
A quick fix is to use the legacy Adam optimizer (which gets deprecated in later TF versions) in place of L91

self.optimizer = tf.keras.optimizers.legacy.Adam()

In fact, as a temp fix, we can update all tf workspaces with the respective legacy optimizers to get them working on tf v2.13, but a long term solution would require a refactor of the workspace

@kta-intel kta-intel added bug Something isn't working duplicate This issue or pull request already exists labels Nov 8, 2024
@kta-intel
Copy link
Collaborator

Adding duplicate label for better tracking, but leaving it open since it directly calls out the bug. It'll be part of a bigger effort to update the workspace

@tanwarsh tanwarsh self-assigned this Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

3 participants