Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(baselines) Add FedRep Baseline #3790

Merged
merged 73 commits into from
Oct 3, 2024
Merged

Conversation

KarhouTam
Copy link
Contributor

@KarhouTam KarhouTam commented Jul 12, 2024

Issue

Description

Add baseline FedRep to flower-baseline.
Reproduce experiments on CIFAR-10 and CIFAR-100.

Related issues/PRs

Related issue #3788

Checklist

  • Implement proposed change
  • Write tests
  • Update documentation
  • Make CI checks pass
  • Ping maintainers on Slack (channel #contributions)

@jafermarq jafermarq self-assigned this Jul 26, 2024
@KarhouTam
Copy link
Contributor Author

Hi, @jafermarq .
I've fixed the code and ran cifar10_100_2 experiment.
Here are the results

Results will be saved into: /root/codes/python/flower/baselines/fedrep/outputs/2024-09-29/20-57-44/results.pkl
(ClientAppActor pid=78925) <------- TEST RESULTS -------> : {'loss': 0.010720741094612494, 'accuracy': 0.7926829268292683}

image

Seems FedRep is normal again.

My laptop has not that much resources to do all the experiments. I think if cifar10_100_2 is normal, then the others should be also.

This PR is ready again.

@jafermarq
Copy link
Contributor

Hi @KarhouTam, thanks for the update. I re-run the experiments but they perform similarly to when i did my last review (so they are worse than earlier in the PR thread). I looked into it and I'm happy to say that I found the issue 😆 . Before using flower-datasets, clients would use as their test set 17% of their training set (derived from <config>.dataset.fraction=0.83) and (!!) also a 100th of the CIFAR-10/100 testset (right?

self.data = torch.cat([train_data, test_data])
). However, after switching to flower-dataset the test set of CIFAR-10/100 is not being used at all. It seems the only reason why the accuracy of some experiments (mostly CIFAR-10 w/ FedRep) seem to be lower is just not exposing the trained model to a wider evaluation set... See below the results I got with the code as in the last commit and after applying the changes I propose below.

Screenshot 2024-09-30 at 00 33 52

Note

Top row is FedRep-CIFAR10-100-2. Bottom row is FedRep-CIFAR10-100-5. It doesn't seem to impact CIFAR-100 results much so i haven't included them here.

What to change ?

  • in client.py you'll need to make use of the "test" partition of the dataset and concatenate it to the validation split you are already creating out of the training partition.
  • For this the best is to create a new partitioner that follows the same scheme as the one you already use. This will ensure that the test set of your clients also have the same classes as the ones present in the training partition.

I've done it like this:

def get_client_fn_simulation(...):
     .... 
     
    # new partitioner for the training split
    partitioner_test = PathologicalPartitioner(<same-args-as-the-existing-one>)
    ....

    FEDERATED_DATASET = FederatedDataset(
            dataset=config.dataset.name.lower(), partitioners={"train": partitioner_train, "test": partitioner_test}
        )
    ...
   
    def client_fn(cid: str) -> Client:

    ....

        trainset = partition_train_test["train"].with_transform(apply_train_transforms)
        valset = partition_train_test["test"].with_transform(apply_test_transforms) # just like before
        #! Now extract test partition and fuse validation and test into a single one
        testset = FEDERATED_DATASET.load_partition(cid_use, split="test")
        testset = testset.with_transform(apply_test_transforms)
        testset = concatenate_datasets([valset, testset])

        # rest is the same

Finally, I also had to make a simple change to your evaluate_metrics_aggregation_fn(). I simplified it to this:

    def evaluate_metrics_aggregation_fn(...):
        weights, accuracies = [], []
        for num_examples, metric in eval_metrics:
            weights.append(num_examples)
            accuracies.append(metric["accuracy"] * num_examples)
        accuracy = sum(accuracies) / sum(weights)
        return {"accuracy": accuracy}

I though of presenting my changes here in a comment instead of as comments to the code. Let me know what you think about this finding of now including the testset for evaluation. I didn't check super super carefully how you were doing the partitioning of the dataset before switching to flower-datasets but, am I right in thinking that what I propose above is equivalent to what you were doing?

@KarhouTam
Copy link
Contributor Author

Hi, @jafermarq . Thanks for doing other experiments. Your thought is really helpful 🚀. I've fixed the code. I followed most of your suggestions. At partitioner part, instead, I re-using the partitioner: partitioners={"train": partitioner, "test": partitioner}. I think that should work too. Because partitioner does not relate to a specific dataset object.

Would you mind run experiment again for testing? If so, that would be a great help.

@jafermarq
Copy link
Contributor

@KarhouTam , actually reusing the same partitioner isn't possible since it can only be linked to one ._dataset. So passing partitioners={"train": ..., "test":...} and both pointing to the same object will make the second one reuse the same dataset split as the first one. This is something i encountered yesterday for the first time when proposing my solution. What do you think @adam-narozniak , should we support this multiple datasets or add a check that if the id(partitioner) is the same an error is thrown?

A super simplified test of this is the following:

from flwr_datasets import FederatedDataset
from flwr_datasets.partitioner import IidPartitioner
partitioner = IidPartitioner(num_partitions=10)
fds = FederatedDataset(dataset="mnist", partitioners={"test": partitioner, "train": partitioner})

# Load the first partition
iid_partition = fds.load_partition(partition_id=0, split='test')
print(iid_partition)
iid_partition = fds.load_partition(partition_id=0, split='train')
print(iid_partition)

this gets printed:

Dataset({
    features: ['image', 'label'],
    num_rows: 1000
})
Dataset({
    features: ['image', 'label'],
    num_rows: 1000 # <------------- clearly this isn't coming from the training set (which has 50K images)
})

@KarhouTam , in the mean time. Is it ok if we have that "duplicated" partitioner? maybe do a hardcopy if you prefer in order to keep the code cleaner and leave a comment?

@KarhouTam
Copy link
Contributor Author

@jafermarq , sorry I didn't check the inside of partitioner. As you mentioned before, I set different partitioners for dataset splits.

@adam-narozniak
Copy link
Contributor

@jafermarq I'll open up a PR that performs the check if there are two identical partitioners specified in the partitioners dict. It's a bug (I haven't thought about this problem). The latter asked partitioning does not happen but uses the already assigned dataset.

@adam-narozniak
Copy link
Contributor

adam-narozniak commented Oct 1, 2024

@jafermarq @KarhouTam
When it comes to the results problem, I have two comments.

  1. if, in the original code, it was self.data = torch.cat([train_data, test_data]) then using two different partitioners does not necessarily give the same results (size wise; the classes should be the same if the number of partitions is the same + the seed is the same). I'd say that the dataset should be merged in a similar manner so
from flwr_datasets import FederatedDataset
from flwr_datasets.partitioner import IidPartitioner
from flwr_datasets.preprocessor import Merger
partitioner = IidPartitioner(num_partitions=10)
fds = FederatedDataset(dataset="mnist", partitioners={"train": partitioner}, preprocessor=Merger({"train":("train", "test")}))
fds.load_split("train")
# This gives the 70,000 samples which come from the original: 60,000 train + 10,000 test
# You can continue using the .load_partition normally 
fds.load_partition(0)
  1. The pathological assignment you performed in your code is according to the class_assignment_mode that is called "random" in PathologicalPartitioner. The behavior of the "pathological" assignment is inconsistent in the different papers (in the non-idd bench, it was sth I call "deterministic-first" which is the default in flwr-datasets).
    Anyway, to match what is now in your custom code, I'd say to use PathologicalPartitioner(..., class_assignment_mode ="random")

@jafermarq
Copy link
Contributor

Thanks @adam-narozniak !
This use of the Merger looks very clean. @KarhouTam, let's use the Merger option if we are right in thinking that in your original code you first merged both train and test sets, then do the partitioning, then further split into train/test each partition (potentially having some images in the training set used for validation and the other way around). Is this the case?

I'm seeing good results with the change proposed by Adam.

@KarhouTam
Copy link
Contributor Author

KarhouTam commented Oct 1, 2024

@adam-narozniak Thanks for the clean Merger solution. I will do the change later.

first merged both train and test sets, then do the partitioning, then further split into train/test each partition (potentially having some images in the training set used for validation and the other way around). Is this the case?

@jafermarq Well I think is fine to blend the data from different splits (original train and test) together. Because the data feature distributions of train and test are identical, and the model we want to train has no any empirical knowledge to these data, right? So there is no data sample that has to be in train or test split.

Anyway, I agree to use the Merger option. That's elegant.

@KarhouTam
Copy link
Contributor Author

Hi, @jafermarq . The change has beed done. I've tested the agorithm with cifar10_100_2 config again, and here is the curve.

image

@jafermarq
Copy link
Contributor

Amazing job @KarhouTam! 💯

@jafermarq jafermarq changed the title feat(baselines) Add FedRep feat(baselines) Add FedRep Baseline Oct 3, 2024
@jafermarq jafermarq merged commit 1465048 into adap:main Oct 3, 2024
51 checks passed
@jafermarq jafermarq mentioned this pull request Oct 3, 2024
14 tasks
@KarhouTam KarhouTam deleted the baseline-fedrep branch November 27, 2024 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants