Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use dataclasses for all objects returned by HfApi #1911

Closed
mariosasko opened this issue Dec 14, 2023 · 14 comments
Closed

Use dataclasses for all objects returned by HfApi #1911

mariosasko opened this issue Dec 14, 2023 · 14 comments
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@mariosasko
Copy link
Contributor

mariosasko commented Dec 14, 2023

Currently, we use TypedDict to represent some of the objects' attributes returned by the HfApi methods. To be consistent, we should only use dataclasses, as suggested in #1809 (comment):

I think we should open a separate PR to switch all of them (BlobLfsInfo, LastCommitInfo, BlobSecurityInfo, TransformersInfo, SafeTensorsInfo) to dataclasses, in a backward compatible way (and with a deprecation warning when dict-only method is used). This way we'll finally have a single type to represent data returned by the server (now that you've removed ReprMixin).

This work can be split in several PRs to make it easier to implement and review.

@mariosasko mariosasko added enhancement New feature or request good first issue Good for newcomers labels Dec 14, 2023
@NouamaneELGueddarii
Copy link
Contributor

Hello mario, i would like to do this task, is it possible ?

@Wauplin
Copy link
Contributor

Wauplin commented Jan 8, 2024

Hi @NouamaneELGueddarii! Thanks for proposing yourself! It is possible to take this task yes. Just to be sure we are aligned, the goal is to update a bunch of objects in hf_api.py from TypedDict to proper dataclasses. At first, typed dict were used only for static type checking. However now we want to convert everything to dataclasses that have more features. So for each object, you would need to:

  1. inherit from dict like in this one
  2. decorate with @dataclass like in this one
  3. update internal dict in __post_init__ method for backward compatibility:
    def __post_init__(self):
        # hack to make BlobLfsInfo backward compatible
        self.update(asdict(self))
  1. (optional) add from dataclasses import asdict at the top of the module, if not already present

What I would suggest you is to start working on just 1 or 2 classes and open a PR for them. This way we can iterate on it and answer your questions (if any). What do you think?

Thanks in advance!

EDIT: based on #1974 (review), I've updated the instructions above to fit better inside the current codebase

@Ahmedniz1
Copy link
Contributor

@Wauplin is it fine if I start working on it? I want to start making open source contribution to github.
Thanks

@Wauplin
Copy link
Contributor

Wauplin commented Jan 12, 2024

Hi @Ahmedniz1, yes it is! Thanks for proposing you. Please refer to #1911 (comment) on how to deal with it.
Since @NouamaneELGueddarii mentioned earlier they would like to work on it as well, could you share on which dataclass you would like to focus first before starting implementing something? A simple message on this thread is enough (just want to avoid duplicate work).

I want to start making open source contribution to github.

🚀 🚀 🚀

@Ahmedniz1
Copy link
Contributor

@Wauplin I'll start working at the BlobLfsInfo class first, will make a PR once I get it done. If it is approved I can replicate it for the remaining ones and make another PR.
Thanks

@Wauplin
Copy link
Contributor

Wauplin commented Jan 12, 2024

Sounds like a plan!

@NouamaneELGueddarii
Copy link
Contributor

@Wauplin i will be working on the following class "TransformersInfo". I will make a PR and wait for approval then proceed to work on the others.

@Wauplin
Copy link
Contributor

Wauplin commented Jan 15, 2024

Great! Thanks for confirming @NouamaneELGueddarii!

@Ahmedniz1
Copy link
Contributor

@Wauplin I've raised a PR request for the bloblfs class. I've updated it but haven't added any tests to it. Would like your opinion about whether it requires test cases or not.
PR #1974
Thanks

@Wauplin
Copy link
Contributor

Wauplin commented Jan 15, 2024

@Ahmedniz1, saw it and just reviewed it! Thanks!

@Wauplin
Copy link
Contributor

Wauplin commented Jan 17, 2024

@Ahmedniz1 @NouamaneELGueddarii I have updated the instructions on how to proceed with the issue. I'm sorry about this late change but I realized the previous recommendations were working correctly. I've commented on @Ahmedniz1's PR here (#1974 (review)) and updated the comment above (#1911 (comment)). Hopefully, this will make it easier to integrate new dataclasses while keeping backward compatibility.
Sorry about the change and thanks in advance for adapting!

Wauplin added a commit that referenced this issue Jan 17, 2024
* converted bloblfsinfo from typed_dict to dataclass

* updated bloblfs to dataclass

* adding post_init to bloblfs class

* Update src/huggingface_hub/hf_api.py

---------

Co-authored-by: ahmdayyy <ahmednizamani36.com>
Co-authored-by: Lucain <[email protected]>
@Ahmedniz1
Copy link
Contributor

@Wauplin I'm trying to update in the previous PR but it appears to be closed. I can push the updates for other classes(apart from transformerinfo) once the original PR is open again.
Thanks

@Wauplin
Copy link
Contributor

Wauplin commented Jan 18, 2024

I'm trying to update in the previous PR but it appears to be closed.

Yes this is expected! You have to checkout to main, pull the latest changes, create a new branch from main and start a separate PR from there. This way we have small PRs that can easily be reviewed. The work you've already done is now on the main branch.

@Wauplin
Copy link
Contributor

Wauplin commented Jan 19, 2024

Thanks to @NouamaneELGueddarii in #1993, TransformersInfo has also been completed which was the last one to update.

Thanks @Ahmedniz1 and @NouamaneELGueddarii for your contributions on this one! Issue can be closed 🎉

@Wauplin Wauplin closed this as completed Jan 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

4 participants