Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added extra Option to To control HF Token #898

Merged
merged 16 commits into from
Sep 30, 2024

Conversation

nmoeller
Copy link
Contributor

fixes #830.

I've added a new option hf_token in the extra options, the user can pass a Token or True or False.
Also i upgraded use_auth_token to token.

@nmoeller nmoeller marked this pull request as ready for review September 18, 2024 06:47
@nmoeller nmoeller marked this pull request as draft September 18, 2024 06:48
@nmoeller nmoeller marked this pull request as ready for review September 18, 2024 06:51
@kunal-vaishnavi
Copy link
Contributor

Thank you for your contribution!

There are many ways that users can authenticate with Hugging Face.

  1. Always logged in via Hugging Face CLI (comes installed with transformers): huggingface-cli login
  2. Global environment variable: export HF_TOKEN=<token>
  3. Per-run environment variable: HF_TOKEN=<token> python3 -m onnxruntime.models.builder <...>
  4. Token in code: from_pretrained(token=<token>)
  5. Boolean in code: from_pretrained(token=True)
  6. Default in Hugging Face's code: from_pretrained(token=None)

There are also several scenarios for when the token is needed or not needed.

Trust Remote Code = True Trust Remote Code = False
Use Auth Token = True Gated model, modeling file not implemented in installed transformers package Gated model
Use Auth Token = False Modeling file not implemented in installed transformers package Local model

Given this information, I think we should change the behavior of how the Hugging Face token is obtained to the following.

# For the `Model` class
self.hf_token = parse_hf_token(extra_options.get("hf_token", "true"))
# For `create_model`
hf_token = parse_hf_token(extra_options.get("hf_token", "true"))
def parse_hf_token(hf_token):
   """
   Returns the authentication token needed for Hugging Face.
   Token is obtained either from the user or the environment.
   """
    if hf_token.lower() in {"false", "0"}:
        # Default is `None` for disabling authentication
        return None

    if hf_token.lower() in {"true", "1"}:
        # Return token in environment
        return os.environ["HF_TOKEN"]

    # Return user-provided token as string
    return hf_token

Then, the two options for hf_token would be false or the user-provided token. While not officially supported, the "0" and "1" flags are also checked because some users don't follow the documentation.

The hf_token flag would allow us to disable authentication with Hugging Face or provide a custom authentication token that differs from the one stored in the user's environment. When not using the flag, the default behavior would be to use the authentication token stored by huggingface-cli login in the HF_TOKEN environment variable.

What do you think?

@nmoeller
Copy link
Contributor Author

nmoeller commented Sep 25, 2024

You are absolutely right and your feedback is very appreciated ! I implemented the changes you proposed.
Sorry for my missing intelligence to not have the idea by my own 😄

@kunal-vaishnavi are we sure that the huggingface-cli login set the HF_TOKEN variable ? I think the token is stored locally in a file or ?

I think we should return true from the method instead of the env variable ? Because when using from_pretrained(token=True) it will check for a local token set by huggingface-cli login and if no token is set it will use the HF_TOKEN automatically or ?

def parse_hf_token(hf_token):
   """
   Returns the authentication token needed for Hugging Face.
   Token is obtained either from the user or the environment.
   """
    if hf_token.lower() in {"false", "0"}:
        # Default is `None` for disabling authentication
        return None

    if hf_token.lower() in {"true", "1"}:
        # Return token in true
        return True

    # Return user-provided token as string
    return hf_token

@kunal-vaishnavi
Copy link
Contributor

You are absolutely right and your feedback is very appreciated ! I implemented the changes you proposed.
Sorry for my missing intelligence to not have the idea by my own 😄

No worries and happy to review!

@kunal-vaishnavi are we sure that the huggingface-cli login set the HF_TOKEN variable ? I think the token is stored locally in a file or ?
I think we should return true from the method instead of the env variable ? Because when using from_pretrained(token=True) it will check for a local token set by huggingface-cli login and if no token is set it will use the HF_TOKEN automatically or ?

Upon further testing, it appears that the environment variables are not always set. I agree that returning True instead makes sense since the from_pretrained(token=True) command should pick up the token saved locally by huggingface-cli login.

According to Hugging Face's documentation, the HF_TOKEN environment variable can override the token saved at $HF_HOME/token, which is where huggingface-cli login saves the token. If a user wants to use a different token and doesn't want to pass it to --hf_token, then the user can either login with a different token via huggingface-cli login or use a per-run environment variable (e.g. HF_TOKEN=<token> python3 -m onnxruntime.models.builder <...>).

@kunal-vaishnavi
Copy link
Contributor

In the README, can you add a new section in "Extra Options" after "Use 8 Bits Quantization in QMoE" to show how to use this new extra option?

#### Use 8 Bits Quantization in QMoE
This scenario is for when you want to use 8-bit quantization for MoE layers. Default is using 4-bit quantization.
```
# From wheel:
python3 -m onnxruntime_genai.models.builder -i path_to_local_folder_on_disk -o path_to_output_folder -p precision -e execution_provider -c cache_dir_to_store_temp_files --extra_options use_8bits_moe=1
# From source:
python3 builder.py -i path_to_local_folder_on_disk -o path_to_output_folder -p precision -e execution_provider -c cache_dir_to_store_temp_files --extra_options use_8bits_moe=1
```

Once added, can you also add a reference to the new section in the table of contents after "Use 8 Bits Quantization in QMoE"?

- [Extra Options](#extra-options)
- [Config Only](#config-only)
- [Exclude Embedding Layer](#exclude-embedding-layer)
- [Exclude Language Modeling Head](#exclude-language-modeling-head)
- [Enable Cuda Graph](#enable-cuda-graph)
- [Use 8 Bits Quantization in QMoE](#use-8-bits-quantization-in-qmoe)

@kunal-vaishnavi
Copy link
Contributor

This PR looks good to me now! It appears you need to resolve two conflicting files as those files were recently updated by another PR. Once those conflicts are resolved, I can approve and merge your changes!

@nmoeller
Copy link
Contributor Author

I merged the changes, i was not sure what is your desired order for the chapters in the Readme.md.
If i should change the order let me know 👍

@kunal-vaishnavi kunal-vaishnavi merged commit 4332f98 into microsoft:main Sep 30, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

huggingface-cli login error when building quantized model
2 participants