Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

path in HF_ENDPOINT discarded in v0.28.0 #2806

Closed
anael-l opened this issue Jan 30, 2025 · 5 comments · Fixed by #2807
Closed

path in HF_ENDPOINT discarded in v0.28.0 #2806

anael-l opened this issue Jan 30, 2025 · 5 comments · Fixed by #2807
Labels
bug Something isn't working

Comments

@anael-l
Copy link
Contributor

anael-l commented Jan 30, 2025

Describe the bug

Hi,
We are using Artifactory to proxy the download of models from HuggingFace.
We set the HF_ENDPOINT env var to https://<artifactory-host>/artifactory/api/huggingfaceml/huggingface-ml-external for the lib to connect directly to Artifactory.

When using this config in the version 0.27.1, we can see in the lib debug logs that the correct endpoint is used to download

from huggingface_hub import snapshot_download

snapshot_download(
    repo_id="meta-llama/Llama-3.1-8B-Instruct", revision="main"
)
Request dc69bb46-b7d7-40d4-9b7e-619916bb614a: HEAD https://<artifactory-host>/artifactory/api/huggingfaceml/huggingface-ml-external/meta-llama/Llama-3.1-8B-Instruct/resolve/be9e14a5ab73b04d65b74f7dd64e44cc7f77d5cf/model-00002-of-00004.safetensors (authenticated: False)
Request 4117cde9-71fa-423d-9056-3c8a92739623: HEAD https://<artifactory-host>/artifactory/api/huggingfaceml/huggingface-ml-external/meta-llama/Llama-3.1-8B-Instruct/resolve/be9e14a5ab73b04d65b74f7dd64e44cc7f77d5cf/model-00004-of-00004.safetensors (authenticated: False)
Request 4eff2085-7412-4081-89cc-0bc858088423: HEAD https://<artifactory-host>/artifactory/api/huggingfaceml/huggingface-ml-external/meta-llama/Llama-3.1-8B-Instruct/resolve/be9e14a5ab73b04d65b74f7dd64e44cc7f77d5cf/model-00001-of-00004.safetensors (authenticated: False)
Request 09ed572a-6ea4-4922-8385-be611dd7f046: HEAD https://<artifactory-host>/artifactory/api/huggingfaceml/huggingface-ml-external/meta-llama/Llama-3.1-8B-Instruct/resolve/be9e14a5ab73b04d65b74f7dd64e44cc7f77d5cf/original/consolidated.00.pth (authenticated: False)

However, since version 0.28.0, the needed path /artifactory/api/huggingfaceml/huggingface-ml-external/ in our request URL is removed, which results in a 404

Request 83c6aab3-961b-4d67-aa75-e48b5b492143: HEAD https://<artifactory-host>/meta-llama/Llama-3.1-8B-Instruct/resolve/be9e14a5ab73b04d65b74f7dd64e44cc7f77d5cf/model-00002-of-00004.safetensors (authenticated: False)
Request 5a096eec-300b-4cac-98c6-43b6802df61c: HEAD https://<artifactory-host>/meta-llama/Llama-3.1-8B-Instruct/resolve/be9e14a5ab73b04d65b74f7dd64e44cc7f77d5cf/model-00001-of-00004.safetensors (authenticated: False)
Request d2ebada7-233e-479f-83ba-6cf742a58458: HEAD https://<artifactory-host>/meta-llama/Llama-3.1-8B-Instruct/resolve/be9e14a5ab73b04d65b74f7dd64e44cc7f77d5cf/model-00003-of-00004.safetensors (authenticated: False)
[...]
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://<artifactory-host>/meta-llama/Llama-3.1-8B-Instruct/resolve/be9e14a5ab73b04d65b74f7dd64e44cc7f77d5cf/model-00001-of-00004.safetensors

I think this difference comes from this change: 438f2fb#diff-b9ea02465324089a58bcb914a78b6c50143dfa0aadf772cef27478581f1346bcR71
Where instead of concatenating the model path to the HF_ENDPOINT var, it uses urljoin. which as I tested, removes the path after the hostname

from urllib.parse import urljoin

HF_ENDPOINT="https://artifactory-host/artifactory/api/huggingfaceml/huggingface-ml-external"
ENDPOINT = HF_ENDPOINT.rstrip("/")
HUGGINGFACE_CO_URL_TEMPLATE = urljoin(ENDPOINT, "/{repo_id}/resolve/{revision}/{filename}")

print(HUGGINGFACE_CO_URL_TEMPLATE)

Result: https://artifactory-host/{repo_id}/resolve/{revision}/{filename}

Would it be possible to revert this change, or to make it work with HF_ENDPOINT endpoints that contain path ?

Thank you in advance

Reproduction

Described above

Logs

Logs above

System info

Request 9e55649d-a4ca-4997-b5a6-f83642e9abd0: GET https://artifactory-host/artifactory/api/huggingfaceml/huggingface-ml-external/api/whoami-v2 (authenticated: False)

Copy-and-paste the text below in your GitHub issue.

- huggingface_hub version: 0.28.0
- Platform: Windows-10-10.0.19045-SP0
- Python version: 3.11.9
- Running in iPython ?: No
- Running in notebook ?: No
- Running in Google Colab ?: No
- Running in Google Colab Enterprise ?: No
- Token path ?: C:\Users\<user>\.cache\huggingface\token
- Has saved token ?: True
- Configured git credential helpers:
- FastAI: N/A
- Tensorflow: N/A
- Torch: N/A
- Jinja2: 3.1.4
- Graphviz: N/A
- keras: N/A
- Pydot: N/A
- Pillow: N/A
- hf_transfer: N/A
- gradio: N/A
- tensorboard: N/A
- numpy: 2.2.2
- pydantic: N/A
- aiohttp: N/A
- ENDPOINT: https://artifactory-host/artifactory/api/huggingfaceml/huggingface-ml-external
- HF_HUB_CACHE: C:\Users\<user>\.cache\huggingface\hub
- HF_ASSETS_CACHE: C:\Users\<user>\.cache\huggingface\assets
- HF_TOKEN_PATH: C:\Users\<user>\.cache\huggingface\token
- HF_STORED_TOKENS_PATH: C:\Users\<user>\.cache\huggingface\stored_tokens
- HF_HUB_OFFLINE: False
- HF_HUB_DISABLE_TELEMETRY: False
- HF_HUB_DISABLE_PROGRESS_BARS: None
- HF_HUB_DISABLE_SYMLINKS_WARNING: False
- HF_HUB_DISABLE_EXPERIMENTAL_WARNING: False
- HF_HUB_DISABLE_IMPLICIT_TOKEN: False
- HF_HUB_ENABLE_HF_TRANSFER: False
- HF_HUB_ETAG_TIMEOUT: 86400
- HF_HUB_DOWNLOAD_TIMEOUT: 86400
@anael-l anael-l added the bug Something isn't working label Jan 30, 2025
@Wauplin
Copy link
Contributor

Wauplin commented Jan 30, 2025

Hi @anael-l , thanks for reporting. This is indeed a breaking change that should be fixed. You've indeed spotted the root cause of this error. Would you mind opening a PR to replace the two urljoin in this file by a f"{ENDPOINT}/{repo_id}/resolve/{revision}/{filename}" ? No need for a .rstrip("/") given it's already done above in constants.py.

@anael-l
Copy link
Contributor Author

anael-l commented Jan 30, 2025

Hello @Wauplin, thanks for the fast reply, I can do a PR.
However by adding an f-string the the constant.py file, I get an error when running the tests

HUGGINGFACE_CO_URL_TEMPLATE = f"{ENDPOINT}/{repo_id}/resolve/{revision}/{filename}"
E   NameError: name 'repo_id' is not defined

Indeed, the path in templated in

url = HUGGINGFACE_CO_URL_TEMPLATE.format(

I think that's why before using urljoin string concat was used
HUGGINGFACE_CO_URL_TEMPLATE = ENDPOINT + "/{repo_id}/resolve/{revision}/{filename}"

Should I just replace the two urljoin with string concat again ?

@Wauplin
Copy link
Contributor

Wauplin commented Jan 30, 2025

Should I just replace the two urljoin with string concat again ?

Yes perfect! That was an oversight from me. I just approved the PR :)

@Wauplin
Copy link
Contributor

Wauplin commented Jan 30, 2025

@anael-l I just published a 0.28.1 release with your fix: https://github.com/huggingface/huggingface_hub/releases/tag/v0.28.1. Thanks again for your help!

@anael-l
Copy link
Contributor Author

anael-l commented Jan 30, 2025

Thank you for being so reactive !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants