Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow kagglehub.dataset_download to download entire dataset to a specified directory #214

Open
Montekkundan opened this issue Jan 22, 2025 · 1 comment

Comments

@Montekkundan
Copy link

When using kagglehub.dataset_download, the path parameter appears to be designed for downloading a specific file from the dataset rather than defining a destination directory on the user's machine. This creates confusion and limits functionality, especially when a user want to download the entire dataset directly to their current working directory or a specified folder.

For example, the following code:

import kagglehub

path = kagglehub.dataset_download("preritbhagat/stress-non-stress-images", path="./data")

print("Path to dataset files:", path)

this results in a 404 error

i tried with

import kagglehub

# Download latest version
path = kagglehub.dataset_download("preritbhagat/stress-non-stress-images" , path="FINAL_TFEID/FINALTFEID_NONSTRESS/f01_dfh_hx.jpg")

print("Path to dataset files:", path)

this downloads the single image, as documented in the docs of this function; where FINAL_TFEID/FINALTFEID_NONSTRESS/f01_dfh_hx.jpg is the path of the file in that dataset.

  • Is there currently a way to download an entire dataset to a specific directory (e.g., ./data or the current working directory) using kagglehub.dataset_download?
  • If not, would it be possible to enhance the function to include a parameter (e.g., destination) for specifying a custom download location for the entire dataset?
@andreapisa9
Copy link

I agree with @Montekkundan for what concerns the confusing naming choice. @Montekkundan , you can define the download destination by changing the KAGGLEHUB_CACHE environment variable:

export KAGGLEHUB_CACHE=/path/to/your/preferred/directory

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants