Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"TPU-speed data pipelines" (tpu_speed_data_pipelines.ipynb) flower speedup's writing code fails #2428

Open
MrCsabaToth opened this issue Sep 6, 2023 · 0 comments

Comments

@MrCsabaToth
Copy link

No ordinary student will ever have a right to write to the public flowers dataset's location.

The first exercise deals with MNIST db, then the second exercise section tries to showcase how to speed up the flower database reads by creating TFRecords in a cloud storage bucket.

GCS_PATTERN = 'gs://flowers-public/*/*.jpg'
GCS_OUTPUT = 'gs://flowers-public/tfrecords-jpeg-192x192-2/flowers'  # prefix for output file names

This fails with a privilege problem complaining that the compute service account belonging to the project doesn't have object creation rights. First I went into IAM, added the role, and rerun. It failed again, at which point I realized that GCS_OUTPUT points to a public location I don't control, and probably never have any write privileges to.

So IMHO the notebook should have extra steps to:

  1. Create a storage bucket with the student, the usual convention is to name it as the Project ID.
  2. Use that project ID in the GCS_OUTPUT variable.
  3. Then the code can run.
# TODO: substitute project ID
PROJECT_ID = ...

GCS_PATTERN = 'gs://flowers-public/*/*.jpg'
GCS_OUTPUT = 'gs://{}/tfrecords-jpeg-192x192-2/flowers'.format(PROJECT_ID)  # prefix for output file names

Note that there are 10 various ways I saw in notebooks how to fill the PROJECT_ID automatically. I'm ambivalent which one is the best, I leave it to the issue fixer. I can craft a PR if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant