Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[semester-upkeep] Enable YAML control of Kaggle's GPU setting #30

Closed
1 task done
jmuchovej opened this issue Dec 17, 2019 · 8 comments
Closed
1 task done

[semester-upkeep] Enable YAML control of Kaggle's GPU setting #30

jmuchovej opened this issue Dec 17, 2019 · 8 comments
Assignees
Labels
🎆 feature-request 🔑 required Tasks that **need** to be completed, ASAP. 🚀 done ship it!

Comments

@jmuchovej
Copy link
Member

jmuchovej commented Dec 17, 2019

Just a note, when referring to groups... groups := {intelligence, core, data-science, supplementary}

Feature Request, for autobot

Description Not all Kaggle Kernels need GPUs. More importantly, Kaggle limits us to 2 active GPU Kernels, while we're allowed up to 10 CPU Kernels. Based on this, it makes sense to toggle GPUs from syllabus.yml – this also allows for toggling the meeting's info if there's a need to rapidly iterate on the Kaggle Kernel.

Needs

  • Follow the parameter settings and toggle GPU usage

Initial Comments

syllabus.yml specifies all the parameters needed for a given meeting's setup. Within syllabus.yml, one should find...

- required:
    # ... lots going on here
  optional:
    # ... lots going on here
    kaggle:
      datasets: []
      competitions: []
      kernels: []
      gpu: true  # <- pay attention to this

You'll want to make sure that this parameter is both parsed and later propagated to a meeting's kernel-metadata.json file – the template file is here:

{
    "id": "ucfaibot/{{ slug }}",
    "title": "{{ slug }}",
    "code_file": "{{ notebook }}.ipynb",
    "language": "python",
    "kernel_type": "notebook",
    "is_private": false,
    "enable_gpu": false,
    "enable_internet": true,
    "dataset_sources": {{ kaggle.datasets }},
    "competition_sources": {{ kaggle.competitions }},
    "kernel_sources": {{ kaggle.kernels }}
}
@jmuchovej jmuchovej added 📝 todo Items in still in ideation, discovery, or planning "mode." 🔑 required Tasks that **need** to be completed, ASAP. 🍜 nice to have Tasks that should be completed, but not necessarily ASAP. labels Dec 17, 2019
@jmuchovej jmuchovej added this to the Winter 2019 Upgrade milestone Dec 17, 2019
@bb912
Copy link
Contributor

bb912 commented Dec 18, 2019

Would adding a GPU boolean parameter to meeting objects be a sound start to a solution?

and/or changing the parse_yaml and write_yaml functions to allow for this indication?

@bb912
Copy link
Contributor

bb912 commented Dec 18, 2019

I believe all this requires is to add {{ kaggle.enable_gpu }} to the template.

{ "id": "ucfaibot/{{ slug }}", "title": "{{ slug }}", "code_file": "{{ notebook }}.ipynb", "language": "python", "kernel_type": "notebook", "is_private": false, "enable_gpu": {{ kaggle.enable_gpu }}, "enable_internet": true, "dataset_sources": {{ kaggle.datasets }}, "competition_sources": {{ kaggle.competitions }}, "kernel_sources": {{ kaggle.kernels }} }

@bb912
Copy link
Contributor

bb912 commented Dec 18, 2019

I'm not seeing in meetings.py line 166-167 ish

kernel_metadata_path = paths.repo_meeting_folder(meeting) / "kernel-metadata.json" kernel_metadata = Template(open(kernel_metadata_path).read())

after we write to this metadata, (enable_gpu should be written with a changed template described above), where/when/how does the kernel-metadata.json actually help the new notebook get published using kaggle API?

@bb912
Copy link
Contributor

bb912 commented Dec 18, 2019

I am also confused by the purpose of the write_yaml function in the meeting meta, where else is it called?

@bb912
Copy link
Contributor

bb912 commented Dec 18, 2019

so , i believe we are going to be accessing the kernel metadata in autobot/lib/apis/kaggle.py push_kernel function. instead of doing a subprocess.call("kaggle k push", shell=True)

we do a

subprocess.call("kaggle k push -p templates/seed/meeting/kernel-metadata.json", shell=True)

if this is the case, we would be only pushing the kernel specified by this metadata (and push it WITH this metadata so we can push it with enable_gpu as true.

What I don't fully understand yet: how was this kernel-metadata.json file getting pushed before, without this subprocess call edit?

@jmuchovej
Copy link
Member Author

jmuchovej commented Dec 18, 2019

Would adding a GPU boolean parameter to meeting objects be a sound start to a solution?
...
I believe all this requires is to add {{ kaggle.enable_gpu }} to the template.

{
    "id": "ucfaibot/{{ slug }}",
    "title": "{{ slug }}",
    "code_file": "{{ notebook }}.ipynb",
    "language": "python",
    "kernel_type": "notebook",
    "is_private": false,
    "enable_gpu": {{ kaggle.enable_gpu }},
    "enable_internet": true,
    "dataset_sources": {{ kaggle.datasets }},
    "competition_sources": {{ kaggle.competitions }},
    "kernel_sources": {{ kaggle.kernels }}
}

looks good! (this similar to the solution i was thinking of.) 😅 you'll also want to modify the syllabus.yml to make sure the parameter is present. i think it makes sense to set Kaggle GPUs to false.

@brandons209 (thoughts on Kaggle GPU setting to true/false by default?)


and/or changing the parse_yaml and write_yaml functions to allow for this indication?
...
I am also confused by the purpose of the write_yaml function in the meeting meta, where else is it called?

  • parse_yaml definitely needs to be changed, if memory serves.
  • write_yaml doesn't really get used – it was used in an older version of the bot, but hasn't been removed.

I'm not seeing in meetings.py line 166-167 ish

kernel_metadata_path = paths.repo_meeting_folder(meeting) / "kernel-metadata.json"
kernel_metadata = Template(open(kernel_metadata_path).read())

those lines are loading that template JSON. the actual substitutions are happening on line 173: https://github.com/ucfai/bot/blob/409902fc7fa352d0e619ceaf661dd837f91d160f/autobot/lib/utils/meetings.py#L173


after we write to this metadata, (enable_gpu should be written with a changed template described above), where/when/how does the kernel-metadata.json actually help the new notebook get published using kaggle API?

based on how the kaggle CLI works, kernel-metadata.json needs to be in meeting's folder. an example: https://github.com/ucfai/core/tree/master/fa19/2019-10-16-cnns

i avoided the subprocess modification you proposed because putting kernel-metadata.json in the meeting's folder allows for two things:

  1. explicitly states the configuration of the Kaggle Kernel for the given meeting.
  2. allows for emergency edits/pushing without a need to use the bot, provided they have the correct key. (that's the decoded version of kaggle.json.gpg)

all told, looks like you're on the right track. i'll need to help you decode the JSON configuration file, DM on discord for that. There's also some playing around to be done with reacting to Kaggle (but i think this belongs in a separate issue (#39), since it's not required).

@jmuchovej jmuchovej removed the 🍜 nice to have Tasks that should be completed, but not necessarily ASAP. label Dec 18, 2019
@bb912 bb912 added the 🚧 in progress Moved from TODO-like state to actual development label Dec 18, 2019
@bb912
Copy link
Contributor

bb912 commented Dec 20, 2019

I think I got this figured out, but unsure why you think changing parse_yaml is necessary. We are holding this data in the meeting object.optional.kaggle. It is only ever(?) obtained through syllabus.yml file in line 90, and use syllabus.yml as dictionary "meeting" parameter make a meeting object in line 97 of ops.py.

@jmuchovej
Copy link
Member Author

I think I got this figured out, but unsure why you think changing parse_yaml is necessary. ...

So, parse_yaml was a method I used to do some of the parameter enforcement that I believe now gets done in Meeting.__init__(..); so it's possible that we may not need this.

We are holding this data in the meeting object.optional.kaggle. It is only ever(?) obtained through syllabus.yml file in line 90, and use syllabus.yml as dictionary "meeting" parameter make a meeting object in line 97 of ops.py.

You're correct that all this data is stored insyllabus.yml and only needs to be accessed when updating meetings.

So, we could just migrate parse_yaml to "let's remove it" – but that's something that a code review can do, too. 😅

bb912 added a commit that referenced this issue Dec 23, 2019
issue #30 kaggleGPU .YML control. only change needed in template
@jmuchovej jmuchovej added 🚀 done ship it! and removed 🚧 in progress Moved from TODO-like state to actual development 📝 todo Items in still in ideation, discovery, or planning "mode." labels Jan 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🎆 feature-request 🔑 required Tasks that **need** to be completed, ASAP. 🚀 done ship it!
Projects
None yet
Development

No branches or pull requests

4 participants