You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I am a student in Korea working on a 6 week project.
I want to fine-tune a CodeLlama model using your paper's methodology for the Code Repair task. How do you estimate the GPU resources and time required for this project?
I also have two new ideas:
Can a static code analyzer's output improve the dataset?
Can a RLHF based approach using DPO help the model generate better code?
Thank you for your time and guidance.
Best regards,
Won
The text was updated successfully, but these errors were encountered:
wonhyeongseo
changed the title
Fine-tuning a CodeLlama model on CommitPackFT
Fine-tuning a WizardCoder model on CommitPackFT
Sep 7, 2023
wonhyeongseo
changed the title
Fine-tuning a WizardCoder model on CommitPackFT
Fine-tuning a CodeLlama model on CommitPackFT
Sep 7, 2023
How do you estimate the GPU resources and time required for this project?
If you go with the 7B model & you also use LoRA like we did for OctoCoder, then I think 1x A100 with 80GB or even 40GB for a few hours may easily suffice. Even for the 13B that may be enough but you may have to use a few memory reduction techniques like gradient checkpointing etc. Maybe you can even fine-tune the 34B one on a single GPU using stuff like QLoRA etc.
Can a static code analyzer's output improve the dataset?
Can a RLHF based approach using DPO help the model generate better code?
Yes, I think so too, check out this work doing something similar: https://arxiv.org/abs/2307.14936
What is the best way to incorporate RLHF / code feedback is still an open & interesting research question!
Hello, I am a student in Korea working on a 6 week project.
I want to fine-tune a CodeLlama model using your paper's methodology for the Code Repair task. How do you estimate the GPU resources and time required for this project?
I also have two new ideas:
Thank you for your time and guidance.
Best regards,
Won
The text was updated successfully, but these errors were encountered: