You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello authors,
I have some questions to ask about your general_dataset.json.
Why didn't you include other modes than gpt4 and gpt 3.5?
What are the specific versions of gpt4 and gpt 3.5?
Why do some questions appear repeatedly? For instance, the first 20 lines are the same question "Who was the first person to climb Mount Everest?" with 10 times for gpt4 and gpt3.5 each.
The text was updated successfully, but these errors were encountered:
Hi @Taekyo-Lee, thanks for your interest in our work.
Why didn't you include other modes than gpt4 and gpt 3.5?
This json file is prepared specifically for instruction fine-tuning the pretrained LLM models. Our responses from other models are available at GitHub. We include only GPT-4 and GPT-3.5 responses to ensure a higher-quality dataset. Responses from smaller models often lack the depth and coherence needed for effective fine-tuning, which could compromise the dataset's overall quality. By focusing on these more advanced models, we aim to provide more reliable data for downstream fine-tuning.
What are the specific versions of gpt4 and gpt 3.5?
We collected responses using gpt4-1106-preview and gpt-3.5-turbo-1106.
Why do some questions appear repeatedly? For instance, the first 20 lines are the same question "Who was the first person to climb Mount Everest?" with 10 times for gpt4 and gpt3.5 each.
As mentioned, this file is designed for instruction tuning. By generating 10 responses per question from both GPT-4 and GPT-3.5, we aim to enhance the dataset's scale, richness and variability, which can be used for fine-tuning models to handle a wide range of possible inputs and scenarios.
Hello authors,
I have some questions to ask about your general_dataset.json.
The text was updated successfully, but these errors were encountered: