Questions about general_dataset.json #11

Taekyo-Lee · 2024-08-20T06:40:40Z

Hello authors,
I have some questions to ask about your general_dataset.json.

Why didn't you include other modes than gpt4 and gpt 3.5?
What are the specific versions of gpt4 and gpt 3.5?
Why do some questions appear repeatedly? For instance, the first 20 lines are the same question "Who was the first person to climb Mount Everest?" with 10 times for gpt4 and gpt3.5 each.

aidarmyrzakhan · 2024-08-26T19:33:03Z

Hi @Taekyo-Lee, thanks for your interest in our work.

Why didn't you include other modes than gpt4 and gpt 3.5?

This json file is prepared specifically for instruction fine-tuning the pretrained LLM models. Our responses from other models are available at GitHub. We include only GPT-4 and GPT-3.5 responses to ensure a higher-quality dataset. Responses from smaller models often lack the depth and coherence needed for effective fine-tuning, which could compromise the dataset's overall quality. By focusing on these more advanced models, we aim to provide more reliable data for downstream fine-tuning.

What are the specific versions of gpt4 and gpt 3.5?

We collected responses using gpt4-1106-preview and gpt-3.5-turbo-1106.

Why do some questions appear repeatedly? For instance, the first 20 lines are the same question "Who was the first person to climb Mount Everest?" with 10 times for gpt4 and gpt3.5 each.

As mentioned, this file is designed for instruction tuning. By generating 10 responses per question from both GPT-4 and GPT-3.5, we aim to enhance the dataset's scale, richness and variability, which can be used for fine-tuning models to handle a wide range of possible inputs and scenarios.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about general_dataset.json #11

Questions about general_dataset.json #11

Taekyo-Lee commented Aug 20, 2024

aidarmyrzakhan commented Aug 26, 2024

Questions about general_dataset.json #11

Questions about general_dataset.json #11

Comments

Taekyo-Lee commented Aug 20, 2024

aidarmyrzakhan commented Aug 26, 2024