Skip to content

raymond-van/gpt-tinystories

Repository files navigation

gpt-tinystories

This repo contains an implementation of GPT-2 (1M / 8M / 28M parameter variants) and a script to train the model from scratch. Training was done on the TinyStories dataset.

Despite only having a few million parameters, these models are able to produce coherent and fluent english whereas larger models with 125M parameters such as GPT-neo or GPT-2 can be seen struggling to produce such coherent text.

These small LLMs should also serve as a testbed for experimentation of different neural network architectures and techniques.

Dependencies

  • PyTorch
  • Huggingface transformers and datasets
  • Weights & Biases for mlops

To use:

python train.py --cfg_param {1M/8M/28M}

TinyStories

TinyStories is a synthetic dataset of short stories generated by GPT-3.5 and GPT-4 using only words that could be understood by a 4 year old. While current state of the art LLMs (1B+ params) are able to capture the essence of the massive corpus of text they are trained on, small LLMs struggle to keep up with these large datasets. By limiting the vocabulary, it can be demonstrated that tiny LLMs with only a few million parameters are still able to produce fluent english text with proper grammar and reasoning capabilities

Sample text from the TinyStories dataset

One day, a little girl named Lily found a needle in her room. She knew it was difficult to play with it because it was sharp. Lily wanted to share the needle with her mom, so she could sew a button on her shirt. Lily went to her mom and said, "Mom, I found this needle. Can you share it with me and sew my shirt?" Her mom smiled and said, "Yes, Lily, we can share the needle and fix your shirt." Together, they shared the needle and sewed the button on Lily's shirt. It was not difficult for them because they were sharing and helping each other. After they finished, Lily thanked her mom for sharing the needle and fixing her shirt. They both felt happy because they had shared and worked together.

Training

Each model was trained on 2x 3090s. Better performance could be achieved with more training and better hyperparameter optimization such as finding an optimal learning rate, along with a learning rate scheduler.

8M Loss curves

Achieves ~1.62 loss after training for 3 epochs (approx. 9 hours)

For more detailed charts, see this run on Weights & Biases.

28M Loss curves

Achieves ~1.32 Loss after training for 2 epochs (approx. 15 hours)

For more detailed charts, see this run on Weights & Biases.

Language modeling capabilities

Prompt: One day, a little girl named Lily found a needle in her room

See inference.ipynb for more information.

8M

One day, a little girl named Lily found a needle in her room. It was shiny and shiny. She showed it to her mom. "Look, mom! I found a needle!" she said. Her mom smiled and said, "Good job, Lily! You keep this needle safe from the icy ground." Lily put the needle in her toy box. It was her friend. They played with the needle all day long. They took turns to wrap it around the room. Soon, it was time for bed. Lily was sleepy. She put the needle on her bed and closed the winter box. "Good night, Lily. Now it will be warm so." "Goodnight, Jack. I love we coats until it is warm." Lily fell asleep under the icy bed. She felt happy and cozy. She knew the needle two good listening to her mom. When she woke up, they went to play on her useful once more. She loved to play with her friends. And they all lived happily ever after.

28M

One day, a little girl named Lily found a needle in her room. She wanted to make a special necklace with it. Lily took the needle out of the box and tried to make a pretty necklace. But it was not easy to make a special necklace. Lily's mom saw her trying to make the necklace. She smiled and said, "I can help you, Lily! Let squeezing a half of the needle together. It will make you look perfect." Lily was happy and on the other hand. Together, they poked and made a beautiful necklace. Lily wore it to the park to play with her friends. She was very good at making pretty things. Lily knew that the special necklace would make everyone as happy as it was still for her. And whenever Lily wore her colorful necklace, she remembered how she just used the needle to make it and felt proud. And she remembered that it was just as fun to make something new. And that's a moral value for her family. The moral of the story is that using special tools can

Author's trained model

One day, a little girl named Lily found a needle in her room. She was very happy and wanted to help her mommy. Lily asked her mommy, "Can I help you, mommy?" Her mommy said, "Of course, Lily. Let's go to the store and buy some candy." Lily was so happy and said, "Thank you, mommy! I love candy!" Her mommy smiled and said, "You're welcome, Lily. You're welcome." Lily was happy to have a new friend and a new friend. She was happy to have a new friend and a new friend.

About

Reproducing GPT on the TinyStories dataset

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published