SMOL-GPT 🦾

A minimal PyTorch implementation for training your own small LLM from scratch. Designed for educational purposes and simplicity, featuring efficient training, flash attention, and modern sampling techniques.

Features ✨

Minimal Codebase: Pure PyTorch implementation with no abstraction overhead
Modern Architecture: GPT model with:
- Flash Attention (when available)
- RMSNorm and SwiGLU
- Efficient top-k/p/min-p sampling
- Rotary embeddings - RoPE (Optional)
Training Features:
- Mixed precision (bfloat16/float16)
- Gradient accumulation
- Learning rate decay with warmup
- Weight decay & gradient clipping
Dataset Support: Built-in TinyStories dataset processing
Custom Tokenizer: SentencePiece tokenizer training integration

Installation 🛠️

pip install -r requirements.txt

Requirements:

Python 3.8+
PyTorch 2.0+ with CUDA
Modern GPU (recommended)

Quick Start 🚀

Option 1: Full Training Cycle

Prepare Dataset

python preprocess.py prepare-dataset --vocab-size 4096

Start Training

python train.py

Generate Text

python sample.py \
    --prompt "Once upon a time" \
    --num_samples 3 \
    --temperature 0.7 \
    --max_new_tokens 500

Option 2: Use Pre-trained Model

Download Assets

# Download tokenizer
wget https://huggingface.co/OmAlve/TinyStories-SmolGPT/resolve/main/tok4096.model -P data/

# Download pre-trained checkpoint
wget https://huggingface.co/OmAlve/TinyStories-SmolGPT/resolve/main/ckpt.pt -P out/

Run Inference

python sample.py \
    --prompt "Once upon a time" \
    --tokenizer_path data/tok4096.model \
    --ckpt_path out/ckpt.pt \
    --num_samples 3 \
    --max_new_tokens 200 \
    --temperature 0.7

Pre-trained Model Details 🔍

The provided checkpoint was trained on the TinyStories dataset.

Architecture:

4096-token vocabulary
8 heads
8-layer transformer
512 embedding dimension
Trained on ~4 Billion Tokens for around 18.5 hours

Validation Loss - 1.0491

Sample Outputs 📝

Example 1

Prompt: One day, Lily met a unicorn

Output:
One day, Lily met a unicorn in the park. The unicorn had shiny fur and a pretty dress. Lily asked the unicorn, "Where did you come from?"
The unicorn replied, "I came from the forest and wanted to meet you, Lily. I am here to make sure you are safe."
Lily and the unicorn played together and had lots of fun. But then, the unicorn started to act funny. Lily didn't know what was happening. Suddenly, the unicorn turned into a little girl and said, "I was under a spell, and your kindness broke it. Thank you for breaking it."
Lily was surprised and happy. She gave the unicorn a big hug and they played together all day. From that day on, the unicorn would always come to play with Lily - her new friend.

Prompt: The dragon flew over the mountains

Output:
The dragon flew over the mountains, over the rivers and over the rivers. He was very brave and strong.
One day, the dragon saw something very strange. It was a big, shiny rock. He wanted to know what it was, so he flew down and touched it with his nose. Suddenly, the rock began to move!
The dragon was so surprised! He had never seen anything like it before. He looked around and saw that it was a little mouse! The mouse was very scared and started to run away.
The dragon was very sad. He wanted to help the mouse, so he decided to try and make friends. He flew around and around until he found the mouse. He said hello to the mouse and asked if he wanted to be friends.
The mouse was so happy! He said yes, and they played together all day long. From then on, the dragon and the mouse were the best of friends. They had lots of fun together and the dragon was never lonely again.

Configuration ⚙️

Key parameters (modify in config.py):

Model Architecture:

GPTConfig(
    block_size=512,    # Context length
    n_layer=8,         # Number of transformer layers
    n_head=8,          # Number of attention heads
    n_embed=512,       # Embedding dimension
    dropout=0.2,       # Dropout rate
    bias=False,        # Use bias in layers
    use_rotary=False,  # Toggle rotary embeddings
)

Training:

TrainingConfig(
    batch_size=64,
    max_iters=30000,
    learning_rate=6e-4,
    weight_decay=0.1,
    grad_clip=1.0,
    warmup_iters=1000
)

File Structure 📁

om-alve-smolgpt/
├── config.py       - Model & training configuration
├── dataset.py      - Data loading & preprocessing
├── model.py        - GPT model implementation
├── preprocess.py   - Dataset preparation scripts
├── sample.py       - Text generation script
├── tokenizer.py    - Tokenizer wrapper
└── train.py        - Main training loop

Contributing 🤝

Contributions welcome! Please open an issue or PR for:

Bug fixes
Performance improvements
New features

Training RIG SPECS (Rented via LightningAI)

GPU: NVIDIA L4 Tensor Core (Optimized for AI workloads)
vCPUs: 16
RAM: 64 GB
VRAM: 24 GB

Note: This implementation is inspired by modern LLM training practices and adapted for educational purposes. For production use, consider scaling up model size and dataset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SMOL-GPT 🦾

Features ✨

Installation 🛠️

Quick Start 🚀

Option 1: Full Training Cycle

Option 2: Use Pre-trained Model

Pre-trained Model Details 🔍

Sample Outputs 📝

Example 1

Configuration ⚙️

File Structure 📁

Contributing 🤝

Training RIG SPECS (Rented via LightningAI)

About

Releases

Packages

Contributors 4

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
assets		assets
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.py		config.py
dataset.py		dataset.py
model.py		model.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
sample.py		sample.py
tokenizer.py		tokenizer.py
train.py		train.py

License

Om-Alve/smolGPT

Folders and files

Latest commit

History

Repository files navigation

SMOL-GPT 🦾

Features ✨

Installation 🛠️

Quick Start 🚀

Option 1: Full Training Cycle

Option 2: Use Pre-trained Model

Pre-trained Model Details 🔍

Sample Outputs 📝

Example 1

Configuration ⚙️

File Structure 📁

Contributing 🤝

Training RIG SPECS (Rented via LightningAI)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages