About Dumb GPT :
This is a learning project created by me.
It is a pre-processed Large Language Model meaning it is not smart as there is no Finetuning involved in the training process or not pre-processed transformers are used while creating it.
It is created to find the text which looks similar to it's prompt (the input which we give).
The current model is trained on a wopping 100k iterations taking a time of 18H to traing on my RTX 3050 grapics cards
You can train yours by getting the data from Open Web Text.
Steps to Run this LLM: (on linux environment)
-
Create your venv. Use command
python3 -m venv env_name
-
Make sure you have pytorch on your venv Pytorch Installation link
I would suggest to go with the lastest version of CUDA. -
Clone this repo.
-
Download the data from Open Web Text.
-
Extract the data using the Data-Extract.ipynb file.
-
After the extraction is done. Go to chatbot.py and make sure the paths are correct.
-
After that open your terminal and type the magical words (obviously this are for my model 😁 That you have pre tarained with the repo)
python3 chatbot.py -batch_size 48
Don't forget to download the GPT model from here
This is because the current model was trained on 48 batch size you can adjust this as per your graphics card's convenience. 😬
Also make sure to change the batch size when ever you try to traing your own model using the file training.py
And don't forget to change the name of the traing file when you run your custom model.
I would like to thank Elliot Arledge, he is my mentor and guid through out this project.(check out the links below).
Below are the paper links that are useful in creating of this GPT:
-
Attention is all you need basic transformer
Below paper is not using in this project as it is on fine tuning.
QL O RA: Efficient Finetuning of Quantized LLMs
Connect with Me: LinkedIn
Links
=> Elliot Arledge's Youtube Channel
=> Free Code Camp Course For LLM
Thank you for viewing