Skip to content

Contians code for GPT implementation from the scratch using PyTorch.

Notifications You must be signed in to change notification settings

tayyibgondal/GPT_from_scratch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

GPT (Generative Pre-trained Transformer)

This repository contains an implementation of the GPT (Generative Pre-trained Transformer) model from scratch. The GPT model is a state-of-the-art language generation model based on the Transformer architecture, introduced by Vaswani et al. in the paper Attention Is All You Need.

Overview

The GPT model is a deep learning-based language model that uses unsupervised pre-training on a large corpus of text data, followed by fine-tuning on specific downstream tasks such as language translation, question answering, and text completion. It leverages the Transformer architecture, which utilizes self-attention mechanisms to capture dependencies between words or tokens in a sentence.

I implemented the GPT model in the gpt.py file in gpt folder. The implementation includes all the necessary components, such as the Transformer encoder, positional encoding, self-attention, feed-forward neural networks, and the generation pipeline.

Architecture Diagram

GPT Architecture Diagram

The above diagram illustrates the high-level architecture of the GPT model. It consists of several stacked Transformer encoder layers, where each layer comprises multi-head self-attention and feed-forward neural networks. The input text is tokenized, and positional encoding is added to preserve the order of the tokens. The model is trained to predict the next token in a sequence given the previous tokens, enabling it to generate coherent and contextually relevant text.

About

Contians code for GPT implementation from the scratch using PyTorch.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published