EEG-Transformer

A ViT based transformer applied on multi-channel time-series EEG data for motor imagery classification. This repo is part of the final project for COGS 189: Brain Computer Interfaces at the University of California, San Diego, Winter 2022. This code repository and the project is managed and developed by Colin Wang, and several possible directions to improve the baseline model are proposed by Xing Hong, Luning Yang, Annie Fan, Yunyi Huang, and Zixin Ma.

The repository contains code that is highly experimental. Many arguments are hardcoded and the data is not carefully pre-processed. Use with caution. If you are developing a research project inspired by this repo, please send me an email: [email protected]

Colin's notes as of 10/10/2022: I'm quite surprised that this repo (which originally was a course project) is getting 20+ stars. To make sure it works well and benefit those who need it, I will soon check if everything is working well and update the code if needed. Thank you all for your supports! We really appreciate your feedbacks and opinions on this.

Introduction

This is a naive baseline model that explores the possibility of using a ViT based transformer for inferring 3-class motor imagery based on multichannel time-series EEG data recorded at 1000 Hz for 8 seconds (in which 4 seconds are used). The model shows the capability to converge on training data with very high accuracy (i.e. around 98%), but suffers from overfitting. Our contributions are:

Demonstrating that it's possible to use a ViT to deal with multi-channel EEG data based on computational resources (it took 1 minute for each epoch on 1,000 training data on a 1080Ti. About 6000 Mb of VRAM is used during training).
The model is capable of learning information on this architecture. By using a learnable CLS token and concatenating with the other 59 tokens before encoding (i.e. 59 channels of EEG data), the latent of CLS, with an MLP head, is able to make predictions on the training set with almost 100% accuracy and above 55% accuracy on validation set.
We found that the biggest problem for this model is its tendency to overfit. This can be solved by many approaches such as using more training data, using techniques in latest research regarding fine-tuning the transformer in low data settings. This model can also further be improved by pre-training, which we can mask some channels or time-intervals to let the model reconstruct, which is pretty similar to MAE. This mechanism has not been implemented in this baseline model.

Link to the project presentation

Model High-Level Architecture

Model Low-Level Architecture

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
         LayerNorm-1             [-1, 60, 4000]           8,000
            Linear-2            [-1, 60, 12000]      48,000,000
           Dropout-3            [-1, 8, 60, 60]               0
            Linear-4             [-1, 60, 4000]      16,004,000
           Dropout-5             [-1, 60, 4000]               0
         Attention-6             [-1, 60, 4000]               0
          Identity-7             [-1, 60, 4000]               0
         LayerNorm-8             [-1, 60, 4000]           8,000
            Linear-9            [-1, 60, 16000]      64,016,000
             GELU-10            [-1, 60, 16000]               0
          Dropout-11            [-1, 60, 16000]               0
           Linear-12             [-1, 60, 4000]      64,004,000
          Dropout-13             [-1, 60, 4000]               0
              Mlp-14             [-1, 60, 4000]               0
         Identity-15             [-1, 60, 4000]               0
            Block-16             [-1, 60, 4000]               0
           Linear-17                  [-1, 512]       2,048,512
             ReLU-18                  [-1, 512]               0
           Linear-19                    [-1, 3]           1,539
================================================================
Total params: 194,090,051
Trainable params: 194,090,051
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.90
Forward/backward pass size (MB): 47.83
Params size (MB): 740.39
Estimated Total Size (MB): 789.13
----------------------------------------------------------------

Name	Name	Last commit message	Last commit date
Latest commit zwcolin Update README.md Nov 29, 2024 36565df · Nov 29, 2024 History 9 Commits
logs	logs	update architecture	Feb 21, 2022
.DS_Store	.DS_Store	upload scripts	Feb 21, 2022
.gitignore	.gitignore	Initial commit	Feb 19, 2022
LICENSE	LICENSE	Initial commit	Feb 19, 2022
README.md	README.md	Update README.md	Nov 29, 2024
architecture.png	architecture.png	update architecture	Feb 21, 2022
args.py	args.py	upload scripts	Feb 21, 2022
data.py	data.py	upload scripts	Feb 21, 2022
engine.py	engine.py	upload scripts	Feb 21, 2022
experiment.sh	experiment.sh	upload scripts	Feb 21, 2022
main.py	main.py	upload scripts	Feb 21, 2022
model.py	model.py	upload scripts	Feb 21, 2022
utils.py	utils.py	upload scripts	Feb 21, 2022
viz.ipynb	viz.ipynb	upload scripts	Feb 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EEG-Transformer

Introduction

Model High-Level Architecture

Model Low-Level Architecture

About

Releases

Packages

Languages

License

zwcolin/EEG-Transformer

Folders and files

Latest commit

History

Repository files navigation

EEG-Transformer

Introduction

Model High-Level Architecture

Model Low-Level Architecture

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages