Skip to content

Latest commit

 

History

History
95 lines (82 loc) · 4.01 KB

README.md

File metadata and controls

95 lines (82 loc) · 4.01 KB

Siformer: Feature Isolated Transformer for Efficient Skeleton-based Sign Language Recognition

Installation

conda create -n siformer python==3.11
conda activate siformer

# Please install PyTorch according to your CUDA version.
pip install -r requirements.txt

Get Started

method

To train the model, just specify the hyperparameters and execute the following command:

python -m train
  --experiment_name [str; name of the experiment to name the output logs and plots]
  --epochs [int; number of epochs]
  --lr [float; learning rate]
  --num_classes [int; the number of classes to be recognised by the model]
  
  --attn_type [str; the attention mechanism used by the model]
  --num_enc_layers [int; determines the number of encoder layer]
  --num_dec_layers [int; determines the number of decoder layer]
  --FIM [boolean; determines whether feature-isolated mechanism will be applied]
  --PBEE_encoder [bool; determines whether patience-based encoder will be used for input-adaptive inference]
  --PBEE_decoder [bool; determines whether patience-based decoder will be used for input-adaptive inference]
  --patience [int; determines the patience for earlier exist]
  
  --training_set_path [str; the path to the CSV file with training set's skeletal data]
  --validation_set_path [str; the Path to the CSV file with validation set's skeletal data]
  --testing_set_path [str; the path to the CVS file with testing set's skeletal data]

If you leave the paths for either the validation or testing sets empty, the corresponding metrics will not be computed. Additionally, we offer a pre-defined parameter to automatically divide the validation set according to a desired split of the training set. You can find detailed descriptions of these and various other specific hyperparameters in the train.py file. Each of these hyperparameters has a default value that we have determined to work well in our experiments.

Here are some examples of usage:

python -m train --experiment_name LSA64 --training_set_path datasets/LSA64_60fps.csv --num_classes 64 --experimental_train_split 0.8 --validation_set split-from-train --validation_set_size 0.2 
python -m train --experiment_name WLASL100 --training_set_path datasets/WLASL100_train_25fps.csv --validation_set_path datasets/WLASL100_val_25fps.csv --validation_set from-file --num_classes 100

Dataset

We employed the WLASL100 and LSA64 datasets for our experiments. Their corresponding citations can be found below:

@inproceedings{li2020word,
    title={Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison},
    author={Li, Dongxu and Rodriguez, Cristian and Yu, Xin and Li, Hongdong},
    booktitle={The IEEE Winter Conference on Applications of Computer Vision},
    pages={1459--1469},
    year={2020}
}
@inproceedings{ronchetti2016lsa64,
    title={LSA64: an Argentinian sign language dataset},
    author={Ronchetti, Franco and Quiroga, Facundo and Estrebou, C{\'e}sar Armando and Lanzarini, Laura Cristina and Rosete, Alejandro},
    booktitle={XXII Congreso Argentino de Ciencias de la Computaci{\'o}n (CACIC 2016).},
    year={2016}
}

License

The code is released under the MIT license.

Citation

@inproceedings{10.1145/3581783.3611724,
  author = {Muxin Pu, Mei Kuan Lim, and Chun Yong Chong},
  title = {Siformer: Feature-isolated Transformer for Efficient Skeleton-based Sign Language Recognition},
  year = {2024},
  isbn = {979-8-4007-0686-8},
  publisher = {Association for Computing Machinery},
  address = {Melbourne, VIC, Australia},
  url = {https://doi.org/10.1145/3664647.3681578},
  doi = {10.1145/3664647.3681578},
  booktitle = {Proceedings of the 32st ACM International Conference on Multimedia},
  series = {MM '24}
}