NatSGD Comm2LTL

NatSGD Dataset Human Communication to Linear Temporal Logic (Comm2LTL) Benchmarking: This repository contains the implementation and models for translating human communication into Linear Temporal Logic (LTL).

1. Prerequisites

While we validated Python 3.9 on a Linux machine, any newer python versions should work as long as there is sufficient GPU memory to load the models. Please install the prerequisites using

python3 -m pip install -r requirements.txt

2. Download Dataset files

Download dataset files from this Google Drive link into the data folder.

3. Pre-Training Gesture Encoder

Before the fine-tuning the models, we need to train the gesture Autoencoder. Below are the steps to prepare data and pre

To prepare the LSTM encoder model for gesture preprocessing in our projects, take the following steps:

Run the lstmencoder.py script.
This will generate a .pth file that we will use later as the gesture motion encoding for the downstream tasks.

4. Fine-Tuning Speech and Gesture Models

We have to fine-tune two distinct models, namely bart-base and t5-base, on our dataset across three modalities: Speech only, Gesture only, and a combination of Speech and Gestures.

The learning framework for translating a pair of speech and gesture data to an LTL formula that can solve multi-modal human task understanding problems.

4.1. Speech Only Models

For the Speech-only model, please see:

4.2. Gesture Only Models

For the Gesture-only model, please see:

4.3. Speech + Gesture Models

To explore the Speech and Gesture combined model, please see:

5. Spot Score Calculation

In order to calculate the Spot Score, you will follow the steps below and use calc_spot_score.py file.

Update the input folder and file name in the code according to your specific dataset and file structure. E.g. Here we have Speech + Gestures T5 test predictions for epoch 100's model.

input_folder = os.getcwd() + '/results/predictions/speechGesturesT5/test'
input_file =  input_folder + '/test_data_epoch_100.txt'

Run calc_spot_score.py to compute the total score for this prediction set.

6. Results

Model (using BART)	Jaq Sim ↑	Spot Score ↑
Speech Only	0.934	0.434
Gestures Only	0.922	0.299
Speech + Gestures	0.944	0.588

Model (using T5)	Jaq Sim ↑	Spot Score ↑
Speech Only	0.917	0.299
Gestures Only	0.948	0.244
Speech + Gestures	0.961	0.507

For any additional information or inquiries, please feel free to contact us. Thank you for using NatSGD dataset and Comm2LTL benchmarking!

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
docs		docs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
calc_spot_score.py		calc_spot_score.py
gestures_bart.py		gestures_bart.py
gestures_t5.py		gestures_t5.py
lstmencoder.py		lstmencoder.py
requirements.txt		requirements.txt
speechGestures_t5.py		speechGestures_t5.py
speech_bart.py		speech_bart.py
speech_t5.py		speech_t5.py
speechgestures_bart.py		speechgestures_bart.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NatSGD Comm2LTL

Contents

1. Prerequisites

2. Download Dataset files

3. Pre-Training Gesture Encoder

4. Fine-Tuning Speech and Gesture Models

4.1. Speech Only Models

4.2. Gesture Only Models

4.3. Speech + Gesture Models

5. Spot Score Calculation

6. Results

About

Releases

Packages

Contributors 2

Languages

License

sneheshs/NATSGD_Comm2LTL

Folders and files

Latest commit

History

Repository files navigation

NatSGD Comm2LTL

Contents

1. Prerequisites

2. Download Dataset files

3. Pre-Training Gesture Encoder

4. Fine-Tuning Speech and Gesture Models

4.1. Speech Only Models

4.2. Gesture Only Models

4.3. Speech + Gesture Models

5. Spot Score Calculation

6. Results

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages