Skip to content

Commit

Permalink
More documentation updates for the 2024-02-22 release (#567)
Browse files Browse the repository at this point in the history
* Updates 2024-02-22.html with additional details on the differences in the processing pipeline between the NeurIPS 2023 paper and the current pipeline

* Added some styling for code blocks (for BibTeX citation)

* Removed commented out line in base.html

* Added BibTeX and APA citations for our paper in index.html

* Updated derived data products page to recommend using the data in the *_sync tables instead of the older CSVs
  • Loading branch information
adarshp authored Feb 22, 2024
1 parent 76bb69a commit e9bacf9
Show file tree
Hide file tree
Showing 6 changed files with 100 additions and 15 deletions.
14 changes: 7 additions & 7 deletions human_experiments/datasette_interface/metadata.yml
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ databases:

eeg_sync:
description: >
Table with EEG data synchronized to a main clock with fixed frequency. The start
Table with EEG data synchronized to a main clock with fixed frequency. The start
and end timestamps of the main clock spans across all the tasks in the experiment.
columns:
group_session: Group session ID
Expand All @@ -91,7 +91,7 @@ databases:

ekg_sync:
description: >
Table with EKG data synchronized to a main clock with fixed frequency. The start
Table with EKG data synchronized to a main clock with fixed frequency. The start
and end timestamps of the main clock spans across all the tasks in the experiment.
columns:
group_session: Group session ID
Expand Down Expand Up @@ -120,7 +120,7 @@ databases:

fnirs_sync:
description: >
Table with fNIRS data synchronized to a main clock with fixed frequency. The start
Table with fNIRS data synchronized to a main clock with fixed frequency. The start
and end timestamps of the main clock spans across all the tasks in the experiment.
columns:
group_session: Group session ID
Expand Down Expand Up @@ -178,7 +178,7 @@ databases:

gsr_sync:
description: >
Table with GSR data synchronized to a main clock with fixed frequency. The start
Table with GSR data synchronized to a main clock with fixed frequency. The start
and end timestamps of the main clock spans across all the tasks in the experiment.
columns:
group_session: Group session ID
Expand Down Expand Up @@ -312,9 +312,9 @@ databases:
timestamp_unix: The Unix timestamp of the event.
timestamp_iso8601: The ISO-8601 timestamp of the event.
filename: Name of the screenshot file.
timestamp_origin: >
The source of the timestamp. The priority is precision-driven. LSL time are
used if available, otherwise, creation time is used if available. Modification
timestamp_origin: >
The source of the timestamp. The priority is precision-driven. LSL time are
used if available, otherwise, creation time is used if available. Modification
time is used as last resort.
task:
Expand Down
5 changes: 5 additions & 0 deletions human_experiments/datasette_interface/static/app.css
Original file line number Diff line number Diff line change
Expand Up @@ -235,6 +235,11 @@ pre {
font-family: monospace;
}

.code {
border: 1px;
background-color: #F0F0F0;
}

a.not-underlined {
text-decoration: none;
}
Expand Down
1 change: 0 additions & 1 deletion human_experiments/datasette_interface/templates/base.html
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@
<html>
<head>
<title>{% block title %}{% endblock %}</title>
<!--<link rel="stylesheet" href="{{ urls.static('app.css') }}?{{ app_css_hash }}">-->
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
{% for url in extra_css_urls %}
<link rel="stylesheet" href="{{ url.url }}"{% if url.get("sri") %} integrity="{{ url.sri }}" crossorigin="anonymous"{% endif %}>
Expand Down
49 changes: 49 additions & 0 deletions human_experiments/datasette_interface/templates/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,55 @@ <h1>{{ metadata.title or "Datasette" }}{% if private %} 🔒{% endif %}</h1>
diagram](/assets/db_diagram.png).

Sign up for our [mailing list](/mailing-list) to get updates on the dataset!

## Citation

If you use this dataset, please cite our [NeurIPS 2023
paper](https://openreview.net/forum?id=ZJWQfgXQb6) that introduces
the dataset.

### BibTeX Format

""") }}

<div class="code">
<pre>
<code>
@inproceedings{
pyarelal2023the,
title={The To{MCAT} Dataset},
author={
Adarsh Pyarelal
and Eric Duong
and Caleb Jones Shibu
and Paulo Soares
and Savannah Boyd
and Payal Khosla
and Valeria Pfeifer
and Diheng Zhang
and Eric S Andrews
and Rick Champlin
and Vincent Paul Raymond
and Meghavarshini Krishnaswamy
and Clayton Morrison
and Emily Butler
and Kobus Barnard
},
booktitle={Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
year={2023},
url={https://openreview.net/forum?id=ZJWQfgXQb6}
}
</code>
</pre>
</div>

{{ render_markdown("""

### APA Format

Adarsh Pyarelal, Eric Duong, Caleb Jones Shibu, Paulo Soares, Savannah Boyd, Payal Khosla, Valeria Pfeifer, Diheng Zhang, Eric S Andrews, Rick Champlin, Vincent Paul Raymond, Meghavarshini Krishnaswamy, Clayton Morrison, Emily Butler, & Kobus Barnard (2023). The ToMCAT Dataset. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track.


""") }}
{% for database in databases %}
<h2 style="padding-left: 10px; border-left: 10px solid #{{ database.color }}">
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,11 @@
Documentation for the derived data products is in sections 4 and 5 of the [data
products document](http://ivilab.cs.arizona.edu/data/tomcat/data_products.pdf).

- [Filtered synchronized data (generated 2023-08-28) (836
GB)](https://ivilab.cs.arizona.edu/ivilab/data/tomcat/derived/release_2023_08_28_17.tar.gz)
As of 2024-02-22, we recommend using the derived data in the `*_sync` tables in
the SQLite3 database rather than the CSVs in the older derived data releases
listed below, as the 2024-02-22 release fixes a number of issues with the
2023-08-28 version of the derived data (see [here](/updates/2024-02-22) for
details).

## Older versions

Expand All @@ -23,6 +26,9 @@
However, we will continue to provide the older versions for as long as
we feasibly can.

- [Filtered synchronized data (generated 2023-08-28) (836
GB)](https://ivilab.cs.arizona.edu/ivilab/data/tomcat/derived/release_2023_08_28_17.tar.gz)

- [Filtered synchronized data (1.1
TB)](https://ivilab.cs.arizona.edu/ivilab/data/tomcat/derived/release_2023_07_17_19.tar.gz).
Uploaded 2023-07-19. See [here](/updates/2023-07-19) for details.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,16 +11,42 @@

### 1.1 Synchronized physio data tables

Added new tables (`fnirs_sync`, `gsr_sync`, `eeg_sync`, and `ekg_sync`)
to store preprocessed signals synchronized to a main clock for four modalities
(fNIRS, GSR, EEG, and EKG) across experiments, with filtering for artifact
removal. For each experiment, a main clock was defined with start time at 5
seconds before the beginning of the first task (rest state), end time at 5
Added new tables (`fnirs_sync`, `gsr_sync`, `eeg_sync`, and `ekg_sync`) to the
database to store preprocessed signals synchronized to a main clock for four
modalities (fNIRS, GSR, EEG, and EKG) across experiments, with filtering for
artifact removal. For each experiment, a main clock was defined with start time
at 5 seconds before the beginning of the first task (rest state), end time at 5
seconds after the end of last task (Minecraft), and frequency at 200 Hz.

The changes are introduced in this PR:
https://github.com/ml4ai/tomcat/pull/558.

The pipeline for synchronization that produces the data in the `*_sync` tables
differs from the pipeline described in our [NeurIPS 2023
paper](https://openreview.net/forum?id=ZJWQfgXQb6) (that produced the [derived
data products from
2023-08-28](https://tomcat.ivilab.org/derived-data-products)) in the following
ways:

- In the 2023-08-28 version of the derived data products, we used the same
functions for filtering the GSR and EKG signals as we did for the EEG data.
However, this was an error on our part. We now use the proper GSR and EKG
filtering functions from the [neurokit](https://neuropsychology.github.io/NeuroKit/)
library.
- In the 2023-08-28 version of the derived data products, we used the earliest
timestamp for the modality among the participants as the starting point.
However, this creates a different start time for each modality, which makes it
challenging if a user wants to work with multiple modalities. So instead, we
defined a shared main clock for each group session which starts 30 seconds
before the first task and ends ends 30 seconds after the final task. All
modalities are now mapped to the ticks in this clock (in the `*_sync` tables in
the database).
- In the 2023-08-28 version of the derived data products, synchronized signals
were generated for 100 Hz, 500 Hz, and 1000 Hz. However, the new `*_sync`
tables in the database only contain synchronized signals at 200 Hz, based on
recent discussions with experts on brain data processing (100 Hz may lose
important information, and 500/1000 Hz is excessive computationally expensive).

### 1.2 Vocalic features table

Added a table called `audio_vocalics` that contains vocalic features extracted
Expand Down

0 comments on commit e9bacf9

Please sign in to comment.