Skip to content

Tables derived from the survey paper Promoting Fairness and Diversity in Speech Datasets (Mancini et al.)

License

Notifications You must be signed in to change notification settings

AGalassi/ethical-survey-speech

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Survey: Promoting Fairness and Diversity in Speech Datasets

Practices and Recommendations for the Cases of Mental and Neurological Health Research

This GitHub repository contains detailed tables derived from the paper Promoting Fairness and Diversity in Speech Datasets (Mancini et al.). These tables are organized into four subsections, each corresponding to specific aspects of our checklist for evaluating fairness and diversity in speech datasets. Unlike the tables in the original paper, we have augmented them with excerpts from the surveyed papers, providing additional context and insights into the discussed aspects.

Informed Consent (C1), Data Storage and Security (C2), Anonymity and De-Identification (C3), Accountability (C4)

Dataset Discourse Genre Target Issue Source of Content C1 Informed Consent C2 Data Storage and Security C3 Anonymity and De-Identification File Naming C4 Accountability
Chinese Multimodal Depression Corpus (CMDC) CI DP Yes “Written informed consent was obtained from each subject. Subjects were asked before the experiment if they would like to be video recorded, and the recorded video may be published in research papers without any personal information in the consent form.” No - Yes “The proposed Chinese Multimodal Depression Corpus (CMDC) was made publicly available after deidentification and annotation.” ”All transcribed interviews were annotated to remove identifying information. Utterances that mention personal names, specific addresses, workplaces, and can be used to narrow the scope of the event were tagged and eliminated. Data annotated in the corpus does not contain protected health information.” - Yes “The research was approved by the Independent Medical of Ethics Committee Board of Beijing Anding Hospital (2019 No. 53).”
EMU SR, SS AX, DP CS Yes Figure 2 displays the initial screen of the application, where users are informed about the process. Yes “All data collected by the EMU app is TSL encrypted before being sent to a secure server for storage.” ”The EMU data sever is also responsible for session management. It stores the smartphone’s hardware identifier (phoneID) to easily distinguish unique participants. For this deployment, when a session is initiated, the phoneID is compared to those in completed sessions, thus allowing for a timely repetition notification.” Yes - The EMU data sever is also responsible for session management. It stores the smartphone’s hardware identifier (phoneID) to easily distinguish unique participants. Yes “[IRB number redacted for double-blind submission].”
Moodable SR DP CS Yes In the section titled 'Data Collection Methodology for Study 2,' a detailed description is provided regarding all the information for which participants are required to express their willingness to share. It is noted that participants retain the right to refuse to contribute any specific data modality. The paper reports “”Participants could refuse to contribute any data modality, and at the minimum, only the results of the PHQ-9 survey were required to complete the study”. Figure 5.A illustrates the process of obtaining informed consent. Yes The paper states that data will be stored on a secure server, as depicted in Figure 5.A. No - - No -
Jiang et al., Jiang et al., Jiang et al. CI DP No - No - No - - Yes “All of the data gathering and analytic procedures were reviewed and approved by the Emory University Institutional Review Board (IRB reference numbers 00024883 and 00066843).”
Lin et al. CI AX, DP Yes We interpret participants' agreement as affirmative, even if it's not explicitly labeled as "informed consent" in the paper. The paper states: “All participants got debriefed of the main aim of the data collection at the end of the session.” No - No - - No -
StudentSADD CI DP CS Yes “The Android app also asked students to optionally share passive phone modalities.” ”Android app participants were also informed that they could delete the app at any time.” ”the first page of both the Android and web apps covered the IRB approved study details including the goal, procedure, privacy, and risk” ”Specifically, all potential participants were informed of the following study details: Goal: The goal of this project is to build an AI tool to screen for mental health. The more information you share, the more effectively we can detect mental health conditions which could save lives. • Procedure: This survey will only take around 5 minutes to complete. You will be asked to answer questions, record samples of your voice, and share phone/social media data. No private messages will be collected. • Privacy: All information you give will be stored anonymously on a secure server and will not be tied to you. • Voluntary/Risk: You can share as much or little data as you would like and you may stop the study at any time.” ”Between the demographic survey and the text prompt, the Android app asked for permission to collect text logs, call logs, contacts, and calendar entries stored on the phone. After the audio, the Android app participants could also share their GPS history stored by their Google account. The last modality of both apps asked students to share their Twitter usernames so we could collect publicly available tweets. While we only collected the willingness to share tweets, this approach allows us to determine the willingness of students to have this modality used for screening purposes. Students could indicate they did not have a Twitter account or simply decline to share their username. All these passive modalities were optional and participants were asked to give individual permission for each modality shared to guarantee informed consent.” ”the student participants were informed that their data would be stored anonymously and that the actual text message content would not be collected and thus never shared with anyone.” Yes “All data was TSL encrypted before being sent to our secure server” Yes “All information you give will be stored anonymously on a secure server and will not be tied to you” ”No private messages will be collected.” ”all structured fields were one-way hashed and message content was removed to preserve student privacy.”” - Yes “IRB approved study” ”WPI IRB 00007374 File 18-0031”
The Androids Corpus CI, SR DP Yes “All participants were involved on a voluntary basis and they all signed an informed consent letter.” No - Yes - “The file naming convention was designed to provide information available about a speaker”. Names are constituted by identifiers that do not reveal the identity of the patients. However, authors did not state if they analyzed in details the recordings to see if they do not contain any sensible information, but they declare the type of speech (non-spontaneous speech)” Yes “The data collection was performed according to the ethical regulations of countries and institutions involved in the work.”
MODMA CI DP Yes “Written informed consent was obtained from all participants prior to the experiment.” Yes The datasets discussed in this paper have been made available as safeguarded data on the UK Data Archive's data repository, ReShare. Individuals interested in downloading and utilizing these datasets are required to register with the UK Data Archive and adhere to their End User License conditions, which are outlined at https://ukdataservice.ac.uk/cd137-enduserlicence/. Furthermore, the raw experimental dataset is accessible for download from the publicly accessible repository at http://modma.lzu.edu.cn/data/index/ free of charge. However, all users interested in obtaining this dataset must agree to an End User License Agreement (EULA) prior to downloading. The raw datasets are categorized under "EEG" and "Recordings of spoken language." Each package contains an Excel file comprising demographic data and psychological assessment scores for all participants included in that dataset. Yes - “Each experiment participant is given an identity number, and this participant id is unique across all the packages.” Yes “The local Ethics Committee approved consent forms and study design for Biomedical Research at the Lanzhou University Second Hospital according to the World Medical Association (Declaration of Helsinki).”
Japanese Daily Speech Dataset (JDSD) SS AX, DP CS No - No - No - - No -
DEPAC ST AX, DP CS No - No - No - - Yes “Data collection for the corpus was approved by the Insitutional Review Board (IRB)”
DAIC-WoZ CI DP Yes “Participants first completed a consent form (which included optional consent that allowed their data to be shared for research purposes).” No - Yes “All the transcribed interviews were annotated to remove identifying information. Utterances were tagged for mentions of personal names, specific dates, addresses, schools, places of employment, and locations that can be used to narrow down an event. Utterances were not considered to be personally identifying if they only included large locations (e.g. “I live in Santa Monica”), very large institutions (“I served in the Marines”), or non- specific dates such as age in years. De-identification was performed independently by two annotators, and differ- ences were reconciled by a senior annotator. Utterances marked as personally identifying will not be shared in ac- cordance with our institution’s ethical guidelines.” - No -
EATD-Corpus CI DP Yes “All the volunteers have signed informed consents and guarantee the authenticity of all the information provided.” No - No - - No -
Depression Severity Interviews Database CI DP No - No - No - - No -
Guo et al. CI DP Yes “All participants provided informed consent.” No - No - - Yes “The studies involving human participants were reviewed and approved by Tianshui Third People's Hospital and Lanzhou University.”
Black Dog Institute CI DP Yes “Informed consent was obtained from all participants” No - No - - Yes “Informed consent was obtained from all partici- pants and the study proceeded with approval from the local institutional Human Research Ethics committee in line with the guidelines for human research from the National Health and Medical Research Council.”
D-Vlog SS DP YT No Not needed since content is Youtube video. No - Yes “Considering privacy concerns, we only provide de-identified anonymized data. “ - No -
Kempler SS, ST DM No - No - No - - No -
VAS Corpus ST DM
Delaware-DementiaBank SR AL Yes “After providing informed consent, participants completed the DementiaBank discourse protocol and cognitive–linguistic assessment battery.” Yes “In accordance with TalkBank policies, all participant data in the Delaware corpus are password protected and only available to DementiaBank consortium members. Interested users should request membership as per instructions on the main TalkBank webpage. Interested users should read the ground rules and then e-mail mailto:dev@null with their contact information, affiliation, and a brief statement about how they intend to use the data. Students who are interested in becoming members must ask their faculty advisor to join as DementiaBank members.” No - - Yes DementiaBank is inside TalkBank for which the Code of Ethics is public on the website. https://talkbank.org/share/ethics.html. Moreover, the paper states: “Data collection for the Delaware corpus was approved by the University of Delaware Institutional Review Board.”
ADReSS SS AL No - Yes The description of the platform implies that all participant data within the Delaware corpus adhere to TalkBank policies, whereby they are password protected and solely accessible to members of the DementiaBank consortium. Those interested in accessing the data should follow the instructions provided on the main TalkBank webpage to request membership. Prior to membership approval, interested users are encouraged to review the ground rules and contact ude.umc@wcam with their contact information, affiliation, and a brief statement detailing their intended use of the data. Additionally, students seeking membership are required to have their faculty advisor join as DementiaBank members. No - - Yes “DementiaBank is inside TalkBank for which the Code of Ethics is public on the website. https://talkbank.org/share/ethics.html. “
Ivanova SR AL Yes “All participants, or their legal representatives, signed a written informed consent prior to participating in the research. “ No - No - - Yes “Before taking part in the reading experiment, all participants (or their legal representatives) are informed about the test and sign the consent form in accordance with the protocol approved by the Bioethics Committee of the University of Salamanca, where all recordings were conducted.”
Famous People dataset SS AL No - No - No - - No -
Pitt Corpus-DementiaBank CI AL Yes “At the time of the first visit to the study site, the nature of the research was explained to both the patient and the family, and both the patient and the caregiver musthave given informed consent to participate. If, in the opinion of the examining clinician, the patient did not appear to understand the nature of the evaluations to be undertaken, they were excluded from the research.” No - No - - No -
Bipolar Disorder Corpus CI BD No - No - No - - No -
AMoSS Interview Dataset CI BD Yes “all 62 participants had given consent for further analysis on the qualitative interviews.” No .- No - - Yes “The study protocol was approved by the NRES Committee East of England—Norfolk (13/EE/0288)”
Llamocca et al. CI BD Yes “Informed consent was obtained from all subjects involved in the study.” No - Yes “The personal information of the patients was previously anonymized. “ “In this work, we referred to them as P01 to P17” Yes “The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the Ethics Committee of Project Bip4Cast Ref. A-83-4155904.”
The PRIORI Emotion Dataset CI, SS BD No - Yes “The app runs in the background and turns on whenever a phone call is made, recording only the participant’s side of the dialog. The speech is encrypted in real-time, stored on the phone, and then uploaded to a HIPAA-compliant server.” No - - Yes “The PRIORI Dataset is an ongoing collection of smartphone conversational data (reviewed and approved by the Institutional Review Board of the University of Michigan, HUM00052163). “
Shenzhen Somatisation Speech Corpus ST SSD Yes “All the participants involved were informed that their voice data will be used only for research purposes. Their agreements for this study were recorded as one of the five” Yes “We used a Shinco RV-18 recording pen with 32 GB of storage to record all the participants’ voices” No - - Yes “This study was approved by the ethic committee of 172 the Shenzhen University General Hospital”
StressID ST S Yes “they were requested to sign a consent form to participate. The participants could either consent to, Option A: research use and public release of all their recorded data, including identifying data (i.e. physiological, audio, and video). Option B: research use of all their recorded data, but no public release of identifying data (i.e. only physiological and audio data, but no video). Among the 65 participants, 62 opted for option A and 3 opted for option B (2 women and 1 man).” ”The participants explicitly consent to the recording of their session, the dataset creation, and its release for research purposes following General Data Protection Rules (GDPR).” Yes “The dataset uses a proprietary license for research purposes and it is hosted on Inria servers using storage intended for long-term availability.” Yes “The personal information (sex, age, education), and the acquired physiological and audio signals are pseudonymized, and an alphanumeric code is given for each participant.” “The personal information (sex, age, education), and the acquired physiological and audio signals are pseudonymized, and an alphanumeric code is given for each participant.” Yes “The StressID project is approved by the ethical committee of the Université Cote d’Azur (CER).”
MuSE ST S No - No - No - - No -
SUSAS CI, ST S No - No - No - - No -
I Feel Stressed Out ST S No - No - Yes “Subjects’ personal information is well protected” - Yes “Data collection procedures followed are approved by Ethics Committee, Peking University Health Science Center(IRB00001052-08043).”
Zhang et al. SFT P Yes “Each participant signed a written informed consent before participating in this study.” No - No - - Yes “This study was approved by the ethics committee of the Xuanwu Hospital according to the Declaration of Helsinki.”
Parkinson Speech Datase SR P No - No - No - - Yes “The study has been approved by the Istanbul University Research Ethics Committee.”
Orzco-Arroyave et al. SR, SS, ST P Yes “A written informed consent was signed by each participant.” No - No - - Yes “This study is in compli- ance with the Helsinki Declaration and was approved by the Ethics Committee of the Cl ́ınica Noel, in Medell ́ın, Colom- bia.”
WSM Corpus SS DP, P YT No - No - No - - No -

Fairness, Bias and Diversity (C5)

Dataset Discourse Genre Target Issue Source of Content C5‑a Speakers Distribution C5‑b Age Distribution C5‑c Gender Distribution C5‑d Educational Level of the Speakers C5‑e Language C5‑f Ethnicity C5‑g [PARKINSON'S] UPDRS C5‑h Biases Awareness C5‑i Diversity Collection Measures
Chinese Multimodal Depression Corpus (CMDC) CI DP Yes 78: 26 MDD, 52 HC Yes 18 - 65 years ”MDD 24.1 +/- 5.04 HC 30.5 +/- 11.9 “ Yes “MFF 8M & 18 F, HC 17 M & 35 F” No - Yes Chinese No - No - Yes “Due to the limited availability of MDD subjects, the HC group has more number of subjects and a broader range of ages. One can select a subset of the dataset if matched control were needed where the effect of gender and age biases are minimized to increase the statistical power.” No -
EMU SR, SS AX, DP CS No - No - No - No - No No - No - No - Yes “People are asked in the app to autocomplete information. "To prevent survey fatigue, we only ask four multiple-choice demographic questions regarding age, gender, education, and student status.”
Moodable SR DP CS No In the paper the authors just say 335 volunteer subjects. No They just say “Participants had to be over 18 years of age”. No - No - No - No - No - No - Yes “a series of demographic questions asked for the participants’ gender, age, and employment. Participants were also asked if they had ever been treated in an emergency department. In the remaining questions, participants were asked to label their willingness to disclose different data modalities and PHQ-9 scores on a Likert scale ranging from Completely Unwilling, Somewhat Unwilling, Unsure, Somewhat Willing, to Completely Willing. In the survey, participants were first asked for their willingness to disclose data from social media accounts to medical personnel. These types of data included their Twitter username, Twitter tweets, Facebook Posts, and messages on applications such as GroupeMe, Discord, and WhatsApp. The second group of questions surveyed the willingness to disclose retrospective data stored on a typical smartphone such as GPS, gyroscope, accelerometer, browser history, call logs, and app usage logs. Finally, we surveyed their willingness to record a phrase and capture an image of their face.”
Jiang et al., Jiang et al., Jiang et al. CI DP Yes 12, all depressed Yes Ages 35–68. Yes “female (with the exception of two male patients)” No Not mentioned Yes English No - No - No - No -
Lin et al. CI AX, DP Yes 35: 18/17 for high and low distress Yes Table 1 reports range and mean of age (18–52 25.40) Yes Table 1 reports 16 M and 17 F. No - Yes English No - No - Yes “In the selection, we balanced the participants according to the public norm shown in Table 2 (e.g., for depression, above 6.63 is marked as high, otherwise low). Given potential gender differences in nonverbal communi- cation [52], we also balanced the final sample with regards to gender within each distress group” Yes We can infer it from “Non-binary/other was given as an option in the registration form. A number of people registered with this option. However, none of those people met the distress level criteria and were thus not selected for an interview.”
StudentSADD CI DP CS Yes 302 students, of which 124 depressed. (Table 2) Yes It is not really mentioned, they just say that they are university students, so we can infer the general trend of ages. No - Yes In Table 2 there is the distribution of graduate and undergrad students. Yes English Yes Table 2 indicates the group with which participants identify themselves (e.g., White, Asian), suggesting that they are English speakers with diverse ethnic backgrounds, even if not explicitly stated. No - Yes “Notably, given that the data was collected anonymously, students did not have a reason to consciously or subconsciously fear consequences of a high screening score. This is thus expected to reduce bias in the responses.” Yes As depicted in Figure 1, the collection of demographic information is obtained through a questionnaire that participants complete themselves.
The Androids Corpus CI, SR DP Yes “The populations of depressed and non-depressed participants (64 and 54 individuals, respectively) have the same distribution in terms of age, gender and education level.” Yes Demographic details, including tabular representation, are provided, with the authors asserting that the populations exhibit identical distributions concerning age, gender, and education level. The distribution of control/non-control groups is evaluated using the chi-square test, while age distribution is analyzed through the t-student test. Yes Demographic details, including tabular representation, are provided, with the authors asserting that the populations exhibit identical distributions concerning age, gender, and education level. The distribution of control/non-control groups is evaluated using the chi-square test, while age distribution is analyzed through the t-student test. Yes Demographic details, including tabular representation, are provided, with the authors asserting that the populations exhibit identical distributions concerning age, gender, and education level. The distribution of control/non-control groups is evaluated using the chi-square test, while age distribution is analyzed through the t-student test. Yes Italian No - No - Yes “The distribution shows that there is enough diversity to cover all aspects of depression and not just some of its forms.” ”According to a χ2 there is no difference between depression and control participants in terms of gender and education level distribution” No -
MODMA CI DP Yes 52: 23 MDD, 29 HC Yes “55 participants include a total of 26 outpatients (15 males and 11 females; 16–56-year-old) diagnosed with depression, as well as 29 healthy controls (19 males and 10 females; 18–55-year-old) were recruited;” Yes “55 participants include a total of 26 outpatients (15 males and 11 females; 16–56-year-old) diagnosed with depression, as well as 29 healthy controls (19 males and 10 females; 18–55-year-old) were recruited;” No - Yes Chinese No - No - No - No -
Japanese Daily Speech Dataset (JDSD) SS AX, DP CS Yes 342 Japanese (male: 216, female: 126) subjects Yes In the paper it is reported the mean (41.29)- Yes “(male: 216, female: 126) “ No - Yes Japanese No - No No - Yes “Before using the systems, participants submitted their demographic information (age, sex, and Body Mass Index) and base- line psychological symptoms (depressive and anxiety symptoms).”
DEPAC ST AX, DP CS Yes “The dataset consists of 2,674 audio samples col- lected from 571 subjects (Table 1). 54.67% of the study subjects are female and 45.33% are male. The age of the subjects ranges between 18 and 76, and they received 1 to 26 years of formal education.” Yes “The age of the subjects ranges between 18 and 76.” Yes “54.67% of the study subjects are female and 45.33% are male.” Yes “they received 1 to 26 years of formal education.” Yes English “participants were asked to indicate whether they are native English speakers (i.e., whether they learned the English language before the age of 5 years old).” No - No - Yes “The age distribution is shifted toward the left around its average value, which is equal with 36.85, indicating that most of the dataset is made up of young or middle-aged adults (Fig- ure 2(a)). Moreover, it is witnessed in the education level distribution plot that the most of the partici- pants received higher education, with on average around 15 years of formal education (Figure 2(b)). Figure 3 (Appendix A.2) demonstrates that the distribution of both GAD-7 and PHQ-9 scores are skewed-right, representing that the majority of the dataset is composed of either no or subthreshold level of the disorders. In addition, the number of samples with moderate to severe level of both disorders are higher among women compared with men.” Yes “Upon consenting, participants were asked to indicate whether they are native English speakers (i.e., whether they learned the English language before the age of 5 years old). They were also asked to indicate their age, gender, and education level.”
DAIC-WoZ CI DP No - No - No - No - Yes English No - No - No - No -
EATD-Corpus CI DP Yes 162: 30 MDD, 132 HC No - No - No - Yes Chinese No - No - Yes “Data imbalance heavily exists in depression datasets. Unbalanced datasets will introduce non-depressed preference to the trained clas- sification models. Therefore, the sizes of the depressed and non- depressed classes need to be balanced before training. In this work, resampling is utilized to address the data imbalance issue.” ”For EATD-Corpus, the method of rearranging volunteers’ re- sponses is adopted to increase the size of the depressed class. The orders of three responses are rearranged and these rearranged re- sponses are resampled to create new training samples.Because there are 6 ways of response rearrangement for each individual, the size of the depressed class can be enlarged 6 times.” No -
Depression Severity Interviews Database CI DP Yes 57 depressed speakers: 34 women and 23 men. “They ranged in age from 19 to 65 years. At the time of the study, all met DSM-4 criteria for Major Depressive Disorder (MDD).” Yes “19 to 65 (mean 39.65)” No “34 W and 23 M” No - Yes English “Participants are Euro- or African-American, 46 and 11, respectively)” No - No - No - No -
Guo et al. CI DP Yes 208: 104 MDD, 104 HC Yes Table 1 provides details. Yes Table 1 provides details. Yes Table 1 provides details. Yes Chinese No - No - No - No -
Black Dog Institute CI DP Yes 60: 30 MDD, 30 HC (30 healthy control subject, 30 severely depressed patients) Yes “In this study, a gender balanced subset of 30 depressed subjects (19 Melancholia patients, 10 MDD patients, and 1 Bipolar patient) and 30 con- trols was used (age range 21-75yr, m38 14).” No Only said that is balanced. No - Yes English “Only native Australian English speaking participants were selected, to reduce the variability arising from different language acquisition.” No - No - Yes “In this study, the gender and age were matched in depressed and control groups to reduce the variability of gender bias and the age differences effect.” No -
D-Vlog SS DP YT Yes 816: 555 MDD, 406 HC. No - Yes “twice more females than males” No - No - No - No - No - No -
Kempler SS, ST DM Yes “Twenty education- and gender-matched normally aging persons were included as controls” ”The data for the first investigation were drawn from a subgroup of 10 AD subjects and 10 age- and gender-matched controls for whom good tape recordings of spontaneous conversation were available” Yes AD group: “Age ranged from 62 to 87 years (M = 75; SD = 5.8).“ Control: “their ages ranged from 64 to 84 years (M = 73; SD = 5.3).” ”The data for the first investigation were drawn from a subgroup of 10 AD subjects and 10 age- and gender-matched controls for whom good tape recordings of spontaneous conversation were available. Each subgroup consisted of 6 men and 4 women. The age range for the AD subgroup was 62 to 84 (M = 74; SD = 8.2). The age range for the normal control subgroup was 62 to 84 (M = 75; SD = 6.1). “ No Control: 12 women, 8 men. ”The data for the first investigation were drawn from a subgroup of 10 AD subjects and 10 age- and gender-matched controls for whom good tape recordings of spontaneous conversation were available. Each subgroup consisted of 6 men and 4 women. The age range for the AD subgroup was 62 to 84 (M = 74; SD = 8.2). The age range for the normal control subgroup was 62 to 84 (M = 75; SD = 6.1). “ Yes “Education ranged from 8th grade to college. “ No - No - No - No - No -
VAS Corpus ST DM Yes 30 HC, 30 MCI and 30 DM patients Yes Table 1 presents demographic information, detailing the exact number of males and females within each age range. Yes Table 1 reports demographic information (exact number of M and F per ranges of age). No - No No - No - Yes “The number of HC males is slightly more than HC females, while the numbers of females in both MCI and DM are slightly more than males. This difference was not statistically significant [χ2(1) = 1.16, p = .56].” No -
Delaware-DementiaBank SR AL Yes Participant recruitment is ongoing, but at the time of this writing, there are 53 participants in the Delaware corpus. Yes “Recruitment efforts target older adults who either are neurotypical or meet the clinical criteria for MCI due to possible AD.” A table with demographic information, including age ranges, is provided. Yes Neurotypical: 15 females, 5 males. MCI: 17 females, 16 males. There is a table with demographic information including detailed sex information. Yes A table containing demographic information, including a detailed breakdown of educational levels, is provided. Yes English Yes Although not explicitly mentioned, the table of demographic information includes details about participants' race. No - No - Yes “To rule out other systemic or brain diseases that could cause cognitive decline and increase the likelihood that the underlying disease might be AD, we collected self-reported demographic/medical data using a questionnaire that was developed by the geriatric psychiatrist and clinical liaison for the Delaware Center for Cognitive Aging Research, in line with the guidance from the National Institute on Aging-Alzheimer's Association criteria (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10171844/#bib2)”
ADReSS SS AL Yes 78 AD and 78 non-AD Tables 1 and 2 in the paper offer comprehensive characteristics of the participants in each group. Each row in the tables represents a five-year age range, spanning from 50 to 80 years. Yes Tables 1 and 2 in the paper offer comprehensive characteristics of the participants in each group, with each row in the table representing a five-year age range spanning from 50 to 80 years. Yes Tables 1 and 2 in the paper offer comprehensive characteristics of the participants in each group, with each row in the table representing a five-year age range spanning from 50 to 80 years. No - No No - No - Yes “As data scarcity and heterogeneity have hindered research into the relationship between speech and AD, the ADReSS Challenge provides researchers with the very first available benchmark, acoustically pre-processed and balanced in terms of age and gender.” ”A dataset has been created for this challenge which is matched for age and gender, as shown in Table 1 and Table 2, so as to minimise risk of bias in the prediction tasks.” No -
Ivanova SR AL Yes “Based on the results from their neuropsychological and cognitive evaluation, participants are divided into: (a.) healthy controls (n = 197); (b.) speakers with MCI (n = 91) and (c.) AD speakers (n = 74). In our last reported analysis (Meilan ´ et al., 2020), the MCI group is divided into nondegenerative MCI (nodMCI, n = 73) and preclinical AD (preAD, n = 13)“ Table 3 reports reading corpus metadata. Yes Table 3 reports reading corpus metadata (mean and std dev). Yes Table 3 reports reading corpus metadata (mean and std dev). Yes Table 3 reports reading corpus metadata (mean and std dev). No Spanish No - No - No - No -
Famous People dataset SS AL Yes 30 celebrities, 18 healthy and 12 diagnosed with AD No - No - No - No - No - No - No - No -
Pitt Corpus-DementiaBank CI AL Yes 204 with AD, 102 normal, 13 special cases Yes Table 3 reports details for AD and control patients. Yes Table 3 reports details for AD and control patients. Yes Table 3 reports details for AD and control patients. Yes In eligibility criteria it is mentioned that “Able to read and write English fluently before dementia onset” therefore we can infer that the dataset is only in English and no other details are provided. No - No - No - No -
Bipolar Disorder Corpus CI BD Yes 46 patients and 49 healthy controls. Yes Table 2 reports age distribution details. Yes “35 male and 16 female patients were recruited from the mental health service of a hospital” Yes Table 2 provides details. Yes Turkish No - No - Yes “The most remarkable result at a first glance is that the average response time for manic patients is longer than healthy controls (see Table II). When the data are subdivided into four groups as healthy, remission, hypomania, and mania according to the YMRS total score, it can be observed that average time increases gradually. However, due to increase in the standard deviation of hypomania and mania, this feature alone is not sufficient for discrimination of the disorder.” No ”In order to gather sociodemographic and clinical information, all patients were assessed with semi-structured interviews based on the SKIP-TURK [11]. This form includes: identity, sociodemographic personal and family information, age at disease onset, severity, clinical presentation and used treatments.”
AMoSS Interview Dataset CI BD Yes “Among the 139 participants enrolled in the study, 53 had a BD diagnosis, 33 had been diagnosed with BPD and 53 were healthy volunteers.” Yes Table 1 provides details. Yes Table 1 provides details. No - No - No - No - No - No -
Llamocca et al. CI BD Yes “17 patients diagnosed with BDD“ No - No - No - Yes - No - No - No - No -
The PRIORI Emotion Dataset CI, SS BD Yes Paper reports the total number of observations of each mood class in Table 1. No - No - No - No - No - No - No - No -
Shenzhen Somatisation Speech Corpus ST SSD Yes Yes, Table 4 provides details. Yes Table 1 (only avg). Yes Table 1 provides details. No - Yes We can infer ‘Chinese’ from “we give examples of the recorded sentences which were spoken in Chinese”. No - No - Yes Yes, in “Current limitations and outlook” section. No -
StressID ST S No Just says 65 participants. Yes “Ages ranging between 21 and 55 years old (29y.o. ± 7)” Yes “18 women and 47 men” Yes Among the participants, 32% were master students and interns, 20% PhD students, and the remaining 48% represented diverse tertiary professions. Yes English, we can infer from “All subjects were required to have sufficient proficiency in English “ No - No - Yes Limitations section. Moreover, the paper states: “Lastly, systems that use the dataset for modeling and understanding the mechanisms of human stress conditions need to be aware of the potential imbalance in representation in the dataset. Participants for the data collection were included in our dataset without restrictions on gender, race, age, or education level – instead favoring sample size.” No
MuSE ST S No Just says 28 participants. No - Yes “9 female, 19 males” Yes They are college students. Yes English, reported in Table 1. No - No - No - No -
SUSAS CI, ST S No - Yes “Ages ranging from 22 to 76” Yes “13 female, 19 male” No - No - No - No - No - No -
I Feel Stressed Out ST S No “At the sample level, the proportion of the number of segments of class eustress is 42%, while class distress is 58%. At the subject level, the proportion of the number of the class eustress is 55%, while class distress is 45%.” No - No - No - Yes Mandarin, Table 1 and Title. No - No - No - No -
Zhang et al. SFT P Yes 86 PD patients and 88 healthy controls. Table 1 provides details. Yes Table 1 provides details (mean and std dev). Yes Table 1 provides details (mean and std dev). Yes Table 1 provide details (mean and std dev). Yes Mandarin Chinese No - Yes Table 1 provides details (mean and std dev). No - No -
Parkinson Speech Datase SR P Yes 20 PWP and 20 healthy. Additional info about test set: ”After collecting the aforementioned dataset with multiple types of sound recordings and performing our experiments, in line with the obtained findings, we continued collecting an inde- pendent test set from PWP via the same physician’s examination process under the same conditions. During the collection of this dataset, 28 PD patients are asked to say only the sustained vow- els “a” and “o” three times, respectively which makes a total of 168 recordings. Test group consists of patients who are suffering from PD for 0 to 13 years and individual ages vary between 39 and 79 (mean: 62.67, standard deviation: 10.96).” Yes “Individual ages vary between 43 and 77 (mean: 64.86, standard deviation: 8.97) along with 45 and 83 (mean: 62.55, standard deviation: 10.79) for test and control groups, respectively.” Yes “PWP: 6 F and 14 M, Healty: 10 F and 10 M” No - No - No - No - No - No It seems that data are recorded by a third person because authors say: “When the patient arrives at the hospital, his/her demographic information including gender, age [23], [24], profession, educational status, and a brief health history including the chronic diseases, smoking rate, per- manently used drugs, and symptoms of diseases are recorded. Demographic and clinical history information collected in the context of this study are not used in the PD-diagnosis system but only to design a computer-aided data storage system in the hospital.”
Orzco-Arroyave et al. SR, SS, ST P Yes 50 with PD and 50 HC Yes “The age of the men with PD ranges from 33 to 77 years old (mean 62.2 ± 11.2), the age of the women with PD ranges from 44 to 75 years old (mean 60.1 ± 7.8). For the case of healthy controls, the age of the men ranges from 31 to 86 (mean 61.2 ± 11.3) and the age of the women ranges from 43 to 76 years old (mean 60.7 ± 7.7).” Yes “25 M and 25 F“ No - Yes "All of the participants are Spanish native speakers” ”Colombian Spanish native speakers” No - Yes Tables 1 and 2 present details about UPDRS values, H&Y values, age, and other relevant information for each speaker. Data for men and women are provided in separate tables, with the left side containing information from patients and the right side from the control group. Yes Authors say that: “This database is built with speech recordings of 100 Spanish native speakers, 50 of them are diagnosed with PD and the rest are their age and gender matched healthy controls.” so they explicitly say that they take into account the problem of balancing age and gender among the 2 classes. ”Therefore, the database is well balanced in terms of age and gender.” No -
WSM Corpus SS DP, P YT No - No - No - No - Yes “The language of the videos was restricted to English” No - No - No - No -

Data Quality, Validation and Maintainability (C6)

Dataset Discourse Genre Target Issue Source of Content C6‑a Recording Setting C6‑b Maintenability C6‑c Domain Experts C6-d Clinical Diagnosis C6‑e [ALZHEIMER] Cognitive-Linguistic Battery Details
Chinese Multimodal Depression Corpus (CMDC) CI DP Yes “Each interview was conducted by one of two research assis- tants (RAs) as interviewers. During the interview, only interviewers and interviewees were in the room (Fig. 3).” ”Video and audio streams were captured separately. Based on the research literature on depression, we expect that the interpersonal nature of clinical interviews would improve the distinguishability across modalities [7].” ”The audio recorder was placed on the desk at a distance of approximately 60cm. During the semi-structural interview, the interviewer and interviewee sat face to face. The audio was low pass filtered at 75 kHz, and the frame rate of video recording was 50 fps. All subjects were recorded using the same software and hardware apparatus. Though MDD and HC subjects were recorded at different sites, the room set- ting is the same (both are electromagnetic shielding rooms). Mobile phones were turned off or set to flight mode before recording. Participants were asked to avoid touching the recording stick and table during recording; prior to the key questions, the experimenter recorded for about 3 minutes before the official recording (did not involve the questions provided). All sessions were recorded during office hours. Due to the difficulty of MDD subject recruitment, the data were collected over a year.” Yes Webpage IEEEDataPort https://ieee-dataport.org/open-access/chinese-multimodal-depression-corpus It contains info like “Last Updated” Yes “we first carried out a comprehensive survey with psychiatrists from a renowned psychiatric hospital to identify key interview topics which are highly related to the diagnosis of depression. Then, a semi-structural interview study was conducted over a year with subjects who have undergone clinical diagnosis and professional assessment.” ”The participated clinicians have a mean medical practice length of 9.4 years (SD 1⁄4 7.0).” Yes “All their participants met DSM-IV criteria for MDD.” No -
EMU SR, SS AX, DP CS No Not possible since is crowdsourcing through mobiles. Yes https://github.com/mltlachac/EMU/tree/master Theoretically in the README it is said that they release the audio content but there isn’t. No - No - No -
Moodable SR DP CS No - No - No - No - No -
Jiang et al., Jiang et al., Jiang et al. CI DP No - No - Yes The diagnostic manual used (e.g. DSM-5) is not specified, but there is written “Subjects in this study are evaluated weekly by study psychiatrists for 8 months, starting before DBS surgery and throughout the first 6 months of chronic stimulation.” No - No -
Lin et al. CI AX, DP No - No - No - No - No -
StudentSADD CI DP CS No Since the source is crowdsourcing through a mobile app it is only specified that data are collected through mobile sensors such as microphones, keyboard etc. Yes “Upon publication, other researchers may access our data analysis code and apply for access to the StudentSADD dataset at our project website: https://emutivo.wpi.edu/. Due to privacy concerns and compliance with our IRB, we share audio features, embeddings, and transcripts rather than the raw voice recordings. We include data for all 345 sessions as repeated sessions may still be useful for data balancing or data generation. This is not a static dataset as we will continue to add more audio features and embedding representations to aid in the development of urgently needed screening technologies.” No - No No, authors only say “Our app administered the PHQ-9 to provide depression and suicidal ideation screening labels” No -
The Androids Corpus CI, SR DP No Some details provided. “The data were collected with the microphone of a laptop in a the mental health centers where the depression patients are treated. Such an in-the-wild setting corresponds to the situation in which doctors and depressed participants meet for their therapeutic interactions. “ Yes GitHub Yes - Yes “The psychiatrists involved in the data collection diagnosed the patients using the Diagnostic and Statistical Manual of Mental Disorders 5 (DSM-5).” No -
MODMA CI DP Yes “The experiments were performed in a quiet, clean, soundproof, and no electromagnetic interference room. During the experiment, the ambient noise of the lab must be less than 60 dB. The devices we used for recording are Neumann TLM102 (microphones) and RME FIREFACE UCX (audio card), with a 44.1 kHz sampling rate and 24-bit sampling depth. All recording data were saved in uncompressed WAV format. The whole experiment lasted about 25 minutes for one participant. During recording, the participant was asked not to touch any equipment and keep the distance between mouth and microphone about 20 cm. Each participant is invited to complete all three experimental tasks on a comfortable chair. Ambient noise signals were required under 60 dB to prevent interference with the participant’s audio signals.” Yes Yes https://reshare.ukdataservice.ac.uk/854301/ Rehsare platform provides info for mantenaibility (such as last updated etc). Yes - Yes Yes, DSM. ”All MDD patients received a structured Mini-International Neuropsychiatric Interview (MINI)22 that met the diagnostic criteria for major depression of the Diagnostic and Statistical Manual of Mental Disorders (DSM) based on the DSM-IV” No -
Japanese Daily Speech Dataset (JDSD) SS AX, DP CS No - No - No - No No, they only say “We used the Depression and Anxiety Mood Scale (DAMS)” No -
DEPAC ST AX, DP CS No There are details about “Platform and Instrumentation”. Since it is collected via crowd sourcing we cannot assess technical details about the instrumentations of the participants, nor it was asked/details are presente in the paper. No - Yes - Yes Yes, for Depression only but at least there is one. “Patient Health Questionnaire (PHQ-9): The PHQ-9 is a well established 3-point self-rated measure for depressive symptoms that has been validated against clinician rated measures (Kroenke et al., 2001). It contains 9 questions which correspond to the core criteria of the Diagnostic And Statistical Manual of Mental Disorders (DSM) for depression.” For Anxiety this is not valid since they only say “The GAD-7 is a popular self-rated measure of general anxiety symptoms that is scored from 0 to 21 (Spitzer et al., 2006). It has been validated against clinical diagnosis and has been shown to be robust as a screening tool and a continuous measure of symptom severity.” No -
DAIC-WoZ CI DP No - No We can only access the page to request the dataset, but the page does not provide any maintenability info. No - No - No -
EATD-Corpus CI DP No No details because data are collected through and APP, is a sort of crowdsourcing situation. Yes GitHub No - No - No -
Depression Severity Interviews Database CI DP No “Interviews were recorded using three hardware-synchronized analogue cameras and two unidirectional microphones (see https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5581737/figure/F1/). Two cameras were positioned approximately 15° to the participant's left and right. One camera recorded the participant's face and one camera recorded a full body view (see https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5581737/figure/F1/ left). A third camera recorded the interviewer's shoulders and face from approximately 15° to the interviewer's right (see https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5581737/figure/F1/ right). Audio-visual data from the camera and microphone to the participant's right were used in this study.” ”Since microphones were not calibrated for intensity, intensity measures were not considered.” No Platform where data are stored is “Page not found” http://www.pitt.edu/%E2%88%BCemotion/depression.htm. Yes “Symptom severity was evaluated on up to four occasions at 1, 7, 13, and 21 weeks post diagnosis and intake by ten clinical interviewers (all female).” Yes Yes, DSM-4 ”In contrast to previous work, all participants met DSM-4 or DSM-5 criteria for major depression as determined by diagnostic interview. Diagnostic criteria matter for at least two reasons. First, many non-depressive disorders are confusable with depression. Post-traumatic stress disorder (PTSD) and generalized anxiety disorder, for instance, share overlapping symptoms with depression. Second, people with history of depression may differ from those without depression in personality factors or in other non-specific ways [https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5581737/#R40]. By using diagnostic criteria and focusing on change in depression severity, we were able to rule out other sources of influence.” No -
Guo et al. CI DP Yes “Fifteen monophonic speech recordings are made in the voice folder. A sampling rate of 44.1 kHz and a sampling depth of 24-bit are used for collecting speech signals. Speech recordings are saved in the uncompressed WAV format. Ambient noise should be lower than 60 dB.” No - Yes - Yes “These questions are designed based on DSM-IV and other depression scales such as the Hamilton depression rating scale (HDRS)” No -
Black Dog Institute CI DP Yes “Video and audio streams were captured in QuickTime Pro (running on a 17” Apple Macbook Pro) using a high-res- olution Pike F-100 FireWire camera (Allied Vision Tech.), and broadcast-quality (Sony) lapel microphone.” ”The microphone was attached to the participant’s lapel, at mid-chest level.” ”Audio was digitised at 44.1 kHz, and the video frame rate was set at 30 fps (frame per second). Both depressed and control subjects were recorded using the same facility (same room setting, hardware equipment, and software). All sessions were recorded at the Black Dog Insti- tute during office hours (8 am-5 pm). Moreover, the recordings were collected over three years.” No - Yes - No - No -
D-Vlog SS DP YT No - No The website where one can request the dataset does not provide any information about maintenability https://sites.google.com/view/jeewoo-yoon/dataset No - No - No -
Kempler SS, ST DM No - No - Yes “The subjects were individuals diagnosed as having probable Alzheimer's disease fAD) by physicians at the UCLA Geriatric Outpatient Clinic or the Neurobehavioral Unit of the West Los Angeles Veterans Administration Hospital.“ No - No -
VAS Corpus ST DM No - No - No - No - No -
Delaware-DementiaBank SR AL Yes “Administration of the discourse protocol was audio-recorded following the guidelines for high-quality audio recording on the TalkBank website.” https://www.talkbank.org/info/da.html Yes DementiaBank platform No - No - Yes Provided, in section “Method” under subsection “Development of the DementiaBank Protocol”.
ADReSS SS AL No Few details about audio post processing. “Recordings were acoustically enhanced with stationary noise removal and audio volume normalisation was applied across all speech segments to control for variation caused by recording conditions such as microphone placement.” Yes DementiaBank Platform No - No - No No
Ivanova SR AL Yes “All participants read the text in 48-point size from a screen in a soundproof room. Microphone (20 Hz-20 kHz frequency range; 2.5 mV/Pa sensitivity; 600 ohms impedance) is placed between 8 and 14 cm from the mouth of the participant at an approximate angle of 45◦ in order to minimize breathe noise. Experimental terms are standardized to the maximum in order to assure equal conditions for data collection: isolated room; no background noise; same apparatus used with all participants; recording in mono; identical recording technique with all participants. This is particularly important in speech analysis, since some variables (like intensity, which values can depend on physical distance from the microphone) can be highly affected by variations in experimental (and even individual) conditions. “ No In the paper they just mention “As a complementary material to this paper, we offer a free-access speech corpus of standardized reading samples produced by healthy elderly, speakers with MCI and speakers with mild AD.” And the dataset is provided as a supplementary material https://www.sciencedirect.com/science/article/pii/S0885230821001340?ref=pdf_download&fr=RR-2&rr=85fb6f34f90fbae5#sec0019aa. Yes “All participants underwent previous neuropsychological evaluation including a complete anamnesis, evaluation of daily life activities, as well as psychological and cognitive assessment through Neuronorma Screening Test (Pena ˜ Casanova et al., 2009) and Goldberg Test for depression discrimination.” No - No In Section “3.1. Participants and procedure” there is a long paragraph about the reading task performed by the participants and it opens saying “The procedure for speech assessment included a reading task embedded within the battery of other language oriented evaluation tests.” but no other details are provided. The other info provided are about the reading task.
Famous People dataset SS AL No - No - No - No - No -
Pitt Corpus-DementiaBank CI AL No - No - Yes “Each participant in the study, patient and control subject alike, received an extensive neuropsychiatrie evaluation including medical history and physical examination, neurologic history and examination, semistructured psychiatric interview, and neuropsychological assessment. Each individual was interviewed by a psychiatric nurse to assess their physical and cognitive limitations, as well as the caregiving burden to their primary caregiver. Each evaluation was completed in approximately three sessions, generally within a 2-week period.” No - No -
Bipolar Disorder Corpus CI BD No - No Authors only say "The collected corpus will be publicly available.” Yes We can infer it from “In order to gather sociodemographic and clinical information, all patients were assessed with semi-structured interviews based on the SKIP-TURK [11]. This form includes: identity, sociodemographic personal and family information, age at disease onset, severity, clinical presentation and used treatments. During hospitalization, in every follow up day (0th- 3rd7th- 14th- 28th day) and after discharge on the 3rd month, presence of depressive and manic features were evaluated using Young Mania Rating Scale (YMRS) [12] and Montgomery-Asberg Depression Rating Scale (MADRS) [13].” ”. Inclusion criteria were as follows: (I) diagnosis of BD type I, manic episode according to DSM-5 [10] given by the following doctor, (II) being informed of the purpose of the study and having given signed consent before enrollment” Yes “Inclusion criteria were as follows: (I) diagnosis of BD type I, manic episode according to DSM-5 [10] given by the following doctor, (II) being informed of the purpose of the study and having given signed consent before enrollment” No
AMoSS Interview Dataset CI BD No - No - Yes “All the diagnoses had been confirmed prior to the study using the structured clinical interview for DSM-IV (the 4th edition of Diagnostic and Statistical Manual of Mental Disorders) and the International Personality Disorder Examination (IPDE)“ ”These semi-structured, one-on-one qualitative interviews took place either in person or by telephone, conducted by 2 clinicians and 2 psychology graduates who were involved in the roll out of the AMoSS study.” Yes “All the diagnoses had been confirmed prior to the study using the structured clinical interview for DSM-IV (the 4th edition of Diagnostic and Statistical Manual of Mental Disorders) and the International Personality Disorder Examination (IPDE) [14].” No -
Llamocca et al. CI BD No - No “Data Availability Statement - Not applicable.” Yes “Periodical interviews during medical consultation. In these sessions, the psychiatrist registers a total of 40 variables. HDRS (Hamilton Depression Rating Scale) and YMRS (Young Mania Rating Scale) items are included. Patients are diagnosed more than once monthly (usually every two weeks). Therefore, the variable “diagnosis” is also collected. According to the psychiatrist, a diagnosis should be maintained until changes in the patient’s symptoms occur, that is, until the next diagnosis. The diagnosis is directly related to the HDRS and YMRS tests. This source of data consists of subjective and objective variables.” No - No -
The PRIORI Emotion Dataset CI, SS BD Yes “The Samsung Galaxy series of phones, including the S3, S4, and S5 are used by participants. Only two of the participants were given S4s and their data are excluded from this study. The distribution of subjects with S3s and S5s can be seen in https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4995442/table/T2/. The two models of phone include model-specific microphones and processing. One of the effects of this recording and processing is clipping. Clipping occurs most often in the S3, with an average of 2.74% of speech samples at maximum range. This sensitivity is also demonstrated by the average root mean square value of 0.397 for the S3. Additionally, the noise is much more pronounced, as seen in the lower signal to noise ratio of 21.2 dB for the S3.” “The two phones used in this study have different acoustic properties. The S3, compared to the S5, has more clipping, higher volume, and a sensitivity to background noise. Because of this, it is necessary to carefully preprocess the data before feature extraction using declipping, audio normalization, and noise-robust segmentation in order to make calls from different devices more comparable.” No - Yes “Participants take part in weekly calls with our study clinicians in which the HAMD and YMRS interviews are conducted.” No - No -
Shenzhen Somatisation Speech Corpus ST SSD Yes “all the participants’ voices which have a sample rate of 190 32 000 Hz and a bit rate of 16 bps.” No - No Only self-assessment.

About

Tables derived from the survey paper Promoting Fairness and Diversity in Speech Datasets (Mancini et al.)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%