Skip to content

Commit

Permalink
added corpora of disordered speech
Browse files Browse the repository at this point in the history
  • Loading branch information
kreetrapper committed Aug 22, 2024
1 parent afa2133 commit d6b530e
Show file tree
Hide file tree
Showing 2 changed files with 34 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
Corpus;Corpus_URL;Language;Size;Annotation;Licence;Description;Buttons;Buttons_URL;Publication;Publication_URL;Note
AphasiaBank;https://aphasia.talkbank.org/;Cantonese, Croatian, English, French, German, Greek, Hungarian, Italian, Japanese, Mandarin, Romanian, Spanish;380 MB transcripts, 827 GB media;CHAT and CA/CHAT;email request for access;This is a corpus of multimedia interactions for the study of communication in aphasia.#SEP Access to the data in AphasiaBank is password protected and restricted to members of the AphasiaBank consortium group. #SEPData in TalkBank use a consistent XML-compatible representation called CHAT. All of the data is transcribed in CHAT and CA/CHAT formats.;Browse;https://sla.talkbank.org/TBB/aphasia;;;CLARIN
Croatian corpus of non-professional written language by typical speakers and speakers with language disorders RAPUT 1.0;http://hdl.handle.net/11356/1435;Croatian;6760 texts, 34469 sentences, 426187 tokens;MULTEXT-East tagset;CC-BY-SA 4.0;The corpus consists of texts produced by nonprofessional typical speakers and speakers with different language disorders (developmental language disorder, dyslexia, traumatic brain injury, aphasia, other).#SEPRoughly half of the corpus consists of texts of typical speakers, and the other half of speakers with language disorders.#SEPLanguage samples were elicited by six groups of tasks representing different writing styles (descriptive, expository, narrative, and letter) and different levels of formality.;Download;http://hdl.handle.net/11356/1435;Kuvač Kraljević et al. (2021);https://hrcak.srce.hr/file/370152;CLARIN
ADHD and SLI corpus UvA database;https://hdl.handle.net/1839/00-2766F32F-4305-4F13-A02C-F4A8F5216425;Dutch;4 GB (67 recordings) of 26 Dutch children with ADHD, 19 Dutch children with SLI, 22 children Dutch controls;Transcriptions (CHAT-format);CLARIN PUB (Transcriptions), CLARIN RESTRICTED (Recordings);This corpus aims to compare the language and executive functioning profiles of children with ADHD to children with Specific Language Impairment and children with Tourette’s Disorder.;Download;https://hdl.handle.net/1839/00-2766F32F-4305-4F13-A02C-F4A8F5216425;;;CLARIN
Bilingual deaf children RU-Kentalis database;https://hdl.handle.net/1839/00-F6BC06C4-B2AD-4ED8-8527-AB81F4EF4E8F;Dutch;4 GB complete video recordings. 1 GB selected parts video recordings. 0,1 GB selected parts transcripts. 0,5 GB test and background data of 11 deaf children, longitudinal, 104 recordings; CHAT-like format for 104 recordings;CLARIN PUB (Transcriptions), CLARIN RESTRICTED (Recordings);The corpus is used for investigating the bilingual language and communication development of young deaf children in Sign Language of the Netherlands (SLN) and Dutch.;Download;https://hdl.handle.net/1839/00-F6BC06C4-B2AD-4ED8-8527-AB81F4EF4E8F;Klatter-Folmer et al. (2016);https://doi.org/10.1093/deafed/enj032;CLARIN
SLI RU-Kentalis database;https://hdl.handle.net/1839/00-97AF29EA-877D-422A-BAF7-25FA269351A6;Dutch;2 GB;Praat transcripts;CLARIN PUB (Transcriptions), CLARIN RESTRICTED (Recordings);The corpus has been collected to investigate of the expression of spatial relations by children with SLI and normally developing children in their spoken language production. ;Download;https://hdl.handle.net/1839/00-712802F3-C245-4EF0-BE9D-D09714DEDE67;;;CLARIN
Dutch Corpus of Pathological and Normal Speech (COPAS) ;http://hdl.handle.net/10032/tm-a2-n3;Dutch (Flemish);319 speakers of which 122 normal controls and 197 with a speech disorder. Corpus size: 1.3 GB;Orthographic transcription;Academic, bespoke;This corpus has been constructed within the framework of the project Speech Algorithms for Clinical and Educational applications (SPACE).;Download;http://hdl.handle.net/10032/tm-a2-n3;Middag et al. (2010);http://hdl.handle.net/1854/LU-1053399;CLARIN
FluencyBank;https://fluency.talkbank.org/;Dutch, English, French, German;481 MB transcripts, 207 GB media;CHAT and CA/CHAT;email request for access;This corpus is intended for the study of fluency development.#SEPParticipants include typically-developing monolingual and bilingual children, children and adults who stutter (C/AWS) or who clutter (C/AWC), and second language learners.#SEPAccess to the research data in FluencyBank is password protected and restricted to members of the FluencyBank consortium group, although a subset of the corpus is publicly available.#SEPData in TalkBank use a consistent XML-compatible representation called CHAT. All of the data is transcribed in CHAT and CA/CHAT formats.;Browse;https://sla.talkbank.org/TBB/fluency;;;CLARIN
ASDBank;https://asd.talkbank.org/;Dutch, English, French, Greek, Mandarin, Spanish;42 MB transcripts, 401 MB media;CHAT and CA/CHAT;open access;This is a corpus of multimedia interactions for the study of communication in autism-spectrum disorder.#SEPData in TalkBank use a consistent XML-compatible representation called CHAT. All of the data is transcribed in CHAT and CA/CHAT formats.;Browse;https://sla.talkbank.org/TBB/asd;;;CLARIN
Deaf adults RU database;https://hdl.handle.net/1839/00-97AF29EA-877D-422A-BAF7-25FA269351A6;Dutch, Turkish, Moroccan;2GB of 46 deaf Dutch adults, 38 hearing Turkish adults, 24 hearing Moroccan adults, 10 Dutch controls;;CLARIN PUB (Transcriptions), CLARIN RESTRICTED (Recordings);This corpus aims at the investigation of the acquisition of Dutch by deaf Dutch adults (late L1/early L2) and comparison to hearing Turkish and Moroccan-Arabic.;Download;https://hdl.handle.net/1839/00-97AF29EA-877D-422A-BAF7-25FA269351A6;Parriger (2012);https://pure.uva.nl/ws/files/1840998/113644_thesis.pdf;CLARIN
TBIBank;https://tbi.talkbank.org/;English;63 MB transcripts, 98 GB media;CHAT and CA/CHAT;email request for access;This is a corpus of multimedia interactions for the study of communication in people with traumatic brain injury.#SEPAccess to the data in TBIBank is password protected and restricted to members of the TBIBank consortium group.#SEPData in TalkBank use a consistent XML-compatible representation called CHAT. All of the data is transcribed in CHAT and CA/CHAT formats.;Browse;https://sla.talkbank.org/TBB/tbi;;;CLARIN
PsychosisBank;https://psychosis.talkbank.org/;English (various dialects), Spanish;Not available;CHAT and CA/CHAT;email request for access;This is a corpus intended for the study of language in psychosis.#SEPThe site is noted as under construction.#SEPData in TalkBank use a consistent XML-compatible representation called CHAT. All of the data is transcribed in CHAT and CA/CHAT formats.;;;;;CLARIN
Alzheimer's Dementia Recognition through Spontaneous Speech (audio only): The ADReSSo Challenge;https://sla.talkbank.org/TBB/dementia;English, German, Mandarin, Spanish, Taiwanese;;CHAT and CA/CHAT ;email request for access;This is a corpus of multimedia interactions for the study of communication in dementia.#SEPAccess to the data in DementiaBank is password protected and restricted to members of the DementiaBank consortium group.#SEPData in TalkBank use a consistent XML-compatible representation called CHAT. All of the data is transcribed in CHAT and CA/CHAT formats.;Browse;https://sla.talkbank.org/TBB/dementia;;;CLARIN
RHDBank;https://rhd.talkbank.org/;English, Spanish;30 MB transcripts, 28 GB media;CHAT and CA/CHAT;email request for access;This is a corpus of multimedia interactions for the study of communication in people with Right Hemisphere Damage (RHD).#SEPAccess to the data in RHDBank is password protected and restricted to members of the RHDBank consortium group.#SEPData in TalkBank use a consistent XML-compatible representation called CHAT. All of the data is transcribed in CHAT and CA/CHAT formats.;Browse;https://sla.talkbank.org/TBB/rhd;;;CLARIN
DemCorpus-Basilicata: Dementia Corpus;http://hdl.handle.net/20.500.11752/OPEN-989;Italian;08:50 hours;;Processed data available by request;This corpus consists of semi-spontaneous speech data produced by elderly residents of the Basilicata region in Italy.#SEPIn total, 40 individuals participated: the patient group consists of 20 participants with a diagnosis of dementia (9 cases of Alzheimer’s disease, 2 patients with mixed dementia, 5 patients with not-further-specified dementia, 3 patients with vascular dementia, and 1 patient with frontotemporal dementia).#SEPthe control group consists of 20 healthy individuals matched for age, gender, and geographical origin. Three linguistic tasks were administered to all participants: two narrative tasks (the first one was about an excursion or a trip, and the second was about Christmas festivities), and an image description task. This resulted in 8 hours and 50 minutes of recorded semi-spontaneous speech, which was then transcribed, segmented, and annotated using ELAN. ;;;Martinelli et al. (2022);http://hdl.handle.net/20.500.11752/OPEN-989;CLARIN
ItaASD: Italian speech corpus Austism Spectrum Disorder;http://hdl.handle.net/20.500.11752/OPEN-990;Italian;04.19 hours;Orthographic;;This is a corpus of semi-spontaneous speech produced by 34 children between 6 and 13 years of age, residents in the Campania region of Italy.#sepHalf of the participating children were diagnosed with high-functioning Autism Spectrum Disorder, and the other half were neurotypical children matched for age, gender, and geographical origin.#SEPAll participants were administered three tasks: a complex image description task, a story-telling task, and a story-retelling task. This resulted in 4 hours and 19 minutes of recorded speech, which were then transcribed and annotated using ELAN. ;;;;;CLARIN
OPLON: Opportunities for active and healthy LONgevity;http://hdl.handle.net/20.500.11752/ILC-992;Italian;06:50 hours;;;This corpus consists of semi-spontaneous speech data collected from 96 elderly participants who were divided into two groups: the pathological and the control group.#SEPThe pathological group refers to three categories: (i) 16 participants with amnestic Mild Cognitive Impairment (MCI), (ii) 16 participants with multiple-domain MCI, and (iii) 16 participants with Early Dementia (probable Alzheimer Dementia, Fronto-Temporal Dementia, Mixed Dementia, and Lewy Body Dementia).#SEPThe control group includes 48 healthy individuals matched for gender, age, educational level, and geographical origin. The corpus was subjected to PoS Tagging and Dependency Parsing (CoNLL format). ;;;;;CLARIN
Polish Cued Speech Corpus of Hearing-Impaired Children;https://hdl.handle.net/1839/dbcd8568-d17d-4861-94bb-aa553e943399;Polish;20 children (11 girls and 9 boys);CHAT format;open access or through email request for access;This is a corpus of recordings of the DIA (Dutch Intelligibilty Assessment).#SEPThe corpus also contains a variety of other samples like reading passages, isolated sentences and recordings of spontaneous speech.#SEPThe corpus contains samples of 187 speakers with a speech disorder and samples of 122 speakers without a speech disorder. ;Download;https://hdl.handle.net/1839/dbcd8568-d17d-4861-94bb-aa553e943399;;;CLARIN
Loading

0 comments on commit d6b530e

Please sign in to comment.