-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
afa2133
commit d6b530e
Showing
2 changed files
with
34 additions
and
0 deletions.
There are no files selected for viewing
18 changes: 18 additions & 0 deletions
18
...ered speech corpora/1-Corpora of disordered speech in the CLARIN infrastructure/1-CSD.csv
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
Corpus;Corpus_URL;Language;Size;Annotation;Licence;Description;Buttons;Buttons_URL;Publication;Publication_URL;Note | ||
AphasiaBank;https://aphasia.talkbank.org/;Cantonese, Croatian, English, French, German, Greek, Hungarian, Italian, Japanese, Mandarin, Romanian, Spanish;380 MB transcripts, 827 GB media;CHAT and CA/CHAT;email request for access;This is a corpus of multimedia interactions for the study of communication in aphasia.#SEP Access to the data in AphasiaBank is password protected and restricted to members of the AphasiaBank consortium group. #SEPData in TalkBank use a consistent XML-compatible representation called CHAT. All of the data is transcribed in CHAT and CA/CHAT formats.;Browse;https://sla.talkbank.org/TBB/aphasia;;;CLARIN | ||
Croatian corpus of non-professional written language by typical speakers and speakers with language disorders RAPUT 1.0;http://hdl.handle.net/11356/1435;Croatian;6760 texts, 34469 sentences, 426187 tokens;MULTEXT-East tagset;CC-BY-SA 4.0;The corpus consists of texts produced by nonprofessional typical speakers and speakers with different language disorders (developmental language disorder, dyslexia, traumatic brain injury, aphasia, other).#SEPRoughly half of the corpus consists of texts of typical speakers, and the other half of speakers with language disorders.#SEPLanguage samples were elicited by six groups of tasks representing different writing styles (descriptive, expository, narrative, and letter) and different levels of formality.;Download;http://hdl.handle.net/11356/1435;Kuvač Kraljević et al. (2021);https://hrcak.srce.hr/file/370152;CLARIN | ||
ADHD and SLI corpus UvA database;https://hdl.handle.net/1839/00-2766F32F-4305-4F13-A02C-F4A8F5216425;Dutch;4 GB (67 recordings) of 26 Dutch children with ADHD, 19 Dutch children with SLI, 22 children Dutch controls;Transcriptions (CHAT-format);CLARIN PUB (Transcriptions), CLARIN RESTRICTED (Recordings);This corpus aims to compare the language and executive functioning profiles of children with ADHD to children with Specific Language Impairment and children with Tourette’s Disorder.;Download;https://hdl.handle.net/1839/00-2766F32F-4305-4F13-A02C-F4A8F5216425;;;CLARIN | ||
Bilingual deaf children RU-Kentalis database;https://hdl.handle.net/1839/00-F6BC06C4-B2AD-4ED8-8527-AB81F4EF4E8F;Dutch;4 GB complete video recordings. 1 GB selected parts video recordings. 0,1 GB selected parts transcripts. 0,5 GB test and background data of 11 deaf children, longitudinal, 104 recordings; CHAT-like format for 104 recordings;CLARIN PUB (Transcriptions), CLARIN RESTRICTED (Recordings);The corpus is used for investigating the bilingual language and communication development of young deaf children in Sign Language of the Netherlands (SLN) and Dutch.;Download;https://hdl.handle.net/1839/00-F6BC06C4-B2AD-4ED8-8527-AB81F4EF4E8F;Klatter-Folmer et al. (2016);https://doi.org/10.1093/deafed/enj032;CLARIN | ||
SLI RU-Kentalis database;https://hdl.handle.net/1839/00-97AF29EA-877D-422A-BAF7-25FA269351A6;Dutch;2 GB;Praat transcripts;CLARIN PUB (Transcriptions), CLARIN RESTRICTED (Recordings);The corpus has been collected to investigate of the expression of spatial relations by children with SLI and normally developing children in their spoken language production. ;Download;https://hdl.handle.net/1839/00-712802F3-C245-4EF0-BE9D-D09714DEDE67;;;CLARIN | ||
Dutch Corpus of Pathological and Normal Speech (COPAS) ;http://hdl.handle.net/10032/tm-a2-n3;Dutch (Flemish);319 speakers of which 122 normal controls and 197 with a speech disorder. Corpus size: 1.3 GB;Orthographic transcription;Academic, bespoke;This corpus has been constructed within the framework of the project Speech Algorithms for Clinical and Educational applications (SPACE).;Download;http://hdl.handle.net/10032/tm-a2-n3;Middag et al. (2010);http://hdl.handle.net/1854/LU-1053399;CLARIN | ||
FluencyBank;https://fluency.talkbank.org/;Dutch, English, French, German;481 MB transcripts, 207 GB media;CHAT and CA/CHAT;email request for access;This corpus is intended for the study of fluency development.#SEPParticipants include typically-developing monolingual and bilingual children, children and adults who stutter (C/AWS) or who clutter (C/AWC), and second language learners.#SEPAccess to the research data in FluencyBank is password protected and restricted to members of the FluencyBank consortium group, although a subset of the corpus is publicly available.#SEPData in TalkBank use a consistent XML-compatible representation called CHAT. All of the data is transcribed in CHAT and CA/CHAT formats.;Browse;https://sla.talkbank.org/TBB/fluency;;;CLARIN | ||
ASDBank;https://asd.talkbank.org/;Dutch, English, French, Greek, Mandarin, Spanish;42 MB transcripts, 401 MB media;CHAT and CA/CHAT;open access;This is a corpus of multimedia interactions for the study of communication in autism-spectrum disorder.#SEPData in TalkBank use a consistent XML-compatible representation called CHAT. All of the data is transcribed in CHAT and CA/CHAT formats.;Browse;https://sla.talkbank.org/TBB/asd;;;CLARIN | ||
Deaf adults RU database;https://hdl.handle.net/1839/00-97AF29EA-877D-422A-BAF7-25FA269351A6;Dutch, Turkish, Moroccan;2GB of 46 deaf Dutch adults, 38 hearing Turkish adults, 24 hearing Moroccan adults, 10 Dutch controls;;CLARIN PUB (Transcriptions), CLARIN RESTRICTED (Recordings);This corpus aims at the investigation of the acquisition of Dutch by deaf Dutch adults (late L1/early L2) and comparison to hearing Turkish and Moroccan-Arabic.;Download;https://hdl.handle.net/1839/00-97AF29EA-877D-422A-BAF7-25FA269351A6;Parriger (2012);https://pure.uva.nl/ws/files/1840998/113644_thesis.pdf;CLARIN | ||
TBIBank;https://tbi.talkbank.org/;English;63 MB transcripts, 98 GB media;CHAT and CA/CHAT;email request for access;This is a corpus of multimedia interactions for the study of communication in people with traumatic brain injury.#SEPAccess to the data in TBIBank is password protected and restricted to members of the TBIBank consortium group.#SEPData in TalkBank use a consistent XML-compatible representation called CHAT. All of the data is transcribed in CHAT and CA/CHAT formats.;Browse;https://sla.talkbank.org/TBB/tbi;;;CLARIN | ||
PsychosisBank;https://psychosis.talkbank.org/;English (various dialects), Spanish;Not available;CHAT and CA/CHAT;email request for access;This is a corpus intended for the study of language in psychosis.#SEPThe site is noted as under construction.#SEPData in TalkBank use a consistent XML-compatible representation called CHAT. All of the data is transcribed in CHAT and CA/CHAT formats.;;;;;CLARIN | ||
Alzheimer's Dementia Recognition through Spontaneous Speech (audio only): The ADReSSo Challenge;https://sla.talkbank.org/TBB/dementia;English, German, Mandarin, Spanish, Taiwanese;;CHAT and CA/CHAT ;email request for access;This is a corpus of multimedia interactions for the study of communication in dementia.#SEPAccess to the data in DementiaBank is password protected and restricted to members of the DementiaBank consortium group.#SEPData in TalkBank use a consistent XML-compatible representation called CHAT. All of the data is transcribed in CHAT and CA/CHAT formats.;Browse;https://sla.talkbank.org/TBB/dementia;;;CLARIN | ||
RHDBank;https://rhd.talkbank.org/;English, Spanish;30 MB transcripts, 28 GB media;CHAT and CA/CHAT;email request for access;This is a corpus of multimedia interactions for the study of communication in people with Right Hemisphere Damage (RHD).#SEPAccess to the data in RHDBank is password protected and restricted to members of the RHDBank consortium group.#SEPData in TalkBank use a consistent XML-compatible representation called CHAT. All of the data is transcribed in CHAT and CA/CHAT formats.;Browse;https://sla.talkbank.org/TBB/rhd;;;CLARIN | ||
DemCorpus-Basilicata: Dementia Corpus;http://hdl.handle.net/20.500.11752/OPEN-989;Italian;08:50 hours;;Processed data available by request;This corpus consists of semi-spontaneous speech data produced by elderly residents of the Basilicata region in Italy.#SEPIn total, 40 individuals participated: the patient group consists of 20 participants with a diagnosis of dementia (9 cases of Alzheimer’s disease, 2 patients with mixed dementia, 5 patients with not-further-specified dementia, 3 patients with vascular dementia, and 1 patient with frontotemporal dementia).#SEPthe control group consists of 20 healthy individuals matched for age, gender, and geographical origin. Three linguistic tasks were administered to all participants: two narrative tasks (the first one was about an excursion or a trip, and the second was about Christmas festivities), and an image description task. This resulted in 8 hours and 50 minutes of recorded semi-spontaneous speech, which was then transcribed, segmented, and annotated using ELAN. ;;;Martinelli et al. (2022);http://hdl.handle.net/20.500.11752/OPEN-989;CLARIN | ||
ItaASD: Italian speech corpus Austism Spectrum Disorder;http://hdl.handle.net/20.500.11752/OPEN-990;Italian;04.19 hours;Orthographic;;This is a corpus of semi-spontaneous speech produced by 34 children between 6 and 13 years of age, residents in the Campania region of Italy.#sepHalf of the participating children were diagnosed with high-functioning Autism Spectrum Disorder, and the other half were neurotypical children matched for age, gender, and geographical origin.#SEPAll participants were administered three tasks: a complex image description task, a story-telling task, and a story-retelling task. This resulted in 4 hours and 19 minutes of recorded speech, which were then transcribed and annotated using ELAN. ;;;;;CLARIN | ||
OPLON: Opportunities for active and healthy LONgevity;http://hdl.handle.net/20.500.11752/ILC-992;Italian;06:50 hours;;;This corpus consists of semi-spontaneous speech data collected from 96 elderly participants who were divided into two groups: the pathological and the control group.#SEPThe pathological group refers to three categories: (i) 16 participants with amnestic Mild Cognitive Impairment (MCI), (ii) 16 participants with multiple-domain MCI, and (iii) 16 participants with Early Dementia (probable Alzheimer Dementia, Fronto-Temporal Dementia, Mixed Dementia, and Lewy Body Dementia).#SEPThe control group includes 48 healthy individuals matched for gender, age, educational level, and geographical origin. The corpus was subjected to PoS Tagging and Dependency Parsing (CoNLL format). ;;;;;CLARIN | ||
Polish Cued Speech Corpus of Hearing-Impaired Children;https://hdl.handle.net/1839/dbcd8568-d17d-4861-94bb-aa553e943399;Polish;20 children (11 girls and 9 boys);CHAT format;open access or through email request for access;This is a corpus of recordings of the DIA (Dutch Intelligibilty Assessment).#SEPThe corpus also contains a variety of other samples like reading passages, isolated sentences and recordings of spontaneous speech.#SEPThe corpus contains samples of 187 speakers with a speech disorder and samples of 122 speakers without a speech disorder. ;Download;https://hdl.handle.net/1839/dbcd8568-d17d-4861-94bb-aa553e943399;;;CLARIN |
Oops, something went wrong.