Skip to content

Commit

Permalink
Update analysis metadata
Browse files Browse the repository at this point in the history
  • Loading branch information
MartinHammarstedt committed Jan 20, 2025
1 parent 078e326 commit ad1ad4f
Show file tree
Hide file tree
Showing 14 changed files with 79 additions and 67 deletions.
2 changes: 1 addition & 1 deletion sparv/modules/conll_export/metadata.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
id: export-conllu
id: sbx-swe-export-conllu
name:
swe: CoNLL-U-export
eng: CoNLL-U export
Expand Down
5 changes: 2 additions & 3 deletions sparv/modules/geo/metadata.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,14 @@ language_codes:
keywords: []
standard_reference: ''
other_references: []
tool: ''
model: "[GeoNames](https://www.geonames.org/)"
trained_on: ''
tagset: ''
evaluation_results: ''
created: 2018-05-28
updated: 2022-05-18
---
id: swe-geotagcontext-sparv
id: sbx-swe-geotagcontext-sparv
parent: geo-parent
name:
swe: Geotaggning av platsnamn från kontext
Expand Down Expand Up @@ -67,7 +66,7 @@ description:
the [GeoNames database](https://www.geonames.org/). This annotation can be applied to any text chunk, e.g. texts,
paragraphs, sentences or tokens.
---
id: swe-geotagmetadata-sparv
id: sbx-swe-geotagmetadata-sparv
parent: geo-parent
name:
swe: Geotagging av platsnamn från metadata
Expand Down
14 changes: 9 additions & 5 deletions sparv/modules/hunpos/metadata.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,16 @@ language_codes:
standard_reference: ''
other_references:
- "Hunpos: https://code.google.com/archive/p/hunpos/"
tool: "Hunpos"
tool:
name: Hunpos
url: "https://code.google.com/archive/p/hunpos/"
licences:
tool: BSD-3-Clause
trained_on: "[SUC3](https://spraakbanken.gu.se/resurser/suc3)"
tagset: "[SUC3](https://spraakbanken.gu.se/korp/markup/msdtags.html)"
evaluation_results: ''
---
id: swe-pos-hunpos-suc3
id: sbx-swe-pos-hunpos-suc3
parent: hunpos-parent
name:
swe: SUC-ordklasstaggning med Hunpos
Expand Down Expand Up @@ -42,7 +46,7 @@ description:
created: 2010-12-15
updated: 2018-05-28
---
id: swe-msd-hunpos-suc3
id: sbx-swe-msd-hunpos-suc3
parent: hunpos-parent
name:
swe: Morfosyntaktisk SUC-taggning med Hunpos
Expand Down Expand Up @@ -74,7 +78,7 @@ description:
created: 2010-12-15
updated: 2018-05-28
---
id: swe-pos-hunpos-suc3-1800
id: sbx-swe-pos-hunpos-suc3-1800
parent: hunpos-parent
name:
swe: SUC-ordklasstaggning med Hunpos för 1800-talssvenska
Expand Down Expand Up @@ -124,7 +128,7 @@ description:
created: 2012-10-23
updated: 2015-09-11
---
id: swe-msd-hunpos-suc3-1800
id: sbx-swe-msd-hunpos-suc3-1800
parent: hunpos-parent
name:
swe: Morfosyntaktisk SUC-taggning med Hunpos för 1800-talssvenska
Expand Down
9 changes: 4 additions & 5 deletions sparv/modules/lexical_classes/metadata.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@ language_codes:
- swe
keywords: []
abstract: true
tool: ''
trained_on: |-
Reference corpora for relative frequencies: [Göteborgsposten 2008](https://spraakbanken.gu.se/resurser/gp2008), [SUC
3.0](https://spraakbanken.gu.se/resurser/suc3), [Bonniersromaner I
Expand All @@ -21,7 +20,7 @@ other_references:
tagset: "[Blingbring](https://spraakbanken.gu.se/resurser/blingbring)"
model: "[Blingbring frequency model](https://github.com/spraakbanken/sparv-models/blob/master/lexical_classes/blingbring.freq.gp2008%2Bsuc3%2Bromi.pickle)"
---
id: swe-lexical_classes_token-sparv-blingbring
id: sbx-swe-lexical_classes_token-sparv-blingbring
parent: blingbring-parent
name:
swe: Lexikala klasser från Blingbring, tokennivå
Expand Down Expand Up @@ -65,7 +64,7 @@ description:
versions of Blingbring.
created: 2017-09-05
---
id: swe-lexical_classes_text-sparv-blingbring
id: sbx-swe-lexical_classes_text-sparv-blingbring
parent: blingbring-parent
name:
swe: Lexikala klasser från Blingbring, textnivå
Expand Down Expand Up @@ -148,7 +147,7 @@ other_references:
tagset: "[Swedish FrameNet (SweFN)](https://spraakbanken.gu.se/resurser/swefn)"
model: "[Frequency model](https://github.com/spraakbanken/sparv-models/blob/master/lexical_classes/swefn.freq.gp2008%2Bsuc3%2Bromi.pickle)"
---
id: swe-lexical_classes_token-sparv-swefn
id: sbx-swe-lexical_classes_token-sparv-swefn
parent: swefn-parent
name:
swe: Lexikala klasser från SweFN, tokennivå
Expand Down Expand Up @@ -186,7 +185,7 @@ description:
classes.
created: 2017-09-21
---
id: swe-lexical_classes_text-sparv-swefn
id: sbx-swe-lexical_classes_text-sparv-swefn
parent: swefn-parent
name:
swe: Lexikala klasser från SweFN, textnivå
Expand Down
6 changes: 4 additions & 2 deletions sparv/modules/malt/metadata.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
id: swe-dependency-malt-treebank
id: sbx-swe-dependency-malt-treebank
name:
swe: Dependensparsning med MaltParser
eng: Dependency parsing with MaltParser
Expand Down Expand Up @@ -35,7 +35,9 @@ standard_reference: |-
other_references:
- "Maltparser: https://www.maltparser.org/download.html"
- 'https://aclanthology.org/2021.nodalida-main.20/'
tool: "Maltparser"
tool:
name: Maltparser
url: "https://www.maltparser.org/"
model: "[Swemalt](https://www.maltparser.org/mco/swedish_parser/swemalt.html)"
trained_on: "[Svensk trädbank (the TalbankenSTB part)](https://spraakbanken.gu.se/resurser/sv-treebank)"
tagset: "[MambaDep](https://svn.spraakdata.gu.se/sb-arkiv/pub/mamba.html)"
Expand Down
3 changes: 1 addition & 2 deletions sparv/modules/phrase_structure/metadata.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
id: swe-phrasestructure-sparv
id: sbx-swe-phrasestructure-sparv
name:
swe: Svensk frasstrukturparsning
eng: Swedish phrase structure parsing
Expand Down Expand Up @@ -45,7 +45,6 @@ example_output: |-
```
standard_reference: ''
other_references: []
tool: ''
model: "Method has no model"
trained_on: "[TalbankenSBX](https://spraakbanken.gu.se/resurser/talbanken)"
tagset: "See description below"
Expand Down
7 changes: 3 additions & 4 deletions sparv/modules/readability/metadata.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,14 @@ language_codes:
- swe
keywords: []
other_references: []
tool: ''
model: ''
trained_on: ''
tagset: ''
evaluation_results: ''
created: 2018-03-28
updated: 2018-03-28
---
id: swe-readability-sparv-lix
id: sbx-swe-readability-sparv-lix
parent: readability-parent
name:
swe: Annotering av läsbarhetsindex (LIX) för texter
Expand Down Expand Up @@ -76,7 +75,7 @@ description:
average word count per sentence and ratio of long words (exceeding six letters). The value is calculated as O / M +
L x 100 / O, where O = word count, M = sentence count and L = long word count.
---
id: swe-readability-sparv-nk
id: sbx-swe-readability-sparv-nk
parent: readability-parent
name:
swe: Annotering av Nominalkvot (NK) för texter
Expand Down Expand Up @@ -137,7 +136,7 @@ description:
dividing this by the number of verbs, adverbs and pronouns. A high nominal ratio suggests a high density of
information, which can also mean that the text is difficult to read.
---
id: swe-readability-sparv-ovix
id: sbx-swe-readability-sparv-ovix
parent: readability-parent
name:
swe: Annotering av Ordvariationsindex (OVIX) för texter
Expand Down
13 changes: 6 additions & 7 deletions sparv/modules/saldo/metadata.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,12 @@ language_codes:
- swe
standard_reference: "[Borin/Forsberg/Lönngren 2013: SALDO: a touch of yin to WordNet's yang](http://dx.doi.org/10.1007/s10579-013-9233-4)"
other_references: []
tool: "Sparv"
model: "[SALDO's morphology](https://spraakbanken.gu.se/resurser/saldom)"
trained_on: ''
tagset: ''
evaluation_results: ''
---
id: swe-lemmatization-sparv-saldo
id: sbx-swe-lemmatization-sparv-saldo
parent: saldo-parent
name:
swe: Annotering av SALDO-grundformer
Expand Down Expand Up @@ -44,7 +43,7 @@ description:
created: 2010-12-15
updated: 2018-03-28
---
id: swe-lemgram-sparv-saldo
id: sbx-swe-lemgram-sparv-saldo
parent: saldo-parent
name:
swe: Annotering av SALDO-lemgram
Expand Down Expand Up @@ -77,7 +76,7 @@ description:
created: 2010-12-15
updated: 2018-03-28
---
id: swe-sense-sparv-saldo
id: sbx-swe-sense-sparv-saldo
parent: saldo-parent
name:
swe: Annotering av SALDO-identifierare
Expand Down Expand Up @@ -107,7 +106,7 @@ description:
created: 2010-12-15
updated: 2018-03-28
---
id: swe-compound-sparv-saldolemgram
id: sbx-swe-compound-sparv-saldolemgram
parent: saldo-parent
name:
swe: Sammansättningsanalys med hjälp av SALDO-lemgram
Expand Down Expand Up @@ -148,7 +147,7 @@ description:
created: 2018-03-28
updated: 2020-07-09
---
id: swe-compound-sparv-saldowords
id: sbx-swe-compound-sparv-saldowords
parent: saldo-parent
name:
swe: Sammansättningsanalys med hjälp av SALDO-ordformer
Expand Down Expand Up @@ -189,7 +188,7 @@ description:
created: 2018-03-28
updated: 2020-07-09
---
id: swe-lemmatization-sparv-saldo2
id: sbx-swe-lemmatization-sparv-saldo2
parent: saldo-parent
name:
swe: Annotering av SALDO-grundformer (utökade)
Expand Down
28 changes: 15 additions & 13 deletions sparv/modules/segment/metadata.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@ abstract: true
keywords: []
standard_reference: "Bird, Steven, Edward Loper and Ewan Klein (2009), Natural Language Processing with Python. O’Reilly Media Inc."
other_references: []
tool: "NLTK"
tool:
name: NLTK
url: "https://www.nltk.org/"
model: ''
trained_on: ''
tagset: ''
Expand All @@ -17,7 +19,7 @@ description:
created: 2010-12-15
updated: 2021-05-07
---
id: tokenization-sparv-linebreaks
id: sbx-mul-tokenization-sparv-linebreaks
parent: segment-nltk-parent
name:
swe: Radbrytningstokenisering
Expand All @@ -44,7 +46,7 @@ example_extra: |-
token_segmenter: linebreaks
```
---
id: sentence-sparv-linebreaks
id: sbx-mul-sentence-sparv-linebreaks
parent: segment-nltk-parent
name:
swe: Radbrytningssegmentering, meningar
Expand Down Expand Up @@ -80,7 +82,7 @@ example_extra: |-
sentence_segmenter: linebreaks
```
---
id: paragraph-sparv-linebreaks
id: sbx-mul-paragraph-sparv-linebreaks
parent: segment-nltk-parent
name:
swe: Radbrytningssegmentering, stycken
Expand Down Expand Up @@ -120,7 +122,7 @@ example_extra: |-
paragraph_segmenter: linebreaks
```
---
id: tokenization-sparv-blanklines
id: sbx-mul-tokenization-sparv-blanklines
parent: segment-nltk-parent
name:
swe: Tomradstokenisering
Expand All @@ -147,7 +149,7 @@ example_extra: |-
token_segmenter: blanklines
```
---
id: sentence-sparv-blanklines
id: sbx-mul-sentence-sparv-blanklines
parent: segment-nltk-parent
name:
swe: Tomradssegmentering, meningar
Expand Down Expand Up @@ -183,7 +185,7 @@ example_extra: |-
sentence_segmenter: blanklines
```
---
id: paragraph-sparv-blanklines
id: sbx-mul-paragraph-sparv-blanklines
parent: segment-nltk-parent
name:
swe: Tomradssegmentering, stycken
Expand Down Expand Up @@ -223,7 +225,7 @@ example_extra: |-
paragraph_segmenter: blanklines
```
---
id: tokenization-sparv-whitespace
id: sbx-mul-tokenization-sparv-whitespace
parent: segment-nltk-parent
name:
swe: Blankteckentokenisering
Expand All @@ -250,7 +252,7 @@ example_extra: |-
token_segmenter: whitespace
```
---
id: sentence-sparv-whitespace
id: sbx-mul-sentence-sparv-whitespace
parent: segment-nltk-parent
name:
swe: Blankteckensegmentering, meningar
Expand Down Expand Up @@ -286,7 +288,7 @@ example_extra: |-
sentence_segmenter: whitespace
```
---
id: paragraph-sparv-whitespace
id: sbx-mul-paragraph-sparv-whitespace
parent: segment-nltk-parent
name:
swe: Blankteckensegmentering, stycken
Expand Down Expand Up @@ -326,7 +328,7 @@ example_extra: |-
paragraph_segmenter: whitespace
```
---
id: swe-tokenization-sparv-betterword
id: sbx-swe-tokenization-sparv-betterword
parent: segment-nltk-parent
name:
swe: Svensk tokenisering
Expand Down Expand Up @@ -366,7 +368,7 @@ description:
tokenizer for other languages.
updated: 2021-05-07
---
id: swe-sentence-punkt-storsuc
id: sbx-swe-sentence-punkt-storsuc
parent: segment-nltk-parent
name:
swe: Svensk meningssegmentering
Expand Down Expand Up @@ -413,7 +415,7 @@ description:
is, however, possible to configure the sentence segmenter for other languages.
updated: 2021-09-02
---
id: sentence-punkt
id: sbx-mul-sentence-punctuation
parent: segment-nltk-parent
name:
swe: Meningssegmentering utifrån skiljetecken
Expand Down
3 changes: 1 addition & 2 deletions sparv/modules/sensaldo/metadata.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
id: swe-sentiment-sparv-sensaldo
id: sbx-swe-sentiment-sparv-sensaldo
name:
swe: Sentimentanalys per token med SenSALDO
eng: Sentiment analysis per token using SenSALDO
Expand Down Expand Up @@ -49,7 +49,6 @@ standard_reference: 'http://www.lrec-conf.org/proceedings/lrec2018/summaries/857
other_references:
- http://www.lrec-conf.org/proceedings/lrec2018/summaries/846.html
- https://gup.ub.gu.se/publication/264721?lang=sv
tool: ''
model: "[Sensaldo](https://spraakbanken.gu.se/resurser/sensaldo)"
trained_on: ''
tagset: ''
Expand Down
Loading

0 comments on commit ad1ad4f

Please sign in to comment.