Skip to content

Commit

Permalink
Update analysis metadata
Browse files Browse the repository at this point in the history
  • Loading branch information
MartinHammarstedt committed Feb 11, 2025
1 parent 41ce4e1 commit ec20beb
Show file tree
Hide file tree
Showing 13 changed files with 111 additions and 80 deletions.
4 changes: 3 additions & 1 deletion sparv/modules/geo/metadata.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,9 @@ language_codes:
keywords: []
standard_reference: ''
other_references: []
model: "[GeoNames](https://www.geonames.org/)"
models:
- name: GeoNames
url: "https://www.geonames.org/"
trained_on: ''
tagset: ''
evaluation_results: ''
Expand Down
47 changes: 28 additions & 19 deletions sparv/modules/hunpos/metadata.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,10 @@ language_codes:
standard_reference: ''
other_references:
- "Hunpos: https://code.google.com/archive/p/hunpos/"
tool:
name: Hunpos
url: "https://code.google.com/archive/p/hunpos/"
licences:
tool: BSD-3-Clause
tools:
- name: Hunpos
url: "https://code.google.com/archive/p/hunpos/"
license: BSD-3-Clause
trained_on: "[SUC3](https://spraakbanken.gu.se/resurser/suc3)"
tagset: "[SUC3](https://spraakbanken.gu.se/korp/markup/msdtags.html)"
evaluation_results: ''
Expand All @@ -35,7 +34,9 @@ example_output: |-
<token pos="NN">korpus</token>
<token pos="MAD">.</token>
```
model: "[suc3_suc-tags_default-setting_utf8.model](https://github.com/spraakbanken/sparv-models/blob/master/hunpos/suc3_suc-tags_default-setting_utf8.model?raw=true)"
models:
- name: suc3_suc-tags_default-setting_utf8.model
url: "https://github.com/spraakbanken/sparv-models/blob/master/hunpos/suc3_suc-tags_default-setting_utf8.model?raw=true"
description:
swe: |-
Meningssegment analyseras och annoteras med ordklasstaggar. Ingår inte längre i
Expand Down Expand Up @@ -67,7 +68,9 @@ example_output: |-
<token msd="NN.UTR.SIN.IND.NOM">korpus</token>
<token msd="MAD">.</token>
```
model: "[suc3_suc-tags_default-setting_utf8.model](https://github.com/spraakbanken/sparv-models/blob/master/hunpos/suc3_suc-tags_default-setting_utf8.model?raw=true)"
models:
- name: suc3_suc-tags_default-setting_utf8.model
url: "https://github.com/spraakbanken/sparv-models/blob/master/hunpos/suc3_suc-tags_default-setting_utf8.model?raw=true"
description:
swe: |-
Meningssegment analyseras och annoteras med ordklasstaggar och morfosyntaktisk information. Ingår inte längre i
Expand Down Expand Up @@ -113,18 +116,21 @@ example_extra: |-
language: swe
variety: "1800"
```
model: |-
- [suc3_suc-tags_default-setting_utf8.model](https://github.com/spraakbanken/sparv-models/blob/master/hunpos/suc3_suc-tags_default-setting_utf8.model?raw=true)
- a word list along with the words' morphosyntactic information generated from the [Dalin
morphology](https://spraakbanken.gu.se/resurser/dalinm) and the [Swedberg
morphology](https://spraakbanken.gu.se/resurser/swedbergm)
models:
- name: suc3_suc-tags_default-setting_utf8.model
url: "https://github.com/spraakbanken/sparv-models/blob/master/hunpos/suc3_suc-tags_default-setting_utf8.model?raw=true"
- name: dalinm-swedberg_saldo_suc-tags.morphtable
description: |-
A word list along with the words' morphosyntactic information generated from the [Dalin
morphology](https://spraakbanken.gu.se/resurser/dalinm) and the [Swedberg
morphology](https://spraakbanken.gu.se/resurser/swedbergm)
description:
swe: |-
Meningssegment analyseras och annoteras med ordklasstaggar. Utöver ordklasstaggningsmodellen använder Hunpos listor
med böjningsformer för att kunna generera bättre ordklasstaggar för 1800-talssvenska.
eng: |-
Sentence segments are analysed to enrich tokens with part-of-speech tags. In addition to the pos model inflection
lists are provided to Hunpos to make more accuare part-of-speech predictions for Swedish from the 1800's.
lists are provided to Hunpos to make more accurate part-of-speech predictions for Swedish from the 1800's.
created: 2012-10-23
updated: 2015-09-11
---
Expand Down Expand Up @@ -163,19 +169,22 @@ example_extra: |-
language: swe
variety: "1800"
```
model: |-
- [suc3_suc-tags_default-setting_utf8.model](https://github.com/spraakbanken/sparv-models/blob/master/hunpos/suc3_suc-tags_default-setting_utf8.model?raw=true)
- a word list along with the words' morphosyntactic information generated from the [Dalin
morphology](https://spraakbanken.gu.se/resurser/dalinm) and the [Swedberg
morphology](https://spraakbanken.gu.se/resurser/swedbergm)
models:
- name: suc3_suc-tags_default-setting_utf8.model
url: "https://github.com/spraakbanken/sparv-models/blob/master/hunpos/suc3_suc-tags_default-setting_utf8.model?raw=true"
- name: dalinm-swedberg_saldo_suc-tags.morphtable
description: |-
A word list along with the words' morphosyntactic information generated from the [Dalin
morphology](https://spraakbanken.gu.se/resurser/dalinm) and the [Swedberg
morphology](https://spraakbanken.gu.se/resurser/swedbergm)
description:
swe: |-
Meningssegment analyseras och annoteras med ordklasstaggar och morfosyntaktisk information. Utöver
ordklasstaggningsmodellen använder Hunpos listor med böjningsformer för att kunna generera bättre ordklasstaggar för
1800-talssvenska.
eng: |-
Sentence segments are analysed to enrich tokens with part-of-speech tags and morphosyntactic information. In
addition to the pos model inflection lists are provided to Hunpos to make more accuare part-of-speech predictions
addition to the pos model inflection lists are provided to Hunpos to make more accurate part-of-speech predictions
for Swedish from the 1800's.
created: 2012-10-23
updated: 2015-09-11
8 changes: 6 additions & 2 deletions sparv/modules/lexical_classes/metadata.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,9 @@ standard_reference: "[Lars Borin, Luis Nieto Piña, Richard Johansson (2015): He
other_references:
- "[Lars Borin, Jens Allwood, Gerard de Melo (2014): Bring vs. MTRoget: Evaluating automatic thesaurus translation, in Proceedings of LREC 2014, May 26-31, 2014 Reykjavik, Iceland](https://gup.ub.gu.se/publication/198549)"
tagset: "[Blingbring](https://spraakbanken.gu.se/resurser/blingbring)"
model: "[Blingbring frequency model](https://github.com/spraakbanken/sparv-models/blob/master/lexical_classes/blingbring.freq.gp2008%2Bsuc3%2Bromi.pickle)"
models:
- name: Blingbring frequency model
url: "https://github.com/spraakbanken/sparv-models/blob/master/lexical_classes/blingbring.freq.gp2008%2Bsuc3%2Bromi.pickle"
---
id: sbx-swe-lexical_classes_token-sparv-blingbring
parent: blingbring-parent
Expand Down Expand Up @@ -145,7 +147,9 @@ standard_reference: "[Dana Dannélls, Lars Borin, Karin Friberg Heppin (2021): T
other_references:
- "Dana Dannélls, Lars Borin, Karin Friberg Heppin (2021): The Swedish FrameNet++ Harmonization, integration, method development and practical language technology applications. John Benjamins: Amsterdam, Philadelphia. ISBN 978 90 272 5848 9."
tagset: "[Swedish FrameNet (SweFN)](https://spraakbanken.gu.se/resurser/swefn)"
model: "[Frequency model](https://github.com/spraakbanken/sparv-models/blob/master/lexical_classes/swefn.freq.gp2008%2Bsuc3%2Bromi.pickle)"
models:
- name: Frequency model
url: "https://github.com/spraakbanken/sparv-models/blob/master/lexical_classes/swefn.freq.gp2008%2Bsuc3%2Bromi.pickle"
---
id: sbx-swe-lexical_classes_token-sparv-swefn
parent: swefn-parent
Expand Down
10 changes: 6 additions & 4 deletions sparv/modules/malt/metadata.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -35,10 +35,12 @@ standard_reference: |-
other_references:
- "Maltparser: https://www.maltparser.org/download.html"
- 'https://aclanthology.org/2021.nodalida-main.20/'
tool:
name: Maltparser
url: "https://www.maltparser.org/"
model: "[Swemalt](https://www.maltparser.org/mco/swedish_parser/swemalt.html)"
tools:
- name: Maltparser
url: "https://www.maltparser.org/"
models:
- name: Swemalt
url: "https://www.maltparser.org/mco/swedish_parser/swemalt.html"
trained_on: "[Svensk trädbank (the TalbankenSTB part)](https://spraakbanken.gu.se/resurser/sv-treebank)"
tagset: "[MambaDep](https://svn.spraakdata.gu.se/sb-arkiv/pub/mamba.html)"
evaluation_results: Labelled Attachment Score 0.78 (using the TalbankenSBX train-dev-test split)
Expand Down
1 change: 0 additions & 1 deletion sparv/modules/phrase_structure/metadata.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,6 @@ example_output: |-
```
standard_reference: ''
other_references: []
model: "Method has no model"
trained_on: "[TalbankenSBX](https://spraakbanken.gu.se/resurser/talbanken)"
tagset: "See description below"
evaluation_results: ''
Expand Down
1 change: 0 additions & 1 deletion sparv/modules/readability/metadata.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@ language_codes:
- swe
keywords: []
other_references: []
model: ''
trained_on: ''
tagset: ''
evaluation_results: ''
Expand Down
4 changes: 3 additions & 1 deletion sparv/modules/saldo/metadata.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@ language_codes:
- swe
standard_reference: "[Borin/Forsberg/Lönngren 2013: SALDO: a touch of yin to WordNet's yang](http://dx.doi.org/10.1007/s10579-013-9233-4)"
other_references: []
model: "[SALDO's morphology](https://spraakbanken.gu.se/resurser/saldom)"
models:
- name: SALDO's morphology
url: "https://spraakbanken.gu.se/resurser/saldom"
trained_on: ''
tagset: ''
evaluation_results: ''
Expand Down
19 changes: 10 additions & 9 deletions sparv/modules/segment/metadata.yaml
Original file line number Diff line number Diff line change
@@ -1,13 +1,11 @@

id: segment-nltk-parent
abstract: true
keywords: []
standard_reference: "Bird, Steven, Edward Loper and Ewan Klein (2009), Natural Language Processing with Python. O’Reilly Media Inc."
other_references: []
tool:
name: NLTK
url: "https://www.nltk.org/"
model: ''
tools:
- name: NLTK
url: "https://www.nltk.org/"
trained_on: ''
tagset: ''
evaluation_results: ''
Expand Down Expand Up @@ -351,9 +349,10 @@ example_output: |-
<token>.</token>
```
standard_reference: ''
model: |-
- [bettertokenizer.sv](https://raw.githubusercontent.com/spraakbanken/sparv-models/master/segment/bettertokenizer.sv)
- bettertokenizer.sv.saldo-tokens
models:
- name: bettertokenizer.sv
url: "https://raw.githubusercontent.com/spraakbanken/sparv-models/master/segment/bettertokenizer.sv"
- name: bettertokenizer.sv.saldo-tokens
trained_on: "[SALDOs morphology](https://spraakbanken.gu.se/resurser/saldom)"
description:
swe: |-
Expand Down Expand Up @@ -399,7 +398,9 @@ example_output: |-
<token>.</token>
</sentence>
```
model: "[punkt-nltk-svenska.pickle](https://github.com/spraakbanken/sparv-models/blob/master/segment/punkt-nltk-svenska.pickle?raw=true)"
models:
- name: punkt-nltk-svenska.pickle
url: "https://github.com/spraakbanken/sparv-models/blob/master/segment/punkt-nltk-svenska.pickle?raw=true"
trained_on: "[StorSUC](https://spraakbanken.gu.se/resurser/storsuc)"
description:
swe: |-
Expand Down
4 changes: 3 additions & 1 deletion sparv/modules/sensaldo/metadata.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,9 @@ standard_reference: 'http://www.lrec-conf.org/proceedings/lrec2018/summaries/857
other_references:
- http://www.lrec-conf.org/proceedings/lrec2018/summaries/846.html
- https://gup.ub.gu.se/publication/264721?lang=sv
model: "[Sensaldo](https://spraakbanken.gu.se/resurser/sensaldo)"
models:
- name: Sensaldo
url: "https://spraakbanken.gu.se/resurser/sensaldo"
trained_on: ''
tagset: ''
evaluation_results: ''
Expand Down
Loading

0 comments on commit ec20beb

Please sign in to comment.