AUTOTYP v1.0.0

autotyp · Feb 10, 2022 · 88133f0 · 88133f0
1 parent 74aaf40
commit 88133f0
Show file tree

Hide file tree

Showing 329 changed files with 3,352,908 additions and 354,175 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1 @@
+.DS_Store
diff --git a/CHANGELOG.md b/CHANGELOG.md
diff --git a/CHANGES-1.0.0.md b/CHANGES-1.0.0.md
@@ -0,0 +1,114 @@
+# Overview of changes in version 1.0.0
+
+AUTOTYP version 1.0.0 is a completely new release that focuses on usability,
+documentation and completeness. It has been radically overhauled compared to
+the earlier 0.1.x version. The sheer number of differences makes it
+impossible to provide a comprehensive list of changes. What follows is a
+quick summary of the most important of the new release as well as notes on
+migrating from the old database releases. 
+
+## Major new features in version 1.0.0:
+
+- New naming conventions for datasets and variables, focusing on usability 
+  and clarity. All names now consistently follow the CamelCase convention and 
+  are based on verbose descriptions that provide more context about the variable 
+  (e.g. `Position` -> `VerbInflectionMarkerPosition`). Hundreds of variables have 
+  been renamed to fit these criteria.
+
+- The datasets are now organized into thematic modules, rather than each dataset
+  constituting a module on its own. 
+
+- Published data now includes the raw exported database data, in addition to the 
+  previously published derived aggregated tables. All aggregation scripts used to
+  compute derived data are published as well (see 
+  [`aggregation-scripts`](aggregation-scripts)). Please feel free to inspect the 
+  scripts and modify them to suit your own needs.   
+
+- Many improvements to variable descriptions and metadata. The metadata YAML files 
+  are now simpler and more compact, which should make the documentation more 
+  accessible.
+
+- Overhauled the data architecture to allow nested and repeated table fields (see 
+  [Data Architecture](readme.md#data-architecture)). This allows many datasets to be
+  expressed in a more natural, conceptually simpler fashion.  
+
+- New R and JSON exports for users who want quick access to the data using their 
+  preferred data wrangling environment. 
+
+- Language name and glottocode is exported for every dataset in addition to the 
+  internal language ID
+
+## Major changes to individual datasets/modules:
+
+- `GrammaticalRelations` module now encompasses all data on grammatical relations 
+  and alignments. We now fully provide the underlying raw database data in addition 
+  to the aggregated alignment data and the scripts used to produce these aggregations.
+
+- `VerbSynthesis` has been overhauled to include detailed list of inflectional 
+  categories expressed on verbs
+
+- `LocusOfMarking` module now contains the raw database data in addition to the 
+  previously published aggregations. 
+
+- `GrammaticalMarkers` dataset has been overhauled to include a detailed list
+  of marker hosts and marked categories 
+
+- `MorphemeClasses` replaces the previous aggregated `Morpheme_types` dataset
+  and exposes the information about individual language-specific morpheme classes. 
+  The information previously available in `Morpheme_types` is now integrated into
+  the improved `MorphologyPerLanguage` aggregated dataset. 
+
+- New module `Categories` groups together datasets that provide information about 
+  selected grammatical categories
+
+- New module `Definitions` provides access to underlying definitions of categorical
+  variables used across AUTOTYP
+
+- New module `PerLanguageSummaries` groups together various per-language aggregated
+  summaries (code to generate these summaries is available under 
+  [`aggregation-scripts`](aggregation-scripts))
+
+
+## Notes on migration from older AUTOTYP release
+
+If you have been using the AUTOTYP version 0.1.x you will notice that many datasets
+have been moved or renamed. The following list should help you to find the new 
+location of the data:
+
+- **`Agreement`** is now exported as `Categories/Agreement`
+- **`Alienability`** is now exported as `Categories/Alienability`
+- **`Alignment`** is now exported as `GrammaticalRelations/Alignment`
+- **`Alignment_per_language`** is now `PerLanguageSummaries/AlignmentForDefaultPredicatesPerLanguage` 
+- **`Clause_linkage`** is now `Sentence/ClauseLinkage` 
+- **`Clause_word_order`** is now `Sentence/ClauseWordOrder` 
+- **`Clusivity`** is now exported as `Categories/Clusivity`
+- **`Gender`** is now exported as `Categories/Gender`
+- **`Grammatical_markers`** is now exported as `Morphology/GrammaticalMarkers`
+- **`GR_per_language`** has been superseded by `GrammaticalRelations/GrammaticalRelationCoverage`
+- **`Locus_per_language`** is now `PerLanguageSummaries/LocusOfMarkingPerLanguage` 
+- **`Locus_per_macrorelation`** has been superseded by `Morphology/DefaultLocusOfMarkingPerMacrorelation`
+- **`Locus_per_microrelation`** has been superseded by `Morphology/LocusOfMarkingPerMicrorelation`
+- **`Markers_per_language`** is now `PerLanguageSummaries/GrammaticalMarkersPerLanguage` 
+- **`Morpheme_types`** has been superseded by `Morphology/MorphemeClasses` and 
+  `PerLanguageSummaries/MorphologyPerLanguage` 
+- **`Morphology_per_language`** is now `PerLanguageSummaries/MorphologyPerLanguage` 
+- **`NP_per_language`** is now `PerLanguageSummaries/NPStructurePerLanguage` 
+- **`NP_structure`** is now `NP/NPStructure`
+- **`NP_structure_presence`** is now `PerLanguageSummaries/NPStructurePresence` 
+- **`Numeral_classifiers`** is now exported as `Categories/NumeralClassifiers`
+- **`Register`** is still `Register`
+- **`Synthesis`** is now `Morphology/VerbSynthesis`
+- **`Valence_classes`** is now `GrammaticalRelations/PredicateClasses`
+- **`Valence_classes_per_language`** is now `PerLanguageSummaries/PredicateClassesSemanticsPerLanguage` 	
+- **`VInfl_counts_per_position`** is now `PerLanguageSummaries/VerbInflectionAndAgreementCountsByPosition` 	
+- **`VInfl_cat_*`** is now `PerLanguageSummaries/VerbInflectionCategoriesAggregatedBy*` 	
+- **`VInfl_macrocat_*`** is now `PerLanguageSummaries/VerbInflectionMacrocategories*` 	
+- **`VAgr_*`** is now `PerLanguageSummaries/VerbAgreementAggregatedBy*` 	
+- **`Word_domains`** is now `Word/WordDomains` 	
+
+
+
+
+
+
+
diff --git a/LICENSE b/LICENSE
diff --git a/R/autotyp.utilities.R b/R/autotyp.utilities.R
diff --git a/VERSION b/VERSION