Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

languages with more than one area #49

Open
HedvigS opened this issue Oct 4, 2022 · 1 comment
Open

languages with more than one area #49

HedvigS opened this issue Oct 4, 2022 · 1 comment

Comments

@HedvigS
Copy link

HedvigS commented Oct 4, 2022

In this table register.csv there are languages with the same glottocodes which are associated with more than one area.

oira1263 for example is associated with both Inner Asia and Oceania. This seems to be because one of them should have the glottocode kalm1243, not oira1263 (LID = 1343).

There are 12 cases like this. I think each should be gone through and probably the glottocode & ISO 639-3 changed.

 1 oira1263   
 2 toho1245   
 3 tibe1272   
 4 indo1316   
 5 kyer1238   
 6 balk1252   
 7 east2295   
 8 kati1270   
 9 mart1256   
10 noga1249   
11 taha1241   
12 peri1253

Here's a way of finding them using R-code.

library(tidyverse)
AUTOTYP <- read_csv("data/csv/Register.csv"  ,col_types = cols()) %>% 
  distinct(Glottocode, Area, .keep_all = T) %>% 
  mutate(dup = duplicated(Glottocode) + duplicated(Glottocode, fromLast = T)) %>% 
  filter(dup > 0) 

Some of them make sense, like Tuareg (Air) (LID = 1420) and Tuareg (Ghat) (LID = 1421). The long lat of the varieties probably merits the different areas.

@HedvigS
Copy link
Author

HedvigS commented Oct 4, 2022

(There are also 132 entries of duplicates but with the same AUTOTYP-area. They seem to represent different data-collection events. Is that right? For example LID 148 & 579.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant