Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Virgin islands (USA) country code missing (VIR) #333

Open
nellybiondi opened this issue Jul 3, 2023 · 3 comments
Open

Virgin islands (USA) country code missing (VIR) #333

nellybiondi opened this issue Jul 3, 2023 · 3 comments

Comments

@nellybiondi
Copy link

No description provided.

@cjyetman
Copy link
Collaborator

cjyetman commented Jul 3, 2023

Which version are you using?

library(countrycode)
packageVersion("countrycode")
#> [1] '1.5.0'
countrycode::countrycode("VIR", "iso3c", "country.name")
#> [1] "U.S. Virgin Islands"

@nellybiondi
Copy link
Author

Same version 1.5.0
summarise() has grouped output by 'Country', 'Code', 'Year', 'Sex'. You can override using the .groups argument.
Warning message:
There were 2 warnings in mutate().
The first warning was:
ℹ In argument: Code = countrycode(Country, origin = "country.name", destination = "iso3c").
Caused by warning:
! Some values were not matched unambiguously: Rodrigues, Virgin Islands (USA)
ℹ Run dplyr::last_dplyr_warnings() to see the 1 remaining warning.

@cjyetman
Copy link
Collaborator

cjyetman commented Jul 3, 2023

"Virgin Islands (USA)" is a particularly hard string to match without causing a problem with matching "USA" to "USA".

These are the variations of names for Virgin Islands that will work and are tested:

"U.S. Virgin Islands" = c(
"United States Virgin Islands",
"US Virgin Islands",
"U.S. Virgin Islands",
"Virgin Islands, US",
"Virgin Islands, U.S.",
"Virgin Islands, (U.S.)",
"Virgin Islands, (US)",
"Virgin Islands US",
"Virgin Islands U.S.",
"Virgin Islands (U.S.)",
"Virgin Islands (US)"
),

for example...

library(countrycode)
packageVersion("countrycode")
#> [1] '1.5.0'

country_names <- 
  c(
      "U.S. Virgin Islands",
      "United States Virgin Islands",
      "US Virgin Islands",
      "U.S. Virgin Islands",
      "Virgin Islands, US",
      "Virgin Islands, U.S.",
      "Virgin Islands, (U.S.)",
      "Virgin Islands, (US)",
      "Virgin Islands US",
      "Virgin Islands U.S.",
      "Virgin Islands (U.S.)",
      "Virgin Islands (US)"
    )

countrycode(country_names, origin = "country.name", destination = "iso3c")
#>  [1] "VIR" "VIR" "VIR" "VIR" "VIR" "VIR" "VIR" "VIR" "VIR" "VIR" "VIR" "VIR"

One way around this if you have "Virgin Islands (USA)" in your source data is to use the custom_match argument, like so (there is an erroneous warning because "Virgin Islands (USA)" matches two different countries)...

library(countrycode)
countrycode(
  sourcevar = "Virgin Islands (USA)", 
  origin = "country.name", 
  destination = "iso3c",
  custom_match = c("Virgin Islands (USA)" = "VIR")
)
#> Warning: Some strings were matched more than once, and therefore set to <NA> in the result: Virgin Islands (USA),VIR,USA
#> [1] "VIR"

Otherwise, you could modify your source data first, with something like...

library(dplyr)
library(countrycode)

source_data <- data.frame(country = c("Virgin Islands (USA)", "Canada", "United States"))

source_data %>% 
  mutate(country = case_when(
    country == "Virgin Islands (USA)" ~ "U.S. Virgin Islands",
    .default = country
  )) %>% 
  mutate(iso3c = countrycode(country, "country.name", "iso3c"))
#>               country iso3c
#> 1 U.S. Virgin Islands   VIR
#> 2              Canada   CAN
#> 3       United States   USA

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants