"Origin code not supported" error #256

geotheory · 2020-12-23T11:24:26Z

Some fields are supported in the default internal dictionary but not others?

require(countrycode)
#> Loading required package: countrycode
x =  c("Spain","Greece","Bulgaria","Romania","Albania","Malta","Italy","France","Netherlands","United Kingdom")

guess_field(x)
#>                                    code percent_of_unique_matched
#> country.name.en         country.name.en                       100
#> cow.name                       cow.name                       100
#> cldr.name.en               cldr.name.en                       100
#> cldr.name.en_001       cldr.name.en_001                       100
#> cldr.name.en_au         cldr.name.en_au                       100
#> cldr.name.fil             cldr.name.fil                       100
#> cldr.name.luo             cldr.name.luo                       100
#> cldr.name.sn               cldr.name.sn                       100
#> cldr.variant.en         cldr.variant.en                       100
#> cldr.variant.en_001 cldr.variant.en_001                       100
#> cldr.variant.en_au   cldr.variant.en_au                       100
#> cldr.variant.fil       cldr.variant.fil                       100
#> cldr.variant.luo       cldr.variant.luo                       100
#> cldr.variant.sn         cldr.variant.sn                       100
#> iso.name.en                 iso.name.en                        90
#> p4.name                         p4.name                        90
#> un.name.en                   un.name.en                        90
#> vdem.name                     vdem.name                        90
#> cldr.short.en             cldr.short.en                        90
#> cldr.short.en_001     cldr.short.en_001                        90
#> cldr.short.en_au       cldr.short.en_au                        90
#> cldr.short.fil           cldr.short.fil                        90
#> cldr.short.luo           cldr.short.luo                        90
#> cldr.short.sn             cldr.short.sn                        90

countrycode(x, 'country.name.en', 'iso2c')
#>  [1] "ES" "GR" "BG" "RO" "AL" "MT" "IT" "FR" "NL" "GB"

countrycode(x, 'cow.name', 'iso2c')
#> Error in countrycode(x, "cow.name", "iso2c"): Origin code not supported by countrycode or present in the user-supplied custom_dict.

The text was updated successfully, but these errors were encountered:

cjyetman · 2020-12-23T11:36:54Z

see ?countrycode::codelist

'cow.name' is a "Destination only" code. That being said, that could be made clearer (if not completely removed) in the output of guess_field() possibly.

for soe discussion of why it is a destination only code, see for instance #179

cjyetman · 2020-12-23T11:40:49Z

btw... nice to see guess_field() being used "in the wild" 😄

vincentarelbundock · 2020-12-23T14:26:19Z

just to clarify, you should just use "country.name" as origin. It should work.

I'm now thinking that this confusion might be encouraged by guess_field. Users don't actually need to know what name format is used. We do all the guessing work for them using regular expressions.

vincentarelbundock · 2020-12-23T14:27:27Z

maybe we should issue a warning when names are near the top of candidates.

cjyetman · 2020-12-23T14:59:43Z

true... as I understood it, the original intention was to give likely candidates for an origin code that would cover most/all of a given input vector... that leads me to...

it shouldn't ever suggest a destination only code
"codes" that can use regex are largely irrelevant as suggestions here
country name "codes" seem pretty weird here (though we do have an example of using it that way in the help file?!?)

vincentarelbundock · 2020-12-23T15:15:31Z

Yeah, removing them from the results would probably make sense. Would be nice if we could return "country.name" (and "country.name.de") instead.

But I'm off to cook the Joulukinkku now!

geotheory · 2020-12-23T16:52:21Z

How does providing "country.name" work then? It's not a field in countrycode::codelist.

I thought the function was a sort of universal translator from any name/code into any other. I'm unclear what is gained by adding restrictions to that - but then nor have I been through the full thought process that you guys have..

vincentarelbundock · 2020-12-23T17:00:13Z

The problem with using a specific long form of country name, is that they are not "standardized". The organizations that publish them do not usually "stand by" them in the same way they do for numeric or alpha code (the unicode org is an exception). So even the Correlates of War organization might spell the DRC differently in different publications. So if even a given vector of names really looks like cow.name, there is no guarantee that there won't be small variations to it in future (or longer) vectors. Moreover, slight differences in encoding can really mess things up, and a comma or apostrophe too.

I've been dealing with these codes for a while now, and IMO the better solution is to always merge based on shorter/standardized numeric or alpha codes.

When you call countrycode with country.name as "origin", the function will use regular expressions to detect the country name and assign it a unique identifier. Those regular expressions have been developed over the last 10 years, and they are thoroughly tested. They may not be perfect, but they are quite good and general, so they can detect variations of "North Korea", "People's republic of NK", etc.

So yeah, this is certainly an opinionated design choice, but I'm quite convinced that it is the right choice.

Of course, if you really want to use country.names as an "origin" code, it would be easy to select the two columns you want in the countrycode::codelist data.frame, and then use the merge command to insert the new column in your dataset.

vincentarelbundock · 2020-12-23T17:01:20Z

Example:

library(countrycode)
countrycode(
  c("Democratic republic of Congo", "Algeria", "USA"),
  "country.name",
  "iso3c")
#> [1] "COD" "DZA" "USA"

cjyetman · 2020-12-23T18:08:29Z

How does providing "country.name" work then? It's not a field in countrycode::codelist.

If I understood your question correctly, it works because of a hard-coded switch within the function that changes "country.code" to "country.code.en" if origin == 'country.name'

and further, there's a hard-coded switch within the function that sets origin_regex = TRUE if the origin code is "country.name.en" or "country.name.de", and then changes either of those to "country.name.en.regex" or "country.name.de.regex"

countrycode/R/countrycode.R

Lines 82 to 98 in 7454f75

    
           # Regex naming scheme 
        
           if (is.null(custom_dict)) { # only for default dictionary 
        
               # English regex is default 
        
               if (origin == 'country.name') { 
        
                   origin <- 'country.name.en' 
        
               } 
        
               if (destination == 'country.name') { 
        
                   destination <- 'country.name.en' 
        
               } 
        
               # .regex extension in dictionary colnames 
        
               if (origin %in% c('country.name.en', 'country.name.de')) { 
        
                   origin <- paste0(origin, '.regex') 
        
                   origin_regex <- TRUE 
        
               } else { 
        
                   origin_regex <- FALSE 
        
               } 
        
           }

If I remember correctly, the reasoning behind that was...

maintain backward compatibility (where "country.name" was the one and only regex column in codelist or the equivalent)
enable custom dictionaries to also use regex matching, but not without disrupting the more standard usage

cjyetman · 2020-12-23T18:42:00Z

I thought the function was a sort of universal translator from any name/code into any other. I'm unclear what is gained by adding restrictions to that - but then nor have I been through the full thought process that you guys have..

If one would like to do exact matching on one of the name columns of codelist, technically they could easily achieve that by setting custom_dict = codelist, but since the built-in regexes have been maintained and improved for 10+ years, your more likely to get a better result from using them than a snapshot of exact name matches from a certain point in that code's history.

library(countrycode)
cow_names <- na.omit(codelist$cow.name)
from_cow <- countrycode(cow_names, "cow.name", "iso2c", custom_dict = codelist)
#> Warning in countrycode(cow_names, "cow.name", "iso2c", custom_dict = codelist): Some values were not matched unambiguously: Austria-Hungary, Baden, Bavaria, Czechoslovakia, German Democratic Republic, Hanover, Hesse Electoral, Hesse Grand Ducal, Kosovo, Mecklenburg Schwerin, Modena, Parma, Republic of Vietnam, Saxony, Tuscany, Two Sicilies, Wuerttemburg, Yemen Arab Republic, Yemen People's Republic, Yugoslavia, Zanzibar
from_std <- countrycode(cow_names, "country.name", "iso2c")
#> Warning in countrycode(cow_names, "country.name", "iso2c"): Some values were not matched unambiguously: Austria-Hungary, Baden, Bavaria, Czechoslovakia, German Democratic Republic, Hanover, Hesse Electoral, Hesse Grand Ducal, Kosovo, Mecklenburg Schwerin, Modena, Parma, Republic of Vietnam, Saxony, Tuscany, Two Sicilies, Wuerttemburg, Yemen Arab Republic, Yemen People's Republic, Yugoslavia, Zanzibar
identical(from_cow, from_std)
#> [1] TRUE

geotheory · 2020-12-23T19:13:03Z

Thanks for the explainers @vincentarelbundock @cjyetman. I agree it feels sensible to exlude 'destination' schemes from the guess_field() results.

By the way you'd be surprised by how often I use guess_field(). I encounter new unspecified naming/coding systems almost every other day, and this function makes data joins much quicker. But until now I also didn't realise the complexity under the hood of the countrycode function. This is a pretty critical package :)

vincentarelbundock closed this as completed Mar 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Origin code not supported" error #256

"Origin code not supported" error #256

geotheory commented Dec 23, 2020

cjyetman commented Dec 23, 2020

cjyetman commented Dec 23, 2020

vincentarelbundock commented Dec 23, 2020 •

edited

Loading

vincentarelbundock commented Dec 23, 2020

cjyetman commented Dec 23, 2020

vincentarelbundock commented Dec 23, 2020

geotheory commented Dec 23, 2020

vincentarelbundock commented Dec 23, 2020 •

edited

Loading

vincentarelbundock commented Dec 23, 2020

cjyetman commented Dec 23, 2020

cjyetman commented Dec 23, 2020

geotheory commented Dec 23, 2020

"Origin code not supported" error #256

"Origin code not supported" error #256

Comments

geotheory commented Dec 23, 2020

cjyetman commented Dec 23, 2020

cjyetman commented Dec 23, 2020

vincentarelbundock commented Dec 23, 2020 • edited Loading

vincentarelbundock commented Dec 23, 2020

cjyetman commented Dec 23, 2020

vincentarelbundock commented Dec 23, 2020

geotheory commented Dec 23, 2020

vincentarelbundock commented Dec 23, 2020 • edited Loading

vincentarelbundock commented Dec 23, 2020

cjyetman commented Dec 23, 2020

cjyetman commented Dec 23, 2020

geotheory commented Dec 23, 2020

vincentarelbundock commented Dec 23, 2020 •

edited

Loading

vincentarelbundock commented Dec 23, 2020 •

edited

Loading