-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uniquely identify Russia vs. Soviet Union #180
Comments
thanks for posting this! In your reply, you wrote
I agree with that. Especially, since mapping different countries into the same iso can create all sorts of problems (it did for me). In effect, we're mapping 15 countries into one country here ;-) An "NA" is more likely to create immediate problems which the researcher can address. "Prussia" for example is not matched so this would be consistent:
returns an error. I also would have expected this code:
to produce the usual warning about no unique mapping instead of quietly creating c("RUS", "RUS") but maybe I'm wrong. If it is interesting: for me personally (don't know if this is feasible) the ideal behaviour of country.code would have been something like:
"Warning message: maybe such a warning could be extended to your Germany issue or all historical countries? |
I kind of agree that behavior should be stricter. Kind of like |
According to Wikipedia, USSR has an "exceptionally reserved code" But this still leaves open the possibility that some code sets view a former country the same as a current country, as with West Germany being discussed in #179, so...
|
maybe we could add a code set specific tie breaker column for code sets that don't distinguish between some countries that other code sets do... and if an origin code has multiple matches, the tie breaker column would determine which to use? (also relevant to #179 ) |
Yep. For example, CoW only uses Would having different rows with tie breaks mess with our dictionary building process? I'm worried we'll add hacks on top of hacks... |
I haven't totally thought that through, but... I would imagine something like there being a CSV/data.frame that everything begins from... that has a It would definitely require some adaptation of the current build script, but I'm unsure how serious or complicated it would be. It would also require a major review of all the current code sets to determine which ones would require it and for which countries, though I suppose code sets could be adapted as the need was realized. |
In a way, that's not so different than before, when the regex conversion would just run iteratively and arbitrarily end up using the last regex in the CSV. (though obviously, with some more warnings and such.) |
true, but a bit more explicit than depending on the order of the rows it could also be based on a |
unfortunately, my countrycode bandwidth is exhausted for a while (need to get papers ready for summer conferences), but I'll keep thinking about tall this. |
This is an email I just got.
Also related to this issue: #179
The text was updated successfully, but these errors were encountered: