-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
COW codes Germany/Vietnam #179
Comments
These two countries are tricky, because CoW assigns different numerical identifiers depending on the year. In its cross-sectional incarnation,
Are these suboptimal, you think? If you are looking at panel data, a better option would be to use the country year dataset that is packaged with countrycode: |
Another alternative is to use the
|
I see your point going from names to COW codes, but why:
This should be
according to COW system membership file. |
I don't understand what you mean. Can you show me in both directions what you want? And please add iso3c for reference. For instance:
Obviously, this breaks the one-to-one mapping condition... |
According to COW http://www.correlatesofwar.org/data-sets/state-system-membership 260 = "German Federal Republic" and 255 = "Germany". I would expect that countrycode(260, "cown", "country.name") always outputs "German Federal Republic" since that name is assigned to the code 260 independent of any year. I understand that the output for countrycode("German Federal Republic", "country.name", "cown") depends on the year. |
The way Since the cown of "German Federal Republic" depends on the year, we need to choose either 260 or 255 in order to preserve that one-to-one symmetric mapping. To some extent, the choice is arbitrary. Currently, The proper way to deal with this issue is to use a panel conversion dictionary instead of a cross-sectional one. This is why |
I think the problem mainly arises from the ambiguous name "Federal Republic of Germany" and its ambiguous variations. If we agreed to always refer to the former "Federal Republic of Germany" as "West Germany" only, in regex and country.name.en etc., then it might be workable. We do convert "East Germany" to 265. |
Right. But that's a problem with CoW's coding, which we can't do anything about.
|
as far as I can tell, these are always true in CoW... so these could always work, not conflict, and always be directly reversible... that only becomes a problem if we want to allow something like (as some coding schemes do)... if we had...
it seems like it should work if we don't allow "West Germany" to be converted into anything else that's not definitively, unambiguously, exclusively equivalent to "West Germany" |
That's exactly the problem. And it's not just iso3c, it's "all other destinations". My sense is that CoW codes are popular, but that they are definitely a minority use case relative to "West Germany" -> All other codes. I think we want to keep the latter, even if it costs us the former. |
But of course, I don't have any systematic data to back that up. Just my own practice. |
got it... so to clarify a bit, doing this would screw up some of the other (possibly more common) codes which prefer to essentially view "West Germany" and "current Germany" as the same thing |
Yes. Do you share the intuition that those use-cases are more common? |
in my research yes, but I think it's a shame because CoW is one of very few robust country codes schemes one can use for time series that go back ~20+ years |
btw... there is a row in the current |
Good catch. I removed that line in this commit: 51f4a6f That whole discussion reinforces my belief that mostly, what we should be using is |
How is This is somewhat convoluted, and the result isn't great either? library(tibble)
library(dplyr)
library(countrycode)
df <- tribble(
~country, ~year, ~var,
260, 1985, 1,
265, 1985, 2,
255, 2000, 3
)
df %>%
left_join(codelist_panel[, c("cown", "country.name.en", "year")],
by = c("country" = "cown", "year" = "year"))
# # A tibble: 3 x 4
# country year var country.name.en
# <dbl> <dbl> <dbl> <chr>
# 1 260 1985 1 Germany
# 2 265 1985 2 German Democratic Republic
# 3 255 2000 3 Germany I think it would be good to think about...
|
The one-to-one matching logic seems to be a major departure from the previous package version as I recall it. May I ask, what's the reason for this departure? |
No departure. It was always like that. |
But in previous package versions I never got "NA" for a call countrycode(260, "cown", "country.name"). Why now? Up to version 0.19 from February this year, I get "Federal Republic of Germany" but now with version 1.0.0 its NA. |
I'm not sure why you find the
|
It just seems a bit awkward, especially for a package that generally makes things incredibly easy. I don't have a solution in mind, but I might start thinking of one. |
Sounds good. Let me know if you have some ideas. I'm very interested. w.r.t. to the current issue, do you think we should include entries in The important thing would be to ensure none of the entries duplicate what's in other rows (but our test suite should catch that). |
@sumtxt I suppose it could be considered a bug... devtools::install_github("vincentarelbundock/countrycode", ref = "0.19")
library(countrycode)
countrycode(260, "cown", "country.name")
# [1] "Federal Republic of Germany"
countrycode("Federal Republic of Germany", "country.name", "cown")
# [1] NA
# Warning messages:
# 1: In countrycode("Federal Republic of Germany", "country.name", "cown") :
# Some values were not matched unambiguously: Federal Republic of Germany
#
# 2: In countrycode("Federal Republic of Germany", "country.name", "cown") :
# Some strings were matched more than once, and therefore set to <NA> in the result: Federal Republic of Germany,260,255 |
arrgh, that's can't really work, since now we're merging the dictionary based on unique regexes Edit: And we want the regex for "Federal Republic of Germany" to map onto the other code. |
fyi... signing off for a bit... I'll come back to this in the coming days |
Thanks again for opening this issue. I still recognize that this is a problem, but unfortunately this problem can only be satisfactorily addressed by a fundamental re-write of We already have a (dormant but open) issue for this, which you can follow if there is interest: #186 I have also note the problem in a new "Frequently Requested Feature" issue, so I will close this one here. I'm only closing to avoid duplication and to clean up the repo. If Many-to-One is implemented, the specific case of Germany CoW codes will automatically be solved. Thanks for your patience! |
The countrycode version 1.00.0 doesn't mach COW codes for Germany (260) and Vietnam (816), ie.
countrycode(c(260,816), "cown", "country.name") outputs (NA,NA)
The text was updated successfully, but these errors were encountered: