Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wikidata #85

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 16 additions & 3 deletions .github/workflows/actions.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name: Run Makefile with Selenium

on:
# Schedule to run every 1st day of a month at 00 UTC
# Schedule to run on the 1st day of each month
schedule:
- cron: '0 0 1 * *'

Expand Down Expand Up @@ -48,7 +48,20 @@ jobs:
run: |
Xvfb :99 -screen 0 1024x768x24 &
export DISPLAY=:99
make update
source venv/bin/activate
make update

- name: Run Make Clean
run: make clean
run: |
source venv/bin/activate
make clean

- name: Push and Commit
env:
CI_COMMIT_NAME: "Automated commit"
CI_COMMIT_EMAIL: "[email protected]"
CI_COMMIT_MESSAGE: "Automated commit"
run: |
git config --global user.email "${{env.CI_COMMIT_EMAIL}}"
git config --global user.name "${{env.CI_COMMIT_NAME}}"
git diff --quiet && echo "No changes to commit" || (git add data/country-codes.csv && git commit -m "${{env.CI_COMMIT_MESSAGE}}" && git push)
8 changes: 6 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@ all: data/country-codes.csv

.SECONDARY:

data/wd_countries.csv:
./scripts/wikidata.sh

data/iso3166.json:
python3 scripts/iso3166.py # Calls your custom iso3166 script
python3 scripts/csvtojson.py data/iso3166.csv data/iso3166-flat.json # Use your csvtojson function
Expand Down Expand Up @@ -54,11 +57,12 @@ data/country-codes.csv: data/country-codes.json data/geoname.csv data/cldr.csv d
python3 scripts/reorder_columns.py
python3 scripts/reorder_rows.py
cp data/country-codes-reordered-sorted.csv data/country-codes.csv
python3 scripts/cleanup.py # Ensure final column order
python3 scripts/wd_countries.py
python3 scripts/cleanup.py
cp data/country-codes.csv data/previous-country-codes.csv

clean:
# Delete all .csv files starting with 'country' except 'country-codes.csv'
# Delete all .csv files except 'country-codes.csv'
find data/ -name "*.csv" ! -name "country-codes.csv" -exec rm {} +

# Delete all .json files
Expand Down
4 changes: 2 additions & 2 deletions data/country-codes.csv
Original file line number Diff line number Diff line change
Expand Up @@ -162,8 +162,8 @@ NIG,227,NER,ng,Yes,562,181.0,NG,NR,NE,NGR,NIG,RN,la República del Níger,1,11.0
NGA,234,NGA,nr,Yes,566,182.0,NI,NI,NG,NIG,NGR,WAN,la República Federal de Nigeria,1,11.0,Nigéria,Nigéria (le),Naira,Федеративная Республика Нигерия,Nigeria,NGN,,Nigeria,566,尼日利亚联邦共和国,la République fédérale du Nigéria,Нигерия,566,202.0,2.0,نيجيريا,2,جمهورية نيجيريا الاتحادية,尼日利亚,,Western Africa,Nigeria,the Federal Republic of Nigeria,尼日利亚,Nigeria,NIGERIA,,Africa,نيجيريا,Sub-Saharan Africa,Нигерия,World,Abuja,AF,.ng,"en-NG,ha,yo,ig,ff",2328926,Nigeria,Q5
NIU,683,NIU,xh,Associated with NZ,570,183.0,NE, ,NU,NIU,NIU,NZ,Niue **,1,,Nioué,Nioué **,New Zealand Dollar,Ниуэ **,Niue **,NZD,x,Niue **,554,纽埃 **,Nioué **,Ниуэ **,570,61.0,9.0,نيوي,2,نيوي **,纽埃 **,,,Niue,Niue **,纽埃,Niue,NIUE,,Oceania,نيوي **,Polynesia,Ниуэ,World,Alofi,OC,.nu,"niu,en-NU",4036232,Niue,Q6
NFK,672,NFK,nx,Territory of AU,574,184.0,NF,NF,NF,NFK,NFI,AUS,,1,,Île Norfolk,,Australian Dollar,,,AUD,,,036,,,,574,53.0,9.0,جزيرة نورفولك,2,,,,,Isla Norfolk,,诺福克岛,Norfolk Island,NORFOLK ISLAND,,Oceania,,Australia and New Zealand,Остров Норфолк,World,Kingston,OC,.nf,en-NF,2155115,Pulau Norfolk,Q7
MKD,389,MKD,xn,Yes,807,241.0,MK,MJ,MK,MKD,MKD,MK,la República de Macedonia del Norte,1,,Macédoine du Nord,Macédoine du Nord (la),Denar,Республика Северная Македония,North Macedonia,MKD,,Macedonia del Norte,807,北马其顿共和国,la République de Macédoine du Nord,Северная Македония,807,39.0,150.0,مقدونيا الشمالية,2,جمهورية مقدونيا الشمالية,北马其顿,x,,Macedonia del Norte,the Republic of North Macedonia,北马其顿,North Macedonia,NORTH MACEDONIA,,Europe,مقدونيا الشمالية,Southern Europe,Северная Македония,World,Skopje,EU,.mk,"mk,sq,tr,rmm,sr",718075,Macedonia Utara,
NMI,1-670,MNP,nw,Commonwealth of US,580,185.0,CQ,MY,MP,MRA,NMA,USA,,1,,Îles Mariannes du Nord,,US Dollar,,,USD,x,,840,,,,580,57.0,9.0,جزر ماريانا الشمالية,2,,,,,Islas Marianas Septentrionales,,北马里亚纳群岛,Northern Mariana Islands,NORTHERN MARIANA ISLANDS,,Oceania,,Micronesia,Северные Марианские острова,World,Saipan,OC,.mp,"fil,tl,zh,ch-MP,en-MP",4041468,Kepulauan Mariana Utara,1V
MKD,389,MKD,xn,Yes,807,241.0,MK,MJ,MK,MKD,MKD,MK,la República de Macedonia del Norte,1,,Macédoine du Nord,Macédoine du Nord (la),Denar,Республика Северная Македония,North Macedonia,MKD,,Macedonia del Norte,807,北马其顿共和国,la République de Macédoine du Nord,Северная Македония,807,39.0,150.0,مقدونيا الشمالية,2,جمهورية مقدونيا الشمالية,北马其顿,x,,Macedonia del Norte,the Republic of North Macedonia,北马其顿,North Macedonia,NORTH MACEDONIA,,Europe,مقدونيا الشمالية,Southern Europe,Северная Македония,World,Skopje,EU,.mk,"mk,sq,tr,rmm,sr",718075,Macedonia Utara,
NOR,47,NOR,no,Yes,578,186.0,NO,NO,NO,NOR,NOR,N,el Reino de Noruega,1,,Norvège,Norvège (la),Norwegian Krone,Королевство Норвегия,Norway,NOK,,Noruega,578,挪威王国,le Royaume de Norvège,Норвегия,578,154.0,150.0,النرويج,2,مملكة النرويج,挪威,,,Noruega,the Kingdom of Norway,挪威,Norway,NORWAY,,Europe,النرويج,Northern Europe,Норвегия,World,Oslo,EU,.no,"no,nb,nn,se,fi",3144096,Norway,Q8
OMA,968,OMN,mk,Yes,512,187.0,MU,OM,OM,OMA,OMA, ,la Sultanía de Omán,1,,Oman,Oman,Rial Omani,Султанат Оман,Oman,OMR,,Omán,512,阿曼苏丹国,le Sultanat d'Oman,Оман,512,145.0,142.0,عمان,3,سلطنة عمان,阿曼,,,Omán,the Sultanate of Oman,阿曼,Oman,OMAN,,Asia,عمان,Western Asia,Оман,World,Muscat,AS,.om,"ar-OM,en,bal,ur",286963,Oman,P4
PAK,92,PAK,pk,Yes,586,188.0,PK,PK,PK,PAK,PAK,PK,la República Islámica del Pakistán,1,,Pakistan,Pakistan (le),Pakistan Rupee,Исламская Республика Пакистан,Pakistan,PKR,,Pakistán (el),586,巴基斯坦伊斯兰共和国,la République islamique du Pakistan,Пакистан,586,34.0,142.0,باكستان,2,جمهورية باكستان الإسلامية,巴基斯坦,,,Pakistán,the Islamic Republic of Pakistan,巴基斯坦,Pakistan,PAKISTAN,,Asia,باكستان,Southern Asia,Пакистан,World,Islamabad,AS,.pk,"ur-PK,en-PK,pa,sd,ps,brh",1168579,Pakistan,R0
Expand Down Expand Up @@ -236,8 +236,8 @@ UAE,971,ARE,ts,Yes,784,255.0,AE,ER,AE,UAE,UAE, ,los Emiratos Árabes Unidos,1,,
1,44,GBR,xxk,Yes,826,256.0,UK,UK,GB,G,GBR,GB,el Reino Unido de Gran Bretaña e Irlanda del Norte,1,,Royaume-Uni de Grande-Bretagne et d’Irlande du Nord,Royaume-Uni de Grande-Bretagne et d'Irlande du Nord (le),Pound Sterling,Соединенное Королевство Великобритании и Северной Ирландии,United Kingdom of Great Britain and Northern Ireland (the),GBP,,Reino Unido de Gran Bretaña e Irlanda del Norte (el),826,大不列颠及北爱尔兰联合王国,le Royaume-Uni de Grande-Bretagne et d'Irlande du Nord,Соединенное Королевство Великобритании и Северной Ирландии,826,154.0,150.0,المملكة المتحدة لبريطانيا العظمى وآيرلندا الشمالية,2,المملكة المتحدة لبريطانيا العظمى وأيرلندا الشمالية,大不列颠及北爱尔兰联合王国,,,Reino Unido de Gran Bretaña e Irlanda del Norte,the United Kingdom of Great Britain and Northern Ireland,大不列颠及北爱尔兰联合王国,United Kingdom of Great Britain and Northern Ireland,UNITED KINGDOM OF GREAT BRITAIN AND NORTHERN IRELAND,,Europe,المملكة المتحدة لبريطانيا العظمى وأيرلندا الشمالية,Northern Europe,Соединенное Королевство Великобритании и Северной Ирландии,World,London,EU,.uk,"en-GB,cy-GB,gd",2635167,UK,
TAN,255,TZA,tz,Yes,834,257.0,TZ,TN,TZ,TZA,TAN,EAT,la República Unida de Tanzanía,1,14.0,République-Unie de Tanzanie,République-Unie de Tanzanie (la),Tanzanian Shilling,Объединенная Республика Танзания,United Republic of Tanzania (the),TZS,,República Unida de Tanzanía (la),834,坦桑尼亚联合共和国,la République-Unie de Tanzanie,Объединенная Республика Танзания,834,202.0,2.0,جمهورية تنزانيا المتحدة,2,جمهورية تنزانيا المتحدة,坦桑尼亚联合共和国,,Eastern Africa,República Unida de Tanzanía,the United Republic of Tanzania,坦桑尼亚联合共和国,United Republic of Tanzania,"TANZANIA, UNITED REPUBLIC OF",x,Africa,جمهورية تنزانيا المتحدة,Sub-Saharan Africa,Объединенная Республика Танзания,World,Dodoma,AF,.tz,"sw-TZ,en,ar",149590,Tanzania,W0
, ,UMI,b,Territories of US,581,,a, ,UM, , ,USA,,1,,Îles mineures éloignées des États-Unis,,US Dollar,,,USD,,,840,,,,581,57.0,9.0,نائية التابعة للولايات المتحدة,2,,,,,Islas menores alejadas de Estados Unidos,,美国本土外小岛屿,United States Minor Outlying Islands,UNITED STATES MINOR OUTLYING ISLANDS,,Oceania,,Micronesia,Внешние малые острова Соединенных Штатов,World,,OC,.um,en-UM,5854968,Kepulauan Terpencil A.S.,2J
VIR,1-340,VIR,vi,Territory of US,850,258.0,VQ,VI,VI,VIR,ISV,USA,,1,29.0,Îles Vierges américaines,,US Dollar,,,USD,x,,840,,,,850,419.0,19.0,جزر فرجن التابعة للولايات المتحدة,2,,,,Caribbean,Islas Vírgenes de los Estados Unidos,,美属维尔京群岛,United States Virgin Islands,VIRGIN ISLANDS (U.S.),,Americas,,Latin America and the Caribbean,Виргинские острова Соединенных Штатов,World,Charlotte Amalie,,.vi,en-VI,4796775,Kepulauan Virgin A.S.,
USA,1,USA,xxu,Yes,840,259.0,US,US,US,USA,USA,USA,los Estados Unidos de América,1,,États-Unis d’Amérique,États-Unis d'Amérique (les),US Dollar,Соединенные Штаты Америки,United States of America (the),USD,,Estados Unidos de América (los),840,美利坚合众国,les États-Unis d'Amérique,Соединенные Штаты Америки,840,21.0,19.0,الولايات المتحدة الأمريكية,2,الولايات المتحدة الأمريكية,美利坚合众国,,,Estados Unidos de América,the United States of America,美利坚合众国,United States of America,UNITED STATES OF AMERICA,,Americas,الولايات المتحدة الأمريكية,Northern America,Соединенные Штаты Америки,World,Washington,,.us,"en-US,es-US,haw,fr",6252001,A.S,
VIR,1-340,VIR,vi,Territory of US,850,258.0,VQ,VI,VI,VIR,ISV,USA,,1,29.0,Îles Vierges américaines,,US Dollar,,,USD,x,,840,,,,850,419.0,19.0,جزر فرجن التابعة للولايات المتحدة,2,,,,Caribbean,Islas Vírgenes de los Estados Unidos,,美属维尔京群岛,United States Virgin Islands,VIRGIN ISLANDS (U.S.),,Americas,,Latin America and the Caribbean,Виргинские острова Соединенных Штатов,World,Charlotte Amalie,,.vi,en-VI,4796775,Kepulauan Virgin A.S.,
URU,598,URY,uy,Yes,858,260.0,UY,UY,UY,URG,URU,ROU,la República Oriental del Uruguay,1,5.0,Uruguay,Uruguay (l'),"Peso Uruguayo,Unidad Previsional",Восточная Республика Уругвай,Uruguay,"UYU,UYW",,Uruguay (el),"858,927",乌拉圭东岸共和国,la République orientale de l'Uruguay,Уругвай,858,419.0,19.0,أوروغواي,"2,4",جمهورية أوروغواي الشرقية,乌拉圭,,South America,Uruguay,the Eastern Republic of Uruguay,乌拉圭,Uruguay,URUGUAY,,Americas,أوروغواي,Latin America and the Caribbean,Уругвай,World,Montevideo,SA,.uy,es-UY,3439705,Uruguay,X3
UZB,998,UZB,uz,Yes,860,261.0,UZ,UZ,UZ,UZB,UZB,UZ,la República de Uzbekistán,1,,Ouzbékistan,Ouzbékistan (l'),Uzbekistan Sum,Республика Узбекистан,Uzbekistan,UZS,,Uzbekistán,860,乌兹别克斯坦共和国,la République d'Ouzbékistan,Узбекистан,860,143.0,142.0,أوزبكستان,2,جمهورية أوزبكستان,乌兹别克斯坦,x,,Uzbekistán,the Republic of Uzbekistan,乌兹别克斯坦,Uzbekistan,UZBEKISTAN,,Asia,أوزبكستان,Central Asia,Узбекистан,World,Tashkent,AS,.uz,"uz,ru,tg",1512440,Uzbekistan,2K
VAN,678,VUT,nn,Yes,548,262.0,NH,NV,VU,VUT,VAN,VU,la República de Vanuatu,1,,Vanuatu,Vanuatu,Vatu,Республика Вануату,Vanuatu,VUV,x,Vanuatu,548,瓦努阿图共和国,la République de Vanuatu,Вануату,548,54.0,9.0,فانواتو,0,جمهورية فانواتو,瓦努阿图,,,Vanuatu,the Republic of Vanuatu,瓦努阿图,Vanuatu,VANUATU,,Oceania,فانواتو,Melanesia,Вануату,World,Port Vila,OC,.vu,"bi,en-VU,fr-VU",2134431,Vanuatu,2L
Expand Down
2 changes: 1 addition & 1 deletion scripts/cleanup.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ def cleanup():
"Intermediate Region Name", "official_name_es", "UNTERM English Formal", "official_name_cn",
"official_name_en", "ISO4217-currency_country_name", "Least Developed Countries (LDC)", "Region Name",
"UNTERM Arabic Short", "Sub-region Name", "official_name_ru", "Global Name", "Capital",
"Continent", "TLD", "Languages", "Geoname ID", "CLDR display name", "EDGAR"
"Continent", "TLD", "Languages", "Geoname ID", "CLDR display name", "EDGAR","wikidata_id"
]

# Only reorder the columns that exist in both the dataframe and the desired order
Expand Down
20 changes: 20 additions & 0 deletions scripts/wd_countries.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-

import pandas as pd

def run():
"""
Retrieving Wikidata IDs by ISO-code and updating the country-codes CSV file.
"""
wd_countries = pd.read_csv('/Users/gradedsystem/Desktop/country-codes/data/wd_countries.csv')
country_codes = pd.read_csv('/Users/gradedsystem/Desktop/country-codes/data/country-codes.csv')

merged_data = pd.merge(country_codes, wd_countries, left_on='ISO3166-1-Alpha-2', right_on='iso2_code', how='left')

merged_data['wikidata_id'] = 'https://www.wikidata.org/wiki/' + merged_data['wd_id'].fillna('')

merged_data.to_csv('/Users/gradedsystem/Desktop/country-codes/data/country-codes.csv', index=False)

if __name__ == '__main__':
run()
29 changes: 29 additions & 0 deletions scripts/wd_countries.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
#!/bin/bash

## retrieving Wikidata dataset by SparQL:
curl -o data/wd_countries.csv -G 'https://query.wikidata.org/sparql' \
--header "Accept: text/csv" \
--data-urlencode query='
SELECT DISTINCT (?simple_value AS ?iso2_code) ?wd_id
WHERE {
?item p:P297 ?statement .
?statement ps:P297 ?simple_value .
OPTIONAL { ?statement pq:P582 ?qualifier . }
FILTER ( !bound(?qualifier) )
BIND ( strafter(str(?item), str(wd:)) AS ?wd_id ).
} ORDER BY ?iso2_code
'

# Eliminate duplication (confusion with kingdoms and territories)
# in the future we can use "P31 Q417175" to eliminate doublets of kingdows, but "territory vs nation" need some check.
# so, filtering invalid doublets and saving with same name:
grep -v 'Q756617\|Q29999\|Q407199\|Q240592\|Q83286\|Q1246' data/wd_countries.csv | sponge data/wd_countries.csv

# Use awk to modify the second column, write to a temporary file
awk -F, 'BEGIN {OFS = FS} {if (NR > 1) $2="https://www.wikidata.org/wiki/" $2; print}' "data/wd_countries.csv" > "data/wd_countries.tmp.csv"

# Replace original file with the updated file
mv data/wd_countries.tmp.csv data/wd_countries.csv

# filtering also the last two, that are not in use at ISO: Q83286=old YU, Yugoslavia; Q1246=XK, Kosovo.
# filtering wrong duplicated Q240592 Macedonia.
Loading