WIGI

Wikipedia Gender Index (WIGI), uses Wikidata to produce gender-related statistic on Wikipedia Biographies

##The Data

For non-programmer researchers: A simple canonical version of our data, which is translated into English, is available at Figshare. https://figshare.com/articles/Wikidata_Human_Gender_Indicators/3100903
Programmers may explore the language-agnostic, longitudintal data at http://wigi.wmflabs.org/snapshot_data/ . ++ Some helper files to aggregate and map place of birth, ethnicity, and citizenship into "world cultures".

Data Munging Documentation Ipython Notebooks

munge and plot the intitial file and make the reindexes
look at world cultures by date of birth and gender over time and how to aggregate the cultures
Chi Squared Testing of Gender versus Culture and pretty plots of the same
How to make data for and test the celebrity hypothesis
Investigation into the Germanic Nationality Classification Shift
Aggregating sitelinks into a language-culture female percentage scatter plot
Modelling female percentage of biographies for prediction
Scraping out the mechanical turk disagreements for hand coding
How to make the sitelinks scatter plots
Comparing WIGI to the world economic forum

##The Writings

The paper so far on google docs please comment.
In progress discussion on meta.

##Notes Because Wikidata is multilingual its values are stored as identifiers - or ``Q-IDs'' in Wikidata terms - which can be translated into every language for which there is a Wikipedia language edition. To maintain fidelity we keep this standard, so for example Aung San Suu Kyi represented in Wikidata in English looks like Figure \ref{fig:aung} and in our dataset would be a row like \begin{small} Q36740,1945,,Q6581072|,,Q836|,Q37995|,,Q82955|Q36180|Q1476215| \end{small}. As a design decision we do not translate these Wikidata Q-IDs, to maintain language neutrality. We do however include functions to translate these Q-IDs into English (or any other language), which would render the above row as: \ \begin{small} Aung San Suu Kyi,1945,,female|,,Myanmar|,Yangon|,,politician|writer|human rights activist| \end{small}

In order to faithfully represent Wikidata, the value of each property is actually a list, since Wikidata allows there to potentially be multiple values for a property. This is because either two sources disagree on a property, or like in the case of Aung San Suu Kyi, she has many occupations, see Figure \ref{fig:aung}. We store the list, inside the comma-separated sheet, as | ``pipe''-separated values.

Of course these multiple values introduce a design problem in aggregating on a list of properties. Our method is to aggregate on the list, rather than on the individual items within the list. This means in the case of Aung San Suu Kyi, that her occupation is stored as politician, writer, and human rights activist, and is aggregated with all the other humans who have those three occupations too. Since the dataset is open, interested researchers can use our raw data and aggregate it in any way they want.

Name		Name	Last commit message	Last commit date
Latest commit History 139 Commits
Magnus Gender analysis		Magnus Gender analysis
chi		chi
figs		figs
helpers		helpers
opensym		opensym
opensym16		opensym16
.gitignore		.gitignore
Chi squared test.ipynb		Chi squared test.ipynb
Country Inspector Analysis Generator.ipynb		Country Inspector Analysis Generator.ipynb
Gender Culture Plots.ipynb		Gender Culture Plots.ipynb
GenderIndexProcessor.java		GenderIndexProcessor.java
German Austrian Analysis.ipynb		German Austrian Analysis.ipynb
LICENSE		LICENSE
Language Culture Scatter.ipynb		Language Culture Scatter.ipynb
Logistics Fem Per.ipynb		Logistics Fem Per.ipynb
Make-POB-Chi-Square-Test-Data.ipynb		Make-POB-Chi-Square-Test-Data.ipynb
Mechanical Turk Disagreements.ipynb		Mechanical Turk Disagreements.ipynb
README.md		README.md
Singlesite.ipynb		Singlesite.ipynb
Sitelinks Exmaple.ipynb		Sitelinks Exmaple.ipynb
World Cultures Analysis.ipynb		World Cultures Analysis.ipynb
World Economic Forum Comparison.ipynb		World Economic Forum Comparison.ipynb
df-comparator.py		df-comparator.py
gender-index-processing-standalone.ipynb		gender-index-processing-standalone.ipynb
gender-index-processing-standalone.py		gender-index-processing-standalone.py
gender-index-processing.ipynb		gender-index-processing.ipynb
logistics_graph.R		logistics_graph.R
site_links_example.R		site_links_example.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WIGI

Data Munging Documentation Ipython Notebooks

About

Releases

Packages

Contributors 4

Languages

License

notconfusing/WIGI

Folders and files

Latest commit

History

Repository files navigation

WIGI

Data Munging Documentation Ipython Notebooks

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages