GitHub - jstratman/Historical-Populations: Historical US City populations

This is a dataset and code that merges three major sources of historical US population data.

It is part of the in-progress Creating Data digital monograph. If citing, please cite that project in addition to this repo. Eg: "Schmidt, Benjamin. Creating Data: The Invention of Information in the nineteenth century American State. http://creatingdata.us".

License

This data is in the public domain and there are no legal restrictions on its use. If you're an academic, I'd recommend also citing the CESTA population set that this draws on, as well as Wikipedia if you can swing that.

Content

A fuller description of data and method is contained in the file extended_description.md and on the project page.

The sources are:

Every Wikipedia page with a population box.
A manually entered set of CSVs by Wikipedia editor Jacob Alperin-Sheriff (which is mostly, but not entirely, on wikipedia).
A set of historical populations compiled by Stanford's CESTA: U.S. Census Bureau and Erik Steiner, Spatial History Project, Center for Spatial and Textual Analysis, Stanford University.

There are many process files here. The most useful files are likely:

merged.csv (The union dataset.)
The files in wikipedia_state_data/, which include the parsed contents of all Wikipedia population boxes in the United States.
The files in wiki_census, which are the sources Alperin-Sheriff used to build the wikipedia page.

There are all sorts of errors here. Since this is built up programatically, I'm not interested in corrections to individual data points, although I encourage you to correct the Wikipedia pages.

I have made many efforts to merge duplicate cities in the merged.csv file, but there are many cases of double-counting of various sorts, especially when the wikipedia and CESTA populations diverge for a single city or when multiple levels of government each have an entry (for example, both Manhattan and New York City have entries).

The wikipedia set is about 4x bigger than the CESTA one. The following maps show roughly the original contributions of each dataset:

Also included is the code that performs extraction from a wikipedia dump, and which performs the merge (including a few examples of errors and differences between sets.) These are mostly in ipython notebooks, with a little bit in R notebooks. Most of the operational python code is broken out into the .py files which are imported.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.Rproj.user		.Rproj.user
.ipynb_checkpoints		.ipynb_checkpoints
wiki_census		wiki_census
wikipedia_state_data		wikipedia_state_data
.Rhistory		.Rhistory
.gitignore		.gitignore
1790-2010_MASTER.xlsx		1790-2010_MASTER.xlsx
2016_Gaz_cousubs_national.txt		2016_Gaz_cousubs_national.txt
City Sources.png		City Sources.png
First parsing.Rmd		First parsing.Rmd
First parsing.nb.html		First parsing.nb.html
Gaz_counties_national.txt		Gaz_counties_national.txt
Kmean and loess.Rmd		Kmean and loess.Rmd
Kmean and loess.nb.html		Kmean and loess.nb.html
Maxpop.png		Maxpop.png
Merge data together 2018.ipynb		Merge data together 2018.ipynb
Merge data together.ipynb		Merge data together.ipynb
Parse Wikipedia Dumps for Gutentext.ipynb		Parse Wikipedia Dumps for Gutentext.ipynb
Parse Wikipedia Dumps.ipynb		Parse Wikipedia Dumps.ipynb
README.md		README.md
Second Parsing.Rmd		Second Parsing.Rmd
Second Parsing.nb.html		Second Parsing.nb.html
U2.ipynb		U2.ipynb
cache.pickle		cache.pickle
cache2.pickle		cache2.pickle
city_pops.csv		city_pops.csv
extended_description.md		extended_description.md
merged.csv		merged.csv
merging_functions.py		merging_functions.py
nohup.out		nohup.out
places.shelf		places.shelf
provinces.py		provinces.py
wikiparser.py		wikiparser.py
wikipedia_population.Rproj		wikipedia_population.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

License

Content

About

Releases

Packages

Languages

jstratman/Historical-Populations

Folders and files

Latest commit

History

Repository files navigation

License

Content

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages