-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nice collection: data for municipals? #1
Comments
Lots of informaiton. I will try to address each issue:
Check in the mailing list
Other websites
|
@Deleetdk on the water maps, stats, data please check with hyances[AT]gmail he is working with openstreetmaps to solve various problems regarding water access for communities away from urban centers |
Just brainstorming here, your best option on your money bet , might be to approaching someone studying sociology/social sciences at a colombian university :[ |
@dav009 thanks for the replies. In my experience, the easiest way to find a lot of data fast is to use methods like these. Very briefly, use google search with tricks. For instance, in this case we use something like:
This quickly locates lots of files with municipio level data. For instance, the first file https://www.dane.gov.co/files/investigaciones/boletines/censo/DeficitViviendaCenso2005.xls has all the links between municipios and departments, number of households and some household quality data (not sure about the exact translation). The next file has some unsatisfied basic needs data. https://www.dane.gov.co/files/censos/resultados/NBI_total_cab_resto_mpio_nal_31dic08.xls This file is probably the most comprehensive: So, with these, I have a bunch of socioeconomic variables, population. Still need SABER/ICFES: http://www2.icfes.gov.co/docman/instituciones-educativas-y-secretarias/saber-11/resultados-saber11/670-resultados-agregados-puntajes-promedio-saber-11-2014-2-por-institucion-educativa The first has a lot of data for 2014. Rows ≈ 12k. Not sure what is what. Better perhaps to use the SABER case-level datasets. These have n≈550k. The case-level SABER dataset from 2014 seems to have data from almost all municipio, n=1024. Same with 2013 data. Good, we have that covered then. Geographical data: This has: elevation, latitude, longitude, precipitation, days with rain, temperature (mean, max, min), sun hours, humidity, and a few more. Great! I was unable to find race/ethnicity data using keywords like "raza" and "etnia". But SABER datasets have race/ethnicity. If we can assume the students are representative within each municipio, then we can aggregate within each municipio and estimate the race/ethnicity proportions. This covers just about everything, except for some of those natural resources. It's not a requirement, just nice to have. The primary thing I will need help with, then, is some translation help. I will use Google Translate, but sometimes, the translation is unclear and one needs a Spanish speaker. |
You can find raw (student level) data since 2005 for the SABER standardized tests in this R data package: https://github.com/nebulae-co/saber. On request, ICFES provides an FTP connection for researchers to access data, but it can take a while and can be messy so we packaged it. Also, map polygons at the municipal level in https://github.com/nebulae-co/colmaps. |
Very nice. :) |
Thanks for a nice collection of data!
I research social inequality in American countries (among other places) and have previous done a study on Colombia using department-level data. This raised issues of sample size because there are only 32 departments and a capital district.
However, there seems to be about 1,100 municipals (municipio), thus enabling a much richer study.
I'm wondering if it would be possible to collect a dataset for municipals with the following information:
For all variables, it is best to average over a few years to obtain more reliable estimates.
My limitation is that I don't read Spanish well (I'm from Denmark!). If someone could help me put together such a dataset, I would be happy to send some money their way (something like 200 USD) and of course share the dataset publicly for free. :)
The text was updated successfully, but these errors were encountered: