Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nice collection: data for municipals? #1

Open
Deleetdk opened this issue Nov 16, 2016 · 6 comments
Open

Nice collection: data for municipals? #1

Deleetdk opened this issue Nov 16, 2016 · 6 comments

Comments

@Deleetdk
Copy link

Deleetdk commented Nov 16, 2016

Thanks for a nice collection of data!

I research social inequality in American countries (among other places) and have previous done a study on Colombia using department-level data. This raised issues of sample size because there are only 32 departments and a capital district.

However, there seems to be about 1,100 municipals (municipio), thus enabling a much richer study.

I'm wondering if it would be possible to collect a dataset for municipals with the following information:

  • A broad selection of social, economic, health, crime, education etc. variables.
  • Some kind of testing data, perhaps SABER?, for each municipal. Mean scores are best, but pass rates are also okay.
  • Demographic data: population count, race/ethnic/'color', age structure (e.g. mean age), languages spoken.
  • Some kind of measure of natural economy, wealth from minerals/petroleum. There seems to be a strong resource economy in Colombia. This might affect results in unexpected way, so it is important to correct for.
  • Department information: which department it is a part of.
  • Geographical/climate: latitude, longitude (perhaps based on the capitals), elevation, access to water (yes/no), areal.

For all variables, it is best to average over a few years to obtain more reliable estimates.

My limitation is that I don't read Spanish well (I'm from Denmark!). If someone could help me put together such a dataset, I would be happy to send some money their way (something like 200 USD) and of course share the dataset publicly for free. :)

@dav009
Copy link
Member

dav009 commented Nov 16, 2016

Lots of informaiton. I will try to address each issue:

  • It is possible to get such data, but it requires a lot of waiting. with help of journalists and other people who ask for data, I developed http://queremos.datosabiertos.co/ which helps you make letter which you can send to the corresponding authorities to get the data in the format you desire. Beaware this takes A LONG TIME. From the moment you send the letter til the moment you get the data in your email.
  • SABER for sure provides the data by municipio, this data is already in consumable format, all what you have to do is to ask for it in the official site. I will poke a friend who worked with such data to check if he can get a quick link for you. I think SABER dataset is quite rich and comes with very specific fields( i.e: ethnicity, if family owns a tv...). Worth checking ICFES ( http://www.icfes.gov.co/ )
  • natural economy/ minerals. This is somewhat controversial and hard data to get a grasp on. my bet would be to look for NGOs and ask them for data, or ask the ministry using a letter generated by the url I provided above.
  • Department to municipio I think this data is available via: datos.gov.co (I remember seeing it there) worst case you can get it from a django project which implements the location for colombia (you will get the names of municipios to departamentos) . another option could be wikidata or dbpedia.
  • the water data pretty sure is easy available via IDEAM http://www.ideam.gov.co/

Check in the mailing list

Other websites

@dav009
Copy link
Member

dav009 commented Nov 16, 2016

@Deleetdk on the water maps, stats, data please check with hyances[AT]gmail he is working with openstreetmaps to solve various problems regarding water access for communities away from urban centers

@dav009
Copy link
Member

dav009 commented Nov 16, 2016

Just brainstorming here, your best option on your money bet , might be to approaching someone studying sociology/social sciences at a colombian university :[

@Deleetdk
Copy link
Author

Deleetdk commented Nov 16, 2016

@dav009 thanks for the replies. In my experience, the easiest way to find a lot of data fast is to use methods like these. Very briefly, use google search with tricks. For instance, in this case we use something like:

municipio site:co filetype:xls OR filetype:xlsx

This quickly locates lots of files with municipio level data. For instance, the first file

https://www.dane.gov.co/files/investigaciones/boletines/censo/DeficitViviendaCenso2005.xls

has all the links between municipios and departments, number of households and some household quality data (not sure about the exact translation).

The next file has some unsatisfied basic needs data.

https://www.dane.gov.co/files/censos/resultados/NBI_total_cab_resto_mpio_nal_31dic08.xls

This file is probably the most comprehensive:

https://colaboracion.dnp.gov.co/CDT/Desarrollo%20Social/IPM%20por%20municipio%20y%20dpto%202005%20(Incidencias%20y%20Privaciones_F).xls

So, with these, I have a bunch of socioeconomic variables, population.

Still need SABER/ICFES:

http://www2.icfes.gov.co/docman/instituciones-educativas-y-secretarias/saber-11/resultados-saber11/670-resultados-agregados-puntajes-promedio-saber-11-2014-2-por-institucion-educativa
http://www2.icfes.gov.co/docman/instituciones-educativas-y-secretarias/saber-11/resultados-saber11/671-resultados-agregados-puntajes-promedio-saber-11-2015-1-por-institucion-educativa

The first has a lot of data for 2014. Rows ≈ 12k. Not sure what is what.
The second has data for most municipio for 2015, rows ≈ 960.

Better perhaps to use the SABER case-level datasets. These have n≈550k. The case-level SABER dataset from 2014 seems to have data from almost all municipio, n=1024. Same with 2013 data. Good, we have that covered then.

Geographical data:

http://www.ideam.gov.co/documents/21021/553571/Promedios+Climatol%C3%B3gicos++1981+-+2010.xlsx/f28d0b07-1208-4a46-8ccf-bddd70fb4128

This has: elevation, latitude, longitude, precipitation, days with rain, temperature (mean, max, min), sun hours, humidity, and a few more. Great!

I was unable to find race/ethnicity data using keywords like "raza" and "etnia". But SABER datasets have race/ethnicity. If we can assume the students are representative within each municipio, then we can aggregate within each municipio and estimate the race/ethnicity proportions.

This covers just about everything, except for some of those natural resources. It's not a requirement, just nice to have.

The primary thing I will need help with, then, is some translation help. I will use Google Translate, but sometimes, the translation is unclear and one needs a Spanish speaker.

@demorenoc
Copy link

You can find raw (student level) data since 2005 for the SABER standardized tests in this R data package: https://github.com/nebulae-co/saber. On request, ICFES provides an FTP connection for researchers to access data, but it can take a while and can be messy so we packaged it. Also, map polygons at the municipal level in https://github.com/nebulae-co/colmaps.

@Deleetdk
Copy link
Author

Very nice. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants