-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standardize precinct identifier format #144
Comments
Those precinct IDs come from the Census, but only in cases where a state participated in the 2010 VTD program right? |
We invent them for state and local sources. We should be more consistent there...
And there should be crosswalk with other precinct data provider / sources.
… On Aug 1, 2018, at 11:41, Michal Migurski ***@***.***> wrote:
Those precinct IDs come from the Census, but only in cases where a state participated in the 2010 VTD program right?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
For PlanScore, I’ve been assigning them artisinal integers. Works really well internally but not something I’ve exposed generally. |
Please make them public!
… On Aug 1, 2018, at 12:51, Michal Migurski ***@***.***> wrote:
For PlanScore, I’ve been assigning them artisinal integers. Works really well internally but not something I’ve exposed generally.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
First, let me say how thrilled I was when I came across this project. Because it contains preinct-level geodata for the whole country, I think it can be the hub for any GIS or map election data project. I know it's a great starting off point for some work I plan to do! Regarding precincts, I reviewed precinct labels from a number of states and I was disappointed to find that there is little shared rhyme or reason among them. Some use numeric codes; some use physical location names, like "city hall"; some use a combination of the two; others seem not to include labels at all. If the goal is to standardize precinct labels in a way more general than "uppercase, split on non-alphanumeric and join with single whitespace," this project will have to come up with its own novel naming scheme. I'm not sure there is a "right" answer, on face. However, I think we can optimize the labeling for some common use cases. The work I plan to do involves joining this data set to other precinct-level data sets, e.g. data sets from here. I think a good way of standardizing the labels would be:
For example, let's say we find 10 such data sets. It's likely they'll all be at least a little different. But if we find that they all use place names to identify precincts, then we'd want to make sure to preserve place names when they're available in this data set. Because all the data sets will be different we won't find any scheme that's perfect, but we can at least find some objective measure for "better." I also think that what @nvkelso about data crosswalks is really important. If this data set is going to become a hub, then it needs to be as easy for other people to pick up and use for their own purposes as possible. To that point, I think that encouraging people to publish any crosswalks they create would be A Good Thing. (For example, when I do the join to the data sets linked above, I'll be happy to share a "join table" that maps this data set to those data sets.) Those joins make this data more useful; the joined data more useful; and any data that joins to either can now be mapped to both. Here are some data sets that I think it could be useful to review when trying to decide on a standard. I'm sure there are others, but hopefully these are a good start:
Just a couple of thoughts I had while elsewhere in the data set, for whatever they're worth. Hopefully they make sense. |
Hi @sigpwned, thanks for your kind words and thoughtful comments. I really like the idea of x-walk concordance "join" tables with other precinct datasets. I've been wondering if this project should allow both precinct "identifier" and precinct "name" columns when both those are available in the upstream sources to make this a little easier. |
That's an interesting idea! And it's knocked some ideas loose for me. Let me try to dump my brain while the thoughts are fresh. Based on my understanding, the goal of this project is:
Here are a few thoughts on getting there:
4 and 5 above are potentially significant issues. Regarding 4, it's difficult to know if these "duplicate" rows represent one precinct with the region split into multiple geometries, or if the rows are actually mislabeled. The only way I can think of to make that determination is to compare this data to other precinct-level data. Once we know that:
Regarding 5, it's much like 4, except that all precincts should be treated as having the same label. Teasing these apart into "real" labels is going to be fairly manual work, unfortunately. We probably can't cheat by comparing to a "known good" precinct data set because if that data set existed, presumably we'd be using that instead of the data we have. At the very least, we should be able to use this map or one like it to do the assignments. We're free to assign any IDs to updated rows we like. I think it would be wise to make those IDs look as much like other precinct ID labels as possible, but the reality is that new IDs are completely at our discretion. Any crosswalks we publish are essentially a relabel anyway, so users can substitute new labels if they wish. Regarding keeping two In any case, I think the plan of attack here should be to finish out #135 since we're close, and then generate a report sizing up 4 and 5 above, per state. We won't really know how much work this step will be until we have that report. Just my two cents. How does that seem to everyone else? |
Once we have #135 closed and the Here's where I think we are:
I think fixing the last two above are top priority. I'm not sure what the best way to approach that is, per the above, but I haven't though too hard on it yet either. Once this is handled however is deemed best, I think the next priority would be building crosswalks everywhere. How easy that is will probably depend on how we do this work. Again, just my two cents. What does everyone else think? |
Should generally be state fips (AA) & county fips (AAA) & precinct id (AAAAAAA*).
Sometimes there is both a precinct name and ID, perhaps we should include both variants? (Though extra columns inflates the DBF).
The text was updated successfully, but these errors were encountered: