-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standardize use of name and FIPS code in state and county fields #135
Comments
I’m down with always using FIPS.
… On Mar 5, 2018, at 15:05, Jon Culver ***@***.***> wrote:
When I do a SELECT DISTINCT state FROM 'precincts-2163' I get a list that is a mix of full state names, and FIPS codes:
01
02
04
Arkansas
06
08
09
10
11
12
Georgia
Hawaii
16
17
18
Iowa
Kansas
Kentucky
22
23
24
25
26
Minnesota
28
(null)
Missouri
30
31
32
33
34
35
36
37
38
39
40
41
Pennsylvania
45
46
47
48
49
50
Virginia
Washington
West Virginia
55
Wyoming
72
The county field is similar. I feel the need to standardize, and certainly see the value in having, potentially, both forms (if we'd want, then — state, state_fips, county, and county_fips).
Any thoughts?
Potential solutions could include:
Hard coding state name or state FIPS code in each command
Doing a SQLite join to a static CSV file with name and FIPs codes; e.g. ogr2ogr -F CSV state_fips.csv render/cb_2013_us_state_20m-2163.shp -dialect SQLite -sql "SELECT name AS name, statefp AS fips FROM 'cb_2013_us_state_20m-2163'"
Similar approach but for county codes.
Happy to run with this and submit a PR, but would love to have more input, first.
Thanks!
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
What I've done is have a hard-coded array of FIPS - abbreviation - state
name. This was for personal use. Gor largee distribution I think .csv
of the same would be more appropriate
On Mar 5, 2018 5:52 PM, "Nathaniel V. KELSO" <[email protected]>
wrote:
… I’m down with always using FIPS.
> On Mar 5, 2018, at 15:05, Jon Culver ***@***.***> wrote:
>
> When I do a SELECT DISTINCT state FROM 'precincts-2163' I get a list
that is a mix of full state names, and FIPS codes:
>
> 01
> 02
> 04
> Arkansas
> 06
> 08
> 09
> 10
> 11
> 12
> Georgia
> Hawaii
> 16
> 17
> 18
> Iowa
> Kansas
> Kentucky
> 22
> 23
> 24
> 25
> 26
> Minnesota
> 28
> (null)
> Missouri
> 30
> 31
> 32
> 33
> 34
> 35
> 36
> 37
> 38
> 39
> 40
> 41
> Pennsylvania
> 45
> 46
> 47
> 48
> 49
> 50
> Virginia
> Washington
> West Virginia
> 55
> Wyoming
> 72
> The county field is similar. I feel the need to standardize, and
certainly see the value in having, potentially, both forms (if we'd want,
then — state, state_fips, county, and county_fips).
>
> Any thoughts?
>
> Potential solutions could include:
>
> Hard coding state name or state FIPS code in each command
> Doing a SQLite join to a static CSV file with name and FIPs codes; e.g.
ogr2ogr -F CSV state_fips.csv render/cb_2013_us_state_20m-2163.shp
-dialect SQLite -sql "SELECT name AS name, statefp AS fips FROM
'cb_2013_us_state_20m-2163'"
> Similar approach but for county codes.
> Happy to run with this and submit a PR, but would love to have more
input, first.
>
> Thanks!
>
> —
> You are receiving this because you are subscribed to this thread.
> Reply to this email directly, view it on GitHub, or mute the thread.
>
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#135 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AB7NQe67xMtLEetxUhumkfSNPNizIu_xks5tbc-rgaJpZM4Sd0oH>
.
|
👍 to FIPS always. |
Let’s switch to always FIPS then.
… On Mar 5, 2018, at 19:53, Michal Migurski ***@***.***> wrote:
👍 to FIPS always.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
|
The state portion of this was taken care of in #146. |
I've done a little work on the data set and I believe I've got (a) all precincts assigned to a county; and (b) all counties standardized to FIPS and name. What is the best way to contribute this data back? One pull request overall, one pull request per state, ...? Most of the work was done in PostGIS, so I'm not 100% sure what the best way to give the data back will be. |
Nice! Can you submit a small PR for just one or 2 states to demonstrate the changes? Then we can talk about the rest...
For Postgres... is this a 1 time operation? Ideally it’s something that can be done each makefile build. Like with OGR or Mapshaper?
… On Sep 9, 2018, at 14:33, Andy Boothe ***@***.***> wrote:
I've done a little work on the data set and I believe I've got (a) all precincts assigned to a county; and (b) all counties standardized to FIPS and name. What is the best way to contribute this data back? One pull request overall, one pull request per state, ...? Most of the work was done in PostGIS, so I'm not 100% sure what the best way to give the data back will be.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Thanks for the quick response!
Absolutely. I think that's a great approach.
A lot of this work was by-hand, so I'm afraid it's going to be a bit of a mixed bag, at least the way I've done it. However, I'm much more of a generalist than a GIS expert, so it's also totally possible that someone with more GIS experience could take the work I've done and figure out a way to stitch it into a more build-style process. I'll try to open a couple of different PRs demonstrating different kind of work; link to those PRs here in a separate comment; and include in that comment the other work that I've done. Also, THANK YOU for providing a great foundation here! I think making this data generally available is really important work. I hope I can help. |
I was only able to open one ticket/PR this evening, but I did take the time to learn how to integrate the change into the Makefile. Hopefully that's at least a reasonable prototype for the other changes I have. The new ticket is #149. The new PR is #150. Here are the other changes I have. I did the work about a week ago, so it's possible that you've already fixed some of these. I think all this work can be considered part and parcel of this issue. I have organized them as a checklist to show the work I have yet to port over:
I also did some additional work that I think is probably beyond the scope of this issue that I will handle through another issue:
Hopefully those make sense! Are these changes valuable? If so, what do you think is the best way to get these changes into the data set? Ideally, via the Makefile, right? |
Changes to PA have been made per issue #155 / PR #156. Note that this change includes only county FIPS standardization and not precinct standardization. PA precincts do not always have names, and when names are present they are not unique within a county (e.g.: fid=274 and fid=298 both have county 107 and precinct Wd South). How should precinct IDs be generated in this case? FIDs look like a good candidate, but without knowing more about how they were generated it's hard to recommend them. |
Changes to KS have been made per issue #157 / PR #158. Note that this change includes only county FIPS standardization and not precinct standardization. KS precincts have many different labeling schemes, several of which have no obvious mapping back to a unique 4-digit numerical code (e.g., "Overland Park 1-01"). How should precinct IDs be handled in this case? |
Changes to WA have been made per issue #159 / PR #160. Note that this change includes only county FIPS standardization and not precinct standardization. WA precincts have a very consistent naming scheme and it would be easy to transform the existing names into any one of several good, standard, cross-compatible alternatives. However, it must first be decided what format to adopt. How should precinct IDs be handled in this case? It's important to settle this question for WA and the other states above. But you folks are way ahead of me, per #144. 🙂 |
@sigpwned Thanks for your progress on the county part of this issue! Note I changed your text in #135 (comment) to use Markdown for the checkbox state :) |
Thrilled to help! Very glad I can give the data back.
Hot damn! That's much better. I didn't know you could do that. TIL! Time to go review the markdown docs... I'll continue the work laid out above and then re-assess current state. Once I think the data is in a state appropriate to closing this ticket, I'll provide some PostGIS queries of the national file as evidence that state and county FIPS codes have been populated appropriately. |
Currently, this project's model is that each precinct is associated with exactly one county, which (generally) fully encompasses it. I like that model! It's simple and easy to understand. However, it turns out that Iowa's voter precincts are typically bigger than their counties. (In fact, precincts regularly fully encompass multiple counties!) How bizarre. How should this be handled? I can think of a few options that will or won't work:
Does anyone have an opinion on how to handle? I'm leaning towards 4 above. I'll check to see if the other states without counties called out above -- EDIT -- I'm an idiot. The Iowa county file doesn't actually contain counties; rather, it appears to contain "townships." Counties in Iowa appear to be made up of multiple townships. I am re-evaluating now. EDIT -- Update for Iowa has been pushed, per issue #161 / PR #162. All later geo-based updates will depend on this PR because it includes an update to the Dockerfile that enables spatial functions to be used when manipulating GPKG files. |
I'm fine with option 4 to skip it... or to put the "majority" county as a
connivence there.
(The same can also be said for postal codes...)
…On Fri, Sep 21, 2018 at 10:02 AM Andy Boothe ***@***.***> wrote:
Currently, this project's model is that each precinct is associated with
exactly one county, which (generally) fully encompasses it.
I like that model! It's simple and easy to understand. However, it turns
out that Iowa's voter precincts are typically bigger than their counties.
(In fact, precincts regularly fully encompass multiple counties!) How
bizarre.
How should this be handled? I can think of a few options that will or
won't work:
1. If SQLite supported arrays, I'd be inclined to suggest we change
the type of the county field from CHAR(3) to ARRAY(CHAR(3)), but
SQLite does not appear to support arrays.
2. Another (far hackier) option would be to make the county field a
comma-separated list of county FIPS codes, but SQLite does not appear to
support a SPLIT function either, so that would be problematic to work
with.
3. Another good option would be to force each state's GPKG to have two
tables -- state and (for example) precinctcounty -- which would encode
a one-to-many relationship between a precinct and any counties.
4. Another option would be just to leave Iowa's county field blank
because it doesn't fit the model, and incldue a precinctcounty table
in Iowa only.
Does anyone have an opinion on how to handle? I'm leaning towards 4 above.
I'll check to see if the other states without counties called out above --
CA, DE, and NM -- have the same problem now.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#135 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA0EO_EVycBcWkFfIsNbFk_A9b7lTPLiks5udRungaJpZM4Sd0oH>
.
|
Fortunately, I was wrong about the Iowa regions! The file labeled "counties" actually contained "townships" that make up "counties" in Iowa. Once I merged the townships into counties, everything turned out fine. I just commented too early. :) |
OK, looks like all the PRs are open. Once they're all merged, I'll build the nation file and check for counties that aren't populated, or are populated with something other than a valid county FIPS code. I'll rinse and repeat until the counties are all done, or we've fixed as much as makes sense. I'll keep folks updated here. Once the counties are fully populated, I'll recommend closing this issue! EDIT: Looks like there are 4 more states with county FIPS issues. Fortunately, these issues can be handled with simple
EDIT: I've attempted to proactively check the county labels as if all the above PRs had been accepted and found the 4 issues above. All county labels are now numeric, which is great! However, some county labels are not exactly 3 chars or 5 chars, so there are some zero padding issues. I'll take a peek at those next. EDIT: There are several states that use un-padded FIPS codes:
|
Woot! Let’s see what Migurski says :)
I merged in Nevada data last night, and have Ohio data to add this week then I’m all caught up.
… On Sep 23, 2018, at 08:55, Andy Boothe ***@***.***> wrote:
OK, looks like all the PRs are open. Once they're all merged, I'll build the nation file and check for counties that aren't populated, or are populated with something other than a valid county FIPS code. I'll rinse and repeat until the counties are all done, or we've fixed as much as makes sense. I'll keep folks updated here. Once the counties are fully populated, I'll recommend closing this issue!
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
|
The above PRs are now open. Georgia FIPS codes did not need padding because the "unpadded" counties are in fact missing. IMO, this can be handled separately, as it's a different issue. Once the above PRs are pulled, I think this issue is ready to close! |
Party time! Thanks for all your work on this :)
… On Sep 23, 2018, at 15:54, Andy Boothe ***@***.***> wrote:
The above PRs are now open. Georgia FIPS codes did not need padding because the "unpadded" counties are in fact missing. IMO, this can be handled separately, as it's a different issue. Once the above PRs are pulled, I think this issue is ready to close!
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Thrilled to help! I think making this kind of data generally available is important. And I'm looking forward to doing some fun analyses with it, too. 😄 |
Having slept on it, there's one more thing we might want to handle in this issue. While all precincts now use state and county FIPS codes, some precincts use If we standardize to one of the above for county IDs, then I struggle to think of any other work that could be logically assigned to this issue. Regarding which way to lean -- Whichever way it goes, I think it makes sense to handle that in just one PR, rather than one per state. |
Whoo hoo! PR merge party last night! The last fix I can see that this issue may require is standardizing county FIPS codes as |
I'd suggest let's do it here, since most of the discussion is already here? 🤷♂️ Not at all strongly held, though. |
When I do a
SELECT DISTINCT state FROM 'precincts-2163'
I get a list that is a mix of full state names, and FIPS codes:The
county
field is similar. I feel the need to standardize, and certainly see the value in having, potentially, both forms (if we'd want, then —state
,state_fips
,county
, andcounty_fips
).Any thoughts?
Potential solutions could include:
ogr2ogr -F CSV state_fips.csv render/cb_2013_us_state_20m-2163.shp -dialect SQLite -sql "SELECT name AS name, statefp AS fips FROM 'cb_2013_us_state_20m-2163'"
Happy to run with this and submit a PR, but would love to have more input, first.
Thanks!
The text was updated successfully, but these errors were encountered: