Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardize use of name and FIPS code in state and county fields #135

Open
bahoo opened this issue Mar 5, 2018 · 28 comments
Open

Standardize use of name and FIPS code in state and county fields #135

bahoo opened this issue Mar 5, 2018 · 28 comments

Comments

@bahoo
Copy link

bahoo commented Mar 5, 2018

When I do a SELECT DISTINCT state FROM 'precincts-2163' I get a list that is a mix of full state names, and FIPS codes:

01
02
04
Arkansas
06
08
09
10
11
12
Georgia
Hawaii
16
17
18
Iowa
Kansas
Kentucky
22
23
24
25
26
Minnesota
28
(null)
Missouri
30
31
32
33
34
35
36
37
38
39
40
41
Pennsylvania
45
46
47
48
49
50
Virginia
Washington
West Virginia
55
Wyoming
72

The county field is similar. I feel the need to standardize, and certainly see the value in having, potentially, both forms (if we'd want, then — state, state_fips, county, and county_fips).

Any thoughts?

Potential solutions could include:

  1. Hard coding state name or state FIPS code in each command
  2. Doing a SQLite join to a static CSV file with name and FIPs codes; e.g. ogr2ogr -F CSV state_fips.csv render/cb_2013_us_state_20m-2163.shp -dialect SQLite -sql "SELECT name AS name, statefp AS fips FROM 'cb_2013_us_state_20m-2163'"
  3. Similar approach but for county codes.

Happy to run with this and submit a PR, but would love to have more input, first.

Thanks!

@nvkelso
Copy link
Owner

nvkelso commented Mar 5, 2018 via email

@happyjack27
Copy link

happyjack27 commented Mar 6, 2018 via email

@migurski
Copy link
Collaborator

migurski commented Mar 6, 2018

👍 to FIPS always.

@nvkelso
Copy link
Owner

nvkelso commented Mar 6, 2018 via email

@nvkelso
Copy link
Owner

nvkelso commented Sep 3, 2018

The state portion of this was taken care of in #146.

@sigpwned
Copy link
Contributor

sigpwned commented Sep 9, 2018

I've done a little work on the data set and I believe I've got (a) all precincts assigned to a county; and (b) all counties standardized to FIPS and name. What is the best way to contribute this data back? One pull request overall, one pull request per state, ...? Most of the work was done in PostGIS, so I'm not 100% sure what the best way to give the data back will be.

@nvkelso
Copy link
Owner

nvkelso commented Sep 9, 2018 via email

@sigpwned
Copy link
Contributor

Thanks for the quick response!

Can you submit a small PR for just one or 2 states to demonstrate the changes? Then we can talk about the rest...

Absolutely. I think that's a great approach.

For Postgres... is this a 1 time operation? Ideally it’s something that can be done each makefile build. Like with OGR or Mapshaper?

A lot of this work was by-hand, so I'm afraid it's going to be a bit of a mixed bag, at least the way I've done it. However, I'm much more of a generalist than a GIS expert, so it's also totally possible that someone with more GIS experience could take the work I've done and figure out a way to stitch it into a more build-style process.

I'll try to open a couple of different PRs demonstrating different kind of work; link to those PRs here in a separate comment; and include in that comment the other work that I've done.

Also, THANK YOU for providing a great foundation here! I think making this data generally available is really important work. I hope I can help.

@sigpwned
Copy link
Contributor

sigpwned commented Sep 10, 2018

I was only able to open one ticket/PR this evening, but I did take the time to learn how to integrate the change into the Makefile. Hopefully that's at least a reasonable prototype for the other changes I have.

The new ticket is #149. The new PR is #150.

Here are the other changes I have. I did the work about a week ago, so it's possible that you've already fixed some of these. I think all this work can be considered part and parcel of this issue. I have organized them as a checklist to show the work I have yet to port over:

I also did some additional work that I think is probably beyond the scope of this issue that I will handle through another issue:

  • I aggregated Washington precincts spread across multiple records into a single MultiPolygon record.
    • It looks like precincts in WA can consist of multiple non-contiguous regions
    • Each original record seemed to represent one contiguous region
    • This action puts multiple non-contiguous regions into one record

Hopefully those make sense! Are these changes valuable? If so, what do you think is the best way to get these changes into the data set? Ideally, via the Makefile, right?

@sigpwned
Copy link
Contributor

sigpwned commented Sep 15, 2018

Changes to OR have been made per issue #149 / PR #150.

@sigpwned
Copy link
Contributor

sigpwned commented Sep 15, 2018

Changes to HI have been made per issue #151 / PR #152.

@sigpwned
Copy link
Contributor

Changes to VA have been made per issue #153 / PR #154.

@sigpwned
Copy link
Contributor

sigpwned commented Sep 15, 2018

Changes to PA have been made per issue #155 / PR #156.

Note that this change includes only county FIPS standardization and not precinct standardization. PA precincts do not always have names, and when names are present they are not unique within a county (e.g.: fid=274 and fid=298 both have county 107 and precinct Wd South).

How should precinct IDs be generated in this case? FIDs look like a good candidate, but without knowing more about how they were generated it's hard to recommend them.

@sigpwned
Copy link
Contributor

Changes to KS have been made per issue #157 / PR #158.

Note that this change includes only county FIPS standardization and not precinct standardization. KS precincts have many different labeling schemes, several of which have no obvious mapping back to a unique 4-digit numerical code (e.g., "Overland Park 1-01").

How should precinct IDs be handled in this case?

@sigpwned
Copy link
Contributor

sigpwned commented Sep 15, 2018

Changes to WA have been made per issue #159 / PR #160.

Note that this change includes only county FIPS standardization and not precinct standardization. WA precincts have a very consistent naming scheme and it would be easy to transform the existing names into any one of several good, standard, cross-compatible alternatives. However, it must first be decided what format to adopt.

How should precinct IDs be handled in this case?

It's important to settle this question for WA and the other states above. But you folks are way ahead of me, per #144. 🙂

@nvkelso
Copy link
Owner

nvkelso commented Sep 17, 2018

@sigpwned Thanks for your progress on the county part of this issue! Note I changed your text in #135 (comment) to use Markdown for the checkbox state :)

@sigpwned
Copy link
Contributor

sigpwned commented Sep 17, 2018

Thanks for your progress on the county part of this issue!

Thrilled to help! Very glad I can give the data back.

Note I changed your text in #135 (comment) to use Markdown for the checkbox state :)

Hot damn! That's much better. I didn't know you could do that. TIL! Time to go review the markdown docs...

I'll continue the work laid out above and then re-assess current state. Once I think the data is in a state appropriate to closing this ticket, I'll provide some PostGIS queries of the national file as evidence that state and county FIPS codes have been populated appropriately.

@sigpwned
Copy link
Contributor

sigpwned commented Sep 21, 2018

Currently, this project's model is that each precinct is associated with exactly one county, which (generally) fully encompasses it.

I like that model! It's simple and easy to understand. However, it turns out that Iowa's voter precincts are typically bigger than their counties. (In fact, precincts regularly fully encompass multiple counties!) How bizarre.

How should this be handled? I can think of a few options that will or won't work:

  1. If SQLite supported arrays, I'd be inclined to suggest we change the type of the county field from CHAR(3) to ARRAY(CHAR(3)), but SQLite does not appear to support arrays.
  2. Another (far hackier) option would be to make the county field a comma-separated list of county FIPS codes, but SQLite does not appear to support a SPLIT function either, so that would be problematic to work with.
  3. Another good option would be to force each state's GPKG to have two tables -- state and (for example) precinctcounty -- which would encode a one-to-many relationship between a precinct and any counties.
  4. Another option would be just to leave Iowa's county field blank because it doesn't fit the model, and incldue a precinctcounty table in Iowa only.

Does anyone have an opinion on how to handle? I'm leaning towards 4 above.

I'll check to see if the other states without counties called out above -- CA, DE, and NM -- have the same problem now.

EDIT -- I'm an idiot. The Iowa county file doesn't actually contain counties; rather, it appears to contain "townships." Counties in Iowa appear to be made up of multiple townships. I am re-evaluating now.

EDIT -- Update for Iowa has been pushed, per issue #161 / PR #162. All later geo-based updates will depend on this PR because it includes an update to the Dockerfile that enables spatial functions to be used when manipulating GPKG files.

@nvkelso
Copy link
Owner

nvkelso commented Sep 21, 2018 via email

@sigpwned
Copy link
Contributor

I'm fine with option 4 to skip it... or to put the "majority" county as a
connivence there.

Fortunately, I was wrong about the Iowa regions! The file labeled "counties" actually contained "townships" that make up "counties" in Iowa. Once I merged the townships into counties, everything turned out fine. I just commented too early. :)

@sigpwned
Copy link
Contributor

sigpwned commented Sep 23, 2018

OK, looks like all the PRs are open. Once they're all merged, I'll build the nation file and check for counties that aren't populated, or are populated with something other than a valid county FIPS code. I'll rinse and repeat until the counties are all done, or we've fixed as much as makes sense. I'll keep folks updated here. Once the counties are fully populated, I'll recommend closing this issue!

EDIT: Looks like there are 4 more states with county FIPS issues. Fortunately, these issues can be handled with simple CASE statements, so it should be possible to integrate these changes without waiting on @migurski's thoughts on the Docker updates.

EDIT: I've attempted to proactively check the county labels as if all the above PRs had been accepted and found the 4 issues above. All county labels are now numeric, which is great! However, some county labels are not exactly 3 chars or 5 chars, so there are some zero padding issues. I'll take a peek at those next.

EDIT: There are several states that use un-padded FIPS codes:

@nvkelso
Copy link
Owner

nvkelso commented Sep 23, 2018 via email

@sigpwned
Copy link
Contributor

The above PRs are now open. Georgia FIPS codes did not need padding because the "unpadded" counties are in fact missing. IMO, this can be handled separately, as it's a different issue. Once the above PRs are pulled, I think this issue is ready to close!

@nvkelso
Copy link
Owner

nvkelso commented Sep 23, 2018 via email

@sigpwned
Copy link
Contributor

Thrilled to help! I think making this kind of data generally available is important. And I'm looking forward to doing some fun analyses with it, too. 😄

@sigpwned
Copy link
Contributor

sigpwned commented Sep 24, 2018

Having slept on it, there's one more thing we might want to handle in this issue. While all precincts now use state and county FIPS codes, some precincts use $state_fips$county_fips as their county label, and some use just $county_fips.

If we standardize to one of the above for county IDs, then I struggle to think of any other work that could be logically assigned to this issue.

Regarding which way to lean -- $state_fips$county_fips or $county_fips -- because of the lack of consistency on precinct labels, I've come to believe that precinct labels should be left standalone and unstandardized. (That should be handled in #144, not here.) If precinct labels are going to be unstandardized, then I think county labels should probably be standalone too -- just $county_fips. Just my two cents.

Whichever way it goes, I think it makes sense to handle that in just one PR, rather than one per state.

@sigpwned
Copy link
Contributor

sigpwned commented Oct 2, 2018

Whoo hoo! PR merge party last night!

The last fix I can see that this issue may require is standardizing county FIPS codes as $state_fips$county_fips vs $county_fips for every state. Currently we have a mix. We can handle that here, or in a separate issue. If we want to handle that in a separate issue, we can close this one!

@bahoo
Copy link
Author

bahoo commented Oct 22, 2018

I'd suggest let's do it here, since most of the discussion is already here? 🤷‍♂️ Not at all strongly held, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants