Decision on handling JSON fields in download #61

tomschenkjr · 2015-10-27T05:08:41Z

This is for discussion purposes...

The program development team is leaning toward using JSON as the default download format. This decision is based on JSON being a more reliable and faster method of download--which was confirmed with the Socrata development team.

However, JSON and CSV differs in an important way. JSON files have three fields not available in the CSV and not seen in the web interface:

location.latitude
location.longitude
location.need_geocoding

The first two items are parsimonious breakouts of the concatenated location column that is generated by Socrata. However, these columns may be highly redundant as it's a common practice to upload a "latitude" and "longitude" column which is used to create a "location" column.

If we keep location.latitude and location.longitude, we should convert them to numbers to serve their practical function.

The location.needs_geocoding is less useful. It's an internal flag for Socrata to handle their geocoding practice to display on their maps. This is pertinent when the location column is not lat/long, but an address field. I don't see much reason to keep this column except it's the easier thing to do.

Something to keep in mind is the consistency of what one sees in R versus the web browser. Passing along a valid SoDA endpoint (e.g., example.com/resource/four-four.csv or example.com/resource/four-four.json) can be viewed in the browser. In our case, it always chooses JSON which can be inconsistent with the CSV columns. Need to balance this in the discussion.

Open for discussion on the best way to handle these three columns: keep them, drop (some of) them, or use trickier logic to align them to the request (e.g., drop them for CSV requests, keep them otherwise).

/cc @dmpe @geneorama

The text was updated successfully, but these errors were encountered:

dmpe · 2015-10-27T10:50:59Z

Just on the record:

The program development team is leaning toward using JSON as the default download format.
I will continue to believe that JSON and CSV should be both default ones. And user should choose one of them. But, yes, for the consensus, I agree to support JSON only and convert .csv to .json.

Others:

If we keep location.latitude and location.longitude, we should convert them to numbers to serve their practical function.
Yes, i would keep them and convert to numbers.

I don't see much reason to keep this column except it's the easier thing to do.

Agree, me too.

Something to keep in mind is the consistency of what one sees in R versus the web browser.
In our case, it always chooses JSON which can be inconsistent with the CSV columns.

Easy thing to do. Just let it be documented, somewhere.

Finally:
'drop some of them' 👍

geneorama · 2016-02-25T15:23:15Z

I just had to work around this exact issue again, so it's fresh in my mind.

I agree with @dmpe on all points, and would add a little more.

Details about `location`

In every case that I've seen the location value, it's a duplicate of the Latitude and Longitude columns.

Names like location.latitude are just a side effect of nested data being flattened. In some sense "latitude" is actually the full field name, and location is the name of the container.

CSV vs JSON column names

I think it would be fine / best to just use CSV column names, which are just called names in the metadata. If the names are set after the data.frame construction, they can be converted to match the current data.frame output with the function make.names.

JSON CONTENT

As far as I can tell nested data like location is always a duplicate of other fields, so it would be a lot cleaner to avoid nested content, and there'd be no loss of information.

tomschenkjr · 2016-02-25T16:09:51Z

The location field can take on other forms besides lat/long. By default (when uploading data), Socrata doesn't duplicate the fields. However, we've set our portal to show both latitude, longitude, and location (even though it's duplicative). But, duplication cannot be guaranteed.

The location field can also be something besides coordinates. For instance, the location field can be an address (e.g., 123 Main Street). We've avoided that on our portal (it's not useful in most mapping programs), but happens occasionally. It's probably more frequent on other data portals. It'll depend on the setup of the portal.

geneorama · 2016-02-25T21:54:33Z

I found a situation where you would not get all the columns in JSON that you would in CSV (if you skipped the "nested" fields as I was proposing).
https://data.edmonton.ca/resource/sy89-z97q
https://data.edmonton.ca/resource/sy89-z97q.csv
https://data.edmonton.ca/resource/sy89-z97q.json

The campaign_website field doesn't need to have two elements (since the second one is always NULL), but it does.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decision on handling JSON fields in download #61

Decision on handling JSON fields in download #61

tomschenkjr commented Oct 27, 2015

dmpe commented Oct 27, 2015

geneorama commented Feb 25, 2016

tomschenkjr commented Feb 25, 2016

geneorama commented Feb 25, 2016

Decision on handling JSON fields in download #61

Decision on handling JSON fields in download #61

Comments

tomschenkjr commented Oct 27, 2015

dmpe commented Oct 27, 2015

geneorama commented Feb 25, 2016

Details about location

CSV vs JSON column names

JSON CONTENT

tomschenkjr commented Feb 25, 2016

geneorama commented Feb 25, 2016

Details about `location`