Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redesign default .json format #782

Closed
simonw opened this issue May 30, 2020 · 55 comments
Closed

Redesign default .json format #782

simonw opened this issue May 30, 2020 · 55 comments

Comments

@simonw
Copy link
Owner

simonw commented May 30, 2020

The default JSON just isn't right. I find myself using ?_shape=array for almost everything I build against the API.

@simonw simonw added this to the Datasette 1.0 milestone May 30, 2020
@simonw
Copy link
Owner Author

simonw commented May 30, 2020

Here's the default JSON at the moment: https://latest.datasette.io/fixtures/compound_three_primary_keys.json?_size=2

{
  "database": "fixtures",
  "table": "compound_three_primary_keys",
  "is_view": false,
  "human_description_en": "",
  "rows": [
    [
      "a",
      "a",
      "a",
      "a-a-a"
    ],
    [
      "a",
      "a",
      "b",
      "a-a-b"
    ]
  ],
  "truncated": false,
  "filtered_table_rows_count": 1001,
  "expanded_columns": [],
  "expandable_columns": [],
  "columns": [
    "pk1",
    "pk2",
    "pk3",
    "content"
  ],
  "primary_keys": [
    "pk1",
    "pk2",
    "pk3"
  ],
  "units": {},
  "query": {
    "sql": "select pk1, pk2, pk3, content from compound_three_primary_keys order by pk1, pk2, pk3 limit 3",
    "params": {}
  },
  "facet_results": {},
  "suggested_facets": [
    {
      "name": "pk1",
      "toggle_url": "http://latest.datasette.io/fixtures/compound_three_primary_keys.json?_size=2&_facet=pk1"
    },
    {
      "name": "pk2",
      "toggle_url": "http://latest.datasette.io/fixtures/compound_three_primary_keys.json?_size=2&_facet=pk2"
    },
    {
      "name": "pk3",
      "toggle_url": "http://latest.datasette.io/fixtures/compound_three_primary_keys.json?_size=2&_facet=pk3"
    }
  ],
  "next": "a,a,b",
  "next_url": "http://latest.datasette.io/fixtures/compound_three_primary_keys.json?_size=2&_next=a%2Ca%2Cb",
  "query_ms": 17.56119728088379,
  "source": "tests/fixtures.py",
  "source_url": "https://github.com/simonw/datasette/blob/master/tests/fixtures.py",
  "license": "Apache License 2.0",
  "license_url": "https://github.com/simonw/datasette/blob/master/LICENSE"
}

There's a lot of stuff in there. This increases the risk that future minor changes might break existing API consumers.

It returns rows as a list of lists of values, and expects you to correlate these with the list of columns. I originally designed it like this because I thought this was a more efficient representation than repeating the column names in a dictionary for every row. With hindsight this was a bad optimization - I always use ?shape=array because it's more convenient, and gzip encoding of the response means there's no bandwidth saving. Users who want that efficiency should request it using a custom ?_shape=.

@simonw
Copy link
Owner Author

simonw commented May 30, 2020

https://latest.datasette.io/fixtures/compound_three_primary_keys.json?_size=2&_shape=array returns this:

[
  {
    "pk1": "a",
    "pk2": "a",
    "pk3": "a",
    "content": "a-a-a"
  },
  {
    "pk1": "a",
    "pk2": "a",
    "pk3": "b",
    "content": "a-a-b"
  }
]

There's one big problem with this format: it doesn't provide any space for pagination information.

@simonw
Copy link
Owner Author

simonw commented Sep 11, 2020

Another idea: the default output could be the list of dicts:

[
  {
    "pk1": "a",
    "pk2": "a",
    "pk3": "a",
    "content": "a-a-a"
  },
  ...
]

BUT... I could include pagination information in the HTTP headers - as seen in the WordPress REST API or the GitHub API:

~ % curl -s -i 'https://api.github.com/repos/simonw/datasette/commits' | head -n 40
HTTP/1.1 200 OK
server: GitHub.com
date: Fri, 11 Sep 2020 21:37:46 GMT
content-type: application/json; charset=utf-8
status: 200 OK
cache-control: public, max-age=60, s-maxage=60
vary: Accept, Accept-Encoding, Accept, X-Requested-With
etag: W/"71c99379743513394e880c6306b66bf9"
last-modified: Fri, 11 Sep 2020 21:32:54 GMT
x-github-media-type: github.v3; format=json
link: <https://api.github.com/repositories/107914493/commits?page=2>; rel="next", <https://api.github.com/repositories/107914493/commits?page=44>; rel="last"
access-control-expose-headers: ETag, Link, Location, Retry-After, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Used, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval, X-GitHub-Media-Type, Deprecation, Sunset
access-control-allow-origin: *
strict-transport-security: max-age=31536000; includeSubdomains; preload
x-frame-options: deny
x-content-type-options: nosniff
x-xss-protection: 1; mode=block
referrer-policy: origin-when-cross-origin, strict-origin-when-cross-origin
content-security-policy: default-src 'none'
X-Ratelimit-Limit: 60
X-Ratelimit-Remaining: 55
X-Ratelimit-Reset: 1599863850
X-Ratelimit-Used: 5
Accept-Ranges: bytes
Content-Length: 118240
X-GitHub-Request-Id: EC76:0EAD:313F40:5291A4:5F5BEE37

[
  {
    "sha": "d02f6151dae073135a22d0123e8abdc6cbef7c50",
    "node_id": "MDY6Q29tbWl0MTA3OTE0NDkzOmQwMmY2MTUxZGFlMDczMTM1YTIyZDAxMjNlOGFiZGM2Y2JlZjdjNTA=",
    "commit": {

Alternative shapes would provide the pagination information (and other extensions) in the JSON, e.g.:

/squirrels/squirrels.json?_shape=paginated

{
  "rows": [
  {
    "pk1": "a",
    "pk2": "a",
    "pk3": "a",
    "content": "a-a-a"
  }
  ],
  "pagination": {
    "next": "234",
    "count": 442
  }
}

@simonw
Copy link
Owner Author

simonw commented Sep 12, 2020

I'm going to hack together a preview of this in a branch and deploy it somewhere so people can see what I've got planned. Much easier to evaluate a working prototype than static examples.

@simonw
Copy link
Owner Author

simonw commented Sep 12, 2020

(I think I may have been over-thinking the details of this is for a couple of years now.)

@simonw
Copy link
Owner Author

simonw commented Sep 12, 2020

I'd like to revisit the idea of using ?_extra=x to opt-in to extra blocks of JSON, from #262

@simonw
Copy link
Owner Author

simonw commented Sep 12, 2020

Would it be so bad if the default format had a "rows" key containing the array of rows? Maybe it wouldn't. The reason I always use ?_shape=array is because I want an array of objects, rather than an array of arrays that I have to match up again with their columns.

A default format that's an object rather than array also gives something for the ?_extra= parameter to add its extras to.

@simonw
Copy link
Owner Author

simonw commented Sep 12, 2020

Plan: release a new release of Datasette (probably 0.49) with the new JSON API design, but provide a plugin called something like datasette-api-0-48 which runs as ASGI wrapping middleware and internally rewrites incoming requests to e.g. /db/table.json to behave if they have the ?_extra= params on them necessary to produce the 0.48 version of the JSON.

Anyone who has built applications against 0.48 can install that plugin.

@simonw
Copy link
Owner Author

simonw commented Oct 11, 2020

Since the total count can be expensive to calculate, I'm inclined to make that an opt-in extra - maybe ?_extra=count.

Based on that, the default JSON shape could look something like this:

{
  "rows": [{"id": 1}, {"id": 2}],
  "next": "2",
  "next_url": "/db/table?_next=2"
}

And with ?_extra=count:

{
  "rows": [{"id": 1}, {"id": 2}],
  "next": "2",
  "next_url": "/db/table?_next=2",
  "count": 31
}

@simonw
Copy link
Owner Author

simonw commented Oct 11, 2020

Should that default also include "columns" as a list of strings? That would be duplicate data of the keys in the "rows" list of objects, and I've never found myself wanting it in my own code - so I'm going to say no.

@simonw
Copy link
Owner Author

simonw commented Oct 11, 2020

Maybe .jsonfull becomes a new renderer that returns ALL of the defined ?_extra= blocks.

Or... ?_extra=all turns on ALL of the available information blocks (some of which can come from plugins).

@simonw
Copy link
Owner Author

simonw commented Oct 11, 2020

The core issue that I keep reconsidering is whether the default .json representation should be an object or a list.

Arguments in favour of a list:

  • It's what I always want. Almost all of the code that I've written against the API myself uses ?_shape=array.
  • It's really easy to use. You can pipe it to e.g. sqlite-utils insert, you can load it into JavaScript without thinking about it.

Arguments against:

  • Nowhere to put pagination or total counts. I added pagination to the link: HTTP header in Add Link: pagination HTTP headers #1014 (inspired by the WordPress and GitHub APIs) but I haven't solved for total count, and there's other stuff that's useful like "truncated": true to indicate that more than 1000 results were returned and they were truncated.
  • An array is inherently non-extensible: if the root item is an object it's easy to add new features to it in a backwards-compatible way in the future. An array is a fixed format.

But maybe that last point is a positive? It ensures the default .json format remains completely predictable forever.

If .json DID default to an array of objects, the ?_shape= argument could still be used to get back alternative formats.

Maybe .json?_extra=total changes the shape of that default to be this instead:

{
  "rows": [{"id": 1}, {"id": 2}],
  "total": 104
}

The thing I care about most though is next_url. That could be provided like so:

.json?_extra=total&_extra=next - alternative syntax .json?_extra=total,next:

{
  "rows": [{"id": 1}, {"id": 2}],
  "total": 104,
  "next": "2",
  "next_url": "/db/table.json?_extra=total&_extra=next&_next=2"
}

This is feeling a bit verbose for a common combination though.

@simonw simonw added this to the Datasette 1.0 milestone Oct 11, 2020
@simonw
Copy link
Owner Author

simonw commented Oct 11, 2020

I'm going to prototype what it would look like if the default shape was a list of objects and ?_extra= turns that into an object with a rows key, in a plugin. As a separate extension (maybe .json-preview).

@simonw
Copy link
Owner Author

simonw commented Oct 11, 2020

Building this plugin reminded me of an oddity of the register_output_renderer() plugin hook: one of the arguments that can be passed to it is data, which is the default internal data structure created by Datasette - but I deliberately avoided documenting that on https://docs.datasette.io/en/stable/plugin_hooks.html#register-output-renderer-datasette because it's not a stable interface.

That's not ideal. I'd like custom renderers to be able to access this data to get at things like suggested facets, on an opt-in basis.

So maybe that kind of stuff is re-implemented as "extras" which are awaitable callables - then renderer plugins can call the extras that they need to as part of their execution.

To illustrate the problem (in this case the need to access next_url) here's my first prototype of the plugin:

from datasette import hookimpl
from datasette.utils.asgi import Response


@hookimpl
def register_output_renderer(datasette):
    return {
        "extension": "json-preview",
        "render": json_preview,
    }


def json_preview(data, columns, rows):
    next_url = data.get("next_url")
    headers = {}
    if next_url:
        headers["link"] = '<{}>; rel="next"'.format(next_url)
    return Response.json([dict(zip(columns, row)) for row in rows], headers=headers)

@simonw
Copy link
Owner Author

simonw commented Oct 11, 2020

Here's the datasette-json-preview plugin I'll be using to experiment with different formats: https://github.com/simonw/datasette-json-preview

@earthboundkid
Copy link

earthboundkid commented Oct 20, 2020

I vote against headers. It has a lot of strikes against it: poor discoverability, new developers often don’t know how to use them, makes CORS harder, makes it hard to use eg with JQ, needs ad hoc specification for each bit of metadata, etc.

The only advantage of headers is that you don’t need to do .rows, but that’s actually good as a data validation step anyway—if .rows is missing assume there’s an error and do your error handling path instead of parsing the rest.

@simonw
Copy link
Owner Author

simonw commented Oct 20, 2020

Great point about CORS, I hadn't considered that.

I think I'm going to keep the Link: header (added in #1014) because I quite enjoy using it with GitHub and WordPress, but I'm not going to have it be the default way of doing pagination. For the default shape I'm now leaning towards this:

{
  "total": 36,
  "rows": [{"id": 1, "name": "Cleo"}],
  "next_url": "https://latest-with-plugins.datasette.io/fixtures/facetable.json?_next=5"
}

So three keys: total, rows and next_url. Then extra keys can be added using ?_extra= with various named bundles.

@simonw
Copy link
Owner Author

simonw commented Oct 20, 2020

I'll update datasette-json-preview with that now.

@simonw
Copy link
Owner Author

simonw commented Nov 16, 2021

I'm going to take another look at this:

@simonw simonw changed the title Redesign default JSON format in preparation for Datasette 1.0 Redesign default .json format Mar 15, 2022
@simonw simonw unpinned this issue Nov 13, 2022
@simonw
Copy link
Owner Author

simonw commented Dec 31, 2022

It turned out the most significant part of this change - switching from an array of arrays to an array of objects for the "rows" key - was really easy: Datasette already had a ?_shape=arrays v.s. ?_shape=objects mechanism, so I switched which one was the default in 234230e

@simonw
Copy link
Owner Author

simonw commented Dec 31, 2022

https://latest.datasette.io/fixtures/compound_three_primary_keys.json?_size=2 now shows the new default:

{
  "database": "fixtures",
  "table": "compound_three_primary_keys",
  "is_view": false,
  "human_description_en": "",
  "rows": [
    {
      "pk1": "a",
      "pk2": "a",
      "pk3": "a",
      "content": "a-a-a"
    },
    {
      "pk1": "a",
      "pk2": "a",
      "pk3": "b",
      "content": "a-a-b"
    }
  ],

The old format can be had like this: https://latest.datasette.io/fixtures/compound_three_primary_keys.json?_size=2&_shape=arrays

@simonw
Copy link
Owner Author

simonw commented Dec 31, 2022

New thinking on the trimmed-down default. Previously I was going to use "row" and "next_url" - I now want to do this instead:

{
  "ok": true,
  "rows": [
    {
      "pk1": "a",
      "pk2": "a",
      "pk3": "a",
      "content": "a-a-a"
    },
    {
      "pk1": "a",
      "pk2": "a",
      "pk3": "b",
      "content": "a-a-b"
    }
  ],
  "next": "a,a,b"
}

If there isn't a next page it will return "next": null.

This is even more succinct. I'm OK with people having to request next_url if they don't want to construct the new URL themselves.

The "ok": true is there so it can be false for errors, consistently.

@simonw
Copy link
Owner Author

simonw commented Dec 31, 2022

@simonw
Copy link
Owner Author

simonw commented Dec 31, 2022

I'm going to rename filtered_table_rows_count to count - to match the SQL count(*) function.

simonw added a commit that referenced this issue Dec 31, 2022
@simonw
Copy link
Owner Author

simonw commented Dec 31, 2022

https://latest.datasette.io/fixtures/compound_three_primary_keys.json?_size=2 now returns count:

{
  "database": "fixtures",
  "table": "compound_three_primary_keys",
  "is_view": false,
  "human_description_en": "",
  "rows": [
    {
      "pk1": "a",
      "pk2": "a",
      "pk3": "a",
      "content": "a-a-a"
    },
    {
      "pk1": "a",
      "pk2": "a",
      "pk3": "b",
      "content": "a-a-b"
    }
  ],
  "truncated": false,
  "count": 1001,

@simonw
Copy link
Owner Author

simonw commented Aug 10, 2023

Most of this is shipped in https://docs.datasette.io/en/1.0a3/changelog.html#a3-2023-08-09

@simonw simonw closed this as completed Aug 10, 2023
@simonw simonw unpinned this issue Aug 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants