Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

?_extra= support (draft) #1999

Merged
merged 47 commits into from
Mar 22, 2023
Merged

?_extra= support (draft) #1999

merged 47 commits into from
Mar 22, 2023

Conversation

simonw
Copy link
Owner

@simonw simonw commented Jan 21, 2023

@simonw
Copy link
Owner Author

simonw commented Jan 21, 2023

I think I'm going to have to write a new view function from scratch which completely ignores the existing BaseView/DataView/TableView hierarchy.

Here's what I get on the incoming request:

(Pdb) request.url, request.full_path, request.host, request.url_vars
('http://127.0.0.1:8001/content/repos.json', '/content/repos.json', '127.0.0.1:8001',
  {'database': 'content', 'table': 'repos', 'format': 'json'})

@simonw
Copy link
Owner Author

simonw commented Jan 21, 2023

HTML mode needs a list of renderers so it can show links to .geojson etc - can do that as a hidden extra (maybe called renderers), repeating this code:

renderers = {}
for key, (_, can_render) in self.ds.renderers.items():
it_can_render = call_with_supported_arguments(
can_render,
datasette=self.ds,
columns=data.get("columns") or [],
rows=data.get("rows") or [],
sql=data.get("query", {}).get("sql", None),
query_name=data.get("query_name"),
database=database,
table=data.get("table"),
request=request,
view_name=self.name,
)
it_can_render = await await_me_maybe(it_can_render)
if it_can_render:
renderers[key] = self.ds.urls.path(
path_with_format(
request=request, format=key, extra_qs={**url_labels_extra}
)
)

@simonw
Copy link
Owner Author

simonw commented Feb 8, 2023

Next step: get ?_next=... working (it is ignored at the moment, even though the returned JSON includes the "next" key).

Then... figure out how to render HTML and other requested formats.

Then get the tests to pass!

simonw added a commit to simonw/til that referenced this pull request Feb 8, 2023
@simonw
Copy link
Owner Author

simonw commented Mar 8, 2023

Breaking this down into smaller steps:

  • Get ?_next= working
  • Implement extensions - so .json is needed again for the JSON version, and anything without an extension is passed through a new code path for HTML
  • That HTML view should only access JSON data, which can be seen by using .context - this will require a lot of updates to templates (it may be necessary to still provide access to some helper functions though). This will form the basis of the ambition to fully document the template context.
  • Get a bunch of the existing table HTML and JSON tests to pass
  • Use those tests to refactor the nasty _next code, see ?_extra= support (draft) #1999 (comment)
  • Figure out how the register_output_renderer(datasette) plugin hook should work

@simonw
Copy link
Owner Author

simonw commented Mar 8, 2023

I'm trying to get http://127.0.0.1:8001/fixtures/compound_three_primary_keys?_next=a,d,v to return the correct results.

@simonw
Copy link
Owner Author

simonw commented Mar 8, 2023

I'd really like to extract this ugly logic out into a helper function:

# Handle pagination driven by ?_next=
_next = request.args.get("_next")
offset = ""
if _next:
sort_value = None
if is_view:
# _next is an offset
offset = f" offset {int(_next)}"
else:
components = urlsafe_components(_next)
# If a sort order is applied and there are multiple components,
# the first of these is the sort value
if (sort or sort_desc) and (len(components) > 1):
sort_value = components[0]
# Special case for if non-urlencoded first token was $null
if _next.split(",")[0] == "$null":
sort_value = None
components = components[1:]
# Figure out the SQL for next-based-on-primary-key first
next_by_pk_clauses = []
if use_rowid:
next_by_pk_clauses.append(f"rowid > :p{len(params)}")
params[f"p{len(params)}"] = components[0]
else:
# Apply the tie-breaker based on primary keys
if len(components) == len(pks):
param_len = len(params)
next_by_pk_clauses.append(compound_keys_after_sql(pks, param_len))
for i, pk_value in enumerate(components):
params[f"p{param_len + i}"] = pk_value
# Now add the sort SQL, which may incorporate next_by_pk_clauses
if sort or sort_desc:
if sort_value is None:
if sort_desc:
# Just items where column is null ordered by pk
where_clauses.append(
"({column} is null and {next_clauses})".format(
column=escape_sqlite(sort_desc),
next_clauses=" and ".join(next_by_pk_clauses),
)
)
else:
where_clauses.append(
"({column} is not null or ({column} is null and {next_clauses}))".format(
column=escape_sqlite(sort),
next_clauses=" and ".join(next_by_pk_clauses),
)
)
else:
where_clauses.append(
"({column} {op} :p{p}{extra_desc_only} or ({column} = :p{p} and {next_clauses}))".format(
column=escape_sqlite(sort or sort_desc),
op=">" if sort else "<",
p=len(params),
extra_desc_only=""
if sort
else " or {column2} is null".format(
column2=escape_sqlite(sort or sort_desc)
),
next_clauses=" and ".join(next_by_pk_clauses),
)
)
params[f"p{len(params)}"] = sort_value
order_by = f"{order_by}, {order_by_pks}"
else:
where_clauses.extend(next_by_pk_clauses)

I copied it in and asked ChatGPT to "List all of the variable that are either modified or assigned to by the above code":

Modified variables:

offset
where_clauses
params
order_by

Assigned variables:

sort_value
next_by_pk_clauses
order_by (when sort or sort_desc is not None)

Then I asked which variables were used as inputs, and argued with it a bit about whether it should be counting functions. Eventually got to this:

My apologies for the oversight. Here are the variables needed as input by the above code, excluding any functions:

_next
is_view
sort
sort_desc
use_rowid
pks
order_by

Note that use_rowid is actually defined earlier in terms of two of those other variables:

use_rowid = not pks and not is_view

@simonw
Copy link
Owner Author

simonw commented Mar 8, 2023

So maybe I can refactor it to look a bit more like this:

sort, sort_desc, order_by = await _sort_order(
table_metadata, sortable_columns, request, order_by
)

One thing that's useful here is that is_view is handled early, like this:

if _next:
sort_value = None
if is_view:
# _next is an offset
offset = f" offset {int(_next)}"
else:
components = urlsafe_components(_next)

So if I omit the is_view bit from the extracted function I can simplify more.

@simonw
Copy link
Owner Author

simonw commented Mar 8, 2023

I'm going to hold off on that refactor until later, when I have tests to show me if the refactor works or not.

@simonw
Copy link
Owner Author

simonw commented Mar 8, 2023

Just noticed that _json=colname is not working, and that's because it's handled by the renderer here:

def json_renderer(args, data, view_name):
"""Render a response as JSON"""
status_code = 200
# Handle the _json= parameter which may modify data["rows"]
json_cols = []
if "_json" in args:
json_cols = args.getlist("_json")
if json_cols and "rows" in data and "columns" in data:
data["rows"] = convert_specific_columns_to_json(
data["rows"], data["columns"], json_cols
)

But that's not currently being called by my new code.

@simonw
Copy link
Owner Author

simonw commented Mar 8, 2023

The ease with which I added that ?_extra=query feature in 96e94f9 made me feel really confident that this architecture is going in the right direction.

diff --git a/datasette/views/table.py b/datasette/views/table.py
index 8d3bb2c930..3e1db9c85f 100644
--- a/datasette/views/table.py
+++ b/datasette/views/table.py
@@ -1913,6 +1913,13 @@ async def extra_request():
             "args": request.args._data,
         }
 
+    async def extra_query():
+        "Details of the underlying SQL query"
+        return {
+            "sql": sql,
+            "params": params,
+        }
+
     async def extra_extras():
         "Available ?_extra= blocks"
         return {
@@ -1938,6 +1945,7 @@ async def extra_extras():
         extra_primary_keys,
         extra_debug,
         extra_request,
+        extra_query,
         extra_extras,
     )

@simonw
Copy link
Owner Author

simonw commented Mar 8, 2023

For the HTML version, I need to decide where all of the stuff that happens in async def extra_template() is going to live.

I think it's another one of those extra functions, triggered for ?_extra=context.

async def extra_template():
nonlocal sort
display_columns, display_rows = await display_columns_and_rows(
self.ds,
database_name,
table_name,
results.description,
rows,
link_column=not is_view,
truncate_cells=self.ds.setting("truncate_cells_html"),
sortable_columns=await self.sortable_columns_for_table(
database_name, table_name, use_rowid=True
),
)
metadata = (
(self.ds.metadata("databases") or {})
.get(database_name, {})
.get("tables", {})
.get(table_name, {})
)
self.ds.update_with_inherited_metadata(metadata)
form_hidden_args = []
for key in request.args:
if (
key.startswith("_")
and key not in ("_sort", "_sort_desc", "_search", "_next")
and "__" not in key
):
for value in request.args.getlist(key):
form_hidden_args.append((key, value))
# if no sort specified AND table has a single primary key,
# set sort to that so arrow is displayed
if not sort and not sort_desc:
if 1 == len(pks):
sort = pks[0]
elif use_rowid:
sort = "rowid"
async def table_actions():
links = []
for hook in pm.hook.table_actions(
datasette=self.ds,
table=table_name,
database=database_name,
actor=request.actor,
request=request,
):
extra_links = await await_me_maybe(hook)
if extra_links:
links.extend(extra_links)
return links
# filter_columns combine the columns we know are available
# in the table with any additional columns (such as rowid)
# which are available in the query
filter_columns = list(columns) + [
table_column
for table_column in table_columns
if table_column not in columns
]
d = {
"table_actions": table_actions,
"use_rowid": use_rowid,
"filters": filters,
"display_columns": display_columns,
"filter_columns": filter_columns,
"display_rows": display_rows,
"facets_timed_out": facets_timed_out,
"sorted_facet_results": sorted(
facet_results.values(),
key=lambda f: (len(f["results"]), f["name"]),
reverse=True,
),
"form_hidden_args": form_hidden_args,
"is_sortable": any(c["sortable"] for c in display_columns),
"fix_path": self.ds.urls.path,
"path_with_replaced_args": path_with_replaced_args,
"path_with_removed_args": path_with_removed_args,
"append_querystring": append_querystring,
"request": request,
"sort": sort,
"sort_desc": sort_desc,
"disable_sort": is_view,
"custom_table_templates": [
f"_table-{to_css_class(database_name)}-{to_css_class(table_name)}.html",
f"_table-table-{to_css_class(database_name)}-{to_css_class(table_name)}.html",
"_table.html",
],
"metadata": metadata,
"view_definition": await db.get_view_definition(table_name),
"table_definition": await db.get_table_definition(table_name),
"datasette_allow_facet": "true"
if self.ds.setting("allow_facet")
else "false",
}
d.update(extra_context_from_filters)
return d

@simonw
Copy link
Owner Author

simonw commented Mar 8, 2023

Figuring out what to do with display_columns_and_rows() is hard. That returns rows as this special kind of object, which is designed to be accessed from the HTML templates:

class Row:
def __init__(self, cells):
self.cells = cells
def __iter__(self):
return iter(self.cells)
def __getitem__(self, key):
for cell in self.cells:
if cell["column"] == key:
return cell["raw"]
raise KeyError
def display(self, key):
for cell in self.cells:
if cell["column"] == key:
return cell["value"]
return None
def __str__(self):
d = {
key: self[key]
for key in [
c["column"] for c in self.cells if not c.get("is_special_link_column")
]
}
return json.dumps(d, default=repr, indent=2)

@simonw
Copy link
Owner Author

simonw commented Mar 8, 2023

Aside idea: it might be interesting if there were "lazy" template variables available in the context: things that are not actually executed unless a template author requests them.

Imagine if metadata was a lazy template reference, such that custom templates that don't display any metadata don't trigger it to be resolved (which might involve additional database queries some day).

@simonw
Copy link
Owner Author

simonw commented Mar 22, 2023

To replace this code:

async def render(self, templates, request, context=None):
context = context or {}
template = self.ds.jinja_env.select_template(templates)
template_context = {
**context,
**{
"database_color": self.database_color,
"select_templates": [
f"{'*' if template_name == template.name else ''}{template_name}"
for template_name in templates
],
},
}

Maybe datasette.render_template() should optionally accept a list of templates.

https://docs.datasette.io/en/stable/internals.html#await-render-template-template-context-none-request-none - turns out it does already:

If this is a list of template file names then the first one that exists will be loaded and rendered.

It doesn't have an easy way to populate that select_templates debug template variable though.

@simonw
Copy link
Owner Author

simonw commented Mar 22, 2023

Getting close now! Only 13 failures left, mostly relating to CSV.

FAILED tests/test_csv.py::test_table_csv - assert 500 == 200
FAILED tests/test_csv.py::test_table_csv_cors_headers - assert 500 == 200
FAILED tests/test_csv.py::test_table_csv_no_header - assert 500 == 200
FAILED tests/test_csv.py::test_table_csv_with_labels - assert 500 == 200
FAILED tests/test_csv.py::test_table_csv_with_nullable_labels - assert 500 == 200
FAILED tests/test_csv.py::test_table_csv_blob_columns - assert 500 == 200
FAILED tests/test_csv.py::test_table_csv_download - assert 500 == 200
FAILED tests/test_csv.py::test_table_csv_stream - assert 1 == 101
FAILED tests/test_plugins.py::test_hook_extra_css_urls[/fixtures/sortable-expected_decoded_object2] - AssertionError: assert {'added': 15,...ortable', ...} == {'added': 15,...ortable', ...}
FAILED tests/test_plugins.py::test_hook_register_facet_classes - KeyError: 'suggested_facets'
FAILED tests/test_csv.py::test_csv_trace - AttributeError: 'NoneType' object has no attribute 'text'
FAILED tests/test_plugins.py::test_hook_extra_body_script[/fixtures/sortable-expected_extra_body_script2] - AssertionError: assert {'added': 15,...ixtures', ...} == {'added': 15,...ixtures', ...}
FAILED tests/test_plugins.py::test_hook_register_output_renderer_all_parameters - assert {'1+1': 2, 'c... 0xXXX>', ...} == {'1+1': 2, 'c... 0xXXX>', ...}
=============== 13 failed, 1287 passed, 2 skipped, 1 xfailed in 61.57s (0:01:01) ================

@simonw
Copy link
Owner Author

simonw commented Mar 22, 2023

I rebased from main. Now:

FAILED tests/test_csv.py::test_table_csv - assert 500 == 200
FAILED tests/test_csv.py::test_table_csv_cors_headers - assert 500 == 200
FAILED tests/test_csv.py::test_table_csv_no_header - assert 500 == 200
FAILED tests/test_csv.py::test_table_csv_with_labels - assert 500 == 200
FAILED tests/test_csv.py::test_table_csv_with_nullable_labels - assert 500 == 200
FAILED tests/test_csv.py::test_table_csv_blob_columns - assert 500 == 200
FAILED tests/test_csv.py::test_table_csv_download - assert 500 == 200
FAILED tests/test_csv.py::test_table_csv_stream - assert 1 == 101
FAILED tests/test_csv.py::test_csv_trace - AttributeError: 'NoneType' object has no attribute 'text'
FAILED tests/test_plugins.py::test_hook_extra_css_urls[/fixtures/sortable-expected_decoded_object2] - AssertionError: assert {'added': 15,...ortable', ...} == {'added': 15,...ortable', ...}
FAILED tests/test_plugins.py::test_hook_render_cell_demo - AttributeError: 'NoneType' object has no attribute 'string'
FAILED tests/test_plugins.py::test_hook_render_cell_async[/fixtures/simple_primary_key] - assert b'RENDER_CELL_ASYNC_RESULT' in b'<!DOCTYPE html>\n<html>\n<head>\n    <title>Error 500</title>\n    <link rel="stylesheet" href="/-/static/app.css?d5...ils != detailsClickedWithin\n    ).forEach(details => details.open = false);\n});\n</script>\n\n\n\n\n</b...
FAILED tests/test_plugins.py::test_hook_extra_body_script[/fixtures/sortable-expected_extra_body_script2] - AssertionError: assert {'added': 15,...ixtures', ...} == {'added': 15,...ixtures', ...}
FAILED tests/test_plugins.py::test_hook_register_facet_classes - KeyError: 'suggested_facets'
FAILED tests/test_plugins.py::test_hook_register_output_renderer_all_parameters - assert {'1+1': 2, 'c... 0xXXX>', ...} == {'1+1': 2, 'c... 0xXXX>', ...}
FAILED tests/test_permissions.py::test_permissions_checked[/fixtures/simple_primary_key-permissions3] - assert 500 in (200, 403)
FAILED tests/test_table_html.py::test_table_csv_json_export_interface - assert 500 == 200
FAILED tests/test_table_html.py::test_table_metadata - assert 500 == 200
FAILED tests/test_html.py::test_css_classes_on_body[/fixtures/simple_primary_key-expected_classes3] - assert 500 == 200
FAILED tests/test_html.py::test_templates_considered[/fixtures/simple_primary_key-table-fixtures-simple_primary_key.html, *table.html] - assert 500 == 200
FAILED tests/test_plugins.py::test_hook_table_actions[simple_view] - AssertionError: assert [] == [{'href': '/'...simple_view'}]
ERROR tests/test_internals_database.py::test_execute_write_fn_connection_exception - Failed: Timeout >1.0s
ERROR tests/test_internals_database.py::test_mtime_ns - Failed: Timeout >1.0s
ERROR tests/test_internals_database.py::test_mtime_ns_is_none_for_memory - Failed: Timeout >1.0s
ERROR tests/test_internals_database.py::test_is_mutable - Failed: Timeout >1.0s
ERROR tests/test_internals_database.py::test_database_memory_name - Failed: Timeout >1.0s
ERROR tests/test_internals_database.py::test_in_memory_databases_forbid_writes - Failed: Timeout >1.0s
=============== 21 failed, 1275 passed, 2 skipped, 1 xfailed, 6 errors in 59.18s ================

@simonw
Copy link
Owner Author

simonw commented Mar 22, 2023

Oh this is a bit tricky.

I have a failing test because a plugin that uses the extra_css_urls hook can't see the columns for the page.

Turns out that bit comes from here:

datasette/datasette/app.py

Lines 1203 to 1217 in 56b0758

async def _asset_urls(self, key, template, context, request, view_name):
# Flatten list-of-lists from plugins:
seen_urls = set()
collected = []
for hook in getattr(pm.hook, key)(
template=template.name,
database=context.get("database"),
table=context.get("table"),
columns=context.get("columns"),
view_name=view_name,
request=request,
datasette=self,
):
hook = await await_me_maybe(hook)
collected.extend(hook)

Which assumes the context has "columns" - but that's only now available if ?_extra=columns was passed.

Actually I think I can cheat here, since it's still getting the HTML context in order to render the template.

@simonw
Copy link
Owner Author

simonw commented Mar 22, 2023

I hacked at the CSV stuff until it worked.

I need to clean it up though, but I can do that in this separate task:

@simonw simonw marked this pull request as ready for review March 22, 2023 22:32
@simonw
Copy link
Owner Author

simonw commented Mar 22, 2023

This still needs documentation, but now the tests are passing I'm going to merge it into main!

@codecov
Copy link

codecov bot commented Mar 22, 2023

Codecov Report

Patch coverage: 87.89% and project coverage change: -4.43 ⚠️

Comparison is base (56b0758) 92.15% compared to head (921faae) 87.73%.

❗ Current head 921faae differs from pull request most recent head 69a31cd. Consider uploading reports for the commit 69a31cd to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1999      +/-   ##
==========================================
- Coverage   92.15%   87.73%   -4.43%     
==========================================
  Files          38       38              
  Lines        5560     6066     +506     
==========================================
+ Hits         5124     5322     +198     
- Misses        436      744     +308     
Impacted Files Coverage Δ
datasette/views/database.py 96.61% <ø> (ø)
datasette/views/row.py 87.82% <ø> (ø)
datasette/views/table.py 69.11% <86.76%> (-23.46%) ⬇️
datasette/renderer.py 93.33% <90.90%> (-0.87%) ⬇️
datasette/views/base.py 92.78% <91.66%> (-2.39%) ⬇️
datasette/app.py 94.48% <100.00%> (-0.01%) ⬇️
datasette/cli.py 79.93% <100.00%> (ø)
datasette/hookspecs.py 100.00% <100.00%> (ø)
datasette/publish/cloudrun.py 97.29% <100.00%> (ø)
datasette/utils/__init__.py 94.59% <100.00%> (-0.27%) ⬇️

... and 2 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants