-
-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Advanced class-based conversions=
mechanism
#402
Comments
From that thread, two extra ideas which it may be possible to support in a single implementation: from sqlite_utils.conversions import LongitudeLatitude
db["places"].insert(
{
"name": "London",
"lng": -0.118092,
"lat": 51.509865,
},
conversions={"point": LongitudeLatitude("lng", "lat")},
) And db["places"].insert(
{
"name": "London",
"point": LongitudeLatitude(-0.118092, 51.509865)
}
) |
So the key idea here is to introduce a new abstract base class,
|
I like the idea that the contract for |
I think this is the code that needs to become aware of this system: sqlite-utils/sqlite_utils/db.py Lines 2453 to 2469 in fea8c9b
There's an earlier branch that runs for upserts which needs to be modified too: sqlite-utils/sqlite_utils/db.py Lines 2417 to 2440 in fea8c9b
|
I wonder if there's any overlap with the goals here and the I'm not sure that's exactly what we're talking about here, but it might be a parallel with some useful ideas to borrow. |
Hah, that's interesting - I've never used that mechanism before so it wasn't something that came to mind. They seem to be using a pretty surprising trick there that takes advantage of SQLite allowing you to define a column "type" using a made-up type name, which you can then introspect later. |
I've never used it either, but it's interesting, right? Feel like I should try it for something. I'm trying to get my head around how this conversions feature might work, because I really like the idea of it. |
I have an idea for how that third option could work - the one that creates a new column using values from the existing ones: db["places"].insert(
{
"name": "London",
"lng": -0.118092,
"lat": 51.509865,
},
conversions={"point": LongitudeLatitude("lng", "lat")},
) How about specifying that the values in that
Then you could do this: db["places"].insert(
{
"name": "London",
"lng": -0.118092,
"lat": 51.509865,
},
conversions={
"point": lambda row: LongitudeLatitude(
row["lng"], row["lat"]
)
}
) Something I really like about this is that it expands the abilities of |
I'm going to write the documentation for this first, before the implementation, so I can see if it explains cleanly enough that the design appears to be sound. |
What if you did something like this: class Conversion:
def __init__(self, *args, **kwargs):
"Put whatever settings you need here"
def python(self, row, column, value): # not sure on args here
"Python step to transform value"
return value
def sql(self, row, column, value):
"Return the actual sql that goes in the insert/update step, and maybe params"
# value is the return of self.python()
return value, [] This way, you're always passing an instance, which has methods that do the conversion. (Or you're passing a SQL string, as you would now.) The You'd then use it like this: # subclass might be unneeded here, if methods are present
class LngLatConversion(Conversion):
def __init__(self, x="longitude", y="latitude"):
self.x = x
self.y = y
def python(self, row, column, value):
x = row[self.x]
y = row[self.y]
return x, y
def sql(self, row, column, value):
# value is now a tuple, returned above
s = "GeomFromText(POINT(? ?))"
return s, value
table.insert_all(rows, conversions={"point": LngLatConversion("lng", "lat"))} I haven't thought through all the implementation details here, and it'll probably break in ways I haven't foreseen, but wanted to get this idea out of my head. Hope it helps. |
My hunch is that the case where you want to consider input from more than one column will actually be pretty rare - the only case I can think of where I would want to do that is for latitude/longitude columns - everything else that I'd want to use it for (which admittedly is still mostly SpatiaLite stuff) works against a single value. The reason I'm leaning towards using the constructor for the values is that I really like the look of this variant for common conversions: db["places"].insert(
{
"name": "London",
"boundary": GeometryFromGeoJSON({...})
}
) |
The CLI version of this could perhaps look like this:
This will treat the boundary key as GeoJSON. It's equivalent to passing The combined latitude/longitude case here can be handled by combining this with the existing Any |
Yeah, the CLI experience is probably where any kind of multi-column, configured setup is going to fall apart. Sticking with GIS examples, one way I might think about this is using the fiona CLI: # assuming a database is already created and has SpatiaLite
fio cat boundary.shp | sqlite-utils insert boundaries --conversion geometry GeometryGeoJSON - Anyway, very interested to see where you land here. |
Other possible pairs: unconventional date/datetime and timezone pairs eg |
For some discussion of converters, see: #612 |
The
conversions=
parameter works like this at the moment: https://sqlite-utils.datasette.io/en/3.23/python-api.html#converting-column-values-using-sql-functionsThis proposal is to support values in that dictionary that are objects, not strings, which can represent more complex conversions - spun out from #399.
New proposed mechanism:
Here
LongitudeLatitude
is a magical value which does TWO things: it sets up theGeomFromText(?, 4326)
SQL function, and it handles converting the(51.509865, -0.118092)
tuple into aPOINT({} {})
string.This would involve a change to the
conversions=
contract - where it usually expects a SQL string fragment, but it can also take an object which combines that SQL string fragment with a Python conversion function.Best of all... this resolves the
lat, lon
v.s.lon, lat
dilemma because you can usefrom sqlite_utils.utils import LongitudeLatitude
ORfrom sqlite_utils.utils import LatitudeLongitude
depending on which you prefer!Originally posted by @simonw in #399 (comment)
The text was updated successfully, but these errors were encountered: