Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating tables with custom datatypes #406

Open
psychemedia opened this issue Feb 9, 2022 · 6 comments
Open

Creating tables with custom datatypes #406

psychemedia opened this issue Feb 9, 2022 · 6 comments
Labels
enhancement New feature or request

Comments

@psychemedia
Copy link

Via https://stackoverflow.com/a/18622264/454773 I note the ability to register custom handlers for novel datatypes that can map into and out of things like sqlite BLOBs.

From a quick look and a quick play, I didn't spot a way to do this in sqlite_utils?

For example:

# Via https://stackoverflow.com/a/18622264/454773
import sqlite3
import numpy as np
import io

def adapt_array(arr):
    """
    http://stackoverflow.com/a/31312102/190597 (SoulNibbler)
    """
    out = io.BytesIO()
    np.save(out, arr)
    out.seek(0)
    return sqlite3.Binary(out.read())

def convert_array(text):
    out = io.BytesIO(text)
    out.seek(0)
    return np.load(out)


# Converts np.array to TEXT when inserting
sqlite3.register_adapter(np.ndarray, adapt_array)

# Converts TEXT to np.array when selecting
sqlite3.register_converter("array", convert_array)
from sqlite_utils import Database
db = Database('test.db')

# Reset the database connection to used the parsed datatype
# sqlite_utils doesn't seem to support eg:
#  Database('test.db', detect_types=sqlite3.PARSE_DECLTYPES)
db.conn = sqlite3.connect(db_name, detect_types=sqlite3.PARSE_DECLTYPES)

# Create a table the old fashioned way
# but using the new custom data type
vector_table_create = """
CREATE TABLE dummy 
    (title TEXT, vector array );
"""

cur = db.conn.cursor()
cur.execute(vector_table_create)


# sqlite_utils doesn't appear to support custom types (yet?!)
# The following errors on the "array" datatype
"""
db["dummy"].create({
    "title": str,
    "vector": "array",
})
"""

We can then add / retrieve records from the database where the datatype of the vector field is a custom registered array type (which is to say, a numpy array):

import numpy as np

db["dummy"].insert({'title':"test1", 'vector':np.array([1,2,3])})

for row in db.query("SELECT * FROM dummy"):
    print(row['title'], row['vector'], type(row['vector']))

"""
test1 [1 2 3] <class 'numpy.ndarray'>
"""

It would be handy to be able to do this idiomatically in sqlite_utils.

@simonw simonw added the enhancement New feature or request label Feb 16, 2022
@simonw
Copy link
Owner

simonw commented Feb 16, 2022

I had no idea this was possible! I guess SQLite will allow any text string as the column type, defaulting to TEXT as the underlying default representation if it doesn't recognize the type.

@simonw
Copy link
Owner

simonw commented Feb 16, 2022

Allowing custom strings in the create() method, as you suggest in your example, feels like a reasonable way to support this.

db["dummy"].create({
    "title": str,
    "vector": "array",
})

I'm slightly nervous about that just because people might accidentally use this without realizig what they are doing - passing "column-name": "string" for example when they should have used "column-name": str in order to get a TEXT column.

Alternatively, this could work:

db["dummy"].create({
    "title": str,
    "vector": CustomColumnType("array")
})

This would play better with mypy too I think.

@psychemedia
Copy link
Author

psychemedia commented Feb 16, 2022

The CustomColumnType() approach looks good. This pushes you into the mindspace that you are defining and working with a custom column type.

When creating the table, you could then error, or at least warn, if someone wasn't setting a column on a type or a custom column type, which I guess is where mypy comes in?

@psychemedia
Copy link
Author

psychemedia commented Feb 16, 2022

Wondering if this actually relates to #402 ?

I also wonder if this would be a sensible approach for eg registering pint based quantity conversions into and out of the db, perhaps storing the quantity as a serialised magnitude measurement single column string?

@psychemedia
Copy link
Author

I was wondering if you have any more thoughts on this? I have a tangible use case now: adding a "vector" column to a database to support semantic search using doc2vec embeddings (example; note that the vtfunc package may no longer be reliable...).

@rsyring
Copy link

rsyring commented Jan 22, 2024

See also: #612

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants