-
-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable FTS for attached dbs #609
base: main
Are you sure you want to change the base?
Conversation
sqlite_utils/db.py
Outdated
@property | ||
def is_attached(self) -> bool: | ||
return dbname(self.name) not in {'', 'main'} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is used nowhere but seems like a nice thing to provide direct access to.
sqlite_utils/db.py
Outdated
@property | ||
def _pragma_name(self) -> Tuple[str, str]: | ||
if "." in self.name: | ||
db, name = self.name.split(".") | ||
return db + ".", name | ||
return "", self.name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is maybe premature optimization for not changing too much of the code in an visually noisy way. It really just lets me modify all the places we do PRAGMA things without making big changes to what's there.
Also, it probably deserves a better name, because it is also useful (as seen below) in non-PRAGMA situations, but I didn't want to bikeshed names for something that might not be an approach you'd like in the first place.
sqlite_utils/db.py
Outdated
@@ -2386,6 +2444,7 @@ def enable_fts( | |||
:param tokenize: Custom SQLite tokenizer to use, for example ``"porter"`` to enable Porter stemming. | |||
:param replace: Should any existing FTS index for this table be replaced by the new one? | |||
""" | |||
table_name = tablename(self.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the end, the actual changes to enable FTS were, as expected, essentially nil outside the changes to support 'addressing' attached Databases and their Tables.
Hopefully this new ability can be an overall win for the library, and hopefully since you have a lot of experience with SQLite you can help advise on some of the pitfalls I am missing.
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #609 +/- ##
==========================================
- Coverage 95.69% 95.56% -0.13%
==========================================
Files 8 8
Lines 2854 2886 +32
==========================================
+ Hits 2731 2758 +27
- Misses 123 128 +5 ☔ View full report in Codecov by Sentry. |
This is just a work-in-progress, although it does:
sqlite-utils
by loading them in explicitly:db.attach('./a1.db', alias='a1'); a1_foo = db.table('foo', schema='a1')
. This table should generally operate successfully, although this is far from fully tested and I expect there are some edge cases where it does not work as expected due to SQLite limitations on attached tables.main
database.All tests pass locally, as well as black and mypy.
You could call this a proof-of-concept; I think there probably would need to be discussions about some of my refactors around triggers, counts, etc, to decide whether we want to take a similar approach to those (modify the 'main' database rather than the attached ones, under the reasoning that someone attaching a database could presumably make separate connections to that database if they wanted to set up triggers, count tables, etc), do something else, or possibly try to just not support anything having to do with attached databases/tables in those cases.
This would close #608
2024-01-04: Only now am I realizing, based on the content of your tests, that you have use cases where you want to support dots being part of an actual table name. And that this means my approach is not workable, since the API would not allow us to distinguish between cases where the user intended to match an existing, attached database name, and cases where the user wanted to put that matching string in a table name inside
main
.Hopefully the idea of supporting this is compelling enough that we can figure out what a better API (and internal storage format) might be. Perhaps we could use tuples
("a1", "foo")
and force users to be explicit when creating/accessing the tables from theDatabase
object, and then store the database name in a separate field inside theQueryable
object?I really liked the simplicity of a simple string (it works well for the CLI use cases as well, although maybe a CLI would never attach a database in the first place, since that's a stateful operation) -- but I am sure you don't want to break backward compatibilty.
2024-01-09: I have reworked this so that the above section is no longer an issue. The
schema_name
is stored explicitly within the Table object, and the only real change to functionality is thattable_names()
and related methods onDatabase
will no longer operate across schemas - they instead operate on a single schema (attached database) at a time.