-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
etag errors in ceph backed instance #10
Comments
I checked the ceph documentation and it looks like the Get Object method doesn't return the Another option may be to use Minio instead (if that suits your use-case). We have done tests with Minio and it works fine with Sneller. Disabling the |
Hi, thanks so much for the info; I will try to get the changes made to ceph but while waiting what would be the best interim solution to this issue without causing any execution issues? Unfortunately using aws/minio is not possible for our use case (we have an extremely write heavy workload that makes these solutions infeasible to operate, but is seemingly well suited for sneller). Thanks again for the help with resolving these issues! |
The safest way would be to interleave You also may want to check if it's feasible to change the code to check timestamps instead of etags, but it's not a something that is currently supported. Also don't know if that would be completely safe, because timestamps are often not very precise. |
The ETag checking exists to protect the query engine against binary data
that wasn't produced by our ingestion engine.
In the cloud platform, this was important to help protect us against users
tampering with data that we put in their buckets.
The ETag checking does not exist to protect us against data races; we use
unique (randomly-generated) object names for everything written by the
ingestion engine, so we'll never deliberately overwrite a packfile.
If you are self-hosting and your threat model does not involve people
having write access to your S3 buckets, then it's fine to eliminate the
ETag verification.
Just keep in mind that the query engine assumes the data being read from S3
is trusted, and without the ETag verification (ultimately rooted in the MAC
of the index file) there is no cryptographic assurance that a properly
authenticated party wrote the packfiles.
…On Mon, Jan 8, 2024 at 11:30 AM Ramon de Klein ***@***.***> wrote:
The safest way would be to interleave sdb and Sneller query execution (if
that's feasible).
You also may want to check if it's feasible to change the code to check
timestamps instead of etags, but it's not a something that is currently
supported. Also don't know if that would be completely safe, because
timestamps are often not very precise.
—
Reply to this email directly, view it on GitHub
<#10 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAWN7ZU7EOJ7P2B425JNZCDYNRCNTAVCNFSM6AAAAABBQV4PQSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBRGY4TQNJXGM>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Hi,
I was trying to get the package setup using ceph object storage (theoretically s3 compatible) as the backing store but the etag system used to protect against concurrent writes (
sneller/db/build.go
Line 67 in 86e9f11
rebuilding sdb with the if statement here:
commented out, queries appear to function normally and syncing continues to work (as long as no concurrent reading/writing to the table is occuring)
full outputs here:
(unmodified version of sdb)
(modified version of sdb)
(edited version -- worked with no errors)
note on concurrent read/write
in my (limited) testing with concurrent reads/writes occurring while the sync task is running, few to no issues occur with querying data that was synced during a write on my modified instance
(sync during write)
(sync after write)
(full rebuild of indexes and zion files)
The text was updated successfully, but these errors were encountered: