-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data validation #260
Comments
TL;DR At the root of this, because data is end-to-end encrypted, And in the future, in the worst case, we're planning to allow you to recover a database from an older point in time. My original response before all the edits: The client does validate every DB write for you before returning data to you via the Will strengthen validation this weekend. Edit 1: Because all data is end-to-end encrypted, we can't validate the data server-side like traditional databases do. But that doesn't mean data quality relies entirely on an honor code with the client software. Our client does offer some powerful (but not perfect) guarantees about your data. For example, if a malicious user inserts an item that is not encrypted with the correct encryption key, an honest user's client shouldn't crash, it should just ignore that malformed item and move on decrypting properly encrypted items before returning items back to the user via the Now, this would still leave honest clients vulnerable if a malicious client inserts a string, but in your app code you were expecting a nested object or something like that. To defend against that, like you mentioned, your honest clients could safely validate the data before using it to make sure it matches what you expect. So basically what you would normally expect from server-side validation is moved into the client (but it's not necessarily "unsafe client-side" validation in that it trusts all users to be honest). Additionally, you may find userbase-sql.js useful if you're looking for custom SQL-like validation beyond what the default client offers (the default client can offer item uniqueness, versioning validation, correctly decrypted items). With Finally, if data is corrupted by a buggy client or a malicious client manages to sneak through, we can also offer the ability to rollback transactions. This is planned. Something like #206, but allowing you to retrieve the history of an entire database, in the order transactions were pushed to the server. And then recover the database from a particular point in time. Edit 2: apologies for all the edits. [1]: maybe validating data server-side can be done with more advanced cryptography, like Zero Knowledge proofs, but that's beyond the scope of Userbase today. If anyone has ideas on this, happy to hear them. |
While I like the idea of E2E data that the platform is blind to, a lot of the projects I'd like to use Userbase for require a server to interact with user data / perform actions on a user's behalf to enable some features, so the E2E bubble is already popped when the client automatically shares their DBs with the server's service user account. If my app is set for server-side encryption, it would be nice if Userbase could offer support for server-side validation (among other features like triggers, but that's out of scope for this issue). Upload a JS function to run, or check the status code response from calling a webhook, or flagging a user as a service account and letting the service account user open a DB with a change handler that approves/rejects changes, or something. Besides the convenience of schema enforcement, it would help with a lot of use cases like "freemium users have a quota" and "users can't pack the DB to the brim and use up all my storage space". If I'm going to expose user data to my systems anyway, it would be nice if Userbase could take advantage of that. userbase-sql is intriguing, and I'll bet I could put it to good use. Sharing userbase-sql DBs would be a hard requirement for a lot of my projects though (both sharing between users, and to my service accounts). |
True. Something we will keep in mind for the future.
We can offer this even in the end-to-end encryption mode, and it's something we've thought about offering in the past. User X can't insert Y items is possible, or database X can't exceed Y items in total, or User X can't insert more than Y bytes, or only approved users can insert more than Y bytes, etc. It's mainly a matter of getting the right UX down and how to offer it. Will think on this some more, and happy to hear suggestions on how you'd like to set these rules (e.g. in the Admin Panel, or via custom server-side code that you can run).
Can you clarify these use cases a bit more? Does the coming
Got it, will get this on the roadmap :) |
userbase-js-node solves the problem that I need the ability for automated processes to interact with user data, but some projects also need to grant authority over how users interact with stored data. E.g., for one project on my plate, I'm doing realtime collaboration and need to prevent race conditions and eventual consistency problems with users editing the same data. So I can turn to CRDTs, where you store a separate record of every change a user makes, and all users can replay them to arrive at the same worldstate. But I can't allow users to delete CRDT records from the database, else users will arrive at incorrect worldstates (and may even encounter problems replaying at all, depending on the CRDT algorithm). I do need a server to be able to delete CRDTs, so I can squash and trim the history that gets sent to users each time they load the app. Or, I want an app to support RBAC, so anyone can create/edit a record, but only managers/admins can delete it. Or enforce that record changes get versioned (similar problem to the CRDTs above). Or something like HN comments, where a user can edit a comment for a little while, then the comment is locked in place, and where a moderator can edit/remove comments if they violate guidelines. I can sorta implement that by inserting an item with the service user included in Or one of the projects on my plate calls for different users to have different write access to specific, defined-at-runtime properties on DB records. Or I could see creating projects that have to implement a pathological combination of GDPR/Right to be Forgotten + audit trail regulations, where custom logic for what can/can't be deleted is a must-have. And from a practicality perspective, it'd simplify a lot of coding if clients could trust that reads contained good data without me having to think up all of the possible malformed data scenarios I have to defend against at read-time. I could come up with more, but I think the broader point is that the current design of Userbase leaves me very concerned that even if I can see how to make my app work today, at any time my requirements could change in ways Userbase flatly doesn't support because it doesn't have the notion of superuser data access or custom validation for writes. If whole categories of new requirements can only be implemented by upstreaming a patch to userbase server (like your above mention of implementing specific user restrictions), that's a significant risk for non-toy projects. Maybe you're fine with that. The more I get up to speed on Userbase, the more I get the impression that operating a server with insight/control over user-created data feels antithetical to the philosophy that drove Userbase to implement E2E in the first place. If that's not the direction Userbase wants to go, I'm fine with that. I'd just appreciate some official messaging to that effect so I can make decisions accordingly. |
It’s true, our focus is primarily on E2E apps. That being said, a lot of what you’re asking for is possible to do securely without relying on the server to have access to the data. It’s just a matter of building out the features, or approaching the problems with a different mental framework relative to traditional client-server architectures. But yes, to be clear, we will likely spend more time and energy enabling more powerful clients to do the things you’re asking, rather than turn more attention toward the server-side. That is, unless there are relatively easy things that we can enable for server-side focused apps (userbase-js-node being an example), which in some of these cases there may be.
I’m spending the day working on a sample offline-first collaborative app using CRDTs to show how this can be done (described in #255).
We can relatively easily expose userbase-js functions so you can do both of these things securely:
For reference, userbase-sql.js relies on just Inserts for SQL statements, and has some smooth logic to bundle databases in a way that clients experience no downtime. But would like to make it even easier to roll your own logic like that by just exposing the above internal userbase-js functions -- I'll probably need to do that to get sharing sql.js databases working.
Definitely possible with a new database permission, like suggested in #208. Also wouldn’t require server-side changes.
Our client implements versioning. All items are versioned under the hood, we can expose those versions if you’d like — you’d be able to include an item’s version in updateItem/deleteItem.
Database owners have root privileges, all honest clients respect database owners’ edits/removals of any items, so a nefarious user can't do that. We can implement an "admin" privilege too so owners could dole out admin privileges.
When I said "custom server-side code that you can run", I meant code you run on your server, not code that’s patched up to Userbase. This doesn’t seem too challenging to implement on our end. For example, before storing a transaction on our end, we pass it through your proxy server providing enough data along that you can write custom logic as needed telling us if we should store it or return an error to clients.
Really you just have to take logic you’d place on the server, and place it in the client. It’s just a different way of thinking about the problem. All the above being said, you will likely find greater flexibility today in tools that rely on traditional client-server architectures. And that’s likely going to be the case into perpetuity, but, we’d like to get Userbase as close to what you’d expect as possible :) (I know I missed some of your use cases in the above^ happy to discuss any of them in more detail) |
With userbase-js-node, you could also do something like this:
|
Thanks for the detailed response. Userbase excites me - I can spin up new projects without sinking hours into writing boilerplate user management code, CRUD API endpoints, secure sharing, etc., and E2E is there if it suits my use case, I get real-time updates, and it's OSS so I can feel good building OSS projects with it, and feel good that I'm not held hostage by a proprietary vendor's policies/pricing. I'm eager to use it, I just need to get my head around the possibilities and pitfalls so I can verify it's a good fit for my needs. I'm trying not to become a pathological customer, so feel free to stop me if it's clear Userbase isn't aligned with what I'd ask of it. Here are my thoughts:
This sounds like a stellar 80/20 solution that, on the surface, appears to solve most of my use cases with minimal complexity for either the Userbase server or my app. If the validator could enforce that my service user is made an admin on select new user-created databases, that solves a ton of other problems, too.
Reading through the Userbase architecture doc and your comment, it sounds like the idea for CRDTs would be piggybacking on userbase-js'
Customizing Userbase's squash+trim behavior to capture the resolved state of my CRDT stream at bundle-time sounds great! Is that something I can do today, or does that need updates to userbase-js?
Versioning is on my app's roadmap, so if the platform can give that to me for free, I'm delighted to take it. Could my app still see/request versions that had been bundled?
Hmmm, I was thinking eventual consistency and race conditions between clients would make this a bugbear vs a server linearizing the authorizations, but right now I'm having trouble constructing a scenario where it makes a difference. I suppose if honest clients can reject out-of-spec transactions, it works out the same.
Interesting... It leaves me making the CRUD endpoints I'd hoped Userbase would free me from, but it lets me get going now, sounds like it has similar flexibility to traditional-style server apps, and still gets me the user management and real-time features. And if something like the webhook validator is on the roadmap I could eventually rip out my CRUD endpoints, and then it's just easy street for every new project which could skip the CRUD step and jump straight to the webhook validator.
The ideas you gave seem to solve every objection I have at this point once userbase-js/userbase-server expose the required behaviors. Webhook validator, webhook-enforced make-me-a-supplemental-admin, and custom client-side transaction validators seem to nail all of my schema enforcement, moderation, and administration concerns. At this moment, with those in place, I wouldn't see any big reason to hesitate to use Userbase for the projects I'm working on / have planned. |
You're fine, you're pushing us to expand Userbase's usefulness :)
Webhook added to the roadmap :)
When a user creates a database, they are by default the owner of that database, and have admin privileges on the database. Users can then give access to these databases to your admin server. But your admin server wouldn't really have admin privileges over the database in that case. You'd like for both users and your admin server to be granted admin privileges over some databases, right? Or are you specifically looking for custom complex logic to determine whether or not users are admins over some databases at runtime?
Yes, exactly! :D This is what I've been trying to explain but didn't find the right words to explain it.
Sweeeet. You can technically just rewrite the client source to do whatever you want today, but, I'd like to rewrite userbase-js to make it as simple as overwriting the
Yep, got it. Added to the roadmap. This one is pretty straightforward.
🎉 |
Yes, so users can write to their own little pool of data, but my server can do admin/moderation if necessary. I'm not sure, but maybe enforcing that a new database is shared
I'd appreciate that. I probably have a couple/few weeks before the CRDT project is far enough along that I'd be ready to implement squash+trim customization. |
Unless I'm missing something, it seems just having your clients provide your service user I get how this wouldn't protect you from malicious users modifying their client who slip away from your moderation (again, to be clear, honest users are still protected). Point for the webhook: I could see in this case how you don't want anyone using your app ID to store any data unless it passes through your webhook validations first. That would solve this problem. Any time a user inserts an item, they must grant rw access to your server, otherwise the webhook will prevent it from storing. The more I think about the webhook idea, the more I like it. Will keep you posted on that. This issue can serve as a catch-all for all the features discussed above.
Got it, now high priority :) |
Looking forward to webhooks as well! My main concern is that I wouldn’t want any open registration, or at least no db usage at all without prior payment. |
Is any kind of enforced data validation on the roadmap? If data quality is just an honor code with the client software, it seems like I'd have to defensively validate every DB read to prevent a buggy/malicious client from storing malformed data that crashes every client they share their DB with.
The text was updated successfully, but these errors were encountered: