-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data storage API & persistence #265
Comments
You can save the state to disk on demand:
My idea was that as soon as you got an update: grammers/lib/grammers-client/examples/echo.rs Lines 80 to 84 in d180f02
the library could assume the update has been processed by the application. Note that the state grammers saves is minimal: grammers/lib/grammers-session/build.rs Lines 23 to 31 in d180f02
No peers, no update contents. That's up to the application. AFAIK clients like TDesktop don't store these either (but it does cache files).
The idea behind making it easier to "pack" users and channels is that the application would store those, instead of just the IDs. I agree this needs some more work though.
Not sure I follow, as my only opinion on this is that it's messy as hell. Though, I guess that's the point of it being abstracted behind a library.
Happy to offer more knobs, or adjust the existing one to be more obvious / play better together. You're right my ideal may not have been implemented fully correctly. With that said, in my mind, even if an application only persists update state every minute, I would expect it to be able to replay those with idempotency.
Except for the internal peer cache, which I'm not entirely happy with but is AFAIK a necessary evil due to how tied it is with updates, all the rest is in the session, which can be accessed as shown above. I might be me something. I have not built a full application with the library. So again I'm happy to revisit this, if you have concrete ideas. |
Thanks for the quick feedback! My main pain point is that the restarts of the app loose all the state the library keeps internally, like access hash maps. I will do further work to figure out the optimal changes to extend the grammers code to allow for what I have in mind, submit a PR and then we can discuss some concrete code. |
But this should be fine. It only means the library may need to fetch more data to get it back, but that's it. I do not want to persist hashes in the session. I believe storing what users the application interacts with, is best left to the application, because it can choose how to persist that. |
What way does the application have to detect that a new chat was added to the chats map, without reimplementing the updates handling logic by itself? Maybe I'm missing something...
Yes, I'm not suggesting that...
How would that work though? I don't see a way for the library, running as a bot, to know what channels the bot was added to. Our use-case is a channel members maintenance, where the bot would be an admin in a lot of the chats/channels, and we'd need to accurately react to each member-related update event, but also support resuming (and going through the missed updates) after we have restarted, and doing the bookkeeping that is not triggered by the chat/channel update. That last part would be the issue - suppose the app restarts, and we know the channel id we want to, say, list participants at. We'd also need to have access hash to call Also, we need a way to use the channel id to lookup the channel in the chats map - that only supports doing lookups by the peer id currently. Anyhow, still planning to do what I suggested at #265 (comment) - this is just a bit more info about the issues we're facing as maybe there is no issue at all and I've just overlooked a solution... |
None. But I'm not sure why we need to detect when a chat is added. If the library makes sure all peers from updates can be fetched, the application can fetch whatever it needs, when it needs it.
Getting difference would fetch all updates from the last-known point. This should include all seen channels (the state for each channel should be persisted).
Note that this is a type, not a function. Getting information about a single participant needs a hash though, yes.
If the library caught up on missed updates, the updates about users joining should contain the hash, so it would be known again.
Not sure I understand. In the library I use the term "peer ID" to refer to the ID of any peer, including the IDs of channels. |
Well, to persist the id and access hash in the database - for later. Precisely for what you tell to to later - to persist the state of the channels. You say the application has to persist the channels state, but also that there are no ways to react to new channel appearing. Do you see the problem here?
Why should is highlighted here? Like, should, but no way to easily do it right now?
Yes, but note that it is used incorrectly in the chats map as a key without the qualification of the kind of entry this id belongs to. See https://core.telegram.org/api/peers#peer-id - the docs explicitly state that ids of different kinds of entries overlap. |
My thought was this would be done regardless for all incoming peers you could need, regardless of whether you knew that you had them or not.
Bugs or situations I haven't thought of. It's how I tend to speak (I just did it again).
Good catch. First time I see that. I suspect Daniil has continued to put great work in the documentation, so I can't say how recent that is. I have never seen it happen personally, so I had assumed it could never happen, and was hoping either Grammers or Telethon could prove me wrong. Now that there's an official statement, I don't need to anymore. All that said I'm still open to changes that could better suit your usecase. Building things is the best way to see what's missing, and I'm not building anything with the library at the moment, so I don't see the pain points you see. |
Another discussion about this and brief reading of the linked docs made me realize that |
Makes sense, if they suggest not exposing them. Just to clarify: you are planning to, effectively, include the peers database in the session? If that is the case - it makes sense. However, the session itself would probably then require an abstraction for the data storage layer - as we, for instance, wouldn't want to keep all the peers in memory all at once. I have been slicing and dicing the client in the meantime, and figured if we just ignore that it exposes the chats map and handle the raw updates ourselves as they arrive we can maintain our own, persistent peers db, and keep it together with the session (in the database). So, my code proposal is not ready jus yet, but it will likely be focused on extracting the layer of maintaining the chat maps into a separate entity so that if we (or someone else) build our own - it is possible to replace the stock one, and the client can actually skip this extra work it is not needed it; in our case we don't use any of the high-level APIs as they are too dependent on the other client features that we don't really want to pay for at runtime; still thinking about the proper API and namings to suggest - I would want to preserve the functionality of the high-level client for easy use for the users with less demanding / custom use cases - but at the same time it looks like the client currently contains a subset of operations that provides very simple and low-level operations (like maintaining the connections) that would be really beneficial to reuse. I wouldn't want to even attempt diving deep into a broad refactoring and subdividing of the client into smaller composable pieces on just my own - as I don't want to spend a lot of time on it and get it thrown away on review. That said, ultimately, transforming the client (and maybe session too) into a set of smaller composable modules would actually be my suggestion. How would you like to proceed? |
Yes.
Yes, a
It's a shame to duplicate the work though. I guess the library can still do the heavy lifting with update ordering and resolving gaps though.
The Lines 14 to 25 in 0dc27ee
To expand a bit,
So in my mind, if one doesn't want the high-level features, they would use the (Though of course, ideally the |
Yes, this is exactly what we're doing, however there are a lot of things that are implemented directly in the client that I could benefit from when building, effectively, our own client. Concretely, so far:
Also, I find it odd that client implements the connection the way it does, since it would be much more flexible to just have a Thus I'd suggest the following changes, as I see them as huge improvements:
Ultimately, we'd prefer using the client if it allowed us to avoid doing the extra work - which can be perfectly expressed via traits but with, probably, a bit too much complexity in the API surface for the high-level client. I suggest splitting the client into client core - the low-level thing that ties the connection, session, mtsender and updates loop together, but allow for generic interfaces for what the chats map is and how it is maintained - and the (just) client, that would be the same high-level interface that it is today, and would be built on top of the client core with specifying a particular implementation of the chats map (this is the rough idea, might also be more generics between the two). I'm currently working on essentially our own lightweight version of the client (i.e. client core) atm wit all that in mind, and will be happy to upstream it if it is approved when I showcase it. |
Fair enough. Those do seem hard to decouple even further.
Fully agree (connectors were also discussed). Just like in Telethon, proxy support was not added by me but someone else, and I kinda took it in because many people need it.
Unfortunately chat maps are pretty tied to update handling. Quoting Working with Updates - Recovering gaps:
So, even if we separate them, I think it would make sense to keep both in the
I eagerly await this change!
I do wonder how far you'll be able to take this. The password logic makes sense for Maybe it's enough to leave that as standalone functions in the
Seems reasonable. But probably should remain in the |
I was looking into MTProto and grammers implementation for some time now, and I really can't understand how the persistence in grammers is supposed to work.
To elaborate: there is a session that the client and load and store, and the session contains the starting point from which to read and apply the differences - so, when you reload you are supposed to load the session to be able to avoid loading all the state again from scratch; at the same time, the only way the state is handed is in-memory - it is not persisted anywhere, and grammers client does not expose an API to allow user to handle the updates and persist them either.
What this means, I think, is that on restart grammers will loose the data it gathered from the past updates (chats and users, both for the connecting user and all channels) but will not reload them, thus getting into a state of only having partial data actually available. This kind of defeats the purpose of the clever design of the Telegram updates - as all the data is just lost in the end.
Maybe grammers client should not keep the data internally, but, instead, provide the APIs to offload the data storage to the user?
Alternatively, grammers could expose fns for accessing the internal data tables such that they can be persisted together with the session state - this would, in effect, allow no data loss, since it would be possible to replay the data after the last snapshot; the downside of it is the API boundary of the snapshots is moved to the application level, while with the fully offloaded data storage can utilize database/fs-level snapshots for this, which can be much more efficient and needs less work from the implementor.
The text was updated successfully, but these errors were encountered: