Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add persistent unique identifiers for objects #14135

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

sfence
Copy link
Contributor

@sfence sfence commented Dec 20, 2023

This is the adoption of #11050.

  • Goal of the PR: Assign each luaentity object a per-world-unique id that is persistent as long as the world exists, and make this id accessible in lua. Players also have an GUId, which is equal to their name.
    • How it works:
    • The guid (globally unique identifier) (a string) is generated by the GUIDGenerator (or created from the player name for players).
    • A GUId for luaentities looks like this: "@BASE64GUID".
    • Objects from old worlds get a new ID when loaded.
    • Lua entities can restore their GUID in the on_activate call by the method set_guid.
    • All objects added to the environment are also added to table core.objects_by_guid.
  • This should resolve Add uuid to ObjectRefs for tracking objects between loads #5012 at least
  • usecases: makes modding easier

To do

This PR is a Ready for Review

  • Check what should be done.
  • Get feedback
  • Apply feedback

How to test

Run unittests from devtest game.

@sfence
Copy link
Contributor Author

sfence commented Dec 20, 2023

If I get it well, APi of Guids should be updated, related to #11050 (comment) (@sorcerykid or anybody else)?

@Zughy Zughy added @ Script API Roadmap The change matches an item on the current roadmap Feature ✨ PRs that add or enhance a feature labels Dec 20, 2023
Copy link
Contributor

@appgurueu appgurueu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Unofficial conceptual review, I'm not a core dev)

Concept looks good to me. However the implementation of the GUID generator feels overengineered. Why not use random UIDs, say, 128 bit? These would be 16 bytes, or 32 hex digits.

If I'm not mistaken, the probability of a collision would be around 5e-20 with that. The probability of a hardware fault would be many orders of magnitude higher.

With random UIDs, you don't have to worry about persisting and parsing the largest UID. You basically need just one function: Get next UID, which can be very simple, because C++'s builtin random sources do the heavy lifting.

(About persistence: Are you sure that the if an entity is persisted, the corresponding max UID will always be persisted as well? Otherwise you could get collisions.)

Potential bug: v1guid<x> would also be a valid playername, no? Couldn't malicious players abuse this to cause GUID collisions? Perhaps you should use some character which can't appear in player names in GUIDs.

@sfence
Copy link
Contributor Author

sfence commented Dec 28, 2023

(Unofficial conceptual review, I'm not a core dev)

Concept looks good to me. However the implementation of the GUID generator feels overengineered. Why not use random UIDs, say, 128 bit? These would be 16 bytes, or 32 hex digits.

If I'm not mistaken, the probability of a collision would be around 5e-20 with that. The probability of a hardware fault would be many orders of magnitude higher.

With random UIDs, you don't have to worry about persisting and parsing the largest UID. You basically need just one function: Get next UID, which can be very simple, because C++'s builtin random sources do the heavy lifting.

(About persistence: Are you sure that the if an entity is persisted, the corresponding max UID will always be persisted as well? Otherwise you could get collisions.)

Potential bug: v1guid<x> would also be a valid playername, no? Couldn't malicious players abuse this to cause GUID collisions? Perhaps you should use some character which can't appear in player names in GUIDs.

Thanks for the feedback.

At this moment, this is only a rebased PR that has been abandoned. So no changes mentioned by @sorcerykid in the old PR have been applied. He required also removing GUID from the player, because get_player_name returns a unique identification of the player. So it is a change, I expect that I will be asked to do.

Yes, it looks like a really strong GUID generator. I have been thinking about changing it and using the value returned by minetest.get_gametime() and using some counter to make GUID unique for entities spawned in the same game time.
It is a good point for discussion.

@fluxionary
Copy link
Contributor

overengineered. Why not use random UIDs, say, 128 bit?

i've never been a fan of UUIDs. i feel like they are over-engineered for most purposes, unless you need ID generation that isn't controlled by a single thread on a single machine. they require way more storage space and creation time than just incrementing some integer, and are also conceptually more complicated. if you make sure the integer is incremented and written before the id is actually assigned to some entity, you don't have to worry about collisions due to most hardware faults in practice. sequential IDs have the additional benefit of giving you some vague idea of the age of the object in question.

with this proposal's built-in versioning scheme, using something like UUIDs would always be possible in the future if minetest's architecture gets much more distributed, or if the current scheme turns out to cause problems that i'm not correctly imagining.

i've been musing about writing a mob API for a couple years now, and adding unique IDs for persistent mobs has been part of the intention since nearly the beginning. but i'd much prefer that the game engine manage those IDs.

i'm don't feel like i'm totally fit to judge the functionality of this request, but i do se a couple possible problems - it seems to be assigning IDs to objects before incrementing the value, and seems to assign a single value ("v1guid1000") to multiple things if there's some sort of failure.

@appgurueu
Copy link
Contributor

and are also conceptually more complicated

Hard disagree. Most of the code in guid.cpp could be thrown away and replaced with like.. 5 lines of code? when generating random GUIDs: Generate 128 random bits, hex-encode them. That's it. You get completely rid of the need for (de)serialization and reliable persisting of the counter.

if you make sure the integer is incremented and written

If you persist the counter immediately you will have much worse performance than generating 128 random bits. If you don't, you have to worry about collisions.

@Desour
Copy link
Member

Desour commented Jan 3, 2024

If you persist the counter immediately

You could increment a counter by some big number, e.g. 1024, and persist the result, and allocate from this range until it runs out.

Potential bug: v1guid<x> would also be a valid playername, no? Couldn't malicious players abuse this to cause GUID collisions?

That player would then have id vpv1guid<x>.

(Idk what the right thing to do here is, btw..)

@rubenwardy
Copy link
Contributor

I suggest making the player id the name, and the entity id a guid in a form that isn't a valid player name (like shown with angle brackets)

@sfence
Copy link
Contributor Author

sfence commented Jan 3, 2024

So I have made some changes, based on the discussion here and under the previous PR.

get_guid returns the player name for the player object.

GUId for luaentities looks like @7-3a8 where @ is a character that is not allowed in player names, the first number represents the game time of GUId generation in seconds, and the last number represents 32-bit counter with the value stored in env_meta.txt.
So every running server has around 136 years of uptime to come into the risk of repetation of generated GUId.

There is also a new set_guid method for luaentities. This method can be effectively called only in on_activate callback. If GUId is not set in on_activate, or get_guid is called before GUId is set, GUId is generated.

@sfan5
Copy link
Collaborator

sfan5 commented Jan 3, 2024

Definitely agree with @appgurueu.

These can just be an UUIDv4. It's not the most minimal solution but that's exactly why it really just works.
"custom format" "save the counter" "you can allocate the counter in blocks" "mix in the gametime" "don't forget to flush to disk" why do all this?
Write a 10LOC function to generate an UUID and forget about the rest.
(* someone just needs to check that we seed the rng correctly.)

By the way for players I suggest supporting this function too and using an UUIDv5 (SHA1 of the name).
Modders can already start using it now.
And someday someone motivated can do the work to e.g. allow players to be renamed.

@sfence
Copy link
Contributor Author

sfence commented Jan 4, 2024

@fan5
Use of UUIDv4 will end with code similar to: https://github.com/crashoz/uuid_v4/blob/master/uuid_v4.h

So, it will be probably again over-engineered as @appgurueu mentioned before. And much worse performance can be expected from it in comparison with the solution with game time and incremental counter.

Do you think that the need to save the value into env_meta.txt is so big disadvantage that a much more complex UUIDv4 should be used?

@sfence
Copy link
Contributor Author

sfence commented Jan 4, 2024

So, last commit...

GUId generator state is no longer stored in env_meta.txt file. The return value from porting::getTimeS() function is used as the initial counter value after the server starts

Until someone manages to start the server with the same game_time and system time, the combination of these two values should be unique and make GUId unique as well.

If this solution is acceptable, I will prepare this for review.

@sfan5
Copy link
Collaborator

sfan5 commented Jan 4, 2024

Use of UUIDv4 will end with code similar to: https://github.com/crashoz/uuid_v4/blob/master/uuid_v4.h

Thankfully we have the liberty to write a simple UUID generation function that does exactly what we need, instead of pulling in SIMD-accelerated library that we can't even use.

And much worse performance can be expected from it in comparison with the solution with game time and incremental counter.

I don't see this claim supported with an argument.

Do you think that the need to save the value into env_meta.txt is so big disadvantage that a much more complex UUIDv4 should be used?

This question bases on an incorrect premise but let me say: yes, being able to generate an unique ID without needing to synchronize with the disk/database is an important property.

The return value from porting::getTimeS() function is used as the initial counter value after the server starts

Not a good idea btw.
getTime is usually the monotonic time, which is implementation-dependent. It could as well just start counting from zero on boot. With those conditions it's not as unlikely that you could have the same return value across runs.

@rubenwardy
Copy link
Contributor

This question bases on an incorrect premise but let me say: yes, being able to generate an unique ID without needing to synchronize with the disk/database is an important property.

I agree with this - also reduces the impact of env_meta.txt being lost

@Zughy Zughy added the Action / change needed Code still needs changes (PR) / more information requested (Issues) label Jan 4, 2024
@sfence
Copy link
Contributor Author

sfence commented Jan 4, 2024

So, I have done some research about this. It looks like it should be ok to use std::random_device when std::random_device::entropy returns a non-zero value.
But some go around should be found for situations when std::random_device::entropy returns zero, because it indicates that random_device can be based on weak pseudorandom source with sequence which can potentially be repetitive.

Or, if the encryption is planned to be added to UDP protocol, something like DTLS from openSSL, I can use some strong random number generator from it.

src/guid.cpp Outdated Show resolved Hide resolved
@sfence
Copy link
Contributor Author

sfence commented Jan 4, 2024

So a new variant is available. It uses std::random_device. Usage depends on the entropy of the random source to eliminate the risk of duplicates on systems with poor random_device.

@sfence sfence force-pushed the guids branch 2 times, most recently from 3f5645a to 0337e3f Compare January 22, 2024 17:42
@sfence
Copy link
Contributor Author

sfence commented Jan 22, 2024

PR prepared for review.
Added core.objects_by_guid table to Lua API.
The command /unittests should be called two times to verify if core.objects_by_guid is set and cleared.

@Zughy
Copy link
Contributor

Zughy commented Feb 21, 2024

@sfence rebase needed

@Zughy Zughy added Rebase needed The PR needs to be rebased by its author Android and removed Action / change needed Code still needs changes (PR) / more information requested (Issues) labels Feb 21, 2024
@grorp grorp removed the Android label Feb 21, 2024
@Zughy Zughy modified the milestones: 5.10.0, 5.11.0 Oct 10, 2024
@Zughy Zughy added the Adoption needed The pull request needs someone to adopt it. Adoption welcomed! label Nov 3, 2024
@Zughy Zughy closed this Nov 3, 2024
@sfence sfence reopened this Nov 4, 2024
@Zughy Zughy removed Rebase needed The PR needs to be rebased by its author Adoption needed The pull request needs someone to adopt it. Adoption welcomed! labels Nov 4, 2024
@Zughy Zughy added the Rebase needed The PR needs to be rebased by its author label Dec 29, 2024
@Zughy
Copy link
Contributor

Zughy commented Dec 29, 2024

@sfence rebase needed

doc/lua_api.md Outdated Show resolved Hide resolved
Copy link
Contributor

@MisterE123 MisterE123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a spelling error I noticed

doc/world_format.md Outdated Show resolved Hide resolved
src/serverenvironment.h Outdated Show resolved Hide resolved
src/serverenvironment.h Outdated Show resolved Hide resolved
src/guid.cpp Outdated Show resolved Hide resolved
@appgurueu appgurueu added the Action / change needed Code still needs changes (PR) / more information requested (Issues) label Jan 7, 2025
Copy link
Contributor

@appgurueu appgurueu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summarizing the major points:

  • Despite documentation in doc/world_format.md, the GUID does not currently seem to be deserialized in the LuaEntitySAO ctor. This needs to be implemented; the current set_guid Lua API method and everything pertaining to it need to be removed 1. For backwards compatibility this must use a new version2. Entities which do not have a GUID should be assigned one. Conversely, in LuaEntitySAO::getStaticData, the GUID needs to be persisted.
  • As sfan5 pointed out, the "objects by guid" table should contain the same object references. There should not be multiple references to the same object.
  • The current unit test seems unsuitable. It essentially primarily tests set_guid, which is to be removed. What I would like to see instead is something like (ideally we'd also test persistence, but we don't really have a good framework for that yet so I'm fine with testing that manually):
    1. Add an object, get its GUID
    2. Add a second object
    3. Check that we can retrieve both objects (the exact same references) by their GUIDs
    4. Remove the first object, check that we can no longer retrieve it

Footnotes

  1. It seems sfence's intention might have been to let modders manage this themselves, but that is not what we want, because it requires modder cooperation, greatly diminishing the value of the feature. We want the GUID to be separately stored in "engine" staticdata, not mixed into modder staticdata.

@Zughy Zughy modified the milestones: 5.11.0, 5.12.0 Jan 20, 2025
@sfence sfence removed the Rebase needed The PR needs to be rebased by its author label Feb 7, 2025
@sfence sfence removed the Action / change needed Code still needs changes (PR) / more information requested (Issues) label Feb 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature ✨ PRs that add or enhance a feature Roadmap The change matches an item on the current roadmap @ Script API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add uuid to ObjectRefs for tracking objects between loads
9 participants