Skip to content

Repo API: Node IDs #31

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
enikao opened this issue Oct 28, 2022 · 11 comments
Closed

Repo API: Node IDs #31

enikao opened this issue Oct 28, 2022 · 11 comments

Comments

@enikao
Copy link
Contributor

enikao commented Oct 28, 2022

Each LIonWeb node has an internal node id. The node id represents the identity of that node. This means:

  • For the whole existence of the node, its id remains unchanged.
  • Two nodes with the same id are considered identical.
  • If a node would change its id, we would consider the before state and after state to be two different nodes (even if the node is the identical object in terms of memory location or similar implementation-language terms).

Valid characters

ids can only contain these symbols:

  • lowercase latin characters: a..z
  • uppercase latin characters: A..Z
  • arabic numerals: 0..9
  • underscore: _
  • hyphen: -

This is the same charater set as Base64url variant.

Representation

ids are represented by a string, containing only valid characters (as defined above).
An id string is NOT padded, also not by whitespaces.
An id string does NOT contain any terminating symbols (compared to some BASE64 variants); this does not affect internal representation in a specific implementation language, e.g. C-style \0-terminated strings.

Scope

Node ids MUST be unique within their id-space.

Id-space

An id-space is a realm that guarantees the uniqueness of all ids within.
Typically, this means one node repository instance.

An id-space has an id as defined above.
Uniqueness of id-space ids is out of scope of LIonWeb specification.

In LIonWeb (the protocol), id-spaces are NOT hierarchical.
An implementation might chose to use hierarchical id-spaces internally.

Identification

A node can be identified relative to its id-space by the node's id.
To globally identify a node, we use the combination of the id-space id and the node id.

For now, we don't consider the global case (see #25).
Thus, we use only the node id in LIonWeb protocol.

(This issue description has been updated to reflect the consolidated decision on node ids.)

@enikao
Copy link
Contributor Author

enikao commented Dec 2, 2022

I think they should be strings, because they ought to be serializable without fuss. Anything more complicated, and chances are that other concerns (such as namespace identification and such) are leaking into their purpose.

Originally posted by @dslmeinte in #46 (comment)

I strongly agree ids should be strings, it's more about the limitations we set on them.
Some possible limitations are laid out in this issue, @dslmeinte mentioned some others:

  • Identifiers should be unique (within the namespace)
  • An identifier should be a non-empty, non-whitespace-only string

@enikao
Copy link
Contributor Author

enikao commented Dec 9, 2022

Assumptions

  • id can be represented as one string
  • ids must be unique only per namespace

Allowed characters

Arguments for limiting to small set (e.g. ASCII):

  • Safe
  • Complex names can be carried in name field
  • As soon as ids are too readable, people will try to parse them

Arguments for big set (e.g. UTF-8):

  • Direct representation of FQN from e.g. programming languages as stable IDs
    • Especially for representing references by their FQN

Arguments for excluded characters:

  • If id unique only per namespace, need to concat namespaces for globally unique id

Proposed allowed characters

a-z
A-Z
0-9
_ (underscore)
- (hyphen)

Compatible with Base64url variant

Namespace separator character

. (dot):

Pro:

  • Used in lots of programming languages

Con:

  • Hard to see visually

/ (slash)

Pro:

  • Used in e.g. file systems, XPath

Con:

  • Might clash with URLs

@enikao
Copy link
Contributor Author

enikao commented Dec 9, 2022

How to use fully qualified names as IDs

A fully qualified name (FQN) is often used in programming languages to uniquely identify an element, e.g. C# class System.String or Java method java.lang.String.toString().
As the programming language guarantees the fqn's uniqueness, they are suited as id.
However, they contain invalid characters (e.g. . or ().
There are at least three obvious ways to deal with this issue:

Base64url encoding

Base64 is a mechanism to encode arbitrary data in 64 characters that can be safely transmitted by its carrier (e.g. traditional e-mail). In the url variant, the 64 characters used are exactly the allowed characters for IDs in LIonWeb. Thus, we can encode and decode anything (including FQNs) to/from an id without loss.

Mapping table

Keep a map (aka dictionary) between fqns and randomly created ids.

Hash function

Feed the fqn to a hash function, and use the output as id.
Cryptographic hash functions pretty much guarantee the result's uniqueness.
Additionally, the ids typically are shorter than the fqn.

We can combine a mapping table with hashed fqns to achieve stable ids (without additional storage) and bi-directional lookup.

@enikao
Copy link
Contributor Author

enikao commented Dec 23, 2022

Namespace vs. id-space

We prefer the term id-space. People might be less tempted to use names as ids if we avoid the term name at all in this context.

Also, namespaces are very often hierarchical, whereas our id-space is not.

@enikao
Copy link
Contributor Author

enikao commented Dec 23, 2022

Do we need to define separator char?

No. If required, each application can use their own representation.

Examples:

  • Use a datastructure like list of ids.
  • Use a character outside the valid character range for ids, as fitting for the application (e.g. slash (/) for REST services, dot (.) for programming languages).
  • Use a character inside the valid character range and introduce escaping (e.g. dash (-), and escape any dash that's part of the id with double-dash)

@enikao
Copy link
Contributor Author

enikao commented Dec 23, 2022

Once we look into versioning/branching (#26), we need to amend this decision w.r.t. ids of nodes across branches.

@joswarmer
Copy link
Contributor

If this is ready for closing, can we make the choices we are making explicit here?

@enikao
Copy link
Contributor Author

enikao commented Jan 16, 2023

If this is ready for closing, can we make the choices we are making explicit here?

I updated the description of this issue, it should reflect the choices.

@joswarmer
Copy link
Contributor

Ok, tnx, I was looking at the last comment to find the final choices.

@enikao
Copy link
Contributor Author

enikao commented Jan 20, 2023

Closing as accepted, because there's no objection.

@ftomassetti
Copy link
Contributor

I would add that the ID must not be empty

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants