Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(LogUniqueId): Create unique ID for logs #41

Closed
rc10house opened this issue Jun 26, 2024 · 2 comments
Closed

feat(LogUniqueId): Create unique ID for logs #41

rc10house opened this issue Jun 26, 2024 · 2 comments
Assignees

Comments

@rc10house
Copy link
Contributor

rc10house commented Jun 26, 2024

Tied to #40

Problem

We need our logs to have unique ID values we can use to index them in memory in an efficient manner. The IDs should be tied to the timestamp of the log so we can sort them and there should be no collision on ID values. When we start building out segment functionality, we want to be able to operate on these logs in a data structure efficiently (binary tree, min-heap, etc) so incorporating timestamp into the ID is important.

Approach

We should use Prefixed K-Sortable Unique IDentifiers (PKSUID) to accomplish this. A python module exists here. The "prefix" part allows for a prefixed identifier to the ID such as log_1032HU2eZvKYlo2CEPtcnUvl. This will allow us to use PKSUIDs for other objects in the future such as segments (ex: seg_1032HU2eZvKYlo2CEPtcnUvl).

Example

Below is an example of using the module. We want to make sure to set the timestamp ourselves with the timestamp value of the logs. These should be normalized to Coordinated Universal Time (UTC) (ticket here: #42)

from pksuid import PKSUID

# generate a new unique identifier with the prefix usr
uid = PKSUID('usr')

# returns 'usr_24OnhzwMpa4sh0NQmTmICTYuFaD'
print(uid)

# returns: usr
print(uid.get_prefix())

# returns: 1643510623
print(uid.get_timestamp())

# returns: 2022-01-30 02:43:43
print(uid.get_datetime())

# returns: b'\x81>*\xccDJT\xf1\xbe\xa9\xf3&\xe8\xa5\xb2\xc1'
print(uid.get_payload())

# convert from a str representation back to PKSUID
uid_from_string = PKSUID.parse('usr_24OnhzwMpa4sh0NQmTmICTYuFaD')

# this can now be used as usual
# returns: 1643510623
print(uid_from_string.get_timestamp())

# conversion to and parsing from bytes is also possible
uid_as_bytes = uid.bytes()
uid_from_bytes = PKSUID.parse_bytes(uid_as_bytes)

# returns: 2022-01-30 02:43:43
print(uid_from_bytes.get_datetime())

# all the standard comparison operators are available
import time
ts = int(time.time())

# OUR USE CASE
lesser_uid, greater_uid = PKSUID('usr', timestamp = ts), PKSUID('usr', timestamp=ts + 5)

# returns True
print(lesser_uid < greater_uid)

# except for the case of equivalence operators (eq, ne), the prefix is not taken into account when comparing
prefixed_uid_1, prefixed_uid_2 = PKSUID('diff', timestamp = ts), PKSUID('prefix', timestamp=ts + 5)

# returns True
print(prefixed_uid_1 < prefixed_uid_2)

Definition of Done

  1. Build functionality into log class to create ID when parsing JSON
  2. Normalize time to UTC -> feat(NormalizeTimestamp): Normalize UserAle and arbitrary log timestamps to UTC #42
  3. Create PKSUID and store as field within log object

Remember to use Python type hints when implementing

@jlhitzeman
Copy link
Contributor

Commenting for access

@rc10house
Copy link
Contributor Author

Closed via #47

This was referenced Jul 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants