Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement encode/decode for ID columns #730

Open
wants to merge 4 commits into
base: development
Choose a base branch
from

Conversation

calebj
Copy link

@calebj calebj commented Jan 7, 2025

This PR allows users to specify the special value of 'func' for p_epoch to use custom functions to encode/decode time-ordered integers other than the classic seconds, ms, us or ns since the UNIX epoch.

Resolves #729.

@@ -18,7 +18,6 @@ v_analyze boolean := FALSE;
v_check_subpart int;
v_child_timestamp timestamptz;
v_control_type text;
v_time_encoder text;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this variable since it isn't used anywhere.

Comment on lines +265 to +273
-- avoids the need for a functional index using to_timestamp or decoder on control column to quickly find the max
v_max_control_expression := CASE
WHEN v_row.epoch = 'seconds' THEN format('to_timestamp(max(%I))', v_row.control)
WHEN v_row.epoch = 'milliseconds' THEN format('to_timestamp((max(%I)/1000)::float)', v_row.control)
WHEN v_row.epoch = 'microseconds' THEN format('to_timestamp((max(%I)/1000000)::float)', v_row.control)
WHEN v_row.epoch = 'nanoseconds' THEN format('to_timestamp((max(%I)/1000000000)::float)', v_row.control)
WHEN v_row.epoch = 'func' THEN format('%s(max(%I))', v_time_decoder, v_row.control)
ELSE format('max(%I)', v_row.control)
END;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should remove the need for a functional index to find the maximum, provided there is an index on the underlying column. That should be a safe assumption to make since postgres prohibits creating a unique index or primary key on a partitioned table that doesn't contain the partition key.

If this works in all scenarios, it should also be ported to partition_data_time()

@calebj calebj changed the title Implement encode/decode for ID columns + tests, docs Implement encode/decode for ID columns Jan 7, 2025
@calebj calebj marked this pull request as draft January 7, 2025 08:27
@calebj
Copy link
Author

calebj commented Jan 7, 2025

Marking as draft because this still needs tests for smaller integer types

@keithf4
Copy link
Collaborator

keithf4 commented Jan 7, 2025

Thank you! Will be a bit busy over the next month or so, so may not get to dig into this for a bit.

Just reading your notes, though, it is possible to have unique indexes on non-control columns for individual tables using the template table system in partman. May also want to keep in mind that it may be possible in core in the future as well. So not sure if that will affect what you've done.

@keithf4 keithf4 added this to the Future milestone Jan 7, 2025
@calebj
Copy link
Author

calebj commented Jan 7, 2025

Unique indexes on individual partitions or on non-control columns won't affect anything in this PR. The intent is to optimize the check by using MAX() on the base column before converting it to a timestamp, since otherwise it needs a functional index. I added that note because if there's no edge case where that would be undesirable, it makes sense to apply the same type of change to this section's calls to MIN() and MAX()

v_partition_expression := CASE
WHEN v_epoch = 'seconds' THEN format('to_timestamp(%I)', v_control)
WHEN v_epoch = 'milliseconds' THEN format('to_timestamp((%I/1000)::float)', v_control)
WHEN v_epoch = 'microseconds' THEN format('to_timestamp((%I/1000000)::float)', v_control)
WHEN v_epoch = 'nanoseconds' THEN format('to_timestamp((%I/1000000000)::float)', v_control)
ELSE format('%I', v_control)
END;
-- Generate column list to use in SELECT/INSERT statements below. Allows for exclusion of GENERATED (or any other desired) columns.
SELECT string_agg(quote_ident(attname), ',')
INTO v_column_list
FROM pg_catalog.pg_attribute a
JOIN pg_catalog.pg_class c ON a.attrelid = c.oid
JOIN pg_catalog.pg_namespace n ON c.relnamespace = n.oid
WHERE n.nspname = v_source_schemaname
AND c.relname = v_source_tablename
AND a.attnum > 0
AND a.attisdropped = false
AND attname <> ALL(COALESCE(p_ignored_columns, ARRAY[]::text[]));
FOR i IN 1..p_batch_count LOOP
IF p_order = 'ASC' THEN
EXECUTE format('SELECT min(%s) FROM ONLY %I.%I', v_partition_expression, v_source_schemaname, v_source_tablename) INTO v_start_control;
ELSIF p_order = 'DESC' THEN
EXECUTE format('SELECT max(%s) FROM ONLY %I.%I', v_partition_expression, v_source_schemaname, v_source_tablename) INTO v_start_control;
ELSE
RAISE EXCEPTION 'Invalid value for p_order. Must be ASC or DESC';
END IF;

One other improvement I want to make is a round-trip and ordering sanity check for the passed functions, and an assert that the functions are at least marked RETURNS NULL ON NULL INPUT or STRICT, and IMMUTABLE.

Also, I think Julian dates might make an interesting example for 32-bit IDs, so I will play with that for a daily test.

calebj added 3 commits January 7, 2025 21:08
This PR allows users to specify the special value of 'func' for p_epoch
to use custom functions to encode/decode time-ordered integers other than
the classic seconds, ms, us or ns since the UNIX epoch.

Resolves pgpartman#729.
@calebj calebj force-pushed the feat-time-func-ids branch from 5d85138 to 9f8dfa2 Compare January 8, 2025 03:10
@calebj calebj marked this pull request as ready for review January 8, 2025 03:13
@calebj calebj force-pushed the feat-time-func-ids branch from 9f8dfa2 to c592b2b Compare January 8, 2025 06:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for time-based integer IDs (e.g. snowflake) using encode/decode
2 participants