Implement encode/decode for ID columns #730

calebj · 2025-01-07T08:04:37Z

This PR allows users to specify the special value of 'func' for p_epoch to use custom functions to encode/decode time-ordered integers other than the classic seconds, ms, us or ns since the UNIX epoch.

Resolves #729.

calebj · 2025-01-07T08:06:20Z

sql/functions/run_maintenance.sql

@@ -18,7 +18,6 @@ v_analyze                       boolean := FALSE;
 v_check_subpart                 int;
 v_child_timestamp               timestamptz;
 v_control_type                  text;
-v_time_encoder                  text;


I removed this variable since it isn't used anywhere.

calebj · 2025-01-07T08:11:24Z

sql/functions/run_maintenance.sql

+        -- avoids the need for a functional index using to_timestamp or decoder on control column to quickly find the max
+        v_max_control_expression := CASE
+            WHEN v_row.epoch = 'seconds' THEN format('to_timestamp(max(%I))', v_row.control)
+            WHEN v_row.epoch = 'milliseconds' THEN format('to_timestamp((max(%I)/1000)::float)', v_row.control)
+            WHEN v_row.epoch = 'microseconds' THEN format('to_timestamp((max(%I)/1000000)::float)', v_row.control)
+            WHEN v_row.epoch = 'nanoseconds' THEN format('to_timestamp((max(%I)/1000000000)::float)', v_row.control)
+            WHEN v_row.epoch = 'func' THEN format('%s(max(%I))', v_time_decoder, v_row.control)
+            ELSE format('max(%I)', v_row.control)
+        END;


This should remove the need for a functional index to find the maximum, provided there is an index on the underlying column. That should be a safe assumption to make since postgres prohibits creating a unique index or primary key on a partitioned table that doesn't contain the partition key.

If this works in all scenarios, it should also be ported to partition_data_time()

calebj · 2025-01-07T08:28:29Z

Marking as draft because this still needs tests for smaller integer types

keithf4 · 2025-01-07T14:40:46Z

Thank you! Will be a bit busy over the next month or so, so may not get to dig into this for a bit.

Just reading your notes, though, it is possible to have unique indexes on non-control columns for individual tables using the template table system in partman. May also want to keep in mind that it may be possible in core in the future as well. So not sure if that will affect what you've done.

calebj · 2025-01-07T17:06:22Z

Unique indexes on individual partitions or on non-control columns won't affect anything in this PR. The intent is to optimize the check by using MAX() on the base column before converting it to a timestamp, since otherwise it needs a functional index. I added that note because if there's no edge case where that would be undesirable, it makes sense to apply the same type of change to this section's calls to MIN() and MAX()

pg_partman/sql/functions/partition_data_time.sql

Lines 131 to 159 in 1192e93

    
           v_partition_expression := CASE 
        
               WHEN v_epoch = 'seconds' THEN format('to_timestamp(%I)', v_control) 
        
               WHEN v_epoch = 'milliseconds' THEN format('to_timestamp((%I/1000)::float)', v_control) 
        
               WHEN v_epoch = 'microseconds' THEN format('to_timestamp((%I/1000000)::float)', v_control) 
        
               WHEN v_epoch = 'nanoseconds' THEN format('to_timestamp((%I/1000000000)::float)', v_control) 
        
               ELSE format('%I', v_control) 
        
           END; 
        
           -- Generate column list to use in SELECT/INSERT statements below. Allows for exclusion of GENERATED (or any other desired) columns. 
        
           SELECT string_agg(quote_ident(attname), ',') 
        
           INTO v_column_list 
        
           FROM pg_catalog.pg_attribute a 
        
           JOIN pg_catalog.pg_class c ON a.attrelid = c.oid 
        
           JOIN pg_catalog.pg_namespace n ON c.relnamespace = n.oid 
        
           WHERE n.nspname = v_source_schemaname 
        
           AND c.relname = v_source_tablename 
        
           AND a.attnum > 0 
        
           AND a.attisdropped = false 
        
           AND attname <> ALL(COALESCE(p_ignored_columns, ARRAY[]::text[])); 
        
           FOR i IN 1..p_batch_count LOOP 
        
               IF p_order = 'ASC' THEN 
        
                   EXECUTE format('SELECT min(%s) FROM ONLY %I.%I', v_partition_expression, v_source_schemaname, v_source_tablename) INTO v_start_control; 
        
               ELSIF p_order = 'DESC' THEN 
        
                   EXECUTE format('SELECT max(%s) FROM ONLY %I.%I', v_partition_expression, v_source_schemaname, v_source_tablename) INTO v_start_control; 
        
               ELSE 
        
                   RAISE EXCEPTION 'Invalid value for p_order. Must be ASC or DESC'; 
        
               END IF;

One other improvement I want to make is a round-trip and ordering sanity check for the passed functions, and an assert that the functions are at least marked RETURNS NULL ON NULL INPUT or STRICT, and IMMUTABLE.

Also, I think Julian dates might make an interesting example for 32-bit IDs, so I will play with that for a daily test.

This PR allows users to specify the special value of 'func' for p_epoch to use custom functions to encode/decode time-ordered integers other than the classic seconds, ms, us or ns since the UNIX epoch. Resolves pgpartman#729.

calebj commented Jan 7, 2025

View reviewed changes

calebj changed the title ~~Implement encode/decode for ID columns + tests, docs~~ Implement encode/decode for ID columns Jan 7, 2025

calebj marked this pull request as draft January 7, 2025 08:27

keithf4 assigned keithf4 and calebj Jan 7, 2025

keithf4 added the feature request label Jan 7, 2025

keithf4 added this to the Future milestone Jan 7, 2025

calebj added 3 commits January 7, 2025 21:08

Implement encode/decode for ID columns + tests, docs

1d643a2

This PR allows users to specify the special value of 'func' for p_epoch to use custom functions to encode/decode time-ordered integers other than the classic seconds, ms, us or ns since the UNIX epoch. Resolves pgpartman#729.

remove redundant code for func epoch in create_partition_time

c2f44bf

Add julian day func test and fix assumption in show_partitions()

e44919b

calebj force-pushed the feat-time-func-ids branch from 5d85138 to 9f8dfa2 Compare January 8, 2025 03:10

calebj marked this pull request as ready for review January 8, 2025 03:13

Add encoder/decoder checks and fix julian date test

c592b2b

calebj force-pushed the feat-time-func-ids branch from 9f8dfa2 to c592b2b Compare January 8, 2025 06:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement encode/decode for ID columns #730

Implement encode/decode for ID columns #730

calebj commented Jan 7, 2025

calebj Jan 7, 2025

calebj Jan 7, 2025

calebj commented Jan 7, 2025

keithf4 commented Jan 7, 2025

calebj commented Jan 7, 2025

Implement encode/decode for ID columns #730

Are you sure you want to change the base?

Implement encode/decode for ID columns #730

Conversation

calebj commented Jan 7, 2025

calebj Jan 7, 2025

Choose a reason for hiding this comment

calebj Jan 7, 2025

Choose a reason for hiding this comment

calebj commented Jan 7, 2025

keithf4 commented Jan 7, 2025

calebj commented Jan 7, 2025