Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: child rel 1 not found in append_rel_array when using any function in a select statement for some chunks but not all #7078

Closed
chris-morris-h2o opened this issue Jun 28, 2024 · 11 comments
Labels

Comments

@chris-morris-h2o
Copy link

What type of bug is this?

Unexpected error

What subsystems and features are affected?

Query executor

What happened?

When querying a hyper table adding "to_timestamp(t_stamp/1000.0)" causes the following error but only for some chunks:

ERROR:  child rel 1 not found in append_rel_array 

SQL state: XX000

An example query that produces this message when executed from pgAdmin:

select *, to_timestamp(t_stamp/1000.0) from sqlth_1_data where tagid in (483426,
565356,
602444,
603707,
609154,
928947)
    AND t_stamp < 1719578853000
	AND t_stamp > 1704088800000

And another example that produces the error when executed form pgAdmin:

select *, to_timestamp(t_stamp/1000.0) from sqlth_1_data where tagid = 928947
    AND t_stamp < 1719578853000
	AND t_stamp > 1704088800000

I set up a Python script to see if I had a corrupted chunk or some such where it would loop through my chunks in descending order one day at a time:

central_tz = pytz.timezone('America/Chicago')

# Current date in Central Time
current_date = datetime.now(central_tz).replace(hour=0, minute=0, second=0, microsecond=0)
current_date = current_date - timedelta(days=10)
# Date 2 years ago in Central Time
two_years_ago = current_date - timedelta(days=2*365)

# Iterate over each day from now to 2 years ago
while current_date >= two_years_ago:
    # Calculate start and end timestamps for the current day in Central Time
    start_timestamp = int(current_date.timestamp() * 1000)
    end_date = current_date - timedelta(days=1)
    end_timestamp = int(end_date.timestamp() * 1000)

    # SQL query
    query = f"""
    SELECT tagid, intvalue, floatvalue, stringvalue, datevalue, dataintegrity, t_stamp
    FROM sqlth_1_data
    WHERE t_stamp < {start_timestamp} AND t_stamp > {end_timestamp} and 
       tagid in (483426,
565356,
602444,
603707,
609154,
928947)
    LIMIT 1;
    """
    try:
        with conn.cursor(name='chunk_checker_cursor') as cursor:
            cursor.execute(query)
            result = cursor.fetchone()
            if result:
                logging.info(f"Date range: {end_date.strftime('%Y-%m-%d')} to {current_date.strftime('%Y-%m-%d')}: Result: {result}")
            else:
                logging.info(f"Date range: {end_date.strftime('%Y-%m-%d')} to {current_date.strftime('%Y-%m-%d')}: No results")
        conn.commit()
    except Exception as e:
        logging.error(f"Date range: {end_date.strftime('%Y-%m-%d')} to {current_date.strftime('%Y-%m-%d')}: Error: {str(e)}")
        conn.rollback()

    # Move to the previous day
    current_date = end_date

# Close the connection
conn.close()

Sample log from the code above:

2024-06-28 08:47:18,419 - INFO - Date range: 2024-06-17 to 2024-06-18: Result: (609154, None, 0.0, None, None, 192, 1718671630372)
2024-06-28 08:47:18,543 - INFO - Date range: 2024-06-16 to 2024-06-17: Result: (609154, None, 0.0, None, None, 192, 1718575240435)
2024-06-28 08:47:18,676 - INFO - Date range: 2024-06-15 to 2024-06-16: Result: (609154, None, 0.0, None, None, 192, 1718491190342)
2024-06-28 08:47:18,796 - INFO - Date range: 2024-06-14 to 2024-06-15: Result: (609154, None, 0.0, None, None, 192, 1718403885422)

If I add ", to_timestamp(t_stamp/1000.0)" after the columns:


SELECT tagid, intvalue, floatvalue, stringvalue, datevalue, dataintegrity, t_stamp, to_timestamp(t_stamp/1000.0)
    FROM sqlth_1_data
    WHERE t_stamp < {start_timestamp} AND t_stamp > {end_timestamp} and 
       tagid in (483426,
565356,
602444,
603707,
609154,
928947)
    LIMIT 1;

I end up with the error message again but not in every chunk, but a large portion of them.

2024-06-28 08:49:04,706 - INFO - Date range: 2024-04-25 to 2024-04-26: Result: (609154, None, 0.0, None, None, 192, 1714081565466, datetime.datetime(2024, 4, 25, 21, 46, 5, 466000, tzinfo=datetime.timezone.utc))
2024-06-28 08:49:04,900 - ERROR - Date range: 2024-04-24 to 2024-04-25: Error: child rel 1 not found in append_rel_array
====ERROR MESSAGE CONTINUES FOR EVERY DAY UNTIL=====
2024-06-28 08:50:41,102 - ERROR - Date range: 2023-02-13 to 2023-02-14: Error: child rel 1 not found in append_rel_array
2024-06-28 08:50:41,203 - INFO - Date range: 2023-02-12 to 2023-02-13: No results
2024-06-28 08:50:41,395 - ERROR - Date range: 2023-02-11 to 2023-02-12: Error: child rel 1 not found in append_rel_array
2024-06-28 08:50:41,544 - ERROR - Date range: 2023-02-10 to 2023-02-11: Error: child rel 1 not found in append_rel_array
2024-06-28 08:50:41,659 - INFO - Date range: 2023-02-09 to 2023-02-10: No results
2024-06-28 08:50:41,787 - INFO - Date range: 2023-02-08 to 2023-02-09: No results

Removing to_timestamp(t_stamp/1000.0) from the two example queries at the start of the issue also cause them to not produce the error and instead return rows (or no rows if no rows found). I also tried variations on the where clause:

Using an asterisk instead:
*, to_timestamp(t_stamp/1000.0)

Getting rid of the decimal on the 1000:
*, to_timestamp(t_stamp/1000)

Also the to_timestamp call by itself still causes the error.

At first I thought maybe it was only happening on uncompressed chunks but I get a result from some compressed chunks and uncompressed chunks with my script:

2024-06-28 08:49:33,200 - INFO - Date range: 2024-04-25 to 2024-04-26: Result: (609154, None, 0.0, None, None, 192, 1714081565466, datetime.datetime(2024, 4, 25, 21, 46, 5, 466000, tzinfo=datetime.timezone.utc))

Is compressed according to my query:

select hypertable_name, chunk_name, to_timestamp(range_start_integer/1000.0),
to_timestamp(range_end_integer/1000.0) from timescaledb_information.chunks where  is_compressed = false AND
    hypertable_name = 'sqlth_1_data'
    order by range_start_integer

Which outputs this as the earliest two uncompressed chunks (got some data inserted with a bad timestamp):

"sqlth_1_data"	"_hyper_282_12123_chunk"	"1997-04-09 00:00:00+00"	"1997-04-10 00:00:00+00"
"sqlth_1_data"	"_hyper_282_6312_chunk"	"2024-05-18 00:00:00+00"	"2024-05-19 00:00:00+00"

Then I thought maybe somehow the t_stamp column got removed from some chunks, but looping through them day by day with every column specified in the select statement shows this to not be true.

I also tried running some of the variations above with SET client_min_messages TO DEBUG5; but the output didn't change.

Here is my table creation script from pgAdmin for my hypertable:

-- Table: public.sqlth_1_data

-- DROP TABLE IF EXISTS public.sqlth_1_data;

CREATE TABLE IF NOT EXISTS public.sqlth_1_data
(
    tagid integer NOT NULL,
    intvalue bigint,
    floatvalue real,
    stringvalue character varying(255) COLLATE pg_catalog."default",
    datevalue timestamp without time zone,
    dataintegrity integer,
    t_stamp bigint NOT NULL,
    CONSTRAINT "PK__sqlth_1___BE126DD12A1EC133" PRIMARY KEY (tagid, t_stamp)
)

TABLESPACE pg_default;

ALTER TABLE IF EXISTS public.sqlth_1_data
    OWNER to postgres;

ALTER TABLE IF EXISTS public.sqlth_1_data
    ALTER COLUMN tagid SET STATISTICS 2000;

ALTER TABLE IF EXISTS public.sqlth_1_data
    ALTER COLUMN intvalue SET STATISTICS 2000;

ALTER TABLE IF EXISTS public.sqlth_1_data
    ALTER COLUMN floatvalue SET STATISTICS 2000;

ALTER TABLE IF EXISTS public.sqlth_1_data
    ALTER COLUMN stringvalue SET STATISTICS 1500;

ALTER TABLE IF EXISTS public.sqlth_1_data
    ALTER COLUMN datevalue SET STATISTICS 2000;

ALTER TABLE IF EXISTS public.sqlth_1_data
    ALTER COLUMN dataintegrity SET STATISTICS 1000;

ALTER TABLE IF EXISTS public.sqlth_1_data
    ALTER COLUMN t_stamp SET STATISTICS 2000;
-- Index: idx_sqlth_optimized_new

-- DROP INDEX IF EXISTS public.idx_sqlth_optimized_new;

CREATE INDEX IF NOT EXISTS idx_sqlth_optimized_new
    ON public.sqlth_1_data USING btree
    (tagid ASC NULLS LAST, t_stamp DESC NULLS FIRST)
    INCLUDE(intvalue, floatvalue, stringvalue, datevalue, dataintegrity)
    TABLESPACE pg_default;

-- Trigger: ts_cagg_invalidation_trigger

-- DROP TRIGGER IF EXISTS ts_cagg_invalidation_trigger ON public.sqlth_1_data;

CREATE OR REPLACE TRIGGER ts_cagg_invalidation_trigger
    AFTER INSERT OR DELETE OR UPDATE 
    ON public.sqlth_1_data
    FOR EACH ROW
    EXECUTE FUNCTION _timescaledb_internal.continuous_agg_invalidation_trigger('282');

-- Trigger: ts_insert_blocker

-- DROP TRIGGER IF EXISTS ts_insert_blocker ON public.sqlth_1_data;

CREATE OR REPLACE TRIGGER ts_insert_blocker
    BEFORE INSERT
    ON public.sqlth_1_data
    FOR EACH ROW
    EXECUTE FUNCTION _timescaledb_internal.insert_blocker();

We are using 24 hour chunks on our hypertable. We compress after 45 days. We have around 77 terabytes uncompressed and 3.5 terabytes compressed across 3 databases running on this one self managed Postgres server. The server is on an Azure VM, Standard E48ds v5 with 48 vCPUs and 384 GB of RAM.

It is also running on top of OpenZFS. I can provide any other relevant information needed.

TimescaleDB version affected

2.11

PostgreSQL version used

14.12

What operating system did you use?

Ubuntu 22.04 x64

What installation method did you use?

Deb/Apt

What platform did you run on?

Microsoft Azure Cloud

Relevant log output and stack trace

From my postgresql log on the server itself I don't have any additional information. I can enable additional logging or provide additional information as needed.

 Example of it looking the same:

2024-06-28 14:14:30.136 UTC [2057310] postgres@history_sdb01 ERROR:  child rel 1 not found in append_rel_array
2024-06-28 14:14:30.136 UTC [2057310] postgres@history_sdb01 STATEMENT:  SET client_min_messages TO DEBUG5;
        select *, to_timestamp(t_stamp/1000.0) from sqlth_1_data where tagid = 928947
            AND t_stamp < 1719578853000
                AND t_stamp > 1704088800000
2024-06-28 14:15:17.057 UTC [2057310] postgres@history_sdb01 ERROR:  child rel 1 not found in append_rel_array
2024-06-28 14:15:17.057 UTC [2057310] postgres@history_sdb01 STATEMENT:  SET client_min_messages TO DEBUG5;
        select *, to_timestamp(t_stamp/1000.0) from sqlth_1_data where tagid = 928947
            AND t_stamp < 1719578853000
                AND t_stamp > 1704088800000

How can we reproduce the bug?

I am not sure how to reproduce this bug. I can reproduce it on my side on demand but I have no idea why it happens on some chunks and not others and only when to_timestamp() is called.
@chris-morris-h2o
Copy link
Author

chris-morris-h2o commented Jul 1, 2024

I saw another similar bug here (#6140) where it was suggested to run within psql with \set VERBOSITY verbose. I did this and received this output:

postgres=# \set VERBOSITY verbose
postgres=# \c history_sdb01
You are now connected to database "history_sdb01" as user "postgres".
history_sdb01=# select *, to_timestamp(t_stamp/1000.0) from sqlth_1_data where tagid in (483426,
565356,
602444,
603707,
609154,
928947)
    AND t_stamp < 1719578853000
        AND t_stamp > 1704088800000;
ERROR:  XX000: child rel 1 not found in append_rel_array
LOCATION:  find_appinfos_by_relids, appendinfo.c:730

@chris-morris-h2o
Copy link
Author

chris-morris-h2o commented Jul 1, 2024

I ran some more tests, it is not isolated to the to_timestamp() function, any operations or casting on the t_stamp column cause the bug:

history_sdb01=# SELECT *, t_stamp::numeric as t_stamp_numeric
FROM sqlth_1_data
WHERE tagid IN (483426, 565356, 602444, 603707, 609154, 928947)
    AND t_stamp < 1719578853000
    AND t_stamp > 1704088800000;
ERROR:  XX000: child rel 1 not found in append_rel_array
LOCATION:  find_appinfos_by_relids, appendinfo.c:730

But it doesn't seem it is just the t_stamp column, adding +1 to the tagid column also triggers the issue:

    query = f"""
    SELECT tagid, intvalue, floatvalue, stringvalue, datevalue, dataintegrity, t_stamp, tagid+1
    FROM sqlth_1_data
    WHERE t_stamp < {start_timestamp} AND t_stamp > {end_timestamp} and 
       tagid in (483426,
565356,
602444,
603707,
609154,
928947)
    LIMIT 1;
    """

2024-07-01 09:37:04,679 - INFO - Date range: 2024-04-27 to 2024-04-28: Result: (609154, None, 0.0, None, None, 192, 1714258210519, 609155)
2024-07-01 09:37:04,767 - INFO - Date range: 2024-04-26 to 2024-04-27: Result: (609154, None, 0.0, None, None, 192, 1714173390490, 609155)
2024-07-01 09:37:04,853 - INFO - Date range: 2024-04-25 to 2024-04-26: Result: (609154, None, 0.0, None, None, 192, 1714081565466, 609155)
2024-07-01 09:37:04,982 - ERROR - Date range: 2024-04-24 to 2024-04-25: Error: child rel 1 not found in append_rel_array

2024-07-01 09:37:05,103 - ERROR - Date range: 2024-04-23 to 2024-04-24: Error: child rel 1 not found in append_rel_array

2024-07-01 09:37:05,163 - ERROR - Date range: 2024-04-22 to 2024-04-23: Error: child rel 1 not found in append_rel_array

As well as if it is in a CTE:

history_sdb01=# WITH base_data AS (
  SELECT *
  FROM sqlth_1_data
  WHERE tagid IN (483426, 565356, 602444, 603707, 609154, 928947)
      AND t_stamp < 1719578853000
      AND t_stamp > 1704088800000
)
SELECT *, to_timestamp(t_stamp/1000.0) FROM base_data;
ERROR:  XX000: child rel 2 not found in append_rel_array
LOCATION:  find_appinfos_by_relids, appendinfo.c:730

@chris-morris-h2o chris-morris-h2o changed the title [Bug]: child rel 1 not found in append_rel_array when using to_timestamp() with bigint timestamp [Bug]: child rel 1 not found in append_rel_array when using any function in a select statement for some chunks but not all Jul 2, 2024
@moritz93
Copy link

moritz93 commented Jul 4, 2024

I see similar behavior on my side running PG15.3, ts2.10.0 also on Azure. Though, the issue disappears from time to time. Can it be related to accessing chunks, which are currently being compressed?

@chris-morris-h2o
Copy link
Author

I see similar behavior on my side running PG15.3, ts2.10.0 also on Azure. Though, the issue disappears from time to time. Can it be related to accessing chunks, which are currently being compressed?

Not for us, we had no chunks being compressed while the tests above were run.

@saurabhjawas
Copy link

saurabhjawas commented Jul 16, 2024

We are also facing the same issue after inserting record in a compressed chunk.
The below query on a hypertable (postgres with timescaledb) gives this error: ERROR: child rel 1 not found in append_rel_array.

select col_name, count (*) from table_name where col_name = 999 group by col_name;

This problem is observed when a function or group-by clause is used in select statement.

But if we re create this query with a sub-query (to provide row identification to main query), it works.

select col_name, count (*) from table_name where col_name in (select col_name from table_name where col_name = 999) group by col_name;

@chris-morris-h2o
Copy link
Author

We are also facing the same issue after inserting record in a compressed chunk. The below query on a hypertable (postgres with timescaledb) gives this error: ERROR: child rel 1 not found in append_rel_array.

Seeing this issue happen after a record is inserted in a compressed chunk gave me an idea. I decompressed and recompressed one of the chunks that was giving us this error:

SELECT decompress_chunk('_timescaledb_internal._hyper_282_5188_chunk');

SELECT compress_chunk('_timescaledb_internal._hyper_282_5188_chunk');

After doing this, the error went away and I received the expected result from a query with a function in the select statement.

So the fix is to decompress and then compress the impacted chunks as a work around to the bug. This works in theory but I have around 494 chunks that need this done and running (de)compress on them requires an exclusive lock which ties up database calls. This number will grow over time as the problem continues to happen if we insert records an a compressed chunk.

Is this issue resolve without having to recompress in later versions? Is the root cause of this solved in later versions so that the number of chunks I need to fix does not continue to grow over time?

Is there any additional information I can provide to help troubleshoot/diagnose/fix this problem?

@chris-morris-h2o
Copy link
Author

Unfortunately, this did not work for every chunk that was having the issue. Still apparent after decompress and compress for the chunk I tested today.

@fabriziomello
Copy link
Contributor

@chris-morris-h2o can u please try to update the extension for a more recent version. Looks like this issue as solved already in past releases (after 2.11.x of course).

@saurabhjawas
Copy link

saurabhjawas commented Jul 26, 2024

@chris-morris-h2o can u please try to update the extension for a more recent version. Looks like this issue as solved already in past releases (after 2.11.x of course).

I tried looking at the release history page. Not sure if this has been fixed yet. https://github.com/timescale/timescaledb/releases

@fabriziomello
Copy link
Contributor

@saurabhjawas Not sure if there are an specific release notes for it but looking at past issues seem 2.12.x solve it: #6141 (comment)

@chris-morris-h2o
Copy link
Author

Confirming that upgrading to 2.16.1 fixed the bug for us.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants