-
Notifications
You must be signed in to change notification settings - Fork 897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: child rel 1 not found in append_rel_array when using any function in a select statement for some chunks but not all #7078
Comments
I saw another similar bug here (#6140) where it was suggested to run within psql with
|
I ran some more tests, it is not isolated to the to_timestamp() function, any operations or casting on the t_stamp column cause the bug:
But it doesn't seem it is just the t_stamp column, adding +1 to the tagid column also triggers the issue:
As well as if it is in a CTE:
|
I see similar behavior on my side running PG15.3, ts2.10.0 also on Azure. Though, the issue disappears from time to time. Can it be related to accessing chunks, which are currently being compressed? |
Not for us, we had no chunks being compressed while the tests above were run. |
We are also facing the same issue after inserting record in a compressed chunk.
This problem is observed when a function or group-by clause is used in select statement. But if we re create this query with a sub-query (to provide row identification to main query), it works.
|
Seeing this issue happen after a record is inserted in a compressed chunk gave me an idea. I decompressed and recompressed one of the chunks that was giving us this error:
After doing this, the error went away and I received the expected result from a query with a function in the select statement. So the fix is to decompress and then compress the impacted chunks as a work around to the bug. This works in theory but I have around 494 chunks that need this done and running (de)compress on them requires an exclusive lock which ties up database calls. This number will grow over time as the problem continues to happen if we insert records an a compressed chunk. Is this issue resolve without having to recompress in later versions? Is the root cause of this solved in later versions so that the number of chunks I need to fix does not continue to grow over time? Is there any additional information I can provide to help troubleshoot/diagnose/fix this problem? |
Unfortunately, this did not work for every chunk that was having the issue. Still apparent after decompress and compress for the chunk I tested today. |
@chris-morris-h2o can u please try to update the extension for a more recent version. Looks like this issue as solved already in past releases (after 2.11.x of course). |
I tried looking at the release history page. Not sure if this has been fixed yet. https://github.com/timescale/timescaledb/releases |
@saurabhjawas Not sure if there are an specific release notes for it but looking at past issues seem 2.12.x solve it: #6141 (comment) |
Confirming that upgrading to 2.16.1 fixed the bug for us. |
What type of bug is this?
Unexpected error
What subsystems and features are affected?
Query executor
What happened?
When querying a hyper table adding "to_timestamp(t_stamp/1000.0)" causes the following error but only for some chunks:
An example query that produces this message when executed from pgAdmin:
And another example that produces the error when executed form pgAdmin:
I set up a Python script to see if I had a corrupted chunk or some such where it would loop through my chunks in descending order one day at a time:
Sample log from the code above:
If I add ", to_timestamp(t_stamp/1000.0)" after the columns:
I end up with the error message again but not in every chunk, but a large portion of them.
Removing to_timestamp(t_stamp/1000.0) from the two example queries at the start of the issue also cause them to not produce the error and instead return rows (or no rows if no rows found). I also tried variations on the where clause:
Using an asterisk instead:
*, to_timestamp(t_stamp/1000.0)
Getting rid of the decimal on the 1000:
*, to_timestamp(t_stamp/1000)
Also the to_timestamp call by itself still causes the error.
At first I thought maybe it was only happening on uncompressed chunks but I get a result from some compressed chunks and uncompressed chunks with my script:
2024-06-28 08:49:33,200 - INFO - Date range: 2024-04-25 to 2024-04-26: Result: (609154, None, 0.0, None, None, 192, 1714081565466, datetime.datetime(2024, 4, 25, 21, 46, 5, 466000, tzinfo=datetime.timezone.utc))
Is compressed according to my query:
Which outputs this as the earliest two uncompressed chunks (got some data inserted with a bad timestamp):
Then I thought maybe somehow the t_stamp column got removed from some chunks, but looping through them day by day with every column specified in the select statement shows this to not be true.
I also tried running some of the variations above with SET client_min_messages TO DEBUG5; but the output didn't change.
Here is my table creation script from pgAdmin for my hypertable:
We are using 24 hour chunks on our hypertable. We compress after 45 days. We have around 77 terabytes uncompressed and 3.5 terabytes compressed across 3 databases running on this one self managed Postgres server. The server is on an Azure VM, Standard E48ds v5 with 48 vCPUs and 384 GB of RAM.
It is also running on top of OpenZFS. I can provide any other relevant information needed.
TimescaleDB version affected
2.11
PostgreSQL version used
14.12
What operating system did you use?
Ubuntu 22.04 x64
What installation method did you use?
Deb/Apt
What platform did you run on?
Microsoft Azure Cloud
Relevant log output and stack trace
How can we reproduce the bug?
I am not sure how to reproduce this bug. I can reproduce it on my side on demand but I have no idea why it happens on some chunks and not others and only when to_timestamp() is called.
The text was updated successfully, but these errors were encountered: