panic: unsupported for aggregate min: *reads.stringMultiShardArrayCursor #26142

ttftw · 2025-03-14T18:15:15Z

I have asked for help in Slack and will follow up with customer support, but I thought I would also post this here.

I am currently using influxdb cloud as our primary data store. All data is coming in from an mqtt server. I recently spun up a local influxdb instance in a docker container and am sending a copy of the data to a this new local server for testing. The local server adds a few new tags, but besides that, everything else should be the same. Same bucket name, same data structure, etc. I'm doing this so I can just drop in this new server into existing Grafana dashboards and test some things before we push changes that go to the production server.

When I run the same queries from the cloud against the local server, I have some that cause influxdb to panic.

from(bucket: "bucket")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "Sample")
  |> filter(fn: (r) => r["serial"] == "xxx.xxx")
  |> filter(fn: (r) => r["channel"] == "Cellular")
  |> aggregateWindow(every: v.windowPeriod, fn: min, createEmpty: false)

This runs fine on the cloud server, but when I run it locally, I get

panic: unsupported for aggregate min: *reads.stringMultiShardArrayCursor

and if I comment out the aggregateWindow, then it returns the data as I'd expect. One thing to note is when I run this query, the _value column data type is a double in the cloud and a long on the local instance, but besides that, the tables look identical when I comment out the aggregateWindow.

Oddly enough, I just noticed that if I change the query from

|> filter(fn: (r) => r["channel"] == "Cellular")

to

|> filter(fn: (r) => r["_field"] == "Status.Cellular")

which targets the same data rows, but a different way, then I do not get this panic and the data returns as expected with the aggregateWindow.

Any insight as to why this query would be successful in the cloud version but panic in the OSS version will be much appreciated.

InfluxDB v2.7.11
Server: fbf5d4a

Logs:

influxdb-1 | ts=2025-03-14T15:39:37.538278Z lvl=info msg="Execute source panic" log_id=0vHGw81W000 service=storage-reads error="panic: unsupported for aggregate min: *reads.stringMultiShardArrayCursor" stacktrace="goroutine 207758 [running]:\nruntime/debug.Stack()\n\t/go/src/runtime/debug/stack.go:24 +0x5e\ngithub.com/influxdata/flux/execute.(*executionState).recover(0xc002241170)\n\t/go/pkg/mod/github.com/influxdata/[email protected]/execute/recover.go:32 +0x1fd\npanic({0x7f33d4a24b40?, 0xc0036f9650?})\n\t/go/src/runtime/panic.go:770 +0x132\ngithub.com/influxdata/influxdb/v2/storage/reads.newWindowMinArrayCursor({0x7f33d502de40?, 0xc00735d850?}, {{0x0, 0x2540be400, 0x0}, {0x0, 0x2540be400, 0x0}, 0x0, 0x0, ...})\n\t/root/project/storage/reads/array_cursor.gen.go:158 +0x40c\ngithub.com/influxdata/influxdb/v2/storage/reads.newWindowAggregateArrayCursor({0xc0069e5dc0?, 0x9?}, 0x9?, {{0x0, 0x2540be400, 0x0}, {0x0, 0x2540be400, 0x0}, 0x0, ...}, ...)\n\t/root/project/storage/reads/array_cursor.go:43 +0x9c\ngithub.com/influxdata/influxdb/v2/storage/reads.(*windowAggregateResultSet).createCursor(0xc0069e3770, {{0x0, 0x0, 0x0}, {0x7f3384608495, 0x6, 0x3f7b6b}, {0xc002071340, 0x7, 0x7}, ...})\n\t/root/project/storage/reads/aggregate_resultset.go:131 +0x2a6\ngithub.com/influxdata/influxdb/v2/storage/reads.(*windowAggregateResultSet).Next(0xc0069e3770)\n\t/root/project/storage/reads/aggregate_resultset.go:84 +0x138\ngithub.com/influxdata/influxdb/v2/storage/flux.(*windowAggregateIterator).handleRead(0xc00295a100, 0x7f33d5035218?, {0x7f33d5048520, 0xc0069e3770})\n\t/root/project/storage/flux/reader.go:742 +0x3aa\ngithub.com/influxdata/influxdb/v2/storage/flux.(*windowAggregateIterator).Do(0xc00295a100, 0xc0016b3ae0)\n\t/root/project/storage/flux/reader.go:688 +0x365\ngithub.com/influxdata/influxdb/v2/query/stdlib/influxdata/influxdb.(*Source).processTables(0xc002070f20, {0x7f33d5035218, 0xc0036fe090}, {0x7f33d5020118, 0xc00295a100}, 0x182cb58e09149c46)\n\t/root/project/query/stdlib/influxdata/influxdb/source.go:69 +0x9a\ngithub.com/influxdata/influxdb/v2/query/stdlib/influxdata/influxdb.(*readWindowAggregateSource).run(0xc002070f20, {0x7f33d5035218, 0xc0036fe090})\n\t/root/project/query/stdlib/influxdata/influxdb/source.go:303 +0x106\ngithub.com/influxdata/influxdb/v2/query/stdlib/influxdata/influxdb.(*Source).Run(0xc002070f20, {0x7f33d5035218, 0xc0036fe090})\n\t/root/project/query/stdlib/influxdata/influxdb/source.go:50 +0xa3\ngithub.com/influxdata/flux/execute.(*executionState).do.func2({0x7f33d5036d70, 0xc002070f20})\n\t/go/pkg/mod/github.com/influxdata/[email protected]/execute/executor.go:535 +0x375\ncreated by github.com/influxdata/flux/execute.(*executionState).do in goroutine 234\n\t/go/pkg/mod/github.com/influxdata/[email protected]/execute/executor.go:515 +0xf8\n"

The text was updated successfully, but these errors were encountered:

This PR is used to alleviate the erroneous panic we are seeing corresponding with #26142. There should not be a panic and instead we should be throwing an error.

devanbenz · 2025-03-19T14:07:12Z

@ttftw I've opened up a PR to remove the erroneous panic. That being said I'll also need to take a look at the query to see why its being transformed to use the improper cursor type. Thank you for the detailed bug report. Going to continue looking in to it today.

devanbenz · 2025-03-19T15:21:52Z

Also if possible but would you be willing to send me over some line protocol for the schema you're using so I can write data to a local influxdb to help with debugging? I'll try my best to reproduce some mock data with the query you've provided in the mean time.

ttftw · 2025-03-19T21:49:42Z

@devanbenz Here is a sample of data that generated the error above:

Sample,alarm=False,channel=Cellular,client=None,driver=Status,project=None,serial=PV112,site=Demo Status.Cellular=-17i 1742420358

devanbenz · 2025-03-20T16:52:14Z

@ttftw thank you - another question: How exactly are you running the flux query? Are you using the chronograf UI or running it some other way?

ttftw · 2025-03-20T20:39:22Z

I've tried and confirmed the error from Grafana dashboards/Explore and the Data Explorer tool in Influxdb

devanbenz · 2025-03-25T17:30:43Z

I've been attempting to reproduce this issue locally without much luck. I assume that this data was line protocol you dumped from a TSM file?

Sample,alarm=False,channel=Cellular,client=None,driver=Status,project=None,serial=PV112,site=Demo Status.Cellular=-17i 1742420358

Just ensuring I am reading it correctly. I assume that your tags set is the following:

alarm=False,channel=Cellular,client=None,driver=Status,project=None,serial=PV112,site=Demo

Can you verify that I have tags right? I may request that you give me a clone of some mock data in the form of a TSM file you're seeing the issue on if possible. I would like to try and reproduce this issue myself.

devanbenz · 2025-03-25T19:23:11Z

Okay so I was able to reproduce this issue. I basically wrote some points in line protocol using a script. I modified my points so that they were a string type for _value which is just the field value as outlined by our documentation here: https://docs.influxdata.com/influxdb/cloud/reference/key-concepts/data-elements/#fields. Obviously I only have 1 field being channel but I wonder if you have multiple fields and theres a conflict between them somehow... My suspicion is that you have some sneaky _value 's locally that are of string type?

Documents/InfluxData/issue_26142 via 🐍 v3.10.12 on ☁  [email protected] took 18s
❯ python3 repro_lp.py
Data saved to influxdb_data.txt

Sample of generated data:
Sample,alarm=true,channel=WiFi,client=Client1,driver=Status,project=Project3,serial=PV112,site=Production WiFi="bar" 1742324017
Sample,alarm=true,channel=Cellular,client=None,driver=Sensor,project=None,serial=PV112,site=Production Cellular="baz" 1742324077
Sample,alarm=false,channel=Cellular,client=Client3,driver=Monitor,project=None,serial=PV112,site=Staging Cellular="bar" 1742324137
Sample,alarm=false,channel=Ethernet,client=Client2,driver=Control,project=Project1,serial=PV112,site=Development Ethernet="baz" 1742324198
Sample,alarm=true,channel=Bluetooth,client=Client1,driver=Status,project=Project1,serial=PV112,site=Production Bluetooth="foo" 1742324258

I've gone ahead and added a PR for this case where instead of panic'ing we will error. #26165

On your local instance where you're seeing the issue could you run your flux query without the aggregation?

from(bucket: "bucket")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "Sample")
  |> filter(fn: (r) => r["serial"] == "xxx.xxx")
  |> filter(fn: (r) => r["channel"] == "Cellular")

And send over a screen grab from the Simple Table graph like so?

I have a feeling that somehow the query is attempting to aggregate on string values instead of numerical values.

ttftw · 2025-03-26T19:06:36Z

Here are the columns that come back from that query. Without the aggregate window, it data comes back. With the aggregate window, I get the error.

If I change the query to:

from(bucket: "bucket")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "Sample")
  |> filter(fn: (r) => r["serial"] == "xxx.xxx")
  |> filter(fn: (r) => r["_field"] == "Status.Cellular")
  |> aggregateWindow(every: v.windowPeriod, fn: min, createEmpty: false)

This targets the same rows and runs fine. For some reason changing the filter to |> filter(fn: (r) => r["channel"] == "Cellular") is causing this error.

This is the data schema that comes back when I use the _field query above with aggregate window:

devanbenz · 2025-03-26T20:03:57Z

And what if you send the query just like this?

from(bucket: "bucket")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "Sample")

I wonder if there is a mix of _value types. I see the following when I write the two points:

Sample,alarm=False,channel=Cellular,client=None,driver=Status,project=None,serial=PV112,site=Demo Status.Cellular=-17i 1742420358

and

Sample,alarm=False,channel=Wifi,client=None,driver=Status,project=None,serial=PV112,site=Demo Status.Wifi="foo" 1742420359

When querying this using the window aggregate I see (with my fix to bubble up error instead of panic):

Other than that I cannot repro with the data in line protocol you provided. It appears that adjusting your query works. I'll remove the panics and close the issue if we can't dig up anything more.

ttftw · 2025-03-27T15:11:04Z

from(bucket: "bucket")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "Sample")

This works, but there are a handful of tables that have a different data type for _value. Some _value's are strings and some are longs.

Other than that I cannot repro with the data in line protocol you provided. It appears that adjusting your query works. I'll remove the panics and close the issue if we can't dig up anything more.

I might be misunderstanding, but your last screenshot seems to be reproducing the bug I see?
In that bucket, I have data that is like you describe above, where the channel tag value changes and the field's _value type is a string for some records, and a long for others. These different datasets have different field names, but the same names have the same data type. If I filter by channel == cellular, that table only returns field names with _value's as longs, but if I filter by channel == cellular_name, then the field _value type will be a string. When I target the data by channel == cellular and try to aggregate, I get this panic that says something about aggregating strings, but the type for these field _value's is long.

If I change the query from channel == "cellular" to field == "status.cellular", these two queries return the exact same dataset that show the same data types, but I do not get the panic when aggregating and filtering by field name, while I do when filtering by the tag channel. This same panic happens when I filter other channel tag values, too.

Something else I just noticed, if I run this query:

from(bucket: "bucket")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "Sample")
  |> filter(fn: (r) => r["serial"] == "xxx")
  |> filter(fn: (r) => r["channel"] == "Cellular")
  |> group(columns: ["_field"])
  |> aggregateWindow(every: 1m, fn: min, createEmpty: false)

It works, but if I comment out |> group(columns: ["_field"]), then I get the panic.

Also, to reiterate from my original post, all of this works correctly with no panics or errors when I run these same queries on a duplicate set of this data in our Influxdb cloud database.

devanbenz · 2025-03-27T19:36:10Z

from(bucket: "bucket")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "Sample")
This works, but there are a handful of tables that have a different data type for _value. Some _value's are strings and some are longs.

Other than that I cannot repro with the data in line protocol you provided. It appears that adjusting your query works. I'll remove the panics and close the issue if we can't dig up anything more.

I might be misunderstanding, but your last screenshot seems to be reproducing the bug I see? In that bucket, I have data that is like you describe above, where the channel tag value changes and the field's _value type is a string for some records, and a long for others. These different datasets have different field names, but the same names have the same data type. If I filter by channel == cellular, that table only returns field names with _value's as longs, but if I filter by channel == cellular_name, then the field _value type will be a string. When I target the data by channel == cellular and try to aggregate, I get this panic that says something about aggregating strings, but the type for these field _value's is long.

If I change the query from channel == "cellular" to field == "status.cellular", these two queries return the exact same dataset that show the same data types, but I do not get the panic when aggregating and filtering by field name, while I do when filtering by the tag channel. This same panic happens when I filter other channel tag values, too.

Something else I just noticed, if I run this query:
from(bucket: "bucket")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "Sample")
  |> filter(fn: (r) => r["serial"] == "xxx")
  |> filter(fn: (r) => r["channel"] == "Cellular")
  |> group(columns: ["_field"])
  |> aggregateWindow(every: 1m, fn: min, createEmpty: false)
It works, but if I comment out |> group(columns: ["_field"]), then I get the panic.

Also, to reiterate from my original post, all of this works correctly with no panics or errors when I run these same queries on a duplicate set of this data in our Influxdb cloud database.

Sorry - I meant with the single line:

Sample,alarm=False,channel=Cellular,client=None,driver=Status,project=None,serial=PV112,site=Demo Status.Cellular=-17i 1742420358

I could not reproduce. Adding additional lines with my own data and mixing strings with longs/ints caused the issue. Which it sounds like you have strings for some of the _fields in your bucket.

In the cloud version do you have mixed types for the _field as well? I'm tempted to close this issue as it appears you have found a work around with the flux query and we will be removing the panics in an upcoming release. I've also found that just calling |> group()resolves the issue as well.

ttftw · 2025-03-27T20:31:46Z

Yes, in the cloud version the data mix is the same. The data that's coming in is going to both cloud and OSS versions, so it's basically a duplicate dataset.

I have found a workaround, yes, but this doesn't feel like a confidence-inspiring solution to continue to use Influxdb at the cloud level or OSS. It seems from this thread that this is a bug, right? If so, shouldn't this be a valid issue that gets resolved and not closed without a fix? This kind of data and query seems like a typical use that should work without an error. Or is this not being fixed because 2.0 is being deprecated?

I'm trying to drop in an Influxdb OSS datasource in Grafana to test some queries against existing dashboards that already work with Influxdb cloud, but when I swap the datasource to Influxdb OSS, it throws up these errors. Even if it's not panicking, it's still going to error and not return the data and this is something we will have to work around/work against in the future and I'll have to edit dozens of queries across dozens of dashboards just to get the dashboards that already work to work around this bug.

devanbenz · 2025-03-28T12:46:53Z

Yes, in the cloud version the data mix is the same. The data that's coming in is going to both cloud and OSS versions, so it's basically a duplicate dataset.

I have found a workaround, yes, but this doesn't feel like a confidence-inspiring solution to continue to use Influxdb at the cloud level or OSS. It seems from this thread that this is a bug, right? If so, shouldn't this be a valid issue that gets resolved and not closed without a fix? This kind of data and query seems like a typical use that should work without an error. Or is this not being fixed because 2.0 is being deprecated?

I'm trying to drop in an Influxdb OSS datasource in Grafana to test some queries against existing dashboards that already work with Influxdb cloud, but when I swap the datasource to Influxdb OSS, it throws up these errors. Even if it's not panicking, it's still going to error and not return the data and this is something we will have to work around/work against in the future and I'll have to edit dozens of queries across dozens of dashboards just to get the dashboards that already work to work around this bug.

I'm going to take a deeper dive in to why it looks like the filter is pruning data in C2 whereas it's not in OSS for that specific query. I was able to set up a full reproduction case using grafana + OSS running locally + cloud 2 running locally. I'll update this ticket with more information as I work on it. Please standby.

ttftw · 2025-03-28T14:09:50Z

Thank you. I appreciate the help.

devanbenz · 2025-03-31T14:50:21Z

After taking a deeper dive in to this it appears to be a difference between how cursors work under the hood in our cloud2 and OSS 2.x codebase with Flux. To resolve this would require a larger change to Flux and OSS 2.x which unfortunately with Flux being in maintenance mode we are no longer working towards extending support that would require such a large change.

davidby-influx assigned devanbenz Mar 18, 2025

davidby-influx added kind/bug area/storage area/flux Issues related to the Flux query engine team/edge labels Mar 18, 2025

devanbenz added a commit that referenced this issue Mar 19, 2025

fix: Do not panic when using an unexpected cursor type

336f728

This PR is used to alleviate the erroneous panic we are seeing corresponding with #26142. There should not be a panic and instead we should be throwing an error.

devanbenz mentioned this issue Mar 19, 2025

fix: Do not panic when using an unexpected cursor type #26165

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

panic: unsupported for aggregate min: *reads.stringMultiShardArrayCursor #26142

panic: unsupported for aggregate min: *reads.stringMultiShardArrayCursor #26142

ttftw commented Mar 14, 2025

devanbenz commented Mar 19, 2025

devanbenz commented Mar 19, 2025

ttftw commented Mar 19, 2025

devanbenz commented Mar 20, 2025

ttftw commented Mar 20, 2025

devanbenz commented Mar 25, 2025 •

edited

Loading

devanbenz commented Mar 25, 2025 •

edited

Loading

ttftw commented Mar 26, 2025 •

edited

Loading

devanbenz commented Mar 26, 2025 •

edited

Loading

ttftw commented Mar 27, 2025

devanbenz commented Mar 27, 2025

ttftw commented Mar 27, 2025

devanbenz commented Mar 28, 2025

ttftw commented Mar 28, 2025

devanbenz commented Mar 31, 2025

panic: unsupported for aggregate min: *reads.stringMultiShardArrayCursor #26142

panic: unsupported for aggregate min: *reads.stringMultiShardArrayCursor #26142

Comments

ttftw commented Mar 14, 2025

devanbenz commented Mar 19, 2025

devanbenz commented Mar 19, 2025

ttftw commented Mar 19, 2025

devanbenz commented Mar 20, 2025

ttftw commented Mar 20, 2025

devanbenz commented Mar 25, 2025 • edited Loading

devanbenz commented Mar 25, 2025 • edited Loading

ttftw commented Mar 26, 2025 • edited Loading

devanbenz commented Mar 26, 2025 • edited Loading

ttftw commented Mar 27, 2025

devanbenz commented Mar 27, 2025

ttftw commented Mar 27, 2025

devanbenz commented Mar 28, 2025

ttftw commented Mar 28, 2025

devanbenz commented Mar 31, 2025

devanbenz commented Mar 25, 2025 •

edited

Loading

devanbenz commented Mar 25, 2025 •

edited

Loading

ttftw commented Mar 26, 2025 •

edited

Loading

devanbenz commented Mar 26, 2025 •

edited

Loading