-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
panic: unsupported for aggregate min: *reads.stringMultiShardArrayCursor #26142
Comments
This PR is used to alleviate the erroneous panic we are seeing corresponding with #26142. There should not be a panic and instead we should be throwing an error.
@ttftw I've opened up a PR to remove the erroneous panic. That being said I'll also need to take a look at the query to see why its being transformed to use the improper cursor type. Thank you for the detailed bug report. Going to continue looking in to it today. |
Also if possible but would you be willing to send me over some line protocol for the schema you're using so I can write data to a local influxdb to help with debugging? I'll try my best to reproduce some mock data with the query you've provided in the mean time. |
@devanbenz Here is a sample of data that generated the error above: Sample,alarm=False,channel=Cellular,client=None,driver=Status,project=None,serial=PV112,site=Demo Status.Cellular=-17i 1742420358 |
@ttftw thank you - another question: How exactly are you running the flux query? Are you using the chronograf UI or running it some other way? |
I've tried and confirmed the error from Grafana dashboards/Explore and the Data Explorer tool in Influxdb |
I've been attempting to reproduce this issue locally without much luck. I assume that this data was line protocol you dumped from a TSM file?
Just ensuring I am reading it correctly. I assume that your tags set is the following:
Can you verify that I have tags right? I may request that you give me a clone of some mock data in the form of a TSM file you're seeing the issue on if possible. I would like to try and reproduce this issue myself. |
Okay so I was able to reproduce this issue. I basically wrote some points in line protocol using a script. I modified my points so that they were a string type for _value which is just the field value as outlined by our documentation here: https://docs.influxdata.com/influxdb/cloud/reference/key-concepts/data-elements/#fields. Obviously I only have 1 field being
I've gone ahead and added a PR for this case where instead of On your local instance where you're seeing the issue could you run your flux query without the aggregation?
And send over a screen grab from the I have a feeling that somehow the query is attempting to aggregate on string values instead of numerical values. |
Here are the columns that come back from that query. Without the aggregate window, it data comes back. With the aggregate window, I get the error. If I change the query to:
This targets the same rows and runs fine. For some reason changing the filter to This is the data schema that comes back when I use the _field query above with aggregate window: |
And what if you send the query just like this?
I wonder if there is a mix of _value types. I see the following when I write the two points:
and
When querying this using the window aggregate I see (with my fix to bubble up error instead of panic): Other than that I cannot repro with the data in line protocol you provided. It appears that adjusting your query works. I'll remove the panics and close the issue if we can't dig up anything more. |
This works, but there are a handful of tables that have a different data type for _value. Some _value's are strings and some are longs.
I might be misunderstanding, but your last screenshot seems to be reproducing the bug I see? If I change the query from channel == "cellular" to field == "status.cellular", these two queries return the exact same dataset that show the same data types, but I do not get the panic when aggregating and filtering by field name, while I do when filtering by the tag channel. This same panic happens when I filter other channel tag values, too. Something else I just noticed, if I run this query:
It works, but if I comment out Also, to reiterate from my original post, all of this works correctly with no panics or errors when I run these same queries on a duplicate set of this data in our Influxdb cloud database. |
Sorry - I meant with the single line:
I could not reproduce. Adding additional lines with my own data and mixing strings with longs/ints caused the issue. Which it sounds like you have strings for some of the _fields in your bucket. In the cloud version do you have mixed types for the |
Yes, in the cloud version the data mix is the same. The data that's coming in is going to both cloud and OSS versions, so it's basically a duplicate dataset. I have found a workaround, yes, but this doesn't feel like a confidence-inspiring solution to continue to use Influxdb at the cloud level or OSS. It seems from this thread that this is a bug, right? If so, shouldn't this be a valid issue that gets resolved and not closed without a fix? This kind of data and query seems like a typical use that should work without an error. Or is this not being fixed because 2.0 is being deprecated? I'm trying to drop in an Influxdb OSS datasource in Grafana to test some queries against existing dashboards that already work with Influxdb cloud, but when I swap the datasource to Influxdb OSS, it throws up these errors. Even if it's not panicking, it's still going to error and not return the data and this is something we will have to work around/work against in the future and I'll have to edit dozens of queries across dozens of dashboards just to get the dashboards that already work to work around this bug. |
I'm going to take a deeper dive in to why it looks like the filter is pruning data in C2 whereas it's not in OSS for that specific query. I was able to set up a full reproduction case using grafana + OSS running locally + cloud 2 running locally. I'll update this ticket with more information as I work on it. Please standby. |
Thank you. I appreciate the help. |
After taking a deeper dive in to this it appears to be a difference between how cursors work under the hood in our cloud2 and OSS 2.x codebase with Flux. To resolve this would require a larger change to Flux and OSS 2.x which unfortunately with Flux being in maintenance mode we are no longer working towards extending support that would require such a large change. |
I have asked for help in Slack and will follow up with customer support, but I thought I would also post this here.
I am currently using influxdb cloud as our primary data store. All data is coming in from an mqtt server. I recently spun up a local influxdb instance in a docker container and am sending a copy of the data to a this new local server for testing. The local server adds a few new tags, but besides that, everything else should be the same. Same bucket name, same data structure, etc. I'm doing this so I can just drop in this new server into existing Grafana dashboards and test some things before we push changes that go to the production server.
When I run the same queries from the cloud against the local server, I have some that cause influxdb to panic.
This runs fine on the cloud server, but when I run it locally, I get
and if I comment out the aggregateWindow, then it returns the data as I'd expect. One thing to note is when I run this query, the
_value
column data type is a double in the cloud and a long on the local instance, but besides that, the tables look identical when I comment out the aggregateWindow.Oddly enough, I just noticed that if I change the query from
|> filter(fn: (r) => r["channel"] == "Cellular")
to
|> filter(fn: (r) => r["_field"] == "Status.Cellular")
which targets the same data rows, but a different way, then I do not get this panic and the data returns as expected with the aggregateWindow.
Any insight as to why this query would be successful in the cloud version but panic in the OSS version will be much appreciated.
Logs:
The text was updated successfully, but these errors were encountered: