-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor/data pipe spec filter constants #2189
Refactor/data pipe spec filter constants #2189
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, and the tests are all passing (yarn workspace shared test
and yarn workspace scripts test
).
See minor clarification comment inline.
Happy to merge after #2187 has been merged (see latest comment on that PR)
private generateEvaluatorConstants(condition: string, row: Record<string, any>) { | ||
const constants: Record<string, any> = {}; | ||
for (const [key, value] of Object.entries(row)) { | ||
if (condition.includes(key)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could lead to some key-value pairs being erroneously included, for example if the condition refers to a "name" column and the row data includes a "first_name" column. Seeing as the purpose of this step is efficiency/reducing complexity of the later JS evaluation, I don't think that's a problem, as it would still be an improvement in such cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in that case if the condition were name==='Ada'
then condition.includes('first_name')
would still return false and not be included.
Although yes the opposite case would be true, first_name==='Ada'
and condition.includes('name')
.
Given that the variable can be used within any valid javascript I think the regex gets quite tricky to pin down perfectly
It would need regex that allows any valid javascript syntax/operators either side of the condition text but not anything that could be included as part of a variable name - I'm assuming these are actually distinct sets given variable naming restrictions, however I'm less confident writing the regex to do so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple sources point in various directions to do so
https://mathiasbynens.be/demo/javascript-identifier-regex
https://regex101.com/r/7fjduD/1
However given the likely complexity I think I'd be much more in favor of accepting the limitation, and if desired exposing a config option to operations like filter
that allow the user to explicitly pass names of columns to include for advanced use
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested this by updating debug_data_list to contain the issue flagged in #2193. I can confirm that this sheet is broken on master
but works on the PR branch.
PR Checklist
Description
Limits the amount of data passed when dynamically evaluating data_pipe filter operations, so that the JSEvaluator is only passed row key-value pairs where the key explicitly appears on the condition string. E.g. for condition
status==='draft'
it will only pass data from thestatus
column of the row (previously would pass full row data)Dev Notes
Builds on top of #2187 so should be merged after
Additional test added to shared, can run via
yarn workspace shared test
Review Notes
I do not think this should cause any breaking changes as the filter operation only shared information available to the row, so accessing any columns must be done directly (e.g.
status==='draft'
) instead of by some dynamic meanscolumn_a_value[column_b_value]
will still have both column_a and column_b data and therefore anything derived from them.This only impacts data_pipe filter. We also use the JSEvaluator in the frontend
AppDataVariableService
which is used to evaluate row conditions at runtime, although if we wanted to try and implement in a similar way there might not be as much optimisation to be gained - e.g.@local.{@field.nameField}
would need all local and field variables, just would omit global. I.e. there is a difference trying to reduce a single set of row variables compared to trying to reduce variable namespaces (field, local, global). Could be a follow-up if useful to have.In either case it would be good to test whether #2187 fixes all known issues, if not recording the exceptions and then checking if this PR fixes. If this does provide additional fixes it would be good to also check frontend code using a condition statement on one of the known erroring rows to see if similar logic needs to be applied in the frontend or not.
Git Issues
Closes #
Screenshots/Videos
If useful, provide screenshot or capture to highlight main changes