Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor/data pipe spec filter constants #2189

Merged
merged 3 commits into from
Feb 20, 2024

Conversation

chrismclarke
Copy link
Member

@chrismclarke chrismclarke commented Jan 31, 2024

PR Checklist

  • PR title descriptive (can be used in release notes)

Description

Limits the amount of data passed when dynamically evaluating data_pipe filter operations, so that the JSEvaluator is only passed row key-value pairs where the key explicitly appears on the condition string. E.g. for condition status==='draft' it will only pass data from the status column of the row (previously would pass full row data)

Dev Notes

Builds on top of #2187 so should be merged after
Additional test added to shared, can run via yarn workspace shared test

Review Notes

I do not think this should cause any breaking changes as the filter operation only shared information available to the row, so accessing any columns must be done directly (e.g. status==='draft') instead of by some dynamic means column_a_value[column_b_value] will still have both column_a and column_b data and therefore anything derived from them.

This only impacts data_pipe filter. We also use the JSEvaluator in the frontend AppDataVariableService which is used to evaluate row conditions at runtime, although if we wanted to try and implement in a similar way there might not be as much optimisation to be gained - e.g. @local.{@field.nameField} would need all local and field variables, just would omit global. I.e. there is a difference trying to reduce a single set of row variables compared to trying to reduce variable namespaces (field, local, global). Could be a follow-up if useful to have.

In either case it would be good to test whether #2187 fixes all known issues, if not recording the exceptions and then checking if this PR fixes. If this does provide additional fixes it would be good to also check frontend code using a condition statement on one of the known erroring rows to see if similar logic needs to be applied in the frontend or not.

Git Issues

Closes #

Screenshots/Videos

If useful, provide screenshot or capture to highlight main changes

Copy link
Collaborator

@jfmcquade jfmcquade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, and the tests are all passing (yarn workspace shared test and yarn workspace scripts test).

See minor clarification comment inline.

Happy to merge after #2187 has been merged (see latest comment on that PR)

private generateEvaluatorConstants(condition: string, row: Record<string, any>) {
const constants: Record<string, any> = {};
for (const [key, value] of Object.entries(row)) {
if (condition.includes(key)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could lead to some key-value pairs being erroneously included, for example if the condition refers to a "name" column and the row data includes a "first_name" column. Seeing as the purpose of this step is efficiency/reducing complexity of the later JS evaluation, I don't think that's a problem, as it would still be an improvement in such cases.

Copy link
Member Author

@chrismclarke chrismclarke Jan 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in that case if the condition were name==='Ada' then condition.includes('first_name') would still return false and not be included.

Although yes the opposite case would be true, first_name==='Ada' and condition.includes('name').
Given that the variable can be used within any valid javascript I think the regex gets quite tricky to pin down perfectly

It would need regex that allows any valid javascript syntax/operators either side of the condition text but not anything that could be included as part of a variable name - I'm assuming these are actually distinct sets given variable naming restrictions, however I'm less confident writing the regex to do so

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple sources point in various directions to do so
https://mathiasbynens.be/demo/javascript-identifier-regex
https://regex101.com/r/7fjduD/1

However given the likely complexity I think I'd be much more in favor of accepting the limitation, and if desired exposing a config option to operations like filter that allow the user to explicitly pass names of columns to include for advanced use

Copy link
Collaborator

@esmeetewinkel esmeetewinkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this by updating debug_data_list to contain the issue flagged in #2193. I can confirm that this sheet is broken on master but works on the PR branch.

Error when syncing debug content on master:
image

No error when syncing debug content on PR branch:
image

@esmeetewinkel esmeetewinkel merged commit c4b0b14 into master Feb 20, 2024
8 checks passed
@esmeetewinkel esmeetewinkel deleted the refactor/data-pipe-spec-filter-constants branch February 20, 2024 12:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants