Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extremely Limited Support for GROUPBY Function #4283

Merged
merged 4 commits into from
Jan 9, 2025

Conversation

oleibman
Copy link
Collaborator

This is a partial response to issue #4282. The actual logic to implement GROUPBY is probably very complicated. And, even worse, Excel has thrown a whole new way of (internally) specifying one of the arguments into the mix. That argument is a function name, expressed not as a mapped integer (as SUBTOTAL does), nor even as a string, but as the unquoted function name prefixed by _xleta.. And, unlike its _xlfn. and _xlws. predecessors, it is difficult to figure out when the new prefix needs to be added, and when it needs to be ignored. I am not even going to attempt that task with this ticket.

So, what does this change do? Like earlier attempts to introduce limited functionality (such as with form controls), it is there so that using GROUPBY can be passed through - you can load a spreadsheet that contains it, and save it to a new spreadsheet, and the function and its results are preserved. Some cautionary notes. Dynamic arrays must be enabled (the function makes no sense without doing that). Changing any of the inputs used in the function may result in internal inconsistencies between PhpSpreadsheet and Excel; this is especially so if the dimensions of the returned array change as a result of changes to the input data. The programmer can avoid some of these problems by changing the formulatAttributes of the cell where the function is used; this may be difficult to do in practice. Oh, yes, using the GROUPBY cell as an argument in another formula will probably lead to problems. Finally, I confess that part of this solution looks awfully kludgey to me.

With its limitations and those cautions, is it worth proceeding with this change? My gut feel is that it is more useful to proceed than not. However, I will give others the opportunity to weigh in. I will wait at least a couple of weeks into the new year before proceeding with this.

This is:

  • a bugfix
  • a new feature
  • refactoring
  • additional unit tests

Checklist:

  • Changes are covered by unit tests
    • Changes are covered by existing unit tests
    • New unit tests have been added
  • Code style is respected
  • Commit message explains why the change is made (see https://github.com/erlang/otp/wiki/Writing-good-commit-messages)
  • CHANGELOG.md contains a short summary of the change and a link to the pull request if applicable
  • Documentation is updated as necessary

Why this change is needed?

Provide an explanation of why this change is needed, with links to any Issues (if appropriate).
If this is a bugfix or a new feature, and there are no existing Issues, then please also create an issue that will make it easier to track progress with this PR.

This is a partial response to issue PHPOffice#4282. The actual logic to implement GROUPBY is probably very complicated. And, even worse, Excel has thrown a whole new way of (internally) specifying one of the arguments into the mix. That argument is a function name, expressed not as a mapped integer (as SUBTOTAL does), nor even as a string, but as the unquoted function name prefixed by `_xleta.`. And, unlike its `_xlfn.` and `_xlws.` predecessors, it is difficult to figure out when the new prefix needs to be added, and when it needs to be ignored. I am not even going to attempt that task with this ticket.

So, what does this change do? Like earlier attempts to introduce limited functionality (such as with form controls), it is there so that using GROUPBY can be passed through - you can load a spreadsheet that contains it, and save it to a new spreadsheet, and the function and its results are preserved. Some cautionary notes. Dynamic arrays must be enabled (the function makes no sense without doing that). Changing any of the inputs used in the function may result in internal inconsistencies between PhpSpreadsheet and Excel; this is especially so if the dimensions of the returned array change as a result of changes to the input data. The programmer can avoid some of these problems by changing the formulatAttributes of the cell where the function is used; this may be difficult to do in practice. Oh, yes, using the GROUPBY cell as an argument in another formula will probably lead to problems. Finally, I confess that part of this solution looks awfully kludgey to me.

With its limitations and those cautions, is it worth proceeding with this change? My gut feel is that it is more useful to proceed than not. However, I will give others the opportunity to weigh in. I will wait at least a couple of weeks into the new year before proceeding with this.
@oleibman
Copy link
Collaborator Author

oleibman commented Dec 19, 2024

Actually, adding a GROUPBY function to a worksheet in PhpSpreadsheet isn't quite as worthless as I had thought. It takes a bit of work - you have to manually add the formula attributes (t must be array, but ref can be omitted), and manually add the xleta prefix. So, for example:

        $sheet->getCell('E3')->setValue('=GROUPBY(B2:B32, C2:C32, _xleta.SUM)');
        $sheet->getCell('E3')->setFormulaAttributes(['t' => 'array']);

PhpSpreadsheet will not be able to evaluate the cell, but, if you save the spreadsheet where you've done it, Excel will evaluate it correctly on open.

@oleibman oleibman added this pull request to the merge queue Jan 9, 2025
Merged via the queue into PHPOffice:master with commit 51b1d1c Jan 9, 2025
14 checks passed
@oleibman oleibman deleted the groupby branch January 9, 2025 21:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

1 participant