You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You may have followed the conversation around pull request #1351, which put us on a path of using the newest versions of three key packages: numpy, pandas, and bokeh. Before #1351 was merged, the versions of those packages were fixed; now they are open-ended so that we can benefit from bug fixes, efficiency improvements, and new features in those packages. Those are all benefits, but the discussion around #1351 recognized there might be costs associated with this new path forward. For example, a change in one of those packages might somehow break Tax-Calculator. We decided to try the new path forward and merged #1351, saying that we would monitor the costs and benefits going forward.
None of us had any idea that it would take only a day or so for us to experience a package upgrade that affected Tax-Calculator. This issue provides a description of what happened in the package upgrade and what the benefits and costs of that upgrade seem to be.
The pandas project announced on May 5, 2017, a major new release 0.20.0/0.20.1, which is described in detail in these pandas release notes. I became aware of this new pandas release yesterday when I submitted pull request #1356 (containing non-substantive coding-style changes to eliminate pylint warnings) and it failed the GitHub check-in pytest suite of tests. A little investigation revealed that three of the three hundred some tests failed because they delved into the internals of the pandas GroupBy object and those internals have been changed in the new pandas release. So, the focus of pull request #1356 quickly changed to understanding how those three tests needed to be rewritten in order to work with pandas 0.20.1. After a few false-starts, I think I have done that in the merged pull request #1356 (which also includes a number of non-substantive changes to test_utils.py code in order to reduce the number of pylint warnings).
As far as I can tell, those three tests were the only parts of Tax-Calculator that were adversely affected by the new pandas release. The cost of rewriting the three tests was not large.
The benefits from the pandas upgrade (which uses a more recent version of numpy and numba) are considerable. There seems to be a substantial reduction in the time required to execute the suite of pytest tests. For example, on my Mac I typically run 303 of the 305 tests (skipping the two notebook tests) and the execution time has been roughly four minutes or 240 seconds. Now, after upgrading to macOS Sierra (10.12.4), installing the newest Anaconda 4.3.1 distribution of Python 2.7, and upgrading to pandas 0.20.1, the same 303 tests execute in about 130 seconds. This is almost a doubling of the speed of Tax-Calculator execution. It will be interesting to see if there is a similar speed-up on the AWS instances that run Tax-Calculator for TaxBrain.
Both @andersonfrailey and I experienced the same thing with our pull requests #1357 and #1356: all the tests passed on our computer but they failed on GitHub, where the computers running the tests are using the newest packages. So, if you want to avoid this kind of unpleasant pull-request surprise and want Tax-Calculator to run noticeably faster on your computer, you should upgrade your Anaconda distribution and then upgrade to pandas 0.20.1.
There are detailed notes as part of #1351 that describe the steps I took to upgrade. Late yesterday I did this to get the pandas 0.20.1 release:
conda update pandas
This pandas update included updates for several other packages, include an update to numpy 1.12.1.
I also did conda update bokeh and that moved me from 0.12.4 to 0.12.5, but that seemed to have no effect on the operation of Tax-Calculator.
After pull request #1361 was merged into the master branch, the following command is an easy way to update your taxcalc-dev environment (for the reasons discussed in #1358):
You may have followed the conversation around pull request #1351, which put us on a path of using the newest versions of three key packages: numpy, pandas, and bokeh. Before #1351 was merged, the versions of those packages were fixed; now they are open-ended so that we can benefit from bug fixes, efficiency improvements, and new features in those packages. Those are all benefits, but the discussion around #1351 recognized there might be costs associated with this new path forward. For example, a change in one of those packages might somehow break Tax-Calculator. We decided to try the new path forward and merged #1351, saying that we would monitor the costs and benefits going forward.
None of us had any idea that it would take only a day or so for us to experience a package upgrade that affected Tax-Calculator. This issue provides a description of what happened in the package upgrade and what the benefits and costs of that upgrade seem to be.
The pandas project announced on May 5, 2017, a major new release 0.20.0/0.20.1, which is described in detail in these pandas release notes. I became aware of this new pandas release yesterday when I submitted pull request #1356 (containing non-substantive coding-style changes to eliminate pylint warnings) and it failed the GitHub check-in pytest suite of tests. A little investigation revealed that three of the three hundred some tests failed because they delved into the internals of the pandas GroupBy object and those internals have been changed in the new pandas release. So, the focus of pull request #1356 quickly changed to understanding how those three tests needed to be rewritten in order to work with pandas 0.20.1. After a few false-starts, I think I have done that in the merged pull request #1356 (which also includes a number of non-substantive changes to
test_utils.py
code in order to reduce the number of pylint warnings).As far as I can tell, those three tests were the only parts of Tax-Calculator that were adversely affected by the new pandas release. The cost of rewriting the three tests was not large.
The benefits from the pandas upgrade (which uses a more recent version of numpy and numba) are considerable. There seems to be a substantial reduction in the time required to execute the suite of pytest tests. For example, on my Mac I typically run 303 of the 305 tests (skipping the two notebook tests) and the execution time has been roughly four minutes or 240 seconds. Now, after upgrading to macOS Sierra (10.12.4), installing the newest Anaconda 4.3.1 distribution of Python 2.7, and upgrading to pandas 0.20.1, the same 303 tests execute in about 130 seconds. This is almost a doubling of the speed of Tax-Calculator execution. It will be interesting to see if there is a similar speed-up on the AWS instances that run Tax-Calculator for TaxBrain.
Both @andersonfrailey and I experienced the same thing with our pull requests #1357 and #1356: all the tests passed on our computer but they failed on GitHub, where the computers running the tests are using the newest packages. So, if you want to avoid this kind of unpleasant pull-request surprise and want Tax-Calculator to run noticeably faster on your computer, you should upgrade your Anaconda distribution and then upgrade to pandas 0.20.1.
There are detailed notes as part of #1351 that describe the steps I took to upgrade. Late yesterday I did this to get the pandas 0.20.1 release:
This pandas update included updates for several other packages, include an update to numpy 1.12.1.
I also did
conda update bokeh
and that moved me from 0.12.4 to 0.12.5, but that seemed to have no effect on the operation of Tax-Calculator.@MattHJensen @feenberg @Amy-Xu @GoFroggyRun @codykallen @zrisher
@PeterDSteinberg @brittainhard @talumbau
The text was updated successfully, but these errors were encountered: