Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Notice to all Tax-Calculator users/developers about Anaconda package updates #1358

Closed
martinholmer opened this issue May 10, 2017 · 1 comment

Comments

@martinholmer
Copy link
Collaborator

You may have followed the conversation around pull request #1351, which put us on a path of using the newest versions of three key packages: numpy, pandas, and bokeh. Before #1351 was merged, the versions of those packages were fixed; now they are open-ended so that we can benefit from bug fixes, efficiency improvements, and new features in those packages. Those are all benefits, but the discussion around #1351 recognized there might be costs associated with this new path forward. For example, a change in one of those packages might somehow break Tax-Calculator. We decided to try the new path forward and merged #1351, saying that we would monitor the costs and benefits going forward.

None of us had any idea that it would take only a day or so for us to experience a package upgrade that affected Tax-Calculator. This issue provides a description of what happened in the package upgrade and what the benefits and costs of that upgrade seem to be.

The pandas project announced on May 5, 2017, a major new release 0.20.0/0.20.1, which is described in detail in these pandas release notes. I became aware of this new pandas release yesterday when I submitted pull request #1356 (containing non-substantive coding-style changes to eliminate pylint warnings) and it failed the GitHub check-in pytest suite of tests. A little investigation revealed that three of the three hundred some tests failed because they delved into the internals of the pandas GroupBy object and those internals have been changed in the new pandas release. So, the focus of pull request #1356 quickly changed to understanding how those three tests needed to be rewritten in order to work with pandas 0.20.1. After a few false-starts, I think I have done that in the merged pull request #1356 (which also includes a number of non-substantive changes to test_utils.py code in order to reduce the number of pylint warnings).

As far as I can tell, those three tests were the only parts of Tax-Calculator that were adversely affected by the new pandas release. The cost of rewriting the three tests was not large.

The benefits from the pandas upgrade (which uses a more recent version of numpy and numba) are considerable. There seems to be a substantial reduction in the time required to execute the suite of pytest tests. For example, on my Mac I typically run 303 of the 305 tests (skipping the two notebook tests) and the execution time has been roughly four minutes or 240 seconds. Now, after upgrading to macOS Sierra (10.12.4), installing the newest Anaconda 4.3.1 distribution of Python 2.7, and upgrading to pandas 0.20.1, the same 303 tests execute in about 130 seconds. This is almost a doubling of the speed of Tax-Calculator execution. It will be interesting to see if there is a similar speed-up on the AWS instances that run Tax-Calculator for TaxBrain.

Both @andersonfrailey and I experienced the same thing with our pull requests #1357 and #1356: all the tests passed on our computer but they failed on GitHub, where the computers running the tests are using the newest packages. So, if you want to avoid this kind of unpleasant pull-request surprise and want Tax-Calculator to run noticeably faster on your computer, you should upgrade your Anaconda distribution and then upgrade to pandas 0.20.1.

There are detailed notes as part of #1351 that describe the steps I took to upgrade. Late yesterday I did this to get the pandas 0.20.1 release:

conda update pandas

This pandas update included updates for several other packages, include an update to numpy 1.12.1.

I also did conda update bokeh and that moved me from 0.12.4 to 0.12.5, but that seemed to have no effect on the operation of Tax-Calculator.

@MattHJensen @feenberg @Amy-Xu @GoFroggyRun @codykallen @zrisher
@PeterDSteinberg @brittainhard @talumbau

@martinholmer
Copy link
Collaborator Author

After pull request #1361 was merged into the master branch, the following command is an easy way to update your taxcalc-dev environment (for the reasons discussed in #1358):

conda env update

@MattHJensen @feenberg @Amy-Xu @andersonfrailey @GoFroggyRun @codykallen @zrisher
@PeterDSteinberg @brittainhard

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant