Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New package version requirements #1351

Merged
merged 11 commits into from
May 9, 2017
Merged

New package version requirements #1351

merged 11 commits into from
May 9, 2017

Conversation

martinholmer
Copy link
Collaborator

@martinholmer martinholmer commented May 6, 2017

This pull request started out as an experiment to see how the tests run on GitHub when using less restrictive package versions in the environment.yml and conda.recipe/meta.yaml files.

See subsequent comments in the #1351 conversion for details on how this pull request has evolved.

@martinholmer martinholmer changed the title Experiment with new package version requirements [WIP] Experiment with new package version requirements May 6, 2017
@codecov-io
Copy link

codecov-io commented May 6, 2017

Codecov Report

Merging #1351 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1351      +/-   ##
==========================================
+ Coverage   99.64%   99.64%   +<.01%     
==========================================
  Files          38       38              
  Lines        2805     2812       +7     
==========================================
+ Hits         2795     2802       +7     
  Misses         10       10
Impacted Files Coverage Δ
taxcalc/utils.py 99.31% <100%> (ø) ⬆️
taxcalc/behavior.py 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b7a2bd5...d94ced8. Read the comment docs.

@talumbau
Copy link
Member

talumbau commented May 6, 2017

I see this PR as less of an "experiment" and more of a philosophical decision on how to move forward with the requirements for Tax-Calculator. This PR positions Tax-Calculator to be "forward compatible" with numpy and pandas. That is, by specifying a version that is greater than or equal to some static version, we can take advantage of new versions of these packages, and only insist that the environment have versions that are no older than some set version. This is a very reasonable path, since we can take advantage of bug fixes and performance improvements in those packages.

However, it is not without danger. The danger is that a new version of numpy or pandas will exhibit behavior that breaks the current Tax-Calculator. We have already seen this situation in the past. In issue #381, we had a situation where Tax-Calculator code worked with numpy 1.9, but not numpy 1.10, because there was a behavior change in numpy.

I don't think there is one "right" answer here. If we make the conda recipe such that Tax-Calculator packages are "forward compatible" with numpy and pandas, a user's environment will likely be modified less than if we insist on some set version. However, we increase the risk that, over time, the Tax-Calculator code will actually be incompatible with some future version of numpy or pandas (or some other package) and the package will still successfully install and run. This can lead to issues like #381 or perhaps more nefarious situations where the code does not give an error but executes incorrectly (an admittedly unlikely scenario).

@martinholmer
Copy link
Collaborator Author

Pull request #1351 has grown out of an assignment given to me several months ago by @MattHJensen. Matt asked me to develop user (not developer) documentation and a way to use Tax-Calculator that requires no Python programming knowledge. I told him I could write the HTML documentation, but that others with more web programming experience would have provide a CSS if he wanted the user documentation to look fancy.

When giving me that assignment, Matt recommended a book to guide my effort: Karl Fogel, Producing Open Source Software: How to Run a Successful Free Software Project, Second Edition.

Even though I have read only through page 36 of Fogel's book, it is clearly an excellent document with many useful observations. Before explaining the thinking behind pull request #1351, I want to quote extensively from page 11 of Fogel's book:

Chapter 2. Getting Started

Starting a free software project is a twofold task. The software needs to acquire users, and to acquire developers. These two needs are not necessarily in conflict, but the interaction between them adds some complexity to a project's initial presentation. Some information is useful for both audiences, some is useful only for one or the other. Both kinds of information should subscribe to the principle of scaled presentation: the degree of detail presented at each stage should correspond to the amount of time and effort put in by the reader at that stage. More effort should always result in more reward. When effort and reward do not correlate reliably, most people will lose faith and stop investing effort.

The corollary to this is that appearances matter. Programmers, in particular, often don't like to believe this. Their love of substance over form is almost a point of professional pride. It's no accident that so many programmers exhibit an antipathy for marketing and public relations work, nor that professional graphic designers are often horrified at the designs programmers come up with on their own.

This is a pity, because there are situations where form is substance, and project presentation is one of them. For example, the very first thing a visitor learns about a project is what its home page looks like. This information is absorbed before any of the actual content on the site is comprehended — before any of the text has been read or links clicked on. However unjust it may be, people cannot stop themselves from forming an immediate first impression. The site's appearance signals whether care was taken in organizing the project's presentation. Humans have extremely sensitive antennae for detecting the investment of care. Most of us can tell in one glance whether a home page was thrown together quickly or was given serious thought. This is the first piece of information your project puts out, and the impression it creates will carry over to the rest of the project by association.

Thus, while much of this chapter talks about the content your project should start out with, remember that its look and feel matter too. Because the project web site has to work for two different types of visitors — users and developers — special attention must be paid to clarity and directedness. Although this is not the place for a general treatise on web design, one principle is important enough to deserve mention, particularly when the site serves multiple (if overlapping) audiences: people should have a rough idea where a link goes before clicking on it. For example, it should be obvious from looking at the links to user documentation that they lead to user documentation, and not to, say, developer documentation. Running a project is partly about supplying information, but it's also about supplying comfort. The mere presence of certain standard offerings, in expected places, reassures users and developers who are deciding whether they want to get involved. It says that this project has its act together, has anticipated the questions people will ask, and has made an effort to answer them in a way that requires minimal exertion on the part of the asker. By giving off this aura of preparedness, the project sends out a message: "Your time will not be wasted if you get involved," which is exactly what people need to hear.

One of my take-aways from this passage is that the getting-started experience for new users should be as simple as possible and, as Fogel says, supply comfort and reassure users.

So, in order to get an idea about what the new-user experience would be like when following the installation directions, I created a new account on my Mac that had no Anaconda or Tax-Calculator and followed the installation directions.

One thing I found is that Continuum Analytics offers Anaconda distributions for Python 2.7 and for Python 3.6, but there is no taxcalc package for Python 3.6 (see issue #1339 for more details).

So I followed the instruction at the Continuum Analytics download page and I had the Anaconda2 (version 4.3.1) Python 2.7 distribution installed in no time as all. Everything worked smoothly.

Then I went to the next step and ran the conda install -c ospc taxcalc command. Here is what I got (which, in my opinion, does not supply comfort and reassure new users):

iMac2:~ tcuser$ conda install -c ospc taxcalc
Fetching package metadata ...........
Solving package specifications: .

Package plan for installation in environment /Users/tcuser/anaconda:

The following NEW packages will be INSTALLED:
  Do
    mock:         2.0.0-py27_0           
    pbr:          1.10.0-py27_0          
    taxcalc:      0.8.2-np111py27_0  ospc

The following packages will be UPDATED:

    anaconda:     4.3.1-np111py27_0       --> custom-py27_0     
    conda:        4.3.14-py27_0           --> 4.3.17-py27_0     

The following packages will be DOWNGRADED due to dependency conflicts:

    mkl:          2017.0.1-0              --> 11.3.3-0          
    numexpr:      2.6.1-np111py27_2       --> 2.6.1-np111py27_1 
    numpy:        1.11.3-py27_0           --> 1.11.2-py27_0     
    pandas:       0.19.2-np111py27_1      --> 0.18.0-np111py27_0
    scikit-learn: 0.18.1-np111py27_1      --> 0.18.1-np111py27_0
    scipy:        0.18.1-np111py27_1      --> 0.18.1-np111py27_0

The first thought new users will probably have is "Why doesn't Tax-Calculator work with the version of Anaconda being offered by Continuum Analytics?" or "Why do I have to downgrade my Anaconda distribution in order to use Tax-Calculator?" This is not a good initial impression to leave with potential new users.

The goal of this pull request is to eliminate this downgrading experience.

@feenberg @PeterDSteinberg @brittainhard @talumbau
@Amy-Xu @andersonfrailey @GoFroggyRun @codykallen @zrisher

@martinholmer
Copy link
Collaborator Author

Details of what was done in the development of pull request #1351

I followed Continuum Analytics instructions about how to uninstall my old Anaconda2 (version 2.3.0) and then installed Anaconda2 (version 4.3.1). I did this because I could not figure out how to use the conda update anaconda command (outside of any conda environment) to get from version 2.3.0 to the newest version 4.3.1.

Then I edited the environment.yml and conda.recipe/meta.yaml files as shown in this pull request #1351. The unit tests all passed on my computer after doing conda install mock and they all passed when I first submitted #1351 to GitHub (which showed that all the unit tests passed on Linux for Python 2.7, 3.4 and 3.5 and on Windows for Python 3.5).

However, I noticed two sets of warnings that were printed to the screen that I had never seen before: one set of warnings occurred when using the --graphs option of tc, the command-line interface to Tax-Calculator, and the other set of warnings occurred when executing the one reform in the python reform_results.py test that involves Behavior. These are just warning messages about floating-point division and Tax-Calculator results are unchanged from before the upgrade from Anaconda2 2.3.0 to Anaconda2 4.3.1.

It turns out that both sets of warnings are caused by improvements in numpy warning logic that are in numpy 1.11.3, but not in numpy 1.11.2. This pull request fixes the utils.py and behavior.py code so that there warnings are not generated. Doing this forced me to learn more about the operation of the numpy.where method and how to avoid some of its shortcomings. The changes I've made leave the code more streamlined and (probably) more efficient and I've learned more about how to use the numpy package efficiently. I consider this to have been a positive experience.

I also learned the the conda-build package is not part of the core Anaconda distribution, so I added code to the taxcalc/cli/install_local_taxcalc_package bash script to install that package if it has not already been installed. This is another small improvement from going through the Anaconda upgrade process.

And finally, my new appreciation for the numpy.seterr method has led me to strengthen the unit tests by converting all numpy warnings into errors so they can be detected in the tests. This is done in taxcalc/tests/config.py so this conversation happens only for the pytest unit tests.

I have experienced just one flaw with Anaconda2 version 4.3.1: the pylint utility does not work correctly without an additional conda install because the conda recipe for pylint is incorrect as described here. As mentioned in that Continuum issue, the work-around is as follows:

conda install --channel conda-forge backports.functools_lru_cache

@MattHJensen @feenberg @PeterDSteinberg @brittainhard @talumbau
@Amy-Xu @andersonfrailey @GoFroggyRun @codykallen @zrisher

@martinholmer martinholmer changed the title [WIP] Experiment with new package version requirements New package version requirements May 8, 2017
@MattHJensen
Copy link
Contributor

I see the advantage of allowing most users to work with taxcalc without needing to create a conda environment and without downgrading their other packages, but I'm still worried about the question I asked over at #1350:

If we move to >= in environment.yml, will we ever be able to guarantee that the taxcalc works, given that the packages on which we depend can push updates anytime w/o triggering our tests?

I think my problem is ignorance, because other projects also specify dependencies this way. For example, a dependency for pandas is "NumPy: 1.7.1 or higher"

@martinholmer
Copy link
Collaborator Author

martinholmer commented May 9, 2017

@MattHJensen said about pull request #1351:

I see the advantage of allowing most users to work with taxcalc without needing to create a conda environment and without downgrading their other packages, but I'm still worried about the question I asked over at #1350:

If we move to >= in environment.yml, will we ever be able to guarantee that the taxcalc works, given that the packages on which we depend can push updates anytime w/o triggering our tests?

That's a good question to pose. Here is what I think the answer is.

There are two ways to guard against the problem Matt identified above.

  1. Have all (most) developers of Tax-Calculator use the newest version of Anaconda. I've upgraded to Anaconda2 version 4.3.1 and will be happy to check each month to see when a newer version is available. When that happens I can raise an issue so that everybody knows about the newer Anaconda version.

  2. Have the GitHub check-in tests always use the newest version of Anaconda. This seems to be happening already. When I look at the tests run on pull request Add content to Input Variables and Output Variables sections #1300 (three weeks ago), I see that on Linux python 2.7.13 was used to run the tests even though at the time I had python 2.7.12 installed on my computer. I'm not one hundred percent sure about this because I can't find the code that sets up the GitHub tests anywhere in the Tax-Calculator repository.

If we do these two things (especially the second thing), any new version of numpy, pandas, bokeh, or any other package, that breaks Tax-Calculator will immediately be exposed.

This is the way I understand the situation. Have I missed something?

@feenberg @Amy-Xu @andersonfrailey @GoFroggyRun @codykallen @zrisher
@PeterDSteinberg @brittainhard @talumbau

@MattHJensen
Copy link
Contributor

If we do these two things (especially the second thing), any new version of numpy, pandas, bokeh, or any other package, that breaks Tax-Calculator will immediately be exposed.

Thanks for the explanation. My takeaway is that a dependency's update might break taxcalc, but we'll catch it very quickly if it does. That seems like an ok state of things for now, and we can always reconsider if it causes too many problems.

@martinholmer
Copy link
Collaborator Author

@MattHJensen said about pull request #1351:

My takeaway is that a dependency's update might break taxcalc, but we'll catch it very quickly if it does. That seems like an ok state of things for now, and we can always reconsider if it causes too many problems.

Yes, I think that is a reasonable way to think about it.

Shall we proceed and merge pull request #1351?

@MattHJensen
Copy link
Contributor

Shall we proceed and merge pull request #1351?

+1

@martinholmer martinholmer merged commit 08f5b8f into PSLmodels:master May 9, 2017
@martinholmer martinholmer deleted the version-experiment branch May 9, 2017 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants