Skip to content

Proposal: Pin only minimum versions of pip and setuptools #2

Open
@blag

Description

@blag

TL;DR

If we use our tools properly, we might be able to relax their version requirements a bit, take advantage of their new features, and allow our transitive dependencies to take advantage of the new features as well, all without breaking the build all of the time.

Background

PR #4895

This came up in PR StackStorm/st2#4895.

In that PR, pip and setuptools were pinned to specific versions:

$(VIRTUALENV_ST2CLIENT_DIR)/bin/pip install --upgrade "pip==19.3.1"
$(VIRTUALENV_ST2CLIENT_DIR)/bin/pip install --upgrade "setuptools==41.0.1"

The setuptools version being pinned that tightly caused a bug when a transitive dependency (read: a dependency of a dependency) was updated to only support Python 3. This happened twice in recent history:

First case

  • python-mistralclient depended on
    • osc-lib >= 1.8.0 (resolved to 2.0.0) which depended on
      • openstacksdk >= 0.15.0 (resolved to 0.44.0) which depended on
        • futurist >= 2.1.0 which does not support for Python 2

Second case

As noted in this PR to importlib-metadata, all of these dependencies properly support the Requires-Python metadata field.

Example

It's easier to give an example than describe it in prose. This means that in a Python project's setup.py file, they can specify the requires_python keyword argument to setup(), like so:

from setuptools import setup

setup(
    ...
    requires_python='>=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*',
    ...
)

That means that pip version 9 or later, when installing packages in a Python 2.7 environment, will ignore any potential packages that do not specify they can run in Python 2.7.

Building ST2 in Travis

However, some of our "build" (eg: CI) commands involve installing packages, but do not involve pip. We run python setup.py develop and python setup.py install in a few different parts of the `Makefile:

All of these commands don't rely on pip to install packages, they rely on setuptools to do so itself. However, the problem comes in when...

The problem

setuptools < 42 doesn't support requires_python keyword argument

The setuptools project added support for requires_python in version 42.0.0.

This means that when we pin setuptools to a specific version, we are guaranteeing that, when transitive dependencies of ST2 start relying on features of more modern versions of our tools, we will miss out on those changes, and our transitive dependencies will break our build.

Problems with other solutions

There are some potential workarounds.

Pin transitive dependency versions

We could pin the versions of the futurist and configparser to versions that still support Python 2. This would ensure that both older versions of pip and setuptools do not break the Python 2.7 build.

However, this approach is not scalable. We would have to track layers upon layers of dependencies, even though we do not directly depend on or import those packages. Furthermore, this problem is only going to get worse and worse as more and more projects drop support for Python 2.7. Each one will cause a build failure for us, and we will have to troubleshoot, solve, test, review, merge, and test every time that happens. This is not a good use of anybody's time.

Use pip-compile (from pip-tools) to catch dependency conflicts

As implemented in StackStorm/st2#4895, we now use pip-compile to check for dependency version conflicts across our entire installation. The output of pip-compile is much more complete and explicit than the previous output of pipconflictchecker.

But this solution only checks that our dependencies don't conflict. Our setuptools commands will continue to try to install dependencies that are Python 3-only, even in our Python 2.7 tests.

The proposal

The only workable solution was to update setuptools to version 42.0.0 (or later). Now, both pip and setuptools respected project's requires_python field, and when run in Python 2.7 environments, only try to install packages that denote compatibility with that version of Python.

However, pinning setuptools, or pip as well, to a specific version just creates the same situation in the future. If there is some other feature that is introduced to pip or setuptools, and any of our transitive dependencies start using that feature before we can update our build to rely on up-to-date versions of pip and setuptools, those changes will again cause our builds to fail.

Instead of pinning pip and setuptools to specific versions during the build, I propose that we instead only specify their known minimum versions. Specifying pip >= 9.0and setuptools >= 42.0.0 would ensure that whatever version of pip is installed at least supports Requires-Python. As pip and setuptools release new versions that support new features, our builds would automatically incorporate those versions in, and our transitive dependencies could use those features as soon as they were released without breaking our build.

Potential drawbacks

So what if pip or setuptools release versions that break things?

That's a valid concern, and it's something that has happened in the past, and it has also - wait for it - broken our build.

But I think pip and setuptools are used by nearly every single project that they have a pretty good test infrastructure, and excellent canaries if they do inadvertently release anything that isn't backwards compatible.

This isn't the only way for updates to break our stuff. We used to rely on the internals of pip being available at build time to parse requirements.txt files:

GET_PIP = 'curl https://bootstrap.pypa.io/get-pip.py | python'

try:
    import pip
    from pip import __version__ as pip_version
except ImportError as e:
    print('Failed to import pip: %s' % (text_type(e)))
    print('')
    print('Download pip:\n%s' % (GET_PIP))
    sys.exit(1)

try:
    # pip < 10.0
    from pip.req import parse_requirements
except ImportError:
    # pip >= 10.0

    try:
        from pip._internal.req.req_file import parse_requirements
    except ImportError as e:
        print('Failed to import parse_requirements from pip: %s' % (text_type(e)))
        print('Using pip: %s' % (str(pip_version)))
        sys.exit(1)

Previous workaround

However, that must have broken when pip updated, because we I assume that's why we now parse requirements.txt files ourselves:

    with open(requirements_file_path, 'r') as fp:
        for line in fp.readlines():
            line = line.strip()

            if line.startswith('#') or not line:
                continue

            link, req_name = _get_link(line=line)

            if link:
                links.append(link)
            else:
                req_name = line

                if ';' in req_name:
                    req_name = req_name.split(';')[0].strip()  # <-- new code to support environment markers

            reqs.append(req_name)

Problems with the workaround

But note that parsing requirements.txt files ourselves also didn't work when I tried to use environment markers in requirements.txt files:

singledispatch ; python_version < 3.4

I think that this is one reason why we think we have to specify a specific version of pip: because we import from the private module pip._internals.

A better way

But we can actually refactor fetch_requirements to use the packaging library to parse requirements.txt lines. Doing so would have automatically helped us parse environment markers instead of doing it ourselves (which, even with my improvement, is still brittle code that probably still has bugs).

Not so fast...

However, the dist_utils.py file also comes with an ominous note:

# NOTE: This script can't rely on any 3rd party dependency so we need to use this code here

There is no further definitive explanation for that note that I have been able to find. However, my current working theory is that when that code is being run, the virtualenv has been created and activated, but third party dependencies have not yet been installed via pip or setuptools, so all imports from third party packages will fail. We need a way to install dependencies during "build" time...

Using setuptools properly

After a little digging, I found that setuptools supports specifying build-time dependencies in the setup_requires keyword argument to setup() (copying and pasting here since there isn't an anchor to that section on the linked page):

setup_requires

A string or list of strings specifying what other distributions need to be present in order for the setup script to run.
setuptools will attempt to obtain these (using pip if available) before processing the rest of the setup script or com-
mands. This argument is needed if you are using distutils extensions as part of your build process; for example, exten-
sions that process setup() arguments and turn them into EGG-INFO metadata files.

(Note: projects listed in setup_requires will NOT be automatically installed on the system where the setup script is being
run. They are simply downloaded to the ./.eggs directory if they’re not locally available already. If you want them to be in-
stalled, as well as being available when the setup script is run, you should add them to install_requires and
setup_requires.)

So, instead of writing all of this brittle NIH code, and instead of pinning pip to a specific version so we can import its private internals (and if that sounds gross to you - good, that's very much the intent!), I believe all we need to do is specify the setup_requires argument to setup() in our subpackages:

from setuptools import setup

setup(
    ...
    setup_requires=['packaging'],
    ...
)

and then specifying a minimum version constraint for both pip and setuptools.

Other Reasons

There are probably other reasons why we pin to specific versions, and I'd like a chance to investigate them fully and discuss them here. As demonstrated, some of our code is working around problems that have been solved in better ways by more recent versions of our tools, and we should use our tools as best we can instead of trying (and failing) to duplicate them.

I've realized that some of our inline comments and developer documentation are not as detailed as necessary to fully understand some of the constraints, so at the very least we need to get better at fully explaining ourselves in pull requests, developer documentation, but ideally in comments in the code. We cannot continue to carry around this information in our heads, we need to get better at communicating our rationale for major decisions in the code itself.

Possible compromise

Now, some of this might be a "bridge too far" for some people. Reproducible builds are becoming more and more important, and I think we should also support things like that.

To that end, one possible compromise would be to pin to specific versions of pip and setuptools only when we release a version of StackStorm. For example, ST2 v3.1 was build with pip == 19.3.1 and setuptools == 41.0.1. And then during development, we would only specify a minimum version so we continue to incorporate the most recent versions of our tools as they are released. So the git branch for ST2 v3.1.0 would pin to specific versions of pip and setuptools in the Makefile:

git checkout v3.1.0

pip install --upgrade "pip == 19.3.1"
pip install --upgrade "setuptools == 41.0.1"

and the git master branch would only specify minimum versions for pip and setuptools in the Makefile:

git checkout master

pip install --upgrade "pip >= 20"
pip install --upgrade "setuptools >= 42.0.1"

Updating and pinning the versions could be incorporated into the release process.

This would still allow us to achieve completely reproducible builds while preventing the development branch from breaking all the time.

Further Reading

  • What the heck is pyproject.toml - I need to read more up on this, but the Python ecosystem seems to be moving to use pyproject.toml instead of setup.py. This post goes through the problems with setuptools, the chicken-and-egg problem, and how pyproject.toml solves these problems. I'm not done reading it yet (tons of interrupts all day), but the first few paragraphs looked promising. If this is The Way™️ we should go, then we will need to update our dependency updating scripts and Makefile around pyproject.toml files.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions