Proposal: Pin only minimum versions of pip and setuptools

# TL;DR #

If we use our tools properly, we might be able to relax their version requirements a bit, take advantage of their new features, and allow our transitive dependencies to take advantage of the new features as well, all without breaking the build all of the time.

# Background #

## PR #4895 ##

This came up in PR StackStorm/st2#4895.

In that PR, pip and setuptools were pinned to specific versions:
```
$(VIRTUALENV_ST2CLIENT_DIR)/bin/pip install --upgrade "pip==19.3.1"
```

```
$(VIRTUALENV_ST2CLIENT_DIR)/bin/pip install --upgrade "setuptools==41.0.1"
```

The setuptools version being pinned that tightly caused a bug when a transitive dependency (read: a dependency of a dependency) was updated to only support Python 3. This happened twice in recent history:

### First case ###
* `python-mistralclient` depended on
  - `osc-lib >= 1.8.0` (resolved to `2.0.0`) which depended on
    + `openstacksdk >= 0.15.0` (resolved to `0.44.0`) which depended on
      * `futurist >= 2.1.0` which does not support for Python 2

### Second case ###
* `kombu` requires [`importlib-metadata` >= `0.18`](https://github.com/celery/kombu/blob/bfca209752c25ee2a6970041bf392ae920af6ee1/requirements/default.txt#L1)
  - the most recent version of `importlib-metadata` is `1.5.2`
  - `importlib-metadata` requires [`configparser` >= `3.5`](https://gitlab.com/python-devs/importlib_metadata/-/blob/f45c9d325955fd72575e9ba5f69839481e33f090/setup.cfg#L24)
    + the most recent version of `configparser` is `5.0`
    + that version only supports Python 3
    + the most recent version of `configparser` that [supports Python 2 is version `4.0.2`](https://github.com/jaraco/configparser/blob/9337b6aacd8bf594584903443285e99085a406ea/setup.cfg#L22)

As noted in [this PR to importlib-metadata](https://gitlab.com/python-devs/importlib_metadata/-/merge_requests/118), all of these dependencies properly support the [`Requires-Python` metadata field](https://packaging.python.org/specifications/core-metadata/#requires-python).

### Example ###

It's easier to give an example than describe it in prose. This means that in a Python project's `setup.py` file, they can specify the `requires_python` keyword argument to `setup()`, like so:

```python
from setuptools import setup

setup(
    ...
    requires_python='>=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*',
    ...
)
```

That means that pip version 9 or later, when installing packages in a Python 2.7 environment, will ignore any potential packages that do not specify they can run in Python 2.7.

## Building ST2 in Travis ##

However, some of our "build" (eg: CI) commands involve installing packages, but do not involve pip. We run `python setup.py develop` and `python setup.py install` in a few different parts of the `Makefile:

* [to install runners](https://github.com/StackStorm/st2/blob/63dfed5b1590aeda57bc1ab17e4f822f48b0b04e/Makefile#L160)
* [to check packages](https://github.com/StackStorm/st2/blob/63dfed5b1590aeda57bc1ab17e4f822f48b0b04e/Makefile#L202)
* [to load `st2common` drivers](https://github.com/StackStorm/st2/blob/63dfed5b1590aeda57bc1ab17e4f822f48b0b04e/Makefile#L491)
* [to load `st2common` metrics drivers](https://github.com/StackStorm/st2/blob/63dfed5b1590aeda57bc1ab17e4f822f48b0b04e/Makefile#L502)
* [to check `st2client` installation](https://github.com/StackStorm/st2/blob/63dfed5b1590aeda57bc1ab17e4f822f48b0b04e/Makefile#L339)
* [to load `st2common` metrics drivers again](https://github.com/StackStorm/st2/blob/63dfed5b1590aeda57bc1ab17e4f822f48b0b04e/Makefile#L853)

All of these commands don't rely on pip to install packages, they rely on setuptools to do so itself. However, the problem comes in when...

# The problem #

## setuptools < 42 doesn't support `requires_python` keyword argument ##

The setuptools project added support for `requires_python` in [version 42.0.0](https://setuptools.readthedocs.io/en/latest/history.html#v42-0-0).

This means that when we pin setuptools to a specific version, we are guaranteeing that, when transitive dependencies of ST2 start relying on features of more modern versions of our tools, we will miss out on those changes, and our transitive dependencies will break our build.

# Problems with other solutions #

There are some potential workarounds.

## Pin transitive dependency versions ##

We could pin the versions of the `futurist` and `configparser` to versions that still support Python 2. This would ensure that both older versions of pip and setuptools do not break the Python 2.7 build.

However, this approach is not scalable. We would have to track layers upon layers of dependencies, even though we do not directly depend on or import those packages. Furthermore, this problem is only going to get worse and worse as more and more projects drop support for Python 2.7. Each one will cause a build failure for us, and we will have to troubleshoot, solve, test, review, merge, and test every time that happens. This is not a good use of anybody's time.

## Use `pip-compile` (from pip-tools) to catch dependency conflicts ##

As implemented in StackStorm/st2#4895, we now use `pip-compile` to check for dependency version conflicts across our entire installation. The output of `pip-compile` is much more complete and explicit than the previous output of `pipconflictchecker`.

But this solution only checks that our dependencies don't conflict. Our setuptools commands will continue to try to install dependencies that are Python 3-only, even in our Python 2.7 tests.

# The proposal #

The only workable solution was to update setuptools to version `42.0.0` (or later). Now, both pip and setuptools respected project's `requires_python` field, and when run in Python 2.7 environments, only try to install packages that denote compatibility with that version of Python.

However, pinning setuptools, or pip as well, to a specific version just creates the same situation in the future. If there is some other feature that is introduced to pip or setuptools, and any of our transitive dependencies start using that feature before we can update our build to rely on up-to-date versions of pip and setuptools, those changes will again cause our builds to fail.

Instead of pinning pip and setuptools to specific versions during the build, I propose that we instead only specify their known minimum versions. Specifying `pip >= 9.0`and `setuptools >= 42.0.0` would ensure that whatever version of pip is installed at least supports `Requires-Python`. As pip and setuptools release new versions that support new features, our builds would automatically incorporate those versions in, and our transitive dependencies could use those features as soon as they were released without breaking our build.

# Potential drawbacks #

So what if pip or setuptools release versions that break things?

That's a valid concern, and it's something that has happened in the past, and it has also - wait for it - broken our build.

But I think pip and setuptools are used by nearly every single project that they have a pretty good test infrastructure, and excellent canaries if they do inadvertently release anything that isn't backwards compatible.

This isn't the only way for updates to break our stuff. We used to rely on the [internals of pip being available at build time](https://github.com/StackStorm/st2/blob/master/scripts/dist_utils_old.py) to [parse `requirements.txt` files](https://github.com/StackStorm/st2/blob/63dfed5b1590aeda57bc1ab17e4f822f48b0b04e/scripts/dist_utils_old.py#L84):

```python
GET_PIP = 'curl https://bootstrap.pypa.io/get-pip.py | python'

try:
    import pip
    from pip import __version__ as pip_version
except ImportError as e:
    print('Failed to import pip: %s' % (text_type(e)))
    print('')
    print('Download pip:\n%s' % (GET_PIP))
    sys.exit(1)

try:
    # pip < 10.0
    from pip.req import parse_requirements
except ImportError:
    # pip >= 10.0

    try:
        from pip._internal.req.req_file import parse_requirements
    except ImportError as e:
        print('Failed to import parse_requirements from pip: %s' % (text_type(e)))
        print('Using pip: %s' % (str(pip_version)))
        sys.exit(1)
```

### Previous workaround ###

However, that must have broken when pip updated, because we **I assume** that's why we now [parse `requirements.txt` files ourselves](https://github.com/StackStorm/st2/blob/63dfed5b1590aeda57bc1ab17e4f822f48b0b04e/scripts/dist_utils.py#L96):

```python
    with open(requirements_file_path, 'r') as fp:
        for line in fp.readlines():
            line = line.strip()

            if line.startswith('#') or not line:
                continue

            link, req_name = _get_link(line=line)

            if link:
                links.append(link)
            else:
                req_name = line

                if ';' in req_name:
                    req_name = req_name.split(';')[0].strip()  # <-- new code to support environment markers

            reqs.append(req_name)
```

### Problems with the workaround ###

But note that parsing `requirements.txt` files ourselves also didn't work when I tried to [use environment markers](https://github.com/StackStorm/st2/pull/4894/commits/ccf835658c7b15b8e1d92acdbaa606641acbcc7c) in `requirements.txt` files:

```
singledispatch ; python_version < 3.4
```

**I think** that this is one reason why we think we have to specify a specific version of pip: because we import from the private module `pip._internals`.

## A better way ##

But we can actually refactor `fetch_requirements` to use the [packaging](https://packaging.pypa.io/en/latest/requirements/) library to parse `requirements.txt` lines. Doing so would have automatically helped us parse environment markers instead of doing it ourselves (which, even with my improvement, is still brittle code that probably still has bugs).

### Not so fast... ###

However, the `dist_utils.py` file also comes with an ominous note:

```python
# NOTE: This script can't rely on any 3rd party dependency so we need to use this code here
```

There is no further definitive explanation for that note that I have been able to find. However, my current working theory is that when that code is being run, the virtualenv has been created and activated, but third party dependencies have not yet been installed via pip or setuptools, so all imports from third party packages will fail. We need a way to install dependencies during "build" time...

### Using setuptools properly ###

After a little digging, I found that setuptools  supports specifying build-time dependencies in the [`setup_requires` keyword argument to `setup()`](https://setuptools.readthedocs.io/en/latest/setuptools.html) (copying and pasting here since there isn't an anchor to that section on the linked page):

> `setup_requires`
>
>   A string or list of strings specifying what other distributions need to be present in order for the _setup script_ to run.
>   `setuptools` will attempt to obtain these (using pip if available) before processing the rest of the setup script or com-
>   mands. This argument is needed if you are using distutils extensions as part of your build process; for example, exten-
>   sions that process setup() arguments and turn them into EGG-INFO metadata files.
>
>   (Note: projects listed in `setup_requires` will NOT be automatically installed on the system where the setup script is being
>   run. They are simply downloaded to the ./.eggs directory if they’re not locally available already. If you want them to be in-
>   stalled, as well as being available when the setup script is run, you should add them to `install_requires` **and**
>   `setup_requires`.)

So, instead of writing all of this brittle NIH code, and instead of pinning pip to a specific version so we can import its private internals (and if that sounds gross to you - good, that's very much the intent!), I believe all we need to do is specify the `setup_requires` argument to `setup()` in our subpackages:

```python
from setuptools import setup

setup(
    ...
    setup_requires=['packaging'],
    ...
)
```

and then specifying a minimum version constraint for _both_ pip and setuptools.

## Other Reasons ##

There are probably other reasons why we pin to specific versions, and I'd like a chance to investigate them fully and discuss them here. As demonstrated, some of our code is working around problems that have been solved in better ways by more recent versions of our tools, and we should use our tools as best we can instead of trying (and failing) to duplicate them.

I've realized that some of our inline comments and developer documentation are not as detailed as necessary to fully understand some of the constraints, so at the very least we need to get better at **fully** explaining ourselves in pull requests, developer documentation, but ideally in comments in the code. We cannot continue to carry around this information in our heads, we need to get better at communicating our rationale for major decisions in the code itself.

## Possible compromise ##

Now, some of this might be a "bridge too far" for some people. Reproducible builds are becoming more and more important, and I think we should also support things like that.

To that end, one possible compromise would be to pin to specific versions of pip and setuptools only when we release a version of StackStorm. For example, ST2 v3.1 was build with `pip == 19.3.1` and `setuptools == 41.0.1`. And then during development, we would only specify a minimum version so we continue to incorporate the most recent versions of our tools as they are released. So the git branch for ST2 `v3.1.0` would pin to specific versions of pip and setuptools in the `Makefile`:

### `git checkout v3.1.0` ###

```bash
pip install --upgrade "pip == 19.3.1"
pip install --upgrade "setuptools == 41.0.1"
```

and the git `master` branch would only specify minimum versions for pip and setuptools in the `Makefile`:

### `git checkout master` ###

```bash
pip install --upgrade "pip >= 20"
pip install --upgrade "setuptools >= 42.0.1"
```

Updating and pinning the versions could be incorporated into the release process.

This would still allow us to achieve completely reproducible builds while preventing the development branch from breaking all the time.

# Further Reading #

* [What the heck is `pyproject.toml`](https://snarky.ca/what-the-heck-is-pyproject-toml) - I need to read more up on this, but the Python ecosystem seems to be moving to use `pyproject.toml` instead of `setup.py`. This post goes through the problems with setuptools, the chicken-and-egg problem, and how `pyproject.toml` solves these problems. I'm not done reading it yet (tons of interrupts all day), but the first few paragraphs looked promising. If this is The Way™️ we should go, then we will need to update our dependency updating scripts and `Makefile` around `pyproject.toml` files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Proposal: Pin only minimum versions of pip and setuptools #2

TL;DR

Background

PR #4895

First case

Second case

Example

Building ST2 in Travis

The problem

setuptools < 42 doesn't support `requires_python` keyword argument

Problems with other solutions

Pin transitive dependency versions

Use `pip-compile` (from pip-tools) to catch dependency conflicts

The proposal

Potential drawbacks

Previous workaround

Problems with the workaround

A better way

Not so fast...

Using setuptools properly

Other Reasons

Possible compromise

`git checkout v3.1.0`

`git checkout master`

Further Reading

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Proposal: Pin only minimum versions of pip and setuptools #2

Description

TL;DR

Background

PR #4895

First case

Second case

Example

Building ST2 in Travis

The problem

setuptools < 42 doesn't support requires_python keyword argument

Problems with other solutions

Pin transitive dependency versions

Use pip-compile (from pip-tools) to catch dependency conflicts

The proposal

Potential drawbacks

Previous workaround

Problems with the workaround

A better way

Not so fast...

Using setuptools properly

Other Reasons

Possible compromise

git checkout v3.1.0

git checkout master

Further Reading

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

setuptools < 42 doesn't support `requires_python` keyword argument

Use `pip-compile` (from pip-tools) to catch dependency conflicts

`git checkout v3.1.0`

`git checkout master`