Description
TL;DR
If we use our tools properly, we might be able to relax their version requirements a bit, take advantage of their new features, and allow our transitive dependencies to take advantage of the new features as well, all without breaking the build all of the time.
Background
PR #4895
This came up in PR StackStorm/st2#4895.
In that PR, pip and setuptools were pinned to specific versions:
$(VIRTUALENV_ST2CLIENT_DIR)/bin/pip install --upgrade "pip==19.3.1"
$(VIRTUALENV_ST2CLIENT_DIR)/bin/pip install --upgrade "setuptools==41.0.1"
The setuptools version being pinned that tightly caused a bug when a transitive dependency (read: a dependency of a dependency) was updated to only support Python 3. This happened twice in recent history:
First case
python-mistralclient
depended onosc-lib >= 1.8.0
(resolved to2.0.0
) which depended onopenstacksdk >= 0.15.0
(resolved to0.44.0
) which depended onfuturist >= 2.1.0
which does not support for Python 2
Second case
kombu
requiresimportlib-metadata
>=0.18
- the most recent version of
importlib-metadata
is1.5.2
importlib-metadata
requiresconfigparser
>=3.5
- the most recent version of
configparser
is5.0
- that version only supports Python 3
- the most recent version of
configparser
that supports Python 2 is version4.0.2
- the most recent version of
- the most recent version of
As noted in this PR to importlib-metadata, all of these dependencies properly support the Requires-Python
metadata field.
Example
It's easier to give an example than describe it in prose. This means that in a Python project's setup.py
file, they can specify the requires_python
keyword argument to setup()
, like so:
from setuptools import setup
setup(
...
requires_python='>=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*',
...
)
That means that pip version 9 or later, when installing packages in a Python 2.7 environment, will ignore any potential packages that do not specify they can run in Python 2.7.
Building ST2 in Travis
However, some of our "build" (eg: CI) commands involve installing packages, but do not involve pip. We run python setup.py develop
and python setup.py install
in a few different parts of the `Makefile:
- to install runners
- to check packages
- to load
st2common
drivers - to load
st2common
metrics drivers - to check
st2client
installation - to load
st2common
metrics drivers again
All of these commands don't rely on pip to install packages, they rely on setuptools to do so itself. However, the problem comes in when...
The problem
setuptools < 42 doesn't support requires_python
keyword argument
The setuptools project added support for requires_python
in version 42.0.0.
This means that when we pin setuptools to a specific version, we are guaranteeing that, when transitive dependencies of ST2 start relying on features of more modern versions of our tools, we will miss out on those changes, and our transitive dependencies will break our build.
Problems with other solutions
There are some potential workarounds.
Pin transitive dependency versions
We could pin the versions of the futurist
and configparser
to versions that still support Python 2. This would ensure that both older versions of pip and setuptools do not break the Python 2.7 build.
However, this approach is not scalable. We would have to track layers upon layers of dependencies, even though we do not directly depend on or import those packages. Furthermore, this problem is only going to get worse and worse as more and more projects drop support for Python 2.7. Each one will cause a build failure for us, and we will have to troubleshoot, solve, test, review, merge, and test every time that happens. This is not a good use of anybody's time.
Use pip-compile
(from pip-tools) to catch dependency conflicts
As implemented in StackStorm/st2#4895, we now use pip-compile
to check for dependency version conflicts across our entire installation. The output of pip-compile
is much more complete and explicit than the previous output of pipconflictchecker
.
But this solution only checks that our dependencies don't conflict. Our setuptools commands will continue to try to install dependencies that are Python 3-only, even in our Python 2.7 tests.
The proposal
The only workable solution was to update setuptools to version 42.0.0
(or later). Now, both pip and setuptools respected project's requires_python
field, and when run in Python 2.7 environments, only try to install packages that denote compatibility with that version of Python.
However, pinning setuptools, or pip as well, to a specific version just creates the same situation in the future. If there is some other feature that is introduced to pip or setuptools, and any of our transitive dependencies start using that feature before we can update our build to rely on up-to-date versions of pip and setuptools, those changes will again cause our builds to fail.
Instead of pinning pip and setuptools to specific versions during the build, I propose that we instead only specify their known minimum versions. Specifying pip >= 9.0
and setuptools >= 42.0.0
would ensure that whatever version of pip is installed at least supports Requires-Python
. As pip and setuptools release new versions that support new features, our builds would automatically incorporate those versions in, and our transitive dependencies could use those features as soon as they were released without breaking our build.
Potential drawbacks
So what if pip or setuptools release versions that break things?
That's a valid concern, and it's something that has happened in the past, and it has also - wait for it - broken our build.
But I think pip and setuptools are used by nearly every single project that they have a pretty good test infrastructure, and excellent canaries if they do inadvertently release anything that isn't backwards compatible.
This isn't the only way for updates to break our stuff. We used to rely on the internals of pip being available at build time to parse requirements.txt
files:
GET_PIP = 'curl https://bootstrap.pypa.io/get-pip.py | python'
try:
import pip
from pip import __version__ as pip_version
except ImportError as e:
print('Failed to import pip: %s' % (text_type(e)))
print('')
print('Download pip:\n%s' % (GET_PIP))
sys.exit(1)
try:
# pip < 10.0
from pip.req import parse_requirements
except ImportError:
# pip >= 10.0
try:
from pip._internal.req.req_file import parse_requirements
except ImportError as e:
print('Failed to import parse_requirements from pip: %s' % (text_type(e)))
print('Using pip: %s' % (str(pip_version)))
sys.exit(1)
Previous workaround
However, that must have broken when pip updated, because we I assume that's why we now parse requirements.txt
files ourselves:
with open(requirements_file_path, 'r') as fp:
for line in fp.readlines():
line = line.strip()
if line.startswith('#') or not line:
continue
link, req_name = _get_link(line=line)
if link:
links.append(link)
else:
req_name = line
if ';' in req_name:
req_name = req_name.split(';')[0].strip() # <-- new code to support environment markers
reqs.append(req_name)
Problems with the workaround
But note that parsing requirements.txt
files ourselves also didn't work when I tried to use environment markers in requirements.txt
files:
singledispatch ; python_version < 3.4
I think that this is one reason why we think we have to specify a specific version of pip: because we import from the private module pip._internals
.
A better way
But we can actually refactor fetch_requirements
to use the packaging library to parse requirements.txt
lines. Doing so would have automatically helped us parse environment markers instead of doing it ourselves (which, even with my improvement, is still brittle code that probably still has bugs).
Not so fast...
However, the dist_utils.py
file also comes with an ominous note:
# NOTE: This script can't rely on any 3rd party dependency so we need to use this code here
There is no further definitive explanation for that note that I have been able to find. However, my current working theory is that when that code is being run, the virtualenv has been created and activated, but third party dependencies have not yet been installed via pip or setuptools, so all imports from third party packages will fail. We need a way to install dependencies during "build" time...
Using setuptools properly
After a little digging, I found that setuptools supports specifying build-time dependencies in the setup_requires
keyword argument to setup()
(copying and pasting here since there isn't an anchor to that section on the linked page):
setup_requires
A string or list of strings specifying what other distributions need to be present in order for the setup script to run.
setuptools
will attempt to obtain these (using pip if available) before processing the rest of the setup script or com-
mands. This argument is needed if you are using distutils extensions as part of your build process; for example, exten-
sions that process setup() arguments and turn them into EGG-INFO metadata files.(Note: projects listed in
setup_requires
will NOT be automatically installed on the system where the setup script is being
run. They are simply downloaded to the ./.eggs directory if they’re not locally available already. If you want them to be in-
stalled, as well as being available when the setup script is run, you should add them toinstall_requires
and
setup_requires
.)
So, instead of writing all of this brittle NIH code, and instead of pinning pip to a specific version so we can import its private internals (and if that sounds gross to you - good, that's very much the intent!), I believe all we need to do is specify the setup_requires
argument to setup()
in our subpackages:
from setuptools import setup
setup(
...
setup_requires=['packaging'],
...
)
and then specifying a minimum version constraint for both pip and setuptools.
Other Reasons
There are probably other reasons why we pin to specific versions, and I'd like a chance to investigate them fully and discuss them here. As demonstrated, some of our code is working around problems that have been solved in better ways by more recent versions of our tools, and we should use our tools as best we can instead of trying (and failing) to duplicate them.
I've realized that some of our inline comments and developer documentation are not as detailed as necessary to fully understand some of the constraints, so at the very least we need to get better at fully explaining ourselves in pull requests, developer documentation, but ideally in comments in the code. We cannot continue to carry around this information in our heads, we need to get better at communicating our rationale for major decisions in the code itself.
Possible compromise
Now, some of this might be a "bridge too far" for some people. Reproducible builds are becoming more and more important, and I think we should also support things like that.
To that end, one possible compromise would be to pin to specific versions of pip and setuptools only when we release a version of StackStorm. For example, ST2 v3.1 was build with pip == 19.3.1
and setuptools == 41.0.1
. And then during development, we would only specify a minimum version so we continue to incorporate the most recent versions of our tools as they are released. So the git branch for ST2 v3.1.0
would pin to specific versions of pip and setuptools in the Makefile
:
git checkout v3.1.0
pip install --upgrade "pip == 19.3.1"
pip install --upgrade "setuptools == 41.0.1"
and the git master
branch would only specify minimum versions for pip and setuptools in the Makefile
:
git checkout master
pip install --upgrade "pip >= 20"
pip install --upgrade "setuptools >= 42.0.1"
Updating and pinning the versions could be incorporated into the release process.
This would still allow us to achieve completely reproducible builds while preventing the development branch from breaking all the time.
Further Reading
- What the heck is
pyproject.toml
- I need to read more up on this, but the Python ecosystem seems to be moving to usepyproject.toml
instead ofsetup.py
. This post goes through the problems with setuptools, the chicken-and-egg problem, and howpyproject.toml
solves these problems. I'm not done reading it yet (tons of interrupts all day), but the first few paragraphs looked promising. If this is The Way™️ we should go, then we will need to update our dependency updating scripts andMakefile
aroundpyproject.toml
files.