Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump to Python 3.9, Big typing updates #2963

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open

Bump to Python 3.9, Big typing updates #2963

wants to merge 16 commits into from

Conversation

ashleysommer
Copy link
Contributor

@ashleysommer ashleysommer commented Oct 31, 2024

This is a very large PR.
This represents the first steps in the path to RDFLib 8.

There is a bunch to unpack in this one, this is the summary:

  1. Bump minimum Python version to v3.9, because Python 3.8 is no longer supported.
  2. Added Python 3.12 to supported list, and added 3.12 to tox and github actions test suite.
  3. Update any/all dependencies that were stuck on older versions due to our support of Python 3.8. This allowed some dependencies and subdependencies to update their minimum versions to their post-3.8 versions.
  4. Updated MyPy to the latest version (v1.13), to get latest Type Checking advancements, and changed its rules to Python 3.9-compatible rules.
  5. Updated Ruff to latest version for new Linting (with automatic fixing!) and changed its rules to Python 3.9-compatible rules.
  6. Bumped Black to latest version (and pinned it, because we always keep Black pinned).
  7. Updated typing annotations in the codebase from pre-3.9 to post-3.9 style
  8. Re-run Ruff to fix new linting errors, re-run Black to fix new formatting infractions.

It is step number 7 that makes this PR so huge.

As an example, pre-3.9 typing annotations look like this:

from typing import Dict, Tuple, Union

my_var: Dict[str, Tuple[Union[str, bytes]]] = {}

and preferred type annotation style used in Python 3.9+ is like this:

my_var: dict[str, tuple[str|bytes]]

The main differences being that importing basic type classes from the typing stdlib module is deprecated in Python 3.9+, for things like List, Set, Dict, Tuple, Python 3.9+ supports using the native classes directly as type annotations list, set, dict, tuple, etc. The other obvious change is the move away fro the Union helper from typing module, because you can simply union types with the bar | operator in Python 3.9+.

Applying these annotation-pattern updates across the whole codebase caused hundreds of files to be modified in this PR.

Applying all these changes along with the bump to latest MyPy and asking MyPy type checker to operate in Python 3.9+ compatibly mode caused some fallout. There were over 250 failing type checks after applying the above changes, that each needed to be investigated and fixed.

The main culprit was RDFLib's existing excessive use of typing: ignore[] comments, that were either no longer needed (eg, the type was fixed, and the ignore was redundant which itself is a typing error), or causing their own typing problems. Most typing: ignore[] comments were removed, and where there were still type errors the underlying cause was fixed instead of commented over.
Another big contributor to the failures were some incompatibilities with RDFLib's fundamental base types, that were defined like this:

_SubjectType: TypeAlias = rdflib.term.Node
_PredicateType: TypeAlias = rdflib.term.Node
_ObjectType: TypeAlias = rdflib.term.Node
_LiteralType: TypeAlias = Tuple[_SubjectType, _PredicateType, _ObjectType]

The problem is Node is the base classes for many different objects in RDFLib, some of which should not be used as subject or object in a Triple, and most of which cannot be used as the predicate in a triple.

So this PR tightens up the typing of _SubjectType, _PredicateType, and _ObjectType to enforce correct usage of various python objects in triples, at the typing level. In various locations in the code baseinstead of simply using Node or Identifier annotations for objects in a graph (or serializer or parser), we now use the correct _SubjectType, _PredicateType, _ObjectType where appropriate. This change caused a further 200+ type errors to be emitted from MyPy, all of which needed to be carefully checked and fixed. This exposed many instances where incorrect object types were being used, or edge-cases not considered and dealt with.

The remaining portions of this PR are the fixes for those errors. The test suite was used continually throughout the process to ensure the actual behaviour of the executing RDFLib code did not change, and the test suite still passes.

All files were reformatted with latest Black, and Ruff linter with auto-fix used to ensure everything is compliant.

Update dependencies minimum package versions to post-3.9 versions.
Upgrade latest pyparsing
Update to latest MyPy
Update to latest Black
Re-issue new Poetry lockfile
@coveralls
Copy link

coveralls commented Nov 1, 2024

Coverage Status

coverage: 90.248% (-0.03%) from 90.28%
when pulling 73400f3 on py39_typing
into 4be4216 on main.

@ashleysommer
Copy link
Contributor Author

Finally after a bunch of extra tweaks to satisfy tests, linters, and doc generation, this is ready to go!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants