Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optional orjson support for faster json reading and writing #2854

Merged
merged 11 commits into from
Jul 29, 2024

Conversation

ashleysommer
Copy link
Contributor

@ashleysommer ashleysommer commented Jul 27, 2024

This adds optional support for orjson, that can be enabled by installing rdflib with pip extras syntax like rdflib[orjson], or poetry extras syntax like --extras orjson, or finally it will be detected and used if you simply install orjson>=3.9.14 in your python environment.

This PR touches a lot of files, because JSON is surprisingly used in a whole lot of different places in rdflib.

  • JSON-LD Graph Serializer
  • JSON-LD Graph Parser
  • rdf:JSON literal support in JSON-LD documents
  • Hextuples Graph Serializer (that uses a special newline-delimited JSON)
  • Hextuples Graph Parser (parses JSON line-by line from a ND-JSON file)
  • sparql-results-json SPARQLResults Parser
  • sparql-results-json SPARQLResults Serializer
  • The (yet-to-be-released) GeoJSON-LD serializer

There are also some tangential non-JSON related changes to stream handling in a bunch of other SPARQLResult serializers. While implementing the orjson support for sparql-results-json serializer I found some errors in the way all of the different Sparql-Results-Serializers treat TextIO and BinaryIO streams. This was causing 7 errors to be thrown by the rdflib serializer tests, but they were marked as ignored in the test suite.

These additional changes include much better Typing to the Sparql-Results-Serializer subclasses, which exposed where the problems were (hooray for typed Python exposing actual errors). Fixes were made to all of the failing Sparql-Results-Serializer subclasses, and there are no skipped tests now. This also allowed the removal of a bunch of mypy type: ignore patches that were in place to silence the complaining type checker.

I know it would be great to move all those additional changes to a different PR, but there are two reasons I didnt:

  1. The addition of the orjson feature relies on those typing changes and BinaryIO stream fixes.
  2. The sparql-results-serializer fixes for specifically the sparql-results-json (SparqlResultsJson) subclass is too entangled with the orjson feature to be able to be extracted easily.

Fixes #2784

…t stages of json-ld parser.

This relies on merging of the BytesIOWrapper PR.
Add orjson to sparql-results-json parser, and sparql-results-json serializer
Tangential fixes to all the other non-json SPARQL-Results serializers
Adding better typing to all SPARQL-Results Serializers.
Got 7 ignored tests passing for SPARQL-Results Serializers.
@ashleysommer ashleysommer requested review from edmondchuc and nicholascar and removed request for edmondchuc July 27, 2024 04:47
@ashleysommer
Copy link
Contributor Author

Looks like I need to update this a bit to resolve some conflicts with the changes that were introduced with the "JSON-LD from HTML" feature that was recently merged. I'll get to that later today or tomorrow.

# Conflicts:
#	rdflib/plugins/shared/jsonld/util.py
…n is not installed throwing a mypy error, but they are needed when orjson is installed to prevent different mypy errors. So add additional unused-ignore suppressions.
… and stdlib json outputs are the same for docs and comparison purposes.
@coveralls
Copy link

coveralls commented Jul 29, 2024

Coverage Status

coverage: 90.633% (-0.1%) from 90.748%
when pulling 14e4f95 on orjson_support
into 563dfcc on main.

@ashleysommer
Copy link
Contributor Author

Finally, all tests and lints passing. @nicholascar I'm merging this now.

@ashleysommer ashleysommer merged commit 9b3d8a4 into main Jul 29, 2024
22 checks passed
@ashleysommer ashleysommer deleted the orjson_support branch July 29, 2024 06:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Using orjson instead of default json library
2 participants