Release v2.0.0 · projectglow/glow

What's Changed

Support Spark 3.4 and 3.5
Add functions for left and left semi joins with overlap criteria accelerated by Databricks' range join optimization
Register SQL functions via SQL extension service provider interface, so glow.register is no longer necessary if Glow is on the classpath when Spark is launched

Remove Hail integration
Remove features that frequently cause incompatibilities between versions (aggregate_by_index, CSV pipe transformer). Workarounds are provided in the documentation.

On a dataset with 1B left rows and 1M right rows and varying percentages of SNPs in the left table (tested with 1 4 core executor due to quota):

Inner range join + left join, all SNP percentages: 4h
Glow join, 0% SNPs: 4h
Glow join, 50% SNPs: 2h9m
Glow join, 90% SNPs: 0h42m

The Python source artifact is built from tag v2.0.0-conda in order to fix Glow's conda recipe.

Full Changelog: v1.2.1...v2.0.0