What's Changed
Breaking Changes 🛠
- feat!: support multivector type by @BubbleCal in #3190
New Features 🎉
- feat: cache btree sub-index pages by @westonpace in #3309
- feat(java): support spark in predict push down to lance scan by @SaintBacchus in #3314
- feat(py): support count rows with filter in a fragment by @eddyxu in #3318
- feat(java): support take api for java module by @yanghua in #3316
- feat(java): support overwrite for spark connector by @SaintBacchus in #3313
- feat: add global counters for bytes_read & iops for benchmarking utility by @westonpace in #3321
- feat: vector search with distance range by @BubbleCal in #3326
- feat: add utility for reporting data stats by @westonpace in #3328
- feat: cache miniblock metadata by @westonpace in #3323
- feat(java): support statistics row num for lance scan by @SaintBacchus in #3304
- feat: support with_rowaddr for spark by @chenkovsky in #3336
- feat(java): support get real data size for lance spark statistics interface by @SaintBacchus in #3337
- feat(java): support add columns via sql expressions by @yanghua in #3287
- feat: move fsl handling to structural encodings and add support for miniblock by @westonpace in #3324
- feat: support lindera for japanese and korea tokenization by @chenkovsky in #3218
- feat: add support for repetition index to the full zip structural encoding by @westonpace in #3335
- feat: support IVF_FLAT and hamming in pylance by @BubbleCal in #3301
- feat: allow blob in
write_fragments
by @fecet in #3235 - feat: make it possible to build lance without protoc (except on Windows) by @westonpace in #3363
- feat: log the number of rows we were able to sample by @westonpace in #3367
- feat: upgrade datafusion to 44.0 by @westonpace in #3341
- feat:
execute_uncommitted
for merge insert by @wjones127 in #3233
Bug Fixes 🐛
- fix: fix pyproject.toml by @chenkovsky in #3299
- fix: is not false crash by @chenkovsky in #3298
- fix: default value is overwritten by @chenkovsky in #3319
- fix: lance ray sink crash when fields contain none by @Jay-ju in #3322
- fix: allow empty scalar indices and don't drop nulls on update by @westonpace in #3329
- fix: coerce scalar for between by @chenkovsky in #3327
- fix(java): replace org.json with gson to resolve the jar conflict with spark 3.5.1 by @SaintBacchus in #3340
- fix: avoid double-take in some scenarios by @westonpace in #3357
- fix: handle deletions in take by @wjones127 in #3360
- fix: fix ray lance sink error by @Jay-ju in #3230
- fix: scan out of range by @chenkovsky in #3339
- fix: cast null arrays to the appropriate type when coercing to a table by @andrijazz in #3362
- fix(python): correct type hint for
write_fragments()
by @chenkovsky in #3373
Performance Improvements 🚀
- perf: parallelize indexing partitions by @BubbleCal in #3303
Other Changes
- refactor(java): simpilfy fragment by @chenkovsky in #3307
New Contributors
- @fecet made their first contribution in #3235
- @andrijazz made their first contribution in #3362
- @kemingy made their first contribution in #3370
Full Changelog: v0.21.0...v0.22.0