Highlights
- Added
LateChunker
support! You can useLateChunker
in the following manner:
from chonkie import LateChunker
chunker = LateChunker(
embedding_model="jinaai/jina-embeddings-v3",
mode="sentence",
trust_remote_code=True
)
- Added Chonkie Discord to the repository~ Join now to connect with the community! Oh, btw, Chonkie is now on Twitter and Bluesky too!
- Bunch of bug fixes to improve chunkers' stability...
What's Changed
- [Fix] #37: Incorrect indexing when repetition is present in the text by @bhavnicksm in #87
- [Fix] #88: SemanticChunker raises UnboundLocalError: local variable 'threshold' referenced before assignment by @arpesenti in #89
- [Fix] WordChunker chunk_batch fail by @sky-2002 in #90
- [FIX] MEGA Bug Fix PR: Fix WordChunker batching, Fix SentenceChunker token counts, Initialization + more by @bhavnicksm in #96
- Add initial support for Late Chunking by @bhavnicksm in #97
- [FEAT] Add LateChunker by @bhavnicksm in #98
- [FIX] Update outdated package versions + set max limit to numpy to v2.2 (buggy) by @bhavnicksm in #99
- Update version to 0.3.0 in pyproject.toml and init.py by @bhavnicksm in #100
- [fix] Add LateChunker support to chunker and module exports by @bhavnicksm in #101
- [fix] Docstrings in SemanticChunker should include **kwargs by @bhavnicksm in #102
- [Minor] Add Discord badge to README for community engagement by @bhavnicksm in #103
New Contributors
- @arpesenti made their first contribution in #89
Full Changelog: v0.2.2...v0.3.0