Skip to content

v0.3.0

Latest
Compare
Choose a tag to compare
@bhavnicksm bhavnicksm released this 23 Dec 18:44
345ca6c

Highlights

  • Added LateChunker support! You can use LateChunker in the following manner:
from chonkie import LateChunker

chunker = LateChunker(
    embedding_model="jinaai/jina-embeddings-v3",
    mode="sentence", 
    trust_remote_code=True
)
  • Added Chonkie Discord to the repository~ Join now to connect with the community! Oh, btw, Chonkie is now on Twitter and Bluesky too!
  • Bunch of bug fixes to improve chunkers' stability...

What's Changed

  • [Fix] #37: Incorrect indexing when repetition is present in the text by @bhavnicksm in #87
  • [Fix] #88: SemanticChunker raises UnboundLocalError: local variable 'threshold' referenced before assignment by @arpesenti in #89
  • [Fix] WordChunker chunk_batch fail by @sky-2002 in #90
  • [FIX] MEGA Bug Fix PR: Fix WordChunker batching, Fix SentenceChunker token counts, Initialization + more by @bhavnicksm in #96
  • Add initial support for Late Chunking by @bhavnicksm in #97
  • [FEAT] Add LateChunker by @bhavnicksm in #98
  • [FIX] Update outdated package versions + set max limit to numpy to v2.2 (buggy) by @bhavnicksm in #99
  • Update version to 0.3.0 in pyproject.toml and init.py by @bhavnicksm in #100
  • [fix] Add LateChunker support to chunker and module exports by @bhavnicksm in #101
  • [fix] Docstrings in SemanticChunker should include **kwargs by @bhavnicksm in #102
  • [Minor] Add Discord badge to README for community engagement by @bhavnicksm in #103

New Contributors

Full Changelog: v0.2.2...v0.3.0