Highlights
- Now you can see a progress bar when chunking a lot of texts with batch chunking
from chonkie import RecursiveChunker
chunker = RecursiveChunker()
chunks = chunker([...], show_progress_bar=True) # progress bar is enabled by default
# π¦ choooooooooooooooooooonk 100% β’ 200/200 docs chunked [00:00<00:00, 229.65doc/s] π±
What's Changed
- Add CONTRIBUTING.md, update issue templates, CI, Codecov and more... by @bhavnicksm in #119
- [FEAT] Add TQDM to default installs + CONTRIBUTING.md + other minor updates by @bhavnicksm in #120
- [fix] CI: reports were not being uploaded to Codecov by @bhavnicksm in #121
- Update CONTRIBUTING.md with first issue hyperlink by @shreyashnigam in #122
- [FIX] Support class methods as
token_counter
objects forCustomEmbeddings
(#92) by @bhavnicksm in #127 - [Fix] Add fix for #92: Support
class.method
as a Tokenizer forCustomEmbedding
+. minor changes by @bhavnicksm in #128 - [FIX] #116: Incorrect
start_index
whenchunk_overlap
is not 0 by @Udayk02 in #126 - [FIX]
start_index
incorrect whenchunk_overlap
is not 0 (#116) by @bhavnicksm in #132 - [FIX] Remove tests for Py3.8 β Incompatible for support by @bhavnicksm in #134
- [fix] High
chunk_overlap
causes last chunk to be entirely redundant by @bhavnicksm in #136 - [FIX] Handle edge case for RecursiveChunker (#131) by @bhavnicksm in #137
- [DOCS] Update readme intro to match docs. by @shreyashnigam in #135
- [FEAT] Add TQDM progress bars for
chunk_batch
+ Update README.md by @bhavnicksm in #138 - Replace dead discord link with infinite lifetime by @shreyashnigam in #140
- [FIX] Minor fixes + Stylistic enhancements for TQDM and Multiprocessing by @bhavnicksm in #141
- [chore] Bump up the package version to v0.4.1 by @bhavnicksm in #143
New Contributors
- @shreyashnigam made their first contribution in #122
- @Udayk02 made their first contribution in #126
Full Changelog: v0.4.0...v0.4.1