-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added chunking support to vectorize.table() #161
Conversation
added chunk_text funtion to split into smaller chunks and seted max_length to 500.
@harshtech123 - what will the table structure look like in Postgres given chunking during the embedding transformation call as it is proposed in this PR? |
@ChuckHend - |
Thank you! Please add a test with an assertion that the table gets set up correctly. |
added test that insure that table gets setup correctly 1 test_long_text_endpoint (tests long inputs) 2 test_small_input (tests small inputs) 3 test_empty_input (tests empty or none inputs) 4 test_boundary_chunking(insures that only exact chunk size is there) also updated the transform.py for more better and error handling chunking functionality
@ChuckHend kindly review the updates i have made also added some more funtionality to transform.py to handle empty inputs plus added tests with an assertion that the table gets set up correctly. |
@harshtech123 I think this PR is a bit off the intention behind #142. The changes in this PR are made to |
@ChuckHend thanks for this confirmation , i am working to add this functionality and i have a question about sql files in extension did i have to update them all as well if i embed this function. |
There will likely need to be changes made to the code here. This feature perhaps needs further scoping. |
need review for this functionality
@ChuckHend i am closing this pr because i am working on #166 |
/claim #142
Added chunk_text function to split into smaller chunks and You can adjust max_length in chunk_table to your desired text chunk length. Here, I’ve set it to 500 characters.