Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added chunking support to vectorize.table() #161

Closed
wants to merge 6 commits into from

Conversation

harshtech123
Copy link

/claim #142
Added chunk_text function to split into smaller chunks and You can adjust max_length in chunk_table to your desired text chunk length. Here, I’ve set it to 500 characters.

added chunk_text funtion to split into smaller chunks and seted max_length to 500.
@ChuckHend
Copy link
Member

@harshtech123 - what will the table structure look like in Postgres given chunking during the embedding transformation call as it is proposed in this PR?

@harshtech123
Copy link
Author

@ChuckHend -
After chunking, each document will be split into multiple chunks, and each chunk will be assigned its own row in the table.
table structure might looks like
Screenshot (4)

@ChuckHend
Copy link
Member

Thank you! Please add a test with an assertion that the table gets set up correctly.

added test that insure that table gets setup correctly
1 test_long_text_endpoint (tests long inputs)
2 test_small_input (tests small inputs)
3 test_empty_input (tests empty or none inputs)
4 test_boundary_chunking(insures that only exact chunk size is there)

also updated the transform.py for more better and error handling chunking functionality
@harshtech123
Copy link
Author

@ChuckHend kindly review the updates i have made also added some more funtionality to transform.py to handle empty inputs plus added tests with an assertion that the table gets set up correctly.
Screenshot (5)

@ChuckHend
Copy link
Member

@harshtech123 I think this PR is a bit off the intention behind #142. The changes in this PR are made to vector-serve, which is a python service that runs outside of Postgres. We are looking to add the capability to vectorize.table(), which is a function call in the Postgres extension and it is defined here.

@harshtech123
Copy link
Author

@ChuckHend thanks for this confirmation , i am working to add this functionality and i have a question about sql files in extension did i have to update them all as well if i embed this function.

@ChuckHend
Copy link
Member

There will likely need to be changes made to the code here. This feature perhaps needs further scoping.

@harshtech123
Copy link
Author

@ChuckHend i am closing this pr because i am working on #166

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants