You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I don't know if having helpers or default transformers is something being considered and if the core library is the right place for it, but here is the request.
LangChain and LlamaIndex are frameworks widely used in the construction of RAG solutions. In particular, they each define abstract Embeddings and Splitter/Parser interfaces, implemented for different technologies or approaches by themselves or partners (OpenAI, Hugging Face, Bedrock, etc.).
The idea is to provide Transformers based on the abstract interfaces and taking developer-supplied implementations.
This will make it easier to integrate dlt with these frameworks for these needs. Moreover, these Transformers could be agnostic and compatible with all existing Vector Store destinations, so there's no need to make an embedding implementation for every type of Vector Store.
This could also avoid having to make limiting choices if there is a need to provide a default embedding or splitter Transformer in dlt core (e.g. choosing to implement a default embedding with OpenAI only, or to only use a SemanticChunker by default). We can still optionnaly provide few additional transformers, e.g. using the LangChain embeddings and the LangChain OpenAI implementation for convenient or example if needed.
Are you a dlt user?
Yes, I'm already a dlt user.
Use case
I want to easily use an embedding model to compute vectors, using an existing LangChain / LlamaIndex implementation.
I want to reuse existing LangChain / LlamaIndex code.
I want to be able to switch easily from one Embedding model to another (e.g. for local testing or future architecture changes), or to switch from one type of chunking to another.
Proposed solution
Transformers based on the abstract Embeddings and Splitter/Parser interfaces of LangChain and LlamaIndex, and taking developer-supplied implementations.
For chunking, transfomers would take texts from the ressource or a previous transformer, split them into chunks using the supplied implementation, and return the chunks.
For embedding, transfomers would take texts from the ressource or a previous transformer, call the embedding model using the supplied implementation to get vectors, and return the texts and vectors.
Then the vectors and chunks are handled by the destination (or the next transformer).
Feature description
I don't know if having helpers or default transformers is something being considered and if the core library is the right place for it, but here is the request.
LangChain and LlamaIndex are frameworks widely used in the construction of RAG solutions. In particular, they each define abstract Embeddings and Splitter/Parser interfaces, implemented for different technologies or approaches by themselves or partners (OpenAI, Hugging Face, Bedrock, etc.).
The idea is to provide Transformers based on the abstract interfaces and taking developer-supplied implementations.
This will make it easier to integrate dlt with these frameworks for these needs. Moreover, these Transformers could be agnostic and compatible with all existing Vector Store destinations, so there's no need to make an embedding implementation for every type of Vector Store.
This could also avoid having to make limiting choices if there is a need to provide a default embedding or splitter Transformer in dlt core (e.g. choosing to implement a default embedding with OpenAI only, or to only use a SemanticChunker by default). We can still optionnaly provide few additional transformers, e.g. using the LangChain embeddings and the LangChain OpenAI implementation for convenient or example if needed.
Are you a dlt user?
Yes, I'm already a dlt user.
Use case
Proposed solution
Transformers based on the abstract Embeddings and Splitter/Parser interfaces of LangChain and LlamaIndex, and taking developer-supplied implementations.
For chunking, transfomers would take texts from the ressource or a previous transformer, split them into chunks using the supplied implementation, and return the chunks.
For embedding, transfomers would take texts from the ressource or a previous transformer, call the embedding model using the supplied implementation to get vectors, and return the texts and vectors.
Then the vectors and chunks are handled by the destination (or the next transformer).
Related issues
#576
#1615
The text was updated successfully, but these errors were encountered: