-
Notifications
You must be signed in to change notification settings - Fork 761
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update LangChain Support #2187
Comments
Awesome, thank you for the extensive description! I had hoped that LangChain would be stable for a little while longer but unfortunately that does not seem to be the case. That said, if it's deprecated we indeed should be replacing this functionality. Let me address some things here before we continue in the PR:
This behavior is used throughout all LLMs integrated in BERTopic, so if we change it here it should be changed everywhere. That said, I'm actually a big fan of using tags like Other than that (and looking at the PR), I'm wondering whether the changes make the usability for most users more complex. Take a look at this piece of the documentation you shared: from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain.chains.combine_documents import create_stuff_documents_chain
chat_model = ChatOpenAI(model=..., api_key=...)
prompt = ChatPromptTemplate.from_template("What are these documents about? {documents}. Please give a single label.")
chain = RunnablePassthrough.assign(representation=create_stuff_documents_chain(chat_model, prompt, document_variable_name="documents")) That's quite a bit more involved than what it originally was: from langchain.chains.question_answering import load_qa_chain
from langchain.llms import OpenAI
chain = load_qa_chain(OpenAI(temperature=0, openai_api_key=my_openai_api_key), chain_type="stuff") Now what it originally was needs some changes on the backend (as you nicely shared in this issue), I'm wondering whether we can simplify the accessing LangChain within BERTopic a bit more to make it simpler for users. I generally prefer additional representations to have 4 lines of code or so to do a basic LLM and nothing more. |
Hi, Thanks for taking the time to reply 😊
I understand this, and I agree that it is a nice approach to format prompts when using an LLM (e.g. with OpenAI). However, in the case of LangChain, there is already a standard built-in way of formatting prompts using prompt templates. # Example: prompt with a `topic` placeholder replaced at runtime through the input of the chain
from langchain_openai import ChatOpenAI
from langchain_core.prompts import PromptTemplate
chat_model = ChatOpenAI(model=..., api_key=...)
prompt_template = PromptTemplate.from_template("Tell me a joke about {topic}")
chain = prompt_template | chat_model
chain.invoke({"topic": "cats"}) The current implementation uses a hybrid approach to formatting the prompt, using both LangChain prompt templates and string manipulation. The sequence looks like this (I'll assume that
I think these steps illustrate how the complex internal workings of that specific deprecated LangChain approach, together with the combination of LangChain prompt templates and string manipulations make things very confusing to a user wanting to dig deeper about what is feasible in BERTopic using LangChain (and doesn't make it easy to work with custom chains without reading the source code of the LangChain representation object to understand the expected input and output keys).
To your point, I can modify the approach to make it simpler in general:
from bertopic.representation import LangChain
from langchain.chains.question_answering import load_qa_chain
from langchain.llms import OpenAI
chain = load_qa_chain(OpenAI(temperature=0, openai_api_key=my_openai_api_key), chain_type="stuff")
representation_model = LangChain(chain) becomes from bertopic.representation import LangChain
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.llms import OpenAI
prompt = ChatPromptTemplate.from_template("What are these documents about? {DOCUMENTS} Here are keywords related to them {KEYWORDS}.")
chain = create_stuff_documents_chain(OpenAI(temperature=0, openai_api_key=my_openai_api_key), prompt, document_variable_name="DOCUMENTS")
representation_model = LangChain(chain) Note that we can define a prompt in the representation, like it was done before (but this time as a LangChain prompt template) and the code would become from bertopic.representation import LangChain, DEFAULT_PROMPT
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.llms import OpenAI
chain = create_stuff_documents_chain(OpenAI(temperature=0, openai_api_key=my_openai_api_key), DEFAULT_PROMPT, document_variable_name="DOCUMENTS")
representation_model = LangChain(chain) I made the necessary changes in the PR, let me know what you think! (I'll still need to tinker a bit to actually provide a good default prompt, and to make sure that this allows more fancy chains to work, but at least for the basic example it seems to work) |
Feature request
The provided examples that leverage LangChain to create a representation all make use of
langchain.chains.question_answering.load_qa_chain
and the implementation is not very transparent to the user, leading to inconsistencies and difficulties to understand how to provide custom chains.Motivation
Some of the issues in detail
langchain.chains.question_answering.load_qa_chain
is now depricated and will be removed at some point.prompt
can be specified in the constructor of theLangChain
class. However this is not a prompt but rather a custom instruction that is passed to the provided chain through thequestion
key.langchain.chains.question_answering.load_qa_chain
(which is the provided example), thisquestion
key is added as part of a larger, hard-coded (and not transparent to a casual user) prompt.langchain.chains.question_answering.load_qa_chain
chain to avoid this hard-coded prompt (this is currently not very clearly documented). In addition, if that specific chain is not used, the use of aquestion
key can be confusing."[KEYWORDS]"
inself.prompt
and then performing some string manipulation) is confusing.Example of workarounds in current implementation
With the current implementation, a user wanting to use a custom LangChain prompt in a custom LCEL chain and add keywords to that prompt would have to do something like (ignoring that documents are passed as Document objects and not formatted into a str).
Related issues:
Your contribution
I propose several changes, which I have started working on in a branch (made a PR to make the diff easy to see).
langchain.chains.question_answering.load_qa_chain
is replaced bylangchain.chains.combine_documents.stuff.create_stuff_documents_chain
as recommended in the migration guide.langchain.chains.question_answering.load_qa_chain
).LangChain
as the prompt must now be explicitly created with the chain object.documents
,keywords
, andrepresentation
(note thatlangchain.chains.combine_documents.stuff.create_stuff_documents_chain
does not have aoutput_text
output key and therepresentation
key must thus be added).keywords
key is always provided to the chain (but it's up to the user to include a placeholder for it in their prompt).Questions:
DEFAULT_PROMPT
? For examplelangchain.chains.combine_documents.stuff.create_stuff_documents_chain
which takes care of formatting the documents.The text was updated successfully, but these errors were encountered: