-
Notifications
You must be signed in to change notification settings - Fork 777
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update LangChain Support #2187
Comments
Awesome, thank you for the extensive description! I had hoped that LangChain would be stable for a little while longer but unfortunately that does not seem to be the case. That said, if it's deprecated we indeed should be replacing this functionality. Let me address some things here before we continue in the PR:
This behavior is used throughout all LLMs integrated in BERTopic, so if we change it here it should be changed everywhere. That said, I'm actually a big fan of using tags like Other than that (and looking at the PR), I'm wondering whether the changes make the usability for most users more complex. Take a look at this piece of the documentation you shared: from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain.chains.combine_documents import create_stuff_documents_chain
chat_model = ChatOpenAI(model=..., api_key=...)
prompt = ChatPromptTemplate.from_template("What are these documents about? {documents}. Please give a single label.")
chain = RunnablePassthrough.assign(representation=create_stuff_documents_chain(chat_model, prompt, document_variable_name="documents")) That's quite a bit more involved than what it originally was: from langchain.chains.question_answering import load_qa_chain
from langchain.llms import OpenAI
chain = load_qa_chain(OpenAI(temperature=0, openai_api_key=my_openai_api_key), chain_type="stuff") Now what it originally was needs some changes on the backend (as you nicely shared in this issue), I'm wondering whether we can simplify the accessing LangChain within BERTopic a bit more to make it simpler for users. I generally prefer additional representations to have 4 lines of code or so to do a basic LLM and nothing more. |
Hi, Thanks for taking the time to reply 😊
I understand this, and I agree that it is a nice approach to format prompts when using an LLM (e.g. with OpenAI). However, in the case of LangChain, there is already a standard built-in way of formatting prompts using prompt templates. # Example: prompt with a `topic` placeholder replaced at runtime through the input of the chain
from langchain_openai import ChatOpenAI
from langchain_core.prompts import PromptTemplate
chat_model = ChatOpenAI(model=..., api_key=...)
prompt_template = PromptTemplate.from_template("Tell me a joke about {topic}")
chain = prompt_template | chat_model
chain.invoke({"topic": "cats"}) The current implementation uses a hybrid approach to formatting the prompt, using both LangChain prompt templates and string manipulation. The sequence looks like this (I'll assume that
I think these steps illustrate how the complex internal workings of that specific deprecated LangChain approach, together with the combination of LangChain prompt templates and string manipulations make things very confusing to a user wanting to dig deeper about what is feasible in BERTopic using LangChain (and doesn't make it easy to work with custom chains without reading the source code of the LangChain representation object to understand the expected input and output keys).
To your point, I can modify the approach to make it simpler in general:
from bertopic.representation import LangChain
from langchain.chains.question_answering import load_qa_chain
from langchain.llms import OpenAI
chain = load_qa_chain(OpenAI(temperature=0, openai_api_key=my_openai_api_key), chain_type="stuff")
representation_model = LangChain(chain) becomes from bertopic.representation import LangChain
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.llms import OpenAI
prompt = ChatPromptTemplate.from_template("What are these documents about? {DOCUMENTS} Here are keywords related to them {KEYWORDS}.")
chain = create_stuff_documents_chain(OpenAI(temperature=0, openai_api_key=my_openai_api_key), prompt, document_variable_name="DOCUMENTS")
representation_model = LangChain(chain) Note that we can define a prompt in the representation, like it was done before (but this time as a LangChain prompt template) and the code would become from bertopic.representation import LangChain, DEFAULT_PROMPT
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.llms import OpenAI
chain = create_stuff_documents_chain(OpenAI(temperature=0, openai_api_key=my_openai_api_key), DEFAULT_PROMPT, document_variable_name="DOCUMENTS")
representation_model = LangChain(chain) I made the necessary changes in the PR, let me know what you think! (I'll still need to tinker a bit to actually provide a good default prompt, and to make sure that this allows more fancy chains to work, but at least for the basic example it seems to work) |
Thanks for taking the time to so thoroughly go through this! I agree with the things that you mention, which kinda makes it difficult for BERTopic since all LLM-based representations revolve around using [DOCUMENTS] and [KEYWORDS], which I do intend to keep as that is something users are familiar with when interacting with different LLMs. That said, I'm wondering whether we can expose it a bit different, assuming we always need from bertopic.representation import LangChain
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.llms import OpenAI
prompt = ChatPromptTemplate.from_template("What are these documents about? {DOCUMENTS} Here are keywords related to them {KEYWORDS}.")
chain = create_stuff_documents_chain(OpenAI(temperature=0, openai_api_key=my_openai_api_key), prompt, document_variable_name="DOCUMENTS")
representation_model = LangChain(chain) to this: from bertopic.representation import LangChain
from langchain.llms import OpenAI
prompt = "What are these documents about? [DOCUMENTS] Here are keywords related to them [KEYWORDS]."
llm = OpenAI(temperature=0, openai_api_key=my_openai_api_key)
representation_model = LangChain(llm, prompt) where in from langchain_core.prompts import ChatPromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain
langchain_prompt = prompt.replace("[DOCUMENTS]", "{DOCUMENTS}").replace("[KEYWORDS]", "{KEYWORDS}")
langchain_prompt = ChatPromptTemplate.from_template(langchain_prompt )
chain = create_stuff_documents_chain(llm, prompt, document_variable_name="DOCUMENTS") That makes it much easier for most users. If you instead want to use a chain, you can do so with your suggested approach, thereby exposing both the "easy" solution through I think this might be the best of both worlds but would love to get your view on this. |
(disclaimer: I used ChatGPT to help generate this reply because I didn't have much time 😄)
I think I understand your point better now. You mean that other LLM representation objects share a similar interface because they take a client plus an optional prompt, and the prompt is formatted using
Could you elaborate on what you mean by "always" here? Strictly speaking, you don't have to use
I agree that this seems to be a very good approach to maintain the existing interface while addressing the deprecation issues and simplifying/clarifying the approach for most users as well as people wanting more control over the chain. Let me summarize it like this: Proposed Solution
Supporting Both
|
FYI I've updated the PR with all changes discussed above (+ documentation). I'm not too sure about the logic around the vector output with a bunch of empty labels so I kept it in the implementation (when the output is a string, and when it's a list) but if it's not correct please let me know. I'll probably add a code example of a custom chain that outputs a list if you validate that it's appropriate (I have done it in the past, but I always concatenated the labels into a single label since only a single output was supported). |
@Skar0 Awesome, thank you for taking the time! I'll move this over to the PR so we can further discuss the implementation itself. |
Feature request
The provided examples that leverage LangChain to create a representation all make use of
langchain.chains.question_answering.load_qa_chain
and the implementation is not very transparent to the user, leading to inconsistencies and difficulties to understand how to provide custom chains.Motivation
Some of the issues in detail
langchain.chains.question_answering.load_qa_chain
is now depricated and will be removed at some point.prompt
can be specified in the constructor of theLangChain
class. However this is not a prompt but rather a custom instruction that is passed to the provided chain through thequestion
key.langchain.chains.question_answering.load_qa_chain
(which is the provided example), thisquestion
key is added as part of a larger, hard-coded (and not transparent to a casual user) prompt.langchain.chains.question_answering.load_qa_chain
chain to avoid this hard-coded prompt (this is currently not very clearly documented). In addition, if that specific chain is not used, the use of aquestion
key can be confusing."[KEYWORDS]"
inself.prompt
and then performing some string manipulation) is confusing.Example of workarounds in current implementation
With the current implementation, a user wanting to use a custom LangChain prompt in a custom LCEL chain and add keywords to that prompt would have to do something like (ignoring that documents are passed as Document objects and not formatted into a str).
Related issues:
Your contribution
I propose several changes, which I have started working on in a branch (made a PR to make the diff easy to see).
langchain.chains.question_answering.load_qa_chain
is replaced bylangchain.chains.combine_documents.stuff.create_stuff_documents_chain
as recommended in the migration guide.langchain.chains.question_answering.load_qa_chain
).LangChain
as the prompt must now be explicitly created with the chain object.documents
,keywords
, andrepresentation
(note thatlangchain.chains.combine_documents.stuff.create_stuff_documents_chain
does not have aoutput_text
output key and therepresentation
key must thus be added).keywords
key is always provided to the chain (but it's up to the user to include a placeholder for it in their prompt).Questions:
DEFAULT_PROMPT
? For examplelangchain.chains.combine_documents.stuff.create_stuff_documents_chain
which takes care of formatting the documents.The text was updated successfully, but these errors were encountered: