-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decouple prepare from chat #263
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the initial PR @nenb! I have a bunch of inline comments. One overarching concern that I have is how we deal with the **params
handling. With only Chat
in the picture this was simple. Now that Chat
and Corpus
can request params, this is harder. Both need to perform their separate validation and we need to make sure that every parameter is used either by Corpus
or Chat
.
ragna/core/_rag.py
Outdated
@@ -28,6 +27,17 @@ | |||
C = TypeVar("C", bound=Component) | |||
|
|||
|
|||
async def _run( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably go into core._utils
together with run_gen
.
ragna/core/_rag.py
Outdated
self, | ||
documents=documents, | ||
source_storage=source_storage, | ||
corpus_name=corpus_name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be just name
since rag.corpus(corpus_name=...)
is redundant.
ragna/core/_rag.py
Outdated
class SpecialChatParams(pydantic.BaseModel): | ||
user: str = pydantic.Field(default_factory=default_user) | ||
chat_id: uuid.UUID = pydantic.Field(default_factory=uuid.uuid4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should stay. Downstream components should still have access to it.
@@ -130,15 +196,13 @@ def __init__( | |||
self, | |||
rag: Rag, | |||
*, | |||
documents: Iterable[Any], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to be able to also pass in regular paths here. While Corpus
is a useful abstraction, it is unnecessary for the base case. I would move the _parse_documents
function that you moved to Corpus
back here and let it return a Corpus
in case we encounter a list of paths.
ragna/core/_rag.py
Outdated
self._messages: list[Message] = [] | ||
|
||
async def prepare(self) -> Message: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO the chat should keep the prepare
method so users that just have local documents do not ever interact with the Corpus
abstraction. Basically this would just be
async def prepare(self):
return await self.corpus.prepare()
Plus, by keeping this, we could also keep the functionality of adding the welcome message here.
ragna/core/_rag.py
Outdated
self.corpus.source_storage.retrieve, | ||
self.corpus.documents, | ||
prompt.content, | ||
corpus_name=self.corpus.corpus_name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be part of the unpacked_params
.
ragna/core/_rag.py
Outdated
for (_, name), model in component._protocol_models().items() | ||
} | ||
|
||
ChatModel = merge_models( | ||
str(self.params["chat_id"]), | ||
str(self.params["chat_name"]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can roll this back if the we keep the "chat_id"
, correct?
ragna/core/_rag.py
Outdated
async def __aenter__(self) -> Chat: | ||
await self.prepare() | ||
return self | ||
# TODO: Do we need this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See explanation above why I want to keep the prepare
and in turn this as well.
ragna/core/_rag.py
Outdated
assistant=assistant, | ||
**params, | ||
) | ||
|
||
|
||
class Corpus: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since Corpus
now effectively handles SourceStorage.store
it should probably has its own unpacked_params
.
ragna/core/_rag.py
Outdated
*, | ||
documents: Iterable[Any], | ||
source_storage: Union[Type[SourceStorage], SourceStorage], | ||
corpus_name: str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least in the "user just passes document paths", this needs to have a default.
I'm ok with this an an initial implementation when my comments are addressed and CI is green. What I dislike here is that |
WIP