Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decouple prepare from chat #263

Closed
wants to merge 3 commits into from
Closed

Conversation

nenb
Copy link
Contributor

@nenb nenb commented Jan 10, 2024

WIP

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Member

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the initial PR @nenb! I have a bunch of inline comments. One overarching concern that I have is how we deal with the **params handling. With only Chat in the picture this was simple. Now that Chat and Corpus can request params, this is harder. Both need to perform their separate validation and we need to make sure that every parameter is used either by Corpus or Chat.

@@ -28,6 +27,17 @@
C = TypeVar("C", bound=Component)


async def _run(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably go into core._utils together with run_gen.

self,
documents=documents,
source_storage=source_storage,
corpus_name=corpus_name,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be just name since rag.corpus(corpus_name=...) is redundant.

class SpecialChatParams(pydantic.BaseModel):
user: str = pydantic.Field(default_factory=default_user)
chat_id: uuid.UUID = pydantic.Field(default_factory=uuid.uuid4)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should stay. Downstream components should still have access to it.

@@ -130,15 +196,13 @@ def __init__(
self,
rag: Rag,
*,
documents: Iterable[Any],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to be able to also pass in regular paths here. While Corpus is a useful abstraction, it is unnecessary for the base case. I would move the _parse_documents function that you moved to Corpus back here and let it return a Corpus in case we encounter a list of paths.

self._messages: list[Message] = []

async def prepare(self) -> Message:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the chat should keep the prepare method so users that just have local documents do not ever interact with the Corpus abstraction. Basically this would just be

async def prepare(self):
    return await self.corpus.prepare()

Plus, by keeping this, we could also keep the functionality of adding the welcome message here.

self.corpus.source_storage.retrieve,
self.corpus.documents,
prompt.content,
corpus_name=self.corpus.corpus_name,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be part of the unpacked_params.

for (_, name), model in component._protocol_models().items()
}

ChatModel = merge_models(
str(self.params["chat_id"]),
str(self.params["chat_name"]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can roll this back if the we keep the "chat_id", correct?

async def __aenter__(self) -> Chat:
await self.prepare()
return self
# TODO: Do we need this?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See explanation above why I want to keep the prepare and in turn this as well.

assistant=assistant,
**params,
)


class Corpus:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since Corpus now effectively handles SourceStorage.store it should probably has its own unpacked_params.

*,
documents: Iterable[Any],
source_storage: Union[Type[SourceStorage], SourceStorage],
corpus_name: str,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least in the "user just passes document paths", this needs to have a default.

ragna/__init__.py Show resolved Hide resolved
ragna/core/_document.py Show resolved Hide resolved
ragna/source_storages/_chroma.py Show resolved Hide resolved
@pmeier
Copy link
Member

pmeier commented Jan 14, 2024

I'm ok with this an an initial implementation when my comments are addressed and CI is green.

What I dislike here is that Corpus is a second class citizen. It is more like a dataclass and one can only interact with it through the Chat. However, and admin in the "managed Ragna" use case which we are targeting here, only wants to interact with the Corpus and not the Chat. Meaning, they need to have an assistant present and meet its requirements before they can even prepare a Corpus. In the presence of the demo assistant that is not terribly bad, but it is not sound design. I'm wondering if the source storage should be a component of the Corpus and the assistant a component of the Chat. That way, the Corpus can stand on its own, e.g. can be prepared, without the need to ever create a Chat.

@nenb
Copy link
Contributor Author

nenb commented Jan 16, 2024

This PR has become quite messy. I'm closing it (to preserve the history, if we need it for whatever reason in the future), and opening a clean version in #269. (Trying to add #269 on top of this made the history completely unreadable.)

@nenb nenb closed this Jan 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants