Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't summarise if the doc contains more than 16k tokens #88

Open
elisalimli opened this issue Mar 8, 2024 · 1 comment
Open

Can't summarise if the doc contains more than 16k tokens #88

elisalimli opened this issue Mar 8, 2024 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@elisalimli
Copy link
Contributor

Since we use gpt-3.5-turbo-16k model for summarising. If we try to summarise a doc is relatively long. It will not be able to do it.

https://github.com/superagent-ai/super-rag/blob/main/utils/summarise.py#L33

Example file: https://github.com/datasciencedojo/datasets/blob/master/titanic.csv

@elisalimli elisalimli self-assigned this Mar 8, 2024
@homanp
Copy link
Contributor

homanp commented Mar 8, 2024

We will have to split the summaries into max 16k tokens with tiktokenizer or use a model with larger context.

@homanp homanp added the bug Something isn't working label Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants