Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed-up calls to LLM by parallelization of the topic categorization #11

Open
jucor opened this issue Jan 21, 2025 · 1 comment
Open

Comments

@jucor
Copy link
Collaborator

jucor commented Jan 21, 2025

Dear Jigsaw team

As discussed by email, it would be really helpful if the library could run faster. The topic learning is very fast. The categorization could do with being faster. In our IRL conversations I remember you mentioned that you definitely had this in mind, so I’m just adding it here to follow-up.

When looking at the categorization code, I suspect you were probably thinking of parallelizing the call accross the mini-batches, i.e. this loop:

for (
let i = 0;
i < comments.length;
i += this.modelSettings.defaultModel.categorizationBatchSize
) {
const uncategorizedBatch = comments.slice(
i,
i + this.modelSettings.defaultModel.categorizationBatchSize
);
const categorizedBatch = await categorizeWithRetry(
this.modelSettings.defaultModel,
instructions,
uncategorizedBatch,
includeSubtopics,
topics,
additionalInstructions
);
categorized.push(...categorizedBatch);
}

Parallelizing this loop seems possibly the highest-level way with the less amount of work needed and the maximum return.

Of course, as you also pointed, there’s the question of whether Vertex will throttle the requests. Does Vertex offer an async caller which automatically respects its throttling limits? That would be neat :)

Thanks!

@dborkan
Copy link

dborkan commented Jan 22, 2025

@jucor thanks for sharing and pointing out the relevant code locations. We're definitely interested in speeding this up, and are looking to implement parallelization likely after our current sprint to tackle hallucinations.

There are Vertex quota limits, so we'll need to do some rate limiting ourselves or find a helper library. We did recently create a helper function resolvePromisesInParallel for summarization, this could be a good starting point for categorization

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants