Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement batch processing #143

Open
dylanpieper opened this issue Nov 1, 2024 · 8 comments
Open

Implement batch processing #143

dylanpieper opened this issue Nov 1, 2024 · 8 comments

Comments

@dylanpieper
Copy link

dylanpieper commented Nov 1, 2024

Implement batch processing into ellmer to process lists of inputs through a single prompt. Batch processing enables running multiple inputs to get multiple responses while maintaining safety and ability to leverage ellmer's other features.

Basic Features (likely in scope):

  • Sequential LLM API calls or chat responses
  • Prompt caching
  • Integration with ellmer's core functionality
    • Tooling and structured data extraction
    • Error and retry handling
    • Token and rate limit handling

Ambitious Features / Extensions (likely out of scope):

  • Asynchronous batching (polling process)
  • Parallel processing for multi-model/provider requests
  • Similarity scoring for multi-model/provider requests
@hadley
Copy link
Member

hadley commented Jan 23, 2025

Do you think this code could live in ellmer itself? Do you have any sense of what the UI might look like?

@hadley hadley mentioned this issue Jan 23, 2025
@hadley
Copy link
Member

hadley commented Jan 24, 2025

I was thinking that in a batch scenario you still want to be able to do all the same things that you do in a regular chat, but you just want to do many of them at once. So maybe an interface like this would work:

library(ellmer)

chat <- chat_openai("You're reply succintly")
chat$register_tool(tool(function() Sys.Date(), "Return the current date"))
chat$chat_batch(list(
  "What's the date today?",
  "What's the date tomorrow?",
  "What's the date yesterday?",
  "What's the date next week?",
  "What's the date last week?",
))

I think the main challenge is what would chat_btach() return? I think it would make sense to return a list of text answers, but then how do you access the chat if there's a problem? Maybe we just have to accept that there's no easy to way get them, and if you want to debug you'd need to do it with individual requests.

Other thoughts:

  • Would also need a batched version of structured data extraction.
  • Will need error handling like req_perform_parallel(): https://httr2.r-lib.org/reference/req_perform_parallel.html#arg-on-error.
  • Unlike req_perform_parallel() probably needs an explicit argument for parallelism.
  • Like to require work to req_perform_parallel() in order to support req_retry() (and possibly uploading OAuth tokens).
  • Will want to implement after Think about prompt caching #107, since you'd want to cache the base chat.
  • This would probably not support asynchronous batch requests (e.g. https://www.anthropic.com/news/message-batches-api), since that needs a separate polling process. Although we could have a async = TRUE argument that then returns a batch object with a polling method? There's some argument that it would make sense for chat_batch() to always return such an object since that could provide better interrupt support, and would provide a way to access individual chat objects for debugging. But that would require rearchitecting req_perform_parallel() to run in the background.

@dylanpieper dylanpieper changed the title Proposal for helmer: Build LLM Chat Pipelines from a Data Frame or Tibble Build LLM Chat Pipelines (Batch Processing) Jan 24, 2025
@dylanpieper
Copy link
Author

dylanpieper commented Jan 24, 2025

After revisiting this, I do think my original ideas were ambitious, and I would like to see this code in the ellmer package if possible. I revised my original post to simplify the concept.

I think your interface idea is a good start, and as for the returned object, would it be possible to return a list not only of the text answers, but a nested list with each chat object for diagnostics? Although that might be a beefy object if you have big batches (maybe only return the chat object if there was an issue?).

@dylanpieper
Copy link
Author

@t-emery - Adding Teal because I know he has experience using ellmer for batch processing.

@t-emery
Copy link

t-emery commented Jan 24, 2025

FWIW, my use-case was batched structured data extraction.

I was reading text data from a tibble and doing structured data extraction (x 18,000). In my initial attempts, I found that I was using far more tokens than made sense given the inputs. Eventually I found the workable answer was to clear the turns after each time. Then everything worked as expected.

I'm new to using LLM APIs, so I have great humility that I might be missing something simple. I haven't had time to think about this deeply yet, and I haven't figured out what part of this is a documentation issue (creating documentation about best practices for running a lot of data) versus a feature issue.

Here are the relevant functions. The key fix was:

# Clear all prior turns so we start fresh for *each* record chat$set_turns(list())

# 1. Core Classification Function ----
classify_single_project <- function(text, chat, type_spec) {
  result <- purrr::safely(
    ~ chat$extract_data(text, type = type_spec),
    otherwise = NULL,
    quiet = TRUE
  )()
  
  tibble::tibble(
    success = is.null(result$error),
    error_message = if(is.null(result$error)) NA_character_ else as.character(result$error),
    classification = list(result$result)
  )
}

# 2. Process a chunk of data ----
# 3. Modify process_chunk to handle provider types
process_chunk <- function(chunk, chat, type_spec, provider_type = NULL) {
  # Use the provider_type parameter to determine which classification function to use
  classify_fn <- if(provider_type == "deepseek") {
    classify_single_project_deepseek
  } else {
    classify_single_project
  }
  
  chunk |>
    mutate(
      classification_result = map(
        combined_text,
        ~{
          # Clear all prior turns so we start fresh for *each* record
          chat$set_turns(list())
          
          classify_fn(.x, chat, type_spec)
        }
      )
    ) |>
    unnest(classification_result) |>
    mutate(
      primary_class = map_chr(classification, ~.x$classification$primary %||% NA_character_),
      confidence = map_chr(classification, ~.x$classification$confidence %||% NA_character_),
      project_type = map_chr(classification, ~.x$classification$project_type %||% NA_character_),
      justification = map_chr(classification, ~.x$justification %||% NA_character_),
      evidence = map_chr(classification, ~.x$evidence %||% NA_character_)
    ) |>
    select(-classification)
}

@hadley hadley changed the title Build LLM Chat Pipelines (Batch Processing) Implement batch processing Jan 24, 2025
@hadley
Copy link
Member

hadley commented Jan 27, 2025

@t-emery a slightly better way to ensure that the chats are standalone is to call chat$clone().

@dylanpieper
Copy link
Author

dylanpieper commented Jan 27, 2025

Beyond retries, it would be great to handle interrupted batches and resume when you re-call the function. For example:

library(ellmer)

chat <- chat_openai("You reply succinctly")

chat$register_tool(tool(
  function() Sys.Date(), 
  "Return the current date"
))

prompts <- list(
  "What's the date today?",
  "What's the date tomorrow?",
  "What's the date yesterday?",
  "What's the date next week?",
  "What's the date last week?"
)

#  Initial processing (interrupted and returns a partial object)
responses <- chat$chat_batch(prompts)

#  Resume processing
responses <- chat$chat_batch(prompts, responses)

Where chat_batch() may look something like:

chat_batch <- function(prompts, last_chat = NULL) {
  if (!is.null(last_chat) && !last_chat$complete) {
    chat$process_batch(prompts, start_at = last_chat$last_index + 1)
  } else {
    chat$process_batch(prompts)
  }
}

@hadley
Copy link
Member

hadley commented Jan 27, 2025

@dylanpieper I think it's a bit better to return a richer object that lets you resume. Something like this maybe:

batched_chat <- chat$chat_batch(prompts)
batched_chat$process()
# ctrl + c
batched_chat$process() # resumes where the work left off

That object would also be the way you get either rich chat objects or simple text responses:

batched_chat$texts()
batched_chat$chats()

For the batch (not parallel) case, where it might take up to 24 hours, the function would also need to serialise something to disk, so if your R process completely dies, you can resume it in a fresh session:

batched_chat <- resume(some_path)

(All of these names are up for discussion, I just brain-dumped quickly.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants