Implement batch processing #143

dylanpieper · 2024-11-01T17:42:06Z

Implement batch processing into ellmer to process lists of inputs through a single prompt. Batch processing enables running multiple inputs to get multiple responses while maintaining safety and ability to leverage ellmer's other features.

Basic Features (likely in scope):

Sequential LLM API calls or chat responses
Prompt caching
Integration with ellmer's core functionality
- Tooling and structured data extraction
- Error and retry handling
- Token and rate limit handling

Ambitious Features / Extensions (likely out of scope):

Asynchronous batching (polling process)
Parallel processing for multi-model/provider requests
Similarity scoring for multi-model/provider requests

hadley · 2025-01-23T22:38:13Z

Do you think this code could live in ellmer itself? Do you have any sense of what the UI might look like?

hadley · 2025-01-24T13:32:47Z

I was thinking that in a batch scenario you still want to be able to do all the same things that you do in a regular chat, but you just want to do many of them at once. So maybe an interface like this would work:

library(ellmer)

chat <- chat_openai("You're reply succintly")
chat$register_tool(tool(function() Sys.Date(), "Return the current date"))
chat$chat_batch(list(
  "What's the date today?",
  "What's the date tomorrow?",
  "What's the date yesterday?",
  "What's the date next week?",
  "What's the date last week?",
))

I think the main challenge is what would chat_btach() return? I think it would make sense to return a list of text answers, but then how do you access the chat if there's a problem? Maybe we just have to accept that there's no easy to way get them, and if you want to debug you'd need to do it with individual requests.

Other thoughts:

Would also need a batched version of structured data extraction.
Will need error handling like req_perform_parallel(): https://httr2.r-lib.org/reference/req_perform_parallel.html#arg-on-error.
Unlike req_perform_parallel() probably needs an explicit argument for parallelism.
Like to require work to req_perform_parallel() in order to support req_retry() (and possibly uploading OAuth tokens).
Will want to implement after Think about prompt caching #107, since you'd want to cache the base chat.
This would probably not support asynchronous batch requests (e.g. https://www.anthropic.com/news/message-batches-api), since that needs a separate polling process. Although we could have a async = TRUE argument that then returns a batch object with a polling method? There's some argument that it would make sense for chat_batch() to always return such an object since that could provide better interrupt support, and would provide a way to access individual chat objects for debugging. But that would require rearchitecting req_perform_parallel() to run in the background.

dylanpieper · 2025-01-24T14:31:56Z

After revisiting this, I do think my original ideas were ambitious, and I would like to see this code in the ellmer package if possible. I revised my original post to simplify the concept.

I think your interface idea is a good start, and as for the returned object, would it be possible to return a list not only of the text answers, but a nested list with each chat object for diagnostics? Although that might be a beefy object if you have big batches (maybe only return the chat object if there was an issue?).

dylanpieper · 2025-01-24T14:34:36Z

@t-emery - Adding Teal because I know he has experience using ellmer for batch processing.

t-emery · 2025-01-24T19:01:11Z

FWIW, my use-case was batched structured data extraction.

I was reading text data from a tibble and doing structured data extraction (x 18,000). In my initial attempts, I found that I was using far more tokens than made sense given the inputs. Eventually I found the workable answer was to clear the turns after each time. Then everything worked as expected.

I'm new to using LLM APIs, so I have great humility that I might be missing something simple. I haven't had time to think about this deeply yet, and I haven't figured out what part of this is a documentation issue (creating documentation about best practices for running a lot of data) versus a feature issue.

Here are the relevant functions. The key fix was:

# Clear all prior turns so we start fresh for *each* record chat$set_turns(list())

# 1. Core Classification Function ----
classify_single_project <- function(text, chat, type_spec) {
  result <- purrr::safely(
    ~ chat$extract_data(text, type = type_spec),
    otherwise = NULL,
    quiet = TRUE
  )()
  
  tibble::tibble(
    success = is.null(result$error),
    error_message = if(is.null(result$error)) NA_character_ else as.character(result$error),
    classification = list(result$result)
  )
}

# 2. Process a chunk of data ----
# 3. Modify process_chunk to handle provider types
process_chunk <- function(chunk, chat, type_spec, provider_type = NULL) {
  # Use the provider_type parameter to determine which classification function to use
  classify_fn <- if(provider_type == "deepseek") {
    classify_single_project_deepseek
  } else {
    classify_single_project
  }
  
  chunk |>
    mutate(
      classification_result = map(
        combined_text,
        ~{
          # Clear all prior turns so we start fresh for *each* record
          chat$set_turns(list())
          
          classify_fn(.x, chat, type_spec)
        }
      )
    ) |>
    unnest(classification_result) |>
    mutate(
      primary_class = map_chr(classification, ~.x$classification$primary %||% NA_character_),
      confidence = map_chr(classification, ~.x$classification$confidence %||% NA_character_),
      project_type = map_chr(classification, ~.x$classification$project_type %||% NA_character_),
      justification = map_chr(classification, ~.x$justification %||% NA_character_),
      evidence = map_chr(classification, ~.x$evidence %||% NA_character_)
    ) |>
    select(-classification)
}

hadley · 2025-01-27T17:22:21Z

@t-emery a slightly better way to ensure that the chats are standalone is to call chat$clone().

dylanpieper · 2025-01-27T21:45:54Z

Beyond retries, it would be great to handle interrupted batches and resume when you re-call the function. For example:

library(ellmer)

chat <- chat_openai("You reply succinctly")

chat$register_tool(tool(
  function() Sys.Date(), 
  "Return the current date"
))

prompts <- list(
  "What's the date today?",
  "What's the date tomorrow?",
  "What's the date yesterday?",
  "What's the date next week?",
  "What's the date last week?"
)

#  Initial processing (interrupted and returns a partial object)
responses <- chat$chat_batch(prompts)

#  Resume processing
responses <- chat$chat_batch(prompts, responses)

Where chat_batch() may look something like:

chat_batch <- function(prompts, last_chat = NULL) {
  if (!is.null(last_chat) && !last_chat$complete) {
    chat$process_batch(prompts, start_at = last_chat$last_index + 1)
  } else {
    chat$process_batch(prompts)
  }
}

hadley · 2025-01-27T23:43:31Z

@dylanpieper I think it's a bit better to return a richer object that lets you resume. Something like this maybe:

batched_chat <- chat$chat_batch(prompts)
batched_chat$process()
# ctrl + c
batched_chat$process() # resumes where the work left off

That object would also be the way you get either rich chat objects or simple text responses:

batched_chat$texts()
batched_chat$chats()

For the batch (not parallel) case, where it might take up to 24 hours, the function would also need to serialise something to disk, so if your R process completely dies, you can resume it in a fresh session:

batched_chat <- resume(some_path)

(All of these names are up for discussion, I just brain-dumped quickly.)

hadley mentioned this issue Jan 23, 2025

Batch API #260

Closed

dylanpieper changed the title ~~Proposal for helmer: Build LLM Chat Pipelines from a Data Frame or Tibble~~ Build LLM Chat Pipelines (Batch Processing) Jan 24, 2025

hadley changed the title ~~Build LLM Chat Pipelines (Batch Processing)~~ Implement batch processing Jan 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement batch processing #143

Implement batch processing #143

dylanpieper commented Nov 1, 2024 •

edited

Loading

hadley commented Jan 23, 2025

hadley commented Jan 24, 2025

dylanpieper commented Jan 24, 2025 •

edited

Loading

dylanpieper commented Jan 24, 2025

t-emery commented Jan 24, 2025

hadley commented Jan 27, 2025

dylanpieper commented Jan 27, 2025 •

edited

Loading

hadley commented Jan 27, 2025

Implement batch processing #143

Implement batch processing #143

Comments

dylanpieper commented Nov 1, 2024 • edited Loading

hadley commented Jan 23, 2025

hadley commented Jan 24, 2025

dylanpieper commented Jan 24, 2025 • edited Loading

dylanpieper commented Jan 24, 2025

t-emery commented Jan 24, 2025

hadley commented Jan 27, 2025

dylanpieper commented Jan 27, 2025 • edited Loading

hadley commented Jan 27, 2025

dylanpieper commented Nov 1, 2024 •

edited

Loading

dylanpieper commented Jan 24, 2025 •

edited

Loading

dylanpieper commented Jan 27, 2025 •

edited

Loading