Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enumerable method to find the first truthy block result and return that result #14879

Closed
jgaskins opened this issue Aug 7, 2024 · 11 comments · Fixed by #14893
Closed

Enumerable method to find the first truthy block result and return that result #14879

jgaskins opened this issue Aug 7, 2024 · 11 comments · Fixed by #14893

Comments

@jgaskins
Copy link
Contributor

jgaskins commented Aug 7, 2024

Feature Request

Is your feature request related to a problem? Please describe clearly and concisely what is it.

I have a frequent need for something that is similar to Enumerable#find, but rather than returning the element for which the block is truthy, it returns that truthy block result itself.

This comes up when I have an array of objects that may or may not have a property set. The most recent need for this came up when implementing tool use for LLM APIs. The API response has an array of choices, each with a message, and those messages may or may not have a tool_calls array.

record Response, choices : Array(Choice)
record Choice, message : Message
record Message, tool_calls : Array(ToolCall)?
record ToolCall

response = Response.new([Choice.new(Message.new([ToolCall.new]))])

Describe the feature you would like, optionally illustrated by examples, and how it will solve the above problem.

What I've been using in my code to make this more expressive is a monkeypatch on Enumerable called find_first:

if tool_calls = response.choices.find_first &.message.tool_calls
  # find the tools for these calls, execute them, and send the results back to the API
end

Describe considered alternative solutions, and the reasons why you have not proposed them as a solution here.

Currently, as far as I can tell, at least, the only ways to do this involve either executing that same criteria again:

if choice_with_tool_call = response.choices.find &.message.tool_calls
  # the method chain for choice.message.tool_calls has to be repeated
  tool_calls = choice_with_tool_call
    .message
    .tool_calls
end

if tool_calls
  # find the tools for these calls, execute them, and send the results back to the API
end

Or this format with the block expanded

# or this
response.choices.each do |choice|
  if tool_calls = choice.message.tool_calls
    # find the tools for these calls, execute them, and send the results back to the API
    break
  end
end

These are both a lot more verbose, especially when there are multiple layers of these arrays. Having a method that returns the truthy value lets us use the shorthand block syntax and have a single expression that does all of these things smoothly.

Does it break backward compatibility, if yes then what's the migration path?

Nope. Purely additive.

@jgaskins
Copy link
Contributor Author

jgaskins commented Aug 7, 2024

To be clear, I'm not attached to the find_first name. It's the name I've been using only because I haven't come up with anything better.

@Blacksmoke16
Copy link
Member

response = Response.new([Choice.new(Message.new(nil)), Choice.new(Message.new([ToolCall.new, ToolCall.new]))])

if tool_calls = response.choices.each { |c| tc = c.message.tool_calls; break tc if tc }
  pp tool_calls # => [ToolCall(), ToolCall()]
end

This is best I could come up with without introducing something new. As reference only.

@yxhuvud
Copy link
Contributor

yxhuvud commented Aug 7, 2024

It is still slightly clunky, but at least it doesn't involve break:

[1, 2, 3].each.map { |z| evaluate(z) }.find &.itself

If it is fine to find the first not-nil-value, then
[1, 2, 3].each.compact_map { |z| evaluate(z) }.first
is even better

@HertzDevil
Copy link
Contributor

Ruby doesn't have it, but in Elixir it is called Enum.find_value/3

@straight-shoota
Copy link
Member

straight-shoota commented Aug 13, 2024

Sounds good to me. I'd prefer the name find_value over find_first.

@jgaskins
Copy link
Contributor Author

In this comment, @Sija brought up an interesting point I hadn't considered. I normally use this for nil checks (and even then only use shorthand blocks), but are there any use cases for false triggering an early return?

Part of me wants to push back against it because I don't know of any other Enumerable methods where that is the case, but also I don't have any stake at all in how false is treated, so I thought I'd bring it here for discussion.

@Sija
Copy link
Contributor

Sija commented Aug 13, 2024

Elixir's find_value skips over both falsy values: nil and false, just FTR.

@yxhuvud
Copy link
Contributor

yxhuvud commented Aug 14, 2024

I would argue this method should have the same semantics as find wrt nil vs falsey.

@oprypin
Copy link
Member

oprypin commented Nov 2, 2024

I came here to share this observation, but @yxhuvud already did:

Yes, this is the same as:

[1, 2, 3].each.map { |x| evaluate(x) }.find &.itself

But the issue is that map materializes the whole array! If it didn't, then we could push back against adding this method.

And it is made even more clunky by the fact that .each. is necessary there. If you forget to add it then map would materialize the whole array.

@oprypin
Copy link
Member

oprypin commented Nov 2, 2024

Also fun fact:
These methods map and to_a are exact aliases of each other

(which makes me wish that map was lazy even more)

@jgaskins
Copy link
Contributor Author

jgaskins commented Nov 2, 2024

Not only is the method chain clunkier, but since it also involves a heap allocation (Iterators have state), it's also slower:

With tool calls
                find_value   2.09M (477.64ns) (± 1.45%)    0.0B/op        fastest
each.compact_map(&).first? 134.17k (  7.45µs) (± 0.85%)  31.2kB/op  15.60× slower
each.map(&).find(&.itself) 128.57k (  7.78µs) (± 1.45%)  31.2kB/op  16.28× slower

Without tool calls
                find_value   1.57M (638.94ns) (± 5.47%)    0.0B/op        fastest
each.compact_map(&).first? 132.08k (  7.57µs) (± 1.62%)  31.2kB/op  11.85× slower
each.map(&).find(&.itself) 111.00k (  9.01µs) (± 0.92%)  31.2kB/op  14.10× slower
Benchmark code
require "benchmark"

response_with_tool_calls = Response.new([Choice.new(Message.new([ToolCall.new]))])
response_without_tool_calls = Response.new([Choice.new(Message.new(nil))])

# Need to run the block multiple times because each iteration for find_value
# takes less time than the monotonic clock can measure
iterations = 1_000
[response_with_tool_calls, response_without_tool_calls].each do |response|
  tool_calls = response.choices.find_value &.message.tool_calls
  puts "#{tool_calls ? "With" : "Without"} tool calls"
  Benchmark.ips do |x|
    x.report "find_value" { iterations.times { tool_calls = response.choices.find_value &.message.tool_calls } }
    x.report "each.compact_map(&).first?" { iterations.times { tool_calls = response.choices.each.compact_map(&.message.tool_calls).first? } }
    x.report "each.map(&).find(&.itself)" { iterations.times { tool_calls = response.choices.each.map(&.message.tool_calls).find &.itself } }
  end
  puts

  # Side effect so LLVM won't optimize out the benchmark
  pp tool_calls unless tool_calls
end

record Response, choices : Array(Choice)
record Choice, message : Message
record Message, tool_calls : Array(ToolCall)?
record ToolCall

module Enumerable(T)
  # Same implementation as https://github.com/crystal-lang/crystal/pull/14893
  def find_value(if_none = nil, & : T ->)
    each do |i|
      if result = yield i
        return result
      end
    end
    if_none
  end
end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants