Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: Cache GraphQL attributes #867

Conversation

elenatanasoiu
Copy link
Contributor

@elenatanasoiu elenatanasoiu commented Feb 15, 2024

Context

Backporting this to graphql-1.12.24.

Closes: #865

  • graphql.field.parent is now reporting the interface instead of the concrete class for execute_field. We've made this choice consciously in order to benefit from the very serious performance gains vs 100% accuracy. For graphql v2, this goes back to reporting the concrete class so we always have the choice of upgrading.
  • If the graphql.operation.name is nil, we are now setting the value to anonymous. This allows us to detect unowned queries.

Results

This will result in:

  • a 97% reduction in total allocated memory for this gem (130896 -> 3616)
  • a 96% reduction in total objects allocated for this gem (786 -> 32)
  • a 6% reduction overall in bytes used
- Total allocated: 2084536 bytes (18106 objects)
+ Total allocated: 1957424 bytes (17353 objects)
  Total retained:  0 bytes (0 objects)

allocated memory by gem
-----------------------------------
   1033984  opentelemetry-sdk-1.4.0
-   684008  graphql-1.12.24
+   684176  graphql-1.12.24
    194472  opentelemetry-api-1.2.4
-   130896  graphql/lib
     41176  other
+     3616  graphql/lib  ❗️

allocated objects by gem
-----------------------------------
      8624  opentelemetry-sdk-1.4.0
      5820  graphql-1.12.24
      2353  opentelemetry-api-1.2.4
-      786  graphql/lib
       523  other
+       32  graphql/lib ❗️


allocated objects by class
-----------------------------------
-      7465  Hash
+      6704  Hash
      2348  Array
      1594  String
-      1177  Proc
+      1185  Proc
otel_bench.rb
require "bundler/inline"

gemfile(false) do
  source "https://rubygems.org"
  gem "graphql", "1.12.24"
  gem "opentelemetry-sdk"
  gem "opentelemetry-api"
  gem "opentelemetry-instrumentation-graphql", path: "~/projects/opentelemetry-ruby-contrib/instrumentation/graphql"
  gem "benchmark-ips"
  gem "memory_profiler"
  gem "stackprof"
end

class MySchema < GraphQL::Schema
  DATA = {
    "5" => {
      name: "Beatrix Potter",
    },
    "6" => {
      name: "J.R.R. Tolkien"
    },
    "10" => {
      title: "The Tale of Squirrel Nutkin",
      author_id: "5",
      recommended_book_ids: ["11", "12"],
    },
    "11" => {
      title: "The Tale of Ginger and Pickles",
      author_id: "5",
      recommended_book_ids: ["10", "12"],
    },
    "12" => {
      title: "The Tale of Mr. Tod",
      author_id: "5",
      recommended_book_ids: ["11", "10"],
    },
    "21" => {
      title: "The Hobbit",
      author_id: "6",
      recommended_book_ids: ["22", "23", "24", "25", "26", "27"],
    },
    "22" => {
      title: "The Fellowship of the Ring",
      author_id: "6",
      recommended_book_ids: ["21", "23", "24", "25", "26", "27"],
    },
    "23" => {
      title: "The Two Towers",
      author_id: "6",
      recommended_book_ids: ["22", "21", "24", "25", "26", "27"],
    },
    "24" => {
      title: "The Return of the King",
      author_id: "6",
      recommended_book_ids: ["22", "23", "21", "25", "26", "27"],
    },
    "25" => {
      title: "The Silmarillion",
      author_id: "6",
      recommended_book_ids: ["22", "23", "24", "21", "26", "27"],
    },
    "26" => {
      title: "The Adventures of Tom Bombadil",
      author_id: "6",
      recommended_book_ids: ["21", "22", "23", "24", "25", "27"]
    },
    "27" => {
      title: "Leaf by Niggle",
      author_id: "6",
      recommended_book_ids: ["21", "22", "23", "24", "25", "26"]
    }
  }

  module Node
    include GraphQL::Schema::Interface
    field :id, ID, null: true, description: "The id."
  end

  class Author < GraphQL::Schema::Object
    def self.authorized?(obj, ctx)
      -> { true }
    end
    implements Node
    field :name, String, null: true
    field :books, ["MySchema::Book"], null: true

    def books
      author_id = DATA.find { |(id, auth)| auth == object }.first
      DATA.each_value.select { |d| d[:author_id] == author_id }
    end
  end

  class Book < GraphQL::Schema::Object
    implements Node
    field :title, String, null: true
    field :author, Author, null: true
    field :recommended_books, [Book], null: true

    def author
      DATA[object[:author_id]]
    end

    def recommended_books
      object[:recommended_book_ids].map { |id| -> { DATA[id] } }
    end
  end

  class Query < GraphQL::Schema::Object
    field :node, Node, null: true do
      argument :id, ID, required: false
    end

    def node(id:)
      DATA[id]
    end
  end

  query(Query)

  orphan_types(Book, Author)

  def self.resolve_type(type, obj, ctx)
    if obj.key?(:name)
      Author
    elsif obj.key?(:title)
      Book
    else
      raise "Unknown object: #{obj.inspect}"
    end
  end

  lazy_resolve(Proc, :call)
end

OpenTelemetry::SDK.configure do |c|
  c.use 'OpenTelemetry::Instrumentation::GraphQL', {
    schemas: [MySchema],
    # enable_platform_field maps to the execute_field and execute_field_lazy keys
    enable_platform_field: true,
    # enable_platform_authorized maps to the authorized and authorized_lazy keys
    enable_platform_authorized: true,
    # enable_platform_resolve_type maps to the resolve_type and resolve_type_lazy keys
    enable_platform_resolve_type: true,
    legacy_tracing:true,
  }
end


# puts MySchema.to_definition

small_query_str = <<-GRAPHQL
query {
  node(id: "10") {
    __typename
    ... on Book {
      title
      author {
        __typename
        name
      }
      recommendedBooks {
        __typename
        title
      }
    }
  }
}
GRAPHQL

large_query_str = <<-GRAPHQL
{
  node(id: "6") {
    ... on Author {
      name
      books {
        title
        recommendedBooks {
          title
          author { name }
          recommendedBooks { title }
        }
      }
    }
  }
}
GRAPHQL

# pp MySchema.execute(small_query_str).to_h
# pp MySchema.execute(large_query_str).to_h

Benchmark.ips do |x|
  x.report("small query") { MySchema.execute(small_query_str) }
  x.report("large query") { MySchema.execute(large_query_str) }
end

query_str = large_query_str

puts "\n\n\n"

result = StackProf.run(mode: :wall, interval: 1) do
  MySchema.execute(query_str)
end
report = StackProf::Report.new(result)
File.open("tmp/otel_query2.dump", "wb+") { |f| report.print_dump(f) }
report.print_text

report = MemoryProfiler.report do
  MySchema.execute(query_str)
end
puts "\n\n\n"
report.pretty_print

I've had to add legacy_tracing:true before comparing the two benchmarks.

I ran this on the main branch and then on my branch.

For the main branch, I got a lot of lines of error related to this issue I filed earlier in the week:

invalid span attribute value type NilClass for key 'graphql.field.parent' on span 'graphql.execute_field'
invalid span attribute value type NilClass for key 'graphql.field.parent' on span 'graphql.execute_field'
invalid span attribute value type NilClass for key 'graphql.field.parent' on span 'graphql.execute_field'
invalid span attribute value type NilClass for key 'graphql.field.parent' on span 'graphql.execute_field'
....

The fix was actually fairly straightforward:

attributes['graphql.field.parent'] = data[:owner]&.graphql_name # owner is the concrete type, not interface

becomes

attributes['graphql.field.parent'] = data[:owner]&.graphql_name || "" # owner is the concrete type, not interface

Copy link

linux-foundation-easycla bot commented Feb 15, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

@elenatanasoiu elenatanasoiu force-pushed the elena/reduce-objects-for-fun-and-profit branch from cfc84f7 to 485f4b6 Compare February 15, 2024 03:12
@arielvalentin arielvalentin marked this pull request as ready for review February 15, 2024 15:55
@elenatanasoiu elenatanasoiu changed the title Cache GraphQL attributes perf: Cache GraphQL attributes Feb 15, 2024
@elenatanasoiu
Copy link
Contributor Author

👋 Hi @rmosolgo, I would really appreciate your opinion here since this is backporting the work you did previously to cache attribute hashes in v2. Thanks! 🙇‍♀️

@arielvalentin
Copy link
Collaborator

arielvalentin commented Feb 15, 2024

@rmosolgo we sure could use your help reviewing this PR.

We are not sure how to trigger an execute_field_lazy and it seems we do not have any test coverage.

We also want to make sure we are not doing anything that could get us in trouble with attribute caching based on the Derived Type names as opposed to using the field objects themselves. We just could not figure out how to make it work without making special keys bound to the Owner type as opposed to the Field type in the legacy tracer.

Your guidance would be greatly appreciated!!!

@elenatanasoiu
Copy link
Contributor Author

elenatanasoiu commented Feb 15, 2024

After adding ff6aa06 we can see the performance improvement is much smaller.

allocated objects by gem
-----------------------------------
      5820  graphql-1.12.24
      5488  opentelemetry-sdk-1.4.0
      2353  opentelemetry-api-1.2.4
-      786  graphql/lib
+      433  graphql/lib (it was 32 before we introduced the new commit)
       523  other

allocated memory by gem
-----------------------------------
    707840  opentelemetry-sdk-1.4.0
    684008  graphql-1.12.24
    194472  opentelemetry-api-1.2.4
-   130896  graphql/lib
+    38928  graphql/lib (it was 3616 before we introduced the new commit)
     41176  other

We added the commit to satisfy this first test which expects the graphql.parent.name to be the concrete class instead of the interface.

For graphql < 2.0.6, field.owner.graphql_name will return the Vehicle interface.
For graphql >= 2.0.6, field.owner.graphql_name will return the Car concrete class.

We went with getting the parent name off of data[:owner].graphql_name instead, like in the old version. But this breaks the caching by quite a bit since we're discarding the cache as soon as we get to the concrete class.

@arielvalentin
Copy link
Collaborator

@elenatanasoiu based on the results of the pairing session today I am seeing the value of your perspective, i.e. performance is more important than precision of graphql.field.name attribute.

Users who are on older versions of the GraphQL gem and tracer should not expect to have the same precision as a newer version. Not being able to see the derived types may incentivize folks to upgrade the gem or move to the most recent version of the tracer.

It also occurs to me that these attributes are disabled by default due to performance overhead so I imagine most users won't enable it by default as we have not ever received any bug reports related to this problem.

All that being said I am interested in hearing @rmosolgo perspective on the topic as well as the rest of the @open-telemetry/ruby-contrib-approvers

@rmosolgo
Copy link
Contributor

👋 This change looks good to me! It sounds like you discovered the difference in owner between the old tracing code and the new code, is that right? IIRC, the old code would pass the interface type as the owner, but the new code passes the object type.

As far as execute_field_lazy goes, you have to use lazy_resolve for it to be engaged: https://graphql-ruby.org/schema/lazy_execution.html

Here's a little example to demonstrate:

Trace#execute_field_lazy example

require "bundler/inline"

gemfile do
  gem "graphql", "2.2.8"
end

class MySchema < GraphQL::Schema
  class Box # IRL this would probably be a Promise, Concurrent::Future, etc
    def initialize(&get_value)
      @get_value = get_value
    end

    def value
      @get_value.call
    end
  end

  class Query < GraphQL::Schema::Object
    field :int, Integer

    def int
      puts "calling #int"
      Box.new {
        puts "resolving Box"
        5
      }
    end
  end

  module Trace
    def execute_field(**)
      puts "Trace#execute_field"
      super
    end

    def execute_field_lazy(**)
      puts "Trace#execute_field_lazy"
      super
    end
  end

  lazy_resolve Box, :value
  query(Query)
  trace_with(Trace)
end

pp MySchema.execute("{ int }").to_h
# Trace#execute_field
# calling #int
# Trace#execute_field_lazy
# resolving Box
# {"data"=>{"int"=>5}}

@arielvalentin
Copy link
Collaborator

Yes that's right. That is making it difficult for us to cache fields.

Do you have any recommendations on changes we can make to be able to build up a performant cache using the owner name?

@rmosolgo
Copy link
Contributor

In my experience, it ends up reporting different names for fields, but I think there's a case to be made either way:

  • if owner is the concrete type, then it reports what the client actually queried
  • but if owner is the abstract type where the field was actually defined, it reports the code that actually served the client's request

Since adding this new tracing code and migrating GraphQL-Ruby's other tracers to use it, I haven't had any issues raised because of the change. The only two suggestions I have are:

  • Use field.owner because it gives better performance and it's forward-compatible with the new GraphQL-Ruby tracing system. (The new system doesn't pass a separate owner: ....) Update the tests to validate this behavior for GraphQL-Ruby versions that do it (2.0.6+)
  • Or, keep using data[:owner], accept the reduced performance improvement (still a big improvement!), retain compatibility with previous behavior.

I don't know of any options besides those 🙈 !

@arielvalentin
Copy link
Collaborator

Thank you sir. @elenatanasoiu lets go the route of better perf.

@elenatanasoiu elenatanasoiu force-pushed the elena/reduce-objects-for-fun-and-profit branch from 7503899 to 0aae8ce Compare February 16, 2024 11:36
We're reverting open-telemetry@ff6aa06 so we can maximize caching benefits.
@elenatanasoiu elenatanasoiu force-pushed the elena/reduce-objects-for-fun-and-profit branch from 0aae8ce to c01c7e5 Compare February 16, 2024 11:37
@elenatanasoiu
Copy link
Contributor Author

elenatanasoiu commented Feb 16, 2024

@arielvalentin Good stuff! 👍

@rmosolgo Thank you for the careful considerations! 🙇‍♀️

I've pushed the changes up based on your recommendations: choosing performance. I've very much looking forward to trying this out once it's released.

I've also attempted to add another test for execute_field_lazy but I'm getting an error when trying to add the custom Tracer module that @rmosolgo suggested. It doesn't recognize the trace_with method.

I think we can add these tests separately from this PR which is now ready for re-review!

I've left some notes below.

Tracing execute_field_lazy
/Users/elenatanasoiu/projects/opentelemetry-ruby-contrib/instrumentation/graphql/test/test_helper.rb:152:in `<class:BoxGraphQLAppSchema>': undefined method `trace_with' for BoxGraphQLAppSchema:Class (NoMethodError)

  trace_with Trace
  ^^^^^^^^^^
Did you mean?  tracer
	from /Users/elenatanasoiu/projects/opentelemetry-ruby-contrib/instrumentation/graphql/test/test_helper.rb:137:in `<top (required)>'
	from /Users/elenatanasoiu/projects/opentelemetry-ruby-contrib/instrumentation/graphql/test/instrumentation/graphql/instrumentation_test.rb:7:in `require'
	from /Users/elenatanasoiu/projects/opentelemetry-ruby-contrib/instrumentation/graphql/test/instrumentation/graphql/instrumentation_test.rb:7:in `<top (required)>'
	from /Users/elenatanasoiu/.gem/ruby/3.1.0/gems/rake-13.1.0/lib/rake/rake_test_loader.rb:21:in `require'
	from /Users/elenatanasoiu/.gem/ruby/3.1.0/gems/rake-13.1.0/lib/rake/rake_test_loader.rb:21:in `block in <main>'
	from /Users/elenatanasoiu/.gem/ruby/3.1.0/gems/rake-13.1.0/lib/rake/rake_test_loader.rb:6:in `select'
	from /Users/elenatanasoiu/.gem/ruby/3.1.0/gems/rake-13.1.0/lib/rake/rake_test_loader.rb:6:in `<main>'
rake aborted!
# test_helper.rb

class Box # IRL this would probably be a Promise, Concurrent::Future, etc
  def initialize(&get_value)
    @get_value = get_value
  end

  def value
    @get_value.call
  end
end

class BoxQuery < GraphQL::Schema::Object
  field :int, Integer

  def int
    puts "calling #int"
    Box.new {
      puts "resolving Box"
      5
    }
  end
end

class BoxGraphQLAppSchema < GraphQL::Schema
  module Trace
    def execute_field(**)
      puts "Trace#execute_field"
      super
    end

    def execute_field_lazy(**)
      puts "Trace#execute_field_lazy"
      super
    end
  end

  lazy_resolve Box, :value
  query ::BoxQuery
  trace_with Trace
end
# graphql_tracer_test.rb

it 'includes attributes using platform types - lazy' do
  skip if uses_platform_interfaces?
  expected_attributes = {
    'graphql.field.parent' => 'Box', # type name, not interface
    'graphql.field.name' => 'int',
    'graphql.lazy' => false
  }
  
  BoxGraphQLAppSchema.execute('{ int }')

  require 'pry'; binding.pry
  
  span = spans.find { |s| s.name == 'graphql.execute_field_lazy' && s.attributes['graphql.field.name'] == 'int' }
  _(span).wont_be_nil
  _(span.attributes.to_h).must_equal(expected_attributes)
end

If we remove the tracer module and just run the test, then this line will not find any execute_field_lazy spans:

span = spans.find { |s| s.name == 'graphql.execute_field_lazy' && s.attributes['graphql.field.name'] == 'int' }

The spans look like this:

spans.map{|s| [s.name, s[:attributes]]}
=> [["graphql.lex", {}],
 ["graphql.parse", {}],
 ["graphql.validate", {}],
 ["graphql.analyze_query", {}],
 ["graphql.analyze_multiplex", {}],
 ["graphql.execute_query", {"graphql.document"=>"{ int }", "graphql.operation.type"=>"query", "graphql.operation.name"=>"anonymous"}],
 ["graphql.execute_query_lazy", {}],
 ["graphql.execute_multiplex", {}]]

@elenatanasoiu
Copy link
Contributor Author

elenatanasoiu commented Feb 16, 2024

Final measurements - identical to the original reported in the PR description:

Warming up --------------------------------------
-         small query   125.000 i/100ms
+         small query   119.000 i/100ms
         large query     7.000 i/100ms

- Total allocated: 1758392 bytes (14970 objects)
+ Total allocated: 1631280 bytes (14217 objects) 
Total retained:  0 bytes (0 objects)

allocated memory by gem
-----------------------------------
     707840 opentelemetry-sdk-1.4.0
-    684008 graphql-1.12.24
+    684176 graphql-1.12.24
     194472 opentelemetry-api-1.2.4
-    130896 graphql/lib
      41176 other
+     3616  graphql/lib

allocated objects by gem
-----------------------------------
+      5821  graphql-1.12.24
-      5820  graphql-1.12.24
      5488  opentelemetry-sdk-1.4.0
      2353  opentelemetry-api-1.2.4
-      786  graphql/lib
       523  other
+       32  graphql/lib

@rmosolgo
Copy link
Contributor

Derp, sorry -- I forgot this was for GraphQL-Ruby 1.12.x. Here's a new script illustrating how that event might be called:

Getting the execute_field_lazy event with a .trace method

require "bundler/inline"

gemfile do
  gem "graphql", "~>1.12.0"
end

class MySchema < GraphQL::Schema
  class Box # IRL this would probably be a Promise, Concurrent::Future, etc
    def initialize(&get_value)
      @get_value = get_value
    end

    def value
      @get_value.call
    end
  end

  class Query < GraphQL::Schema::Object
    field :int, Integer, null: false

    def int
      puts "calling #int"
      Box.new {
        puts "resolving Box"
        5
      }
    end
  end

  module Tracer
    def self.trace(event, data)
      case event
      when "execute_field"
        puts "Tracer -> execute_field"
      when "execute_field_lazy"
        puts "Tracer -> execute_field_lazy"
      end

      yield
    end
  end

  lazy_resolve Box, :value
  query(Query)
  tracer(Tracer)
end

pp MySchema.execute("{ int }").to_h
# Tracer -> execute_field
# calling #int
# Tracer -> execute_field_lazy
# resolving Box
# {"data"=>{"int"=>5}}

@arielvalentin arielvalentin changed the title perf: Cache GraphQL attributes feat!: Cache GraphQL attributes Feb 16, 2024
@elenatanasoiu
Copy link
Contributor Author

elenatanasoiu commented Feb 16, 2024

Thanks @rmosolgo!

Looks like we need to search through events rather than spans as the span list doesn't contain execute_field_lazy:

graphql_tracer_test.rb
it 'includes attributes using platform types - lazy' do
  skip if uses_platform_interfaces?
  expected_attributes = {
    'graphql.field.parent' => 'Box', # type name, not interface
    'graphql.field.name' => 'int',
    'graphql.lazy' => false
  }

  BoxGraphQLAppSchema.execute('{ int }')

  require 'pry'; binding.pry

  span = spans.find { |s| s.name == 'graphql.execute_field_lazy' && s.attributes['graphql.field.name'] == 'int' }
  _(span).wont_be_nil
  _(span.attributes.to_h).must_equal(expected_attributes)
end

Screenshot 2024-02-16 at 15 33 35

But the tracer can see it:

Screenshot 2024-02-16 at 15 33 30

@elenatanasoiu
Copy link
Contributor Author

elenatanasoiu commented Feb 16, 2024

I've pushed a final commit to appease Rubocop, and will do the increased test coverage as a separate PR.

attrs['graphql.operation.name'] = data[:query].selected_operation_name || 'anonymous'
attrs.freeze
else
{}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be a frozen hash and a constant. That will mean only allocating one hash for all "misses"

@elenatanasoiu elenatanasoiu force-pushed the elena/reduce-objects-for-fun-and-profit branch from a64f78f to 107bb1b Compare February 16, 2024 16:48
@arielvalentin arielvalentin merged commit c497728 into open-telemetry:main Feb 16, 2024
52 checks passed
@elenatanasoiu elenatanasoiu deleted the elena/reduce-objects-for-fun-and-profit branch February 16, 2024 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow graphql.field.parent and code.namespace to be nil
3 participants