-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor!: Move shared sql behavior to helper gems #529
refactor!: Move shared sql behavior to helper gems #529
Conversation
I don't really care whatever is easier is fine, so happy to leave as is.
Setting aside the question of where this stuff lives, (i think Base is fine practically speaking but if ppl have opinions i probably wont die on this hill), few questions i have.
|
oh also this looks great so far! awesome stuff, ty for taking this on! |
Thanks for catching this! This was just an oversight on my part. I was so focused on the identical code in mysql2/trilogy that I didn't realize the code for PG, though structured differently, behaves the same. |
As far as configs go, I think I'd prefer to keep them defined in the instrumentation library rather than a helper, even though we'll need to make sure they're there for the methods to work. We could try to make a new class for the SQL instrumentation to inherit from and raise an error if the config, or a method that accesses a memoized config, isn't defined. If we go that approach, we might want to change the current refactor a bit. |
instrumentation/base/lib/opentelemetry/instrumentation/helpers/obfuscation_helper.rb
Outdated
Show resolved
Hide resolved
@ericmustin - This is ready for another look! I'll clean up the commits after feedback. I didn't make any changes to the configs, but I would be happy to. Maybe we can chat during the SIG tomorrow? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to extracting this code out to a common place, we've already ran into one instance where parity between trilogy and mysql2 fell out of sync because a change occurred in one place but not the other.
I'm not sure base is the correct location for it, but I'm also not convinced it is the wrong place either.
My general concern is that whenever base bumps it forces all other deps to bump with it. It is toily for releasing, but also for bumping a single instrumentation package as a consumer of these gems.
A slightly obnoxious solution might be gems for each domain? Mysql helpers, HTTP helpers, something_else helpers.
instrumentation/mysql2/lib/opentelemetry/instrumentation/mysql2/patches/client.rb
Outdated
Show resolved
Hide resolved
instrumentation/base/lib/opentelemetry/instrumentation/helpers/obfuscation_helper.rb
Outdated
Show resolved
Hide resolved
Agreed. |
multi_line_comments: %r{\/\*(?:[^\/]|\/[^*])*?(?:\*\/|\/\*.*)} | ||
}.freeze | ||
|
||
DIALECT_COMPONENTS = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the more i look at this code, the more i continue to think this represents what really ought to be a Specification effort to standardize OpenTelemetry's treatment of "sql obfuscation" as a feature. Common dialect component keys ought to be part of otel semconv, and the implementation of how to accomplish that obfuscation could vary under the hood (regex might not be the most performant approach in some languages, etc), but it would be possible to do a matrix-like comparison of language-sdk's + collector's "obfuscation abilities" on sql(or other things, mongodb redis etc), and the varying concerns wrt PII related compliance things that different end users of different language sdks should need to worry about, and possibly, later on in their telemetry pipeline, apply some additional redaction or more fine grained ingestion or indexing approach to their data.
not blocking or anything, just was thinking about this and wanted to socialize it.
...ntation/helpers-sql-obfuscation/lib/opentelemetry/instrumentation/helpers/sql_obfuscation.rb
Outdated
Show resolved
Hide resolved
}.freeze | ||
|
||
DIALECT_COMPONENTS = { | ||
fallback: COMPONENTS_REGEX_MAP.keys, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the idea that this would be something a future instrumentation could refer to as a sort've "catchall" or "default" adapter symbol without needing to update the sql_obfuscation helper gem? I'm fine with that as an idea, but I would maybe want to call it all
or default
, and then perhaps make self.obfuscate_sql
use that as it's default value unless specified. wdyt? Also fine with just dropping this if that's too convoluted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's the idea. I love the name change suggestion (fallback
is the NR name) and using this as the default value.
Do you think I should add a default value for the config hash? Then someone could just pass SQL to the API, though their mileage may vary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default adapter issue fixed in: 8096551
Lmk about the config hash. I'm leaning toward no default there so an error is raised if the configs are missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should look to avoid raising an error outside of initialization, but i think it would be good to raise an error at init time if possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of raising an error, I decided to do a bit of refactoring so that the whole config hash wasn't required: fd2fa4b
This drops the need for :db_statement
and provides a default value for :obfuscation_limit
to make sure the .obfuscate_sql
method can run with only a SQL statement.
...ntation/helpers-sql-obfuscation/lib/opentelemetry/instrumentation/helpers/sql_obfuscation.rb
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay here @kaylareopelle . I had a few nits but largely speaking this looks really close.
The outstanding issues, unless i'm missing something:
- bundler / version constraint TODOs, I don't totally grok the error you're seeing but afaik we shouldn't need to drop the
opentelemetry-instrumentation-base
version constraints on mysql2/trilogy/pg. can try to dig in more here if it ends up being the only thing blocking. - had a few suggestions to make sure we dont have a performance regression on the obfuscation hot path. It might be good to get another set of eyes here too.
instrumentation/helpers-mysql/lib/opentelemetry/instrumentation/version.rb
Outdated
Show resolved
Hide resolved
instrumentation/helpers-mysql/lib/opentelemetry/instrumentation/helpers/mysql.rb
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think everything has been updated here and my previous comments are addressed, with the exception of where to put these gems and the bundler errors. will add a point to the agenda to ensure both these are surfaced and discussed at tuesday SIG, thank u for quick turnaround @kaylareopelle apologies again on delay
assert_equal(db_operation_and_name, operation) | ||
end | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When both database_name
and operation
are nil
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Added here: 3f0bcb4
'/*application:web,controller:blob,action:show,correlation_id:01EZVMR923313VV44ZJDJ7PMEZ,' \ | ||
'endpoint_id:Projects::BlobController#show*/ SELECT "routes".* FROM "routes" WHERE "routes"' \ | ||
'."source_id" = 75 AND "routes"."source_type" = \'Namespace\' LIMIT 1' | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I played around with this a bit. While the example above works as expected, the result is incorrect if one of the query names is in the Marginalia comment. E.g.:
irb(main):001:1* QUERY_NAMES = [
irb(main):002:1* 'set names',
irb(main):003:1* 'select',
irb(main):004:1* 'insert',
irb(main):005:1* 'update',
irb(main):006:1* 'delete',
irb(main):007:1* 'begin',
irb(main):008:1* 'commit',
irb(main):009:1* 'rollback',
irb(main):010:1* 'savepoint',
irb(main):011:1* 'release savepoint',
irb(main):012:1* 'explain',
irb(main):013:1* 'drop database',
irb(main):014:1* 'drop table',
irb(main):015:1* 'create database',
irb(main):016:1* 'create table'
irb(main):017:0> ].freeze
=>
["set names",
...
irb(main):018:0> QUERY_NAME_REGEX = Regexp.new("\\b(#{QUERY_NAMES.join('|')})\\b", Regexp::IGNORECASE)
=> /\b(set names|select|insert|update|delete|begin|commit|rollback|savepoint|release savepoint|explain|drop database|drop table|create da...
irb(main):019:0* sql = '/*application:web,controller:blob,action:show,correlation_id:01EZVMR923313VV44ZJDJ7PMEZ,' \
irb(main):020:0* 'endpoint_id:Projects::BlobController#show*/ SELECT "routes".* FROM "routes" WHERE "routes"' \
irb(main):021:0> '."source_id" = 75 AND "routes"."source_type" = \'Namespace\' LIMIT 1'
=> "/*application:web,controller:blob,action:show,correlation_id:01EZVMR923313VV44ZJDJ7PMEZ,endpoint_id:Projects::BlobController#show*/ S...
irb(main):022:0> QUERY_NAME_REGEX.match(sql) { |match| match[1].downcase }
=> "select"
irb(main):023:0* sql = '/*application:web,controller:insert,action:show,correlation_id:01EZVMR923313VV44ZJDJ7PMEZ,' \
irb(main):024:0* 'endpoint_id:Projects::BlobController#show*/ SELECT "routes".* FROM "routes" WHERE "routes"' \
irb(main):025:0> '."source_id" = 75 AND "routes"."source_type" = \'Namespace\' LIMIT 1'
=> "/*application:web,controller:insert,action:show,correlation_id:01EZVMR923313VV44ZJDJ7PMEZ,endpoint_id:Projects::BlobController#show*/...
irb(main):026:0> QUERY_NAME_REGEX.match(sql) { |match| match[1].downcase }
=> "insert"
irb(main):027:0>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch. I've updated the tests to include this case. It's fixed in: 82584ba
'create table' | ||
].freeze | ||
|
||
QUERY_NAME_REGEX = Regexp.new("\\b(#{QUERY_NAMES.join('|')})\\b", Regexp::IGNORECASE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect we need to match and ignore (multi-line) comments at the start of the query. I.e. these components:
comments: /(?:#|--).*?(?=\r|\n|$)/i,
multi_line_comments: %r{\/\*(?:[^\/]|\/[^*])*?(?:\*\/|\/\*.*)}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Updated in: 82584ba
helpers/sql-obfuscation/lib/opentelemetry/helpers/sql_obfuscation.rb
Outdated
Show resolved
Hide resolved
helpers/sql-obfuscation/lib/opentelemetry/helpers/sql_obfuscation.rb
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM. Thanks for this @kaylareopelle. Sorry for the lengthy review process - the PR and scope was quite large!
@kaylareopelle I am going to do some exploratory testing before merging! Thanks everyone for the thorough reviews. |
Mysql2, trilogy, and pg instrumentation contained duplicated constants and methods. This refactor creates two new gems, opentelemetry-helpers-mysql and opentelemetry-helpers-sql-obfuscation, to hold the shared code. It also: * Improves SQL statement query finding for users of Marginalia/Active Record Query Logs * UTF-8 encodes MySQL statements before extraction * Adds fixtures from New Relic’s SQL obfuscation tests * Adjusts regex queries to support multiple lines * Adds obfuscation support for SQLite, Apache Cassandra, and Oracle
3a62451
to
32018f7
Compare
Mysql2, trilogy, and pg instrumentation contained duplicated constants and methods. This refactor creates two new gems,
opentelemetry-helpers-mysql
andopentelemetry-helpers-sql-obfuscation
, to hold the shared code.In addition, it also adds the ability to obfuscate SQLite, Cassandra, and Oracle databases, though there isn't any instrumentation available to automatically incorporate obfuscation for those database types.
It also improves MySQL statement extraction and SQL obfuscation behavior to better support comments prepended to SQL statements.
Resolves #32
Outdated content below this line