Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement session::get_last_query_context() #1164

Closed
wants to merge 17 commits into from

Conversation

Krzmbrzl
Copy link
Contributor

Fixes #678
Fixes #1027

@Krzmbrzl Krzmbrzl force-pushed the get-query-with-context branch 3 times, most recently from 6798c27 to c5d67dd Compare August 26, 2024 17:00
@Krzmbrzl Krzmbrzl marked this pull request as ready for review August 26, 2024 17:33
@Krzmbrzl
Copy link
Contributor Author

@vadz do you think we need a way to disable parameter logging? Compared to the overhead of the actual DB query the logging of parameters is likely insignificant but maybe you have a scenario in mind in which this might become important?

@Krzmbrzl Krzmbrzl force-pushed the get-query-with-context branch from 75848e5 to 5f5702e Compare August 26, 2024 17:53
@vadz
Copy link
Member

vadz commented Aug 26, 2024

@vadz do you think we need a way to disable parameter logging? Compared to the overhead of the actual DB query the logging of parameters is likely insignificant but maybe you have a scenario in mind in which this might become important?

The only 2 potential problems I can think about are:

  1. Logging sensitive information, e.g. if you store passwords or credit card numbers in the database.
  2. Logging too much data and flooding the logs with useless information.

But it looks like both could be addressed by overriding add_query_parameter()?

@Krzmbrzl
Copy link
Contributor Author

But it looks like both could be addressed by overriding add_query_parameter()?

Yes.

Also, parameters are not logged persistently. They only live to the next query execution. That is still not good for sensitive information of course. Just wanted to clarify.

@Krzmbrzl
Copy link
Contributor Author

Just to be explicit about it: from my POV this PR is done

Copy link
Member

@vadz vadz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work on this, but after rereading it carefully, I've realized that, in addition to various small remarks below, I have a more global question (and I'm really sorry for completely missing your question about this originally): with this change we now impose some extra overhead for all queries, even if it's not needed at all. Granted, this overhead is not huge, but I'm not sure it's completely negligible neither, considering that there may be a lot of parameters in a query (I'm sure I have some with a few dozens in my own code). Have you tried to measure it by chance?

I don't know if we can avoid this by only returning the query context on demand, i.e. only when/if logger::get_last_query_context() is called, but it looks like it should be possible, or am I missing some reason it isn't?

In any case, after we decide on their final shape, the new APIs need to be mentioned in at least docs/api/client.md and docs/logging.md (we probably need to have a new section about logging parameter values in the latter).

Thanks again!

src/core/session.cpp Outdated Show resolved Hide resolved
Comment on lines 34 to 35
std::string name;
std::string value;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that important, but I'd make this change

Suggested change
std::string name;
std::string value;
const std::string name;
const std::string value;

to be explicit about the fact that they never change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here is the wrong place to do this. This should be encoded in the place that actually uses an instance of this struct by making that specific instance const.

Additionally, having const members can have unintended side-effects like a disabled (by default) assignment operator. That's why I usually don't like having const members 🤔

include/soci/logger.h Outdated Show resolved Hide resolved
src/core/session.cpp Outdated Show resolved Hide resolved
src/core/session.cpp Outdated Show resolved Hide resolved
tests/common-tests.h Outdated Show resolved Hide resolved
include/soci/logger.h Outdated Show resolved Hide resolved
include/soci/logger.h Outdated Show resolved Hide resolved
@Krzmbrzl
Copy link
Contributor Author

Krzmbrzl commented Aug 31, 2024

I don't know if we can avoid this by only returning the query context on demand, i.e. only when/if logger::get_last_query_context() is called, but it looks like it should be possible, or am I missing some reason it isn't?

I don't think we have access to the last executed statement, do we? And without access to that, how do you want to obtain the last used parameters?
Always saving them on use seemed like the only way to pull this off, but if I missed something, I'm open to suggestions.

Granted, this overhead is not huge, but I'm not sure it's completely negligible neither, considering that there may be a lot of parameters in a query (I'm sure I have some with a few dozens in my own code). Have you tried to measure it by chance?

I didn't measure it. My gut tells me that any DB call will take significantly longer than logging a tens, even hundreds of parameters but that's just guessing.

I was thinking of whether there might be a semantic such as

sql.enable_query_context_logging(true);

to have a way to explicitly toggle the behavior to opt in or out of this feature.
If we have that, we only have to decide on the default being on or off.

@Krzmbrzl Krzmbrzl changed the title Implement session::get_last_query_with_context() Implement session::get_last_query_context() Sep 1, 2024
@Krzmbrzl
Copy link
Contributor Author

@vadz what do you think of the above suggestion?

@vadz
Copy link
Member

vadz commented Sep 18, 2024

@vadz what do you think of the above suggestion?

Sorry for the delay, I still didn't have time to look at this properly, but maybe you can answer my question without spending any time on it:

Could we store a pointer to the statement being executed in the session? Or do we need to get the query context after the statement has been already destroyed?

@Krzmbrzl
Copy link
Contributor Author

Krzmbrzl commented Sep 19, 2024

do we need to get the query context after the statement has been already destroyed?

I think we do need this functionality. Imo the context is most useful when some sort of error handling (for the error message) and at this point the executed statement is usually destroyed already (as to my understanding the statement only lives until it has been executed).

@vadz
Copy link
Member

vadz commented Sep 20, 2024

But couldn't we fill the context in rethrow_current_exception_with_context(), just as we do now?

@Krzmbrzl
Copy link
Contributor Author

I don't think so. When we use the rethrow, we are still within the statement (which is therefore still alive). However, we need to be able to access the context even when the statement has already been destroyed (as we want to be able to access the context whenever we access the last query, which is independent of the statement object).

And also, we don't want to have the context available only when there is an error while executing the statement.

@vadz
Copy link
Member

vadz commented Sep 20, 2024

I thought we could fill the context stored in the session, but do it only when an exception happens. What prevents us from doing it?

This doesn't help if we want to have the context even without the exception, of course, but IME this is mostly useful when an error occurs, logging successful executions doesn't seem very interesting...

But, in principle, we could also pass some

enum class LogContext
{
  Never,
  Always,
  OnError
};

to enable_query_context_logging() to make this maximally configurable, with OnError only doing it in rethrow_current_exception_with_context().

@Krzmbrzl
Copy link
Contributor Author

 successful executions doesn't seem very interesting...

I disagree. There are soft errors that arise from the semantics which happen even though the query was successful. In fact, this is the primary use case for me as the exception message in case of a DB error already contains the parameters.
In my case the example is the execution of a query which I expect to produce data. So if the query produces the empty set, that is an error that I want to report even though the query itself ran fine and didn't throw an exception.

But, in principle, we could also pass some

enum class LogContext { Never, Always, OnError };

to enable_query_context_logging() to make this maximally configurable, with OnError only doing it in rethrow_current_exception_with_context()

Works for me 👍

@vadz
Copy link
Member

vadz commented Oct 10, 2024

Would you be willing to implement this LogContext proposal? TIA!

@Krzmbrzl
Copy link
Contributor Author

Yes absolutely! I just didn't find the time to do so yet.

@vadz
Copy link
Member

vadz commented Oct 10, 2024

Yes absolutely! I just didn't find the time to do so yet.

No problem at all, I just wanted to check if you were waiting for something from me.

@vadz vadz marked this pull request as draft November 17, 2024 19:14
@Krzmbrzl Krzmbrzl force-pushed the get-query-with-context branch from 39e7a10 to 2f3a067 Compare December 31, 2024 15:47
The default is to cache parameters for all queries. This can be changed
to either only cache in case of an error or to never cache. The latter
will also ensure that the parameters will not be part of exception
messages in order to not leak any potentially sensitive information.
@Krzmbrzl Krzmbrzl marked this pull request as ready for review January 1, 2025 13:12
@Krzmbrzl Krzmbrzl requested a review from vadz January 1, 2025 13:12
@Krzmbrzl
Copy link
Contributor Author

Krzmbrzl commented Jan 1, 2025

Note that when using sql.set_query_context_logging_mode(log_context::none), not only are parameters not cached with the new mechanism. They are then also excluded from exception messages. My line of thought has been that if the user believes that their query parameters shouldn't be cached even in case of a DB query error, then they probably don't want these parameters as part of an exception message either (as most likely the parameters contain sensitive information that should not leak).

Krzmbrzl and others added 4 commits January 1, 2025 15:19
@vadz
Copy link
Member

vadz commented Jan 21, 2025

Thanks, this looks good to me, I've added a few minor changes and if you don't have any objections to them, I'll (squash) merge this soon.

One unfortunate thing is that even this doesn't help when using vector parameters, as their data still doesn't appear in the output and even if it did appear, it wouldn't be helpful to just dump a 1000 element vector without knowing which of its elements resulted in the problem and I don't know how to do it for all backends (notably ODBC). If you, or anybody else, have any ideas about how to improve this, they would be very welcome!

vadz added 2 commits January 21, 2025 23:16
They do the same thing as the existing macros for gcc but for MSVC.
These warnings shouldn't happen and this compiler bug was supposed to be
fixed a long time ago, see

https://developercommunity.visualstudio.com/t/stringifying-raw-string-literal/67300

but they still happen in AppVeyor CI.

This should be reverted once they update their compilers or we don't use
it any more.
@Krzmbrzl
Copy link
Contributor Author

I've added a few minor changes and if you don't have any objections to them, I'll (squash) merge this soon.

Sure, LGTM.

One unfortunate thing is that even this doesn't help when using vector parameters

Ah, right. I completely forgot about those. Given that they (I think?) work by effectively sending all data at once to the DB in order to perform a mass query, I kinda tend to think that it is impossible to know which parameter has caused an issue. Unless there is a Backend specific API to query that kind of thing 🤔

@vadz vadz closed this in dc4815c Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create session::get_last_query_with_context How to get actual query of statement with bound input values
2 participants