Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle errors in execution #176

Merged
merged 5 commits into from
Sep 19, 2024
Merged

Handle errors in execution #176

merged 5 commits into from
Sep 19, 2024

Conversation

mkaruza
Copy link
Collaborator

@mkaruza mkaruza commented Sep 13, 2024

  • Unify reporting of WARNING with log
  • Throw exception in functions that are used inside duckdb execution
  • Report WARNING for functions that are part of "normal" postgres execution
  • Cleanup duckdb_state before reporting error in DuckDBNode (clean up)

@mkaruza mkaruza requested a review from JelteF September 13, 2024 08:01
@mkaruza mkaruza force-pushed the error-handling branch 3 times, most recently from bca95d8 to 57ff4fd Compare September 13, 2024 08:57
@mkaruza
Copy link
Collaborator Author

mkaruza commented Sep 13, 2024

Should address #93

Copy link
Collaborator

@Tishj Tishj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for these efforts, I don't think #93 can be fully closed because it's something we have to be conscientiously aware of going forward.

Also I wonder if there are methods that are used both during duckdb execution and outside of it, which could be a problem if one or the other is assumed

@wuputah
Copy link
Collaborator

wuputah commented Sep 13, 2024

I certainly don't want #93 to be open forever 😂 but we can maybe add something to CONTRIBUTING.md as a note as well. We should try to move towards issues being specific rather than generic "fix all the things" e.g. with either a specific reproducible test case or ideally pointing to something specific in the code. Of course this can be hard with issues like memory leaks.

If there's still specific things you think we should add or change, I'd suggest we open new issue(s) for those.

src/pgduckdb_detoast.cpp Show resolved Hide resolved
std::optional<T>
PostgresFunctionGuard(FuncType postgres_function, FuncArgs... args) {
T return_value;
bool error = false;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be marked volatile

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it seems this is probably not necessary. I was thinking about the problem described here, but that does not apply since error is not changed in the PG_TRY section: https://github.com/postgres/postgres/blob/master/src/include/utils/elog.h#L369-L374

}
PG_CATCH();
{
error = true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like it would probably be better to throw an exception here already. That way we can preserve the original Postgres error message for much better debugging.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure is it correct to throw from this. I would prefer that this function outputs error message but still caller needs to handle this.

Copy link
Collaborator

@JelteF JelteF Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean with "correct" in this case?

Do you mean is it allowed to throw an exception in PG_CATCH? In which case, I think it's fine, but indeed I'm also not 100% sure. But if you throw an exception after the PG_CATCH if error is true, that should definitely be fine.

Or do you mean is it what the caller should do? I think in at least 90% of the cases the caller will simply want to propagate the error message in the form of an exception. In the last 10% maybe they want more control, in which case returning the error message is maybe indeed what we would want. But I think we should make it as easy as possible to write correct code, so I think we should also have a helper that simply throws an exception. And honestly, maybe throwing an exception in all cases might be fine too, since a caller can always catch it if it needs to do something special.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JelteF guard function will now throw InternalException with error message constructed from postgresql log

CONTRIBUTING.md Outdated
Comment on lines 131 to 134
* *Exception* should be used in code that is executed inside duckdb engine.
* Use PostgreSQL *elog* API to report messages back to user. Using *ERROR* is strictly forbiden to use as it can brake execution and lead to
unexpected memory problems - *ERROR* is only used in duckdb node to early bail out execution in case of query preparation error or execution interruption.
* Calling PostgreSQL native functions needs to be used with *PostgresFunctionGuard*. *PostgresFunctionGuard* will handle correctly *ERROR* log messages that could be emmited from these functions.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* *Exception* should be used in code that is executed inside duckdb engine.
* Use PostgreSQL *elog* API to report messages back to user. Using *ERROR* is strictly forbiden to use as it can brake execution and lead to
unexpected memory problems - *ERROR* is only used in duckdb node to early bail out execution in case of query preparation error or execution interruption.
* Calling PostgreSQL native functions needs to be used with *PostgresFunctionGuard*. *PostgresFunctionGuard* will handle correctly *ERROR* log messages that could be emmited from these functions.
* There are two distinct parts of the code where error handling is done very differently: The code that executes before we enter DuckDB execution engine (e.g. initial part of the planner hook) and the part that gets executed inside the duckdb execution engine. Below are rules for how to handle errors in both parts of the code. Not following these guidelines can cause crashes, memory leaks and other unexpected problems.
* Before we enter the DuckDB exection engine no exceptions should ever be thrown here. In cases where you would want to throw an exception here, use `elog(ERROR, ...)`. Any C++ code that might throw an exception is also problematic. Since C++ throws exceptions on allocation failures, this covers lots of C++ APIs. So try to use Postgres datastructures instead of C++ ones whenever possible (e.g. use `List` instead of `Vec`)
* Inside the duckdb execution engine the opposite is true. `elog(ERROR, ...)` should never be used there, use exceptions instead.
* Use PostgreSQL *elog* API can be used to report non-fatal messages back to user. Using *ERROR* is strictly forbiden to use in code that is executed inside the duckdb engine as
* Calling PostgreSQL native functions from within DuckDB execution needs **extreme care**. Pretty much non of these functions are thread-safe, and they might throw errors using `elog(ERROR, ...)`. If you've solved the thread-safety issue by taking a lock (or by carefully asserting that the actual code is thread safe), then you can use *PostgresFunctionGuard* to solve the `elog(ERROR, ...) problem. *PostgresFunctionGuard* will correctly handle *ERROR* log messages that could be emmited from these functions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed according to suggestion.

@@ -19,4 +23,44 @@ TokenizeString(char *str, const char delimiter) {
return v;
};

template <typename T, typename FuncType, typename... FuncArgs>
std::optional<T>
PostgresFunctionGuard(FuncType postgres_function, FuncArgs... args) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have a similar function for the other way around, i.e. where we call DuckDB C++ APIs from Postgres C code.

toastrel = table_open(toast_pointer.va_toastrelid, AccessShareLock);
table_relation_fetch_toast_slice(toastrel, toast_pointer.va_valueid, attrsize, 0, attrsize, result);
table_close(toastrel, AccessShareLock);
toast_rel = try_table_open(toast_pointer.va_toastrelid, AccessShareLock);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try_table_open might allocate memory an throw an ERROR that way. So this needs a PostgresFunctionGuard too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guarded.

error_fetch_toast = true;
}

table_close(toast_rel, AccessShareLock);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even table_close can throw some ERROR, see this quote from the PG source code where they consider memory context reset callbacks possibly throwing an error.

	/*
	 * It's not entirely clear whether 'tis better to do this before or after
	 * delinking the context; but an error in a callback will likely result in
	 * leaking the whole context (if it's not a root context) if we do it
	 * after, so let's do it before.
	 */
	MemoryContextCallResetCallbacks(context);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guarded.

@JelteF
Copy link
Collaborator

JelteF commented Sep 16, 2024

I certainly don't want #93 to be open forever 😂

Totally agreed. But I think there's two more things we need to do at minimum:

  1. Confirm that Resource leak on error, eventual crash on out of memory #120 is fixed
  2. Guarding (pretty much all) C++ API calls from postgres C code, because if those throw an exception (e.g. on OOM) they crash postgres.

I think neither necessarily needs to be part of this initial PR. I personally would prefer merging this PR as soon as we agree on nice to use Guard APIs. Because then we can use those Guard APIs for new code, without having to wait for the problems to be addressed in all existing code.

Finally, this is an issue where it's easy to miss one place. So I definitely want to take another full pass myself over the codebase, after the initial push, to try and find some missed callsites before closing #93.

@JelteF
Copy link
Collaborator

JelteF commented Sep 16, 2024

Finally, this is an issue where it's easy to miss one place.

As a side-note to this: It would be great if we could make it harder to make mistakes here. One idea I had was to not directly import duckdb C++ headers into postgres code. And just as well not directly import postgres headers into C++ code. But instead create dedicated files from where we create safe to use versions with the correct guards. That would make it easy to spot mistakes during review, because it's not expected that you import postgres.h/duckdb.hpp in certain files. And for the files where it is expected you take extra care during review (we could probably even have some automated check in CI for this if we standardize it enough).

Edit: To be clear, please don't do this in this PR. This is more a refactoring that I think could makes sense once we have the Guard APIs to find a few of the missed function calls, and make sure we keep following our own guidelines for this in future PRs.

@wuputah wuputah added this to the 0.1.0 milestone Sep 17, 2024
* Unify reporting of WARNING with log
* Throw exception in functions that are used inside duckdb execution
* Report WARNING for functions that are part of "normal" postgres
  execution
* Cleanup duckdb_state before reporting error in DuckDBNode (clean up)
* Added `PostgresFunctionGuard` and `PostgresVoidFunctionGuard` that
  handle call to Postgres function that can throw ERROR.
* Add few informations how we should handle errors inside code.
* Use same function name `PostgresFunctionGuard` for both void and
  non-void PG function calls.
PG_END_TRY();
// clang-format on
if (error) {
throw duckdb::InternalException("(PGDuckDB/PostgresFunctionGuard) %s", edata->message);
Copy link
Collaborator

@JelteF JelteF Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it makes sense to add this prefix here (and in the other guard), the error didn't really originate here, we are simply transforming it, so having a reference to the guard won't help you find where it originated. Especially if we'll also add conversion the other way around I can see many prefixes being added to a simple error message.

Suggested change
throw duckdb::InternalException("(PGDuckDB/PostgresFunctionGuard) %s", edata->message);
throw duckdb::InternalException("%s", edata->message);

PG_END_TRY();
// clang-format on
if (error) {
throw duckdb::InternalException("(PGDuckDB/PostgresFunctionGuard) %s", edata->message);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think throwing an InternalException is acceptable for now, so that we can start using these guards in new code ASAP. But I think it would be great, if we would save not only the error message, but also the code/detail/context and maybe more in the extra_data of the Exception. Or probably easier: just store the whole edata (which we'd then probably have to allocate in TopMemoryContext).

That way we could convert such an Exception back to the equivalent ereport when surfacing the error to the user.

@Tishj
Copy link
Collaborator

Tishj commented Sep 18, 2024

In the spirit of this PR, I am going to add some generic "ScopedPostgresResource" class so we can make sure never to leak postgres resources like Relation in the event of a C++ exception

// Delete the scan state
CleanupDuckdbScanState(state);
// Process the interrupt on the Postgres side
ProcessInterrupts();
elog(ERROR, "Query cancelled");
elog(ERROR, "(PGDuckDB/ExecuteQuery) Query cancelled");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the prefix really adds anything useful here.

Suggested change
elog(ERROR, "(PGDuckDB/ExecuteQuery) Query cancelled");
elog(ERROR, "Query cancelled");

Side-note: We should be throwing the expected error code here.

Copy link
Collaborator

@JelteF JelteF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think this is pretty much in a merge-able state now.

I'm not sure I'm a fan of all the (PGDuckDB/ReplaceView) type prefixes, but that might just be a matter of me not being used to them. I left two comments on where I think we shouldn't add them for sure though.

Apart from that: Tests are failing. Which should obviously be fixed before merging.

* Guard function throw `duckdb::Exception(EXECUTOR,...)`
* Updated CONTRIBUTING.md according to review suggestion
* Guarded other suggested pg functions
@mkaruza
Copy link
Collaborator Author

mkaruza commented Sep 19, 2024

@JelteF will merge this PR to have some foundation to start with, but i would imagine that there is going to be changes in future until we are satisfied how to handle guarding / error logging / error handling in project.

@mkaruza mkaruza merged commit 51d3a30 into main Sep 19, 2024
3 checks passed
@mkaruza mkaruza deleted the error-handling branch September 19, 2024 08:23
@JelteF
Copy link
Collaborator

JelteF commented Sep 19, 2024

Yeah, sounds like a good plan. I think this is a good foundation to build on top of.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants