-
Notifications
You must be signed in to change notification settings - Fork 4.8k
HIVE-29012: NPE in ExplainTask when protologging posthook is enabled #5872
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Based on the debugging, the Task#initialize() method creates conf object in HiveServer2-Handler-Pool thread and when the post hook for protologging is triggered, it created a new thread (Hive Hook Proto Log Writer) which get the query plan and at that time, ExplainTask#outputPlan() has conf as null which causes NPE.
|
@zabetak, can you please help with the review as the changes are around HIVE-22173's code. |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @Aggarwal-Raghav ! I left some questions/suggestions for the proposed fix.
For completeness, it would be nice to also add a unit test possibly somewhere in TestHiveProtoLoggingHook
or TestHiveHookEventProtoPartialBuilder
. Some fixes are trivial but writting also tests about them increases test coverage and guards against future regressions.
if (conf == null && this.work != null && this.work.getFetchTask() != null) { | ||
conf = this.work.getFetchTask().getConf(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The initial NPE may be avoided but it is not guaranteed that this.work.getFetchTask().getConf()
will be not null so technically the code could still hit NPE. Furthermore, I see at least 9 references to conf
inside this class that do not account for the nullability of the field so fixing one of those seems somewhat incomplete.
Moreover, this kind of fallback strategy to obtain a conf object from the internal fetch task else does not seem to be a widespread pattern so not sure what's the reasoning behind it.
Since the problem seems to be triggered mainly by HiveProtoLoggingHook
I was wondering if we should rather ensure that the respective class passes a non-null configuration object even if that is the default one (new HiveConf()
).
Another alternative would be to do something like the following:
final int limit = conf == null ? ConfVars.HIVE_EXPLAIN_NODE_VISIT_LIMIT.defaultIntVal
: conf.getIntVar(ConfVars.HIVE_EXPLAIN_NODE_VISIT_LIMIT);
that definitely prevents the NPE no matter what, but it is still a localized fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review and suggestions @zabetak and will try to come up with a UT as well.
My understanding:
- I also though of passing the non-null config from
HiveProtoLoggingHook
class but it would require some method parameter/argument changes across classes and might need additional check if prologging is disabled. So didn't gave a try 🙁. - The reason behind using
this.work.getFetchTask().getConf()
is that I wanted to use the conf which was present in ExplainTask without propogating it from other classes. - As you said that there are 9 references of conf but while debugging only ExplainTask#outputPlan() having
conf.getIntVar(ConfVars.HIVE_EXPLAIN_NODE_VISIT_LIMIT)
is getting triggered. So I think handling, the localized place should be enough. - Using
ConfVars.HIVE_EXPLAIN_NODE_VISIT_LIMIT.defaultIntVal
, as you suggested, can work as using the default value shouldn't have any impact on OtherInfoType.CONF. Will test and confirm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding 1., I was thinking that changing just a bit HiveHookEventProtoPartialBuilder should work to avoid this NPE:
private JSONObject getExplainJSON(ExplainWork explainWork) throws Exception {
ExplainTask explain = (ExplainTask) TaskFactory.get(explainWork, new HiveConf());
return explain.getJSONPlan(null, explainWork, stageIdRearrange);
}
This approach seems better than 4, although it doesn't eliminate the potential NPE from other code paths.
The limit could also become part of the ExplainConfiguration
but I guess this change would have bigger impact.
Anyways, there are many more options, so I will leave the final decision up to you.
What changes were proposed in this pull request?
Check HIVE-29012 for the stacktrace.
Why are the changes needed?
When protologging hook is enabled, it stores the query plan for a query as well. The conf object in ExplainTask is null for the "Hive Hook Proto Log Writer" thread.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Will see CI output + local single node setup/cluster testing