Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPCC-32248 Add tracing to rowservice #19314

Open
wants to merge 1 commit into
base: candidate-9.8.x
Choose a base branch
from

Conversation

jpmcmu
Copy link
Contributor

@jpmcmu jpmcmu commented Nov 25, 2024

  • Added opentelemetry tracing to rowservice

Signed-off-by: James McMullan [email protected]

Type of change:

  • This change is a bug fix (non-breaking change which fixes an issue).
  • This change is a new feature (non-breaking change which adds functionality).
  • This change improves the code (refactor or other change that does not change the functionality)
  • This change fixes warnings (the fix does not alter the functionality or the generated code)
  • This change is a breaking change (fix or feature that will cause existing behavior to change).
  • This change alters the query API (existing queries will have to be recompiled)

Checklist:

  • My code follows the code style of this project.
    • My code does not create any new warnings from compiler, build system, or lint.
  • The commit message is properly formatted and free of typos.
    • The commit message title makes sense in a changelog, by itself.
    • The commit is signed.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly, or...
    • I have created a JIRA ticket to update the documentation.
    • Any new interfaces or exported functions are appropriately commented.
  • I have read the CONTRIBUTORS document.
  • The change has been fully tested:
    • I have added tests to cover my changes.
    • All new and existing tests passed.
    • I have checked that this change does not introduce memory leaks.
    • I have used Valgrind or similar tools to check for potential issues.
  • I have given due consideration to all of the following potential concerns:
    • Scalability
    • Performance
    • Security
    • Thread-safety
    • Cloud-compatibility
    • Premature optimization
    • Existing deployed queries will not be broken
    • This change fixes the problem, not just the symptom
    • The target branch of this pull request is appropriate for such a change.
  • There are no similar instances of the same problem that should be addressed
    • I have addressed them here
    • I have raised JIRA issues to address them separately
  • This is a user interface / front-end modification
    • I have tested my changes in multiple modern browsers
    • The component(s) render as expected

Smoketest:

  • Send notifications about my Pull Request position in Smoketest queue.
  • Test my draft Pull Request.

Testing:

Copy link

Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-32248

Jirabot Action Result:
Workflow Transition To: Merge Pending
Updated PR

@@ -162,6 +162,69 @@ static ISecureSocket *createSecureSocket(ISocket *sock, bool disableClientCertVe
}
#endif

//------------------------------------------------------------------------------
Copy link
Contributor Author

@jpmcmu jpmcmu Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ActiveSpanScope is very similar to ThreadedSpanScope described by Gavin here: https://hpccsystems.atlassian.net/jira/software/c/projects/HPCC/issues/HPCC-32982. I liked the name ActiveSpanScope because I believe the class has utility outside of multithreaded contexts, IE: time slicing. Would it be worthwhile to move this out of dafilesrv into jtrace?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. These utility changes should be in jlib, ideally as separate PRs/requests. It would be worth merging your other PR, and having a PR that implements an agreed solution to HPCC-32982, and then rebasing this PR on it.
I'm open to discussing what the different classes should be called.

@@ -366,13 +366,31 @@ version: 1.0
detail: 100
)!!";

IPropertyTree * loadConfigurationWithGlobalDefault(const char * defaultYaml, Owned<IPropertyTree>& globalConfig, const char * * argv, const char * componentTag, const char * envPrefix)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is similar to work Jake has done in HPCC-32991, might be worthwhile to retarget to master and call the overloaded doLoadConfiguration instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. Should reuse the from HPCC-32991. If this change are wanted in to 9.8, we could consider cherry-picking back the changed in HPCC-32991 to 9.8.

std::string traceParent = fullTraceContext ? fullTraceContext : "";
traceParent = traceParent.substr(0,traceParent.find_last_of("-"));

if (!traceParent.empty() && requestTraceParent != traceParent)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I am checking if the traceParent has changed every time process is called here because the client side may use multiple spans during the lifetime a single CRemoteRequest. See below screenshots for an example.

@jpmcmu
Copy link
Contributor Author

jpmcmu commented Nov 25, 2024

Goal:
The goal of this PR is to add initial tracing support to the row service in dafilesrv, which will improve debuggability for downstream row service clients as well as reducing the time the platform team spends debugging issues.

Current Tracing Limitations:
There is limited support for intercepting errors and adding them to the tracing spans, adding annotations and/or statistics to spans, and no internal spans tracking work within the row service. These limitations are intentional to keep the initial PR as simple as possible, and will be addressed in future PRs.

Exported Tracing example:
Note that during the read the client side creates more than one span over the lifetime of connection to the row service. The row service tracing supports this and correct handles the batching the client side is doing.
Screenshot 2024-11-25 at 1 39 44 PM

- Added opentelemetry tracing to rowservice

Signed-off-by: James McMullan [email protected]
Copy link
Member

@ghalliday ghalliday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpmcmu looks good. A few minor comments. It would be good to rationalise the helper span scope classes so they cover all the options.

@@ -162,6 +162,69 @@ static ISecureSocket *createSecureSocket(ISocket *sock, bool disableClientCertVe
}
#endif

//------------------------------------------------------------------------------
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. These utility changes should be in jlib, ideally as separate PRs/requests. It would be worth merging your other PR, and having a PR that implements an agreed solution to HPCC-32982, and then rebasing this PR on it.
I'm open to discussing what the different classes should be called.

const char* fullTraceContext = requestTree->queryProp("_trace/traceparent");

// We only want to compare the trace-id & span-id, so remove the last "sampling" group
std::string traceParent = fullTraceContext ? fullTraceContext : "";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: You can use strrchr on the const char * to avoid cloning the string. Alternatively use std::string_view and assign to a new string.

{
// Check to see if we have an existing span that needs to be closed out, this can happen
// when the span parent changes on the client side
if (requestSpan != nullptr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be automatic when requestSpan is cleared. That would come if the changes in HPCC-32982 are implemented. Possibly requires a boolean to indicate success..

Owned<IProperties> traceHeaders = createProperties();
traceHeaders->setProp("traceparent", fullTraceContext);

std::string requestSpanName;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor efficiency: use a const char * and avoid a string being cloned.

if (traceParent != nullptr)
{
Owned<IProperties> traceHeaders = createProperties();
traceHeaders->setProp("traceparent", traceParent);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also have the sampling suffix removed?

Copy link
Member

@jakesmith jakesmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpmcmu - please see comments.

//------------------------------------------------------------------------------
// ActiveSpanScope Design Notes:
//------------------------------------------------------------------------------
// ActiveSpanScope updates the threadActiveSpan when it is intstantiated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intstantiated -> instantiated

if (0 == cursorHandle)
throw createDafsException(DAFSERR_cmdstream_protocol_failure, "cursor handle not supplied to 'close' command");
{
IException* exception = createDafsException(DAFSERR_cmdstream_protocol_failure, "cursor handle not supplied to 'close' command");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be changed to : IDAFS_Exception* exception = createDafsException..

otherwise the throw exception (of an IException) will not be caught by catch (IDAFS_Exception *e) handlers.

Owned<IPropertyTree> env = getHPCCEnvironment();
IPropertyTree* globalTracing = env->queryPropTree("Software/tracing");
if (globalTracing != nullptr)
extractedGlobalConfig->addPropTree("tracing", globalTracing);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be LINK(globalTracing) (or env->getPropTree)

#endif

// NB: bare-metal dafilesrv does not have a component specific xml, extracting relevant global configuration instead
Owned<IPropertyTree> config = loadConfigurationWithGlobalDefault(defaultYaml, extractedGlobalConfig, argv, "dafilesrv", "DAFILESRV");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of new function/adding to global, could you add 'tracing' to the component config instead?
e.g.:

#ifndef _CONTAINERIZED
    Owned<IPropertyTree> env = getHPCCEnvironment();
    IPropertyTree* tracing = env->getPropTree("Software/tracing");
    if (tracing)
        config->setPropTree("tracing", tracing);
#endif

(and combine with #else // __CONTAINERIZED block below)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants