Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image Render: Support Tracing #586

Open
wants to merge 35 commits into
base: master
Choose a base branch
from
Open

Conversation

lucychen-grafana
Copy link

@lucychen-grafana lucychen-grafana commented Dec 10, 2024

Testing Instructions:

Test as service:

  1. Install tempo https://grafana.com/docs/tempo/latest/getting-started/docker-example/
  1. Checkout this branch of grafana and run grafana-enterprise
  • Set otlp address in defaults.ini
[tracing.opentelemetry.otlp]
address = localhost:4317
  1. Checkout this pr grafana-image-renderer branch. Run as remote server

  2. Open Localhost:3000; run a report or PDF creation, or any call to the image renderer service

  3. Check traces: Explore -> Select ‘gdev-tempo’ datasource -> run query type Search with resource.service.name="grafana-image-renderer"

  • If there is no Tempo datasource, create a new one with url set to http://localhost:3200
  • Select the TraceID related to the IR GET request and look into the spans
  1. Check logs in the console to confirm trace_id and span_id is included in the logs

To test the plugin on Mac:

  1. Build the plugin locally for Mac:
    1. Update your Makefile to set ARCH = darwin-arm64-unknown
    2. Run make build_package
    3. The plugin will be built in the dist folder
  2. Copy / paste the built plugin to your Grafana plugins folder and rename it to grafana-image-renderer
  3. In your Grafana .ini file:
    1. Comment your remote renderer settings in the [rendering] section
    2. Add the following config to enable traces and allow running the IR plugin unsigned:
[plugin.grafana-image-renderer]
rendering_tracing_url = http://localhost:4318/v1/traces

[plugins]
allow_loading_unsigned_plugins = grafana-image-renderer

Additional Pr to review:
Deployment_tools pr for config setup
Grafana pr

@CLAassistant
Copy link

CLAassistant commented Dec 10, 2024

CLA assistant check
All committers have signed the CLA.

@lucychen-grafana lucychen-grafana linked an issue Dec 10, 2024 that may be closed by this pull request
src/tracing.ts Outdated
@@ -0,0 +1,41 @@
import {NodeSDK} from '@opentelemetry/sdk-node';
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed OpenTelemetry Doc for Node.js to implement this

src/tracing.ts Outdated
Comment on lines 31 to 40
export function startTracing(log: Logger) {
sdk.start();
log.info('Starting tracing');

process.on('SIGTERM', () => {
sdk.shutdown()
.then(() => log.debug('Tracing terminated'))
.catch((error) => log.error('Error terminating tracing', error))
.finally(() => process.exit(0));
});
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create startTracing separately from the rest of initialization steps b/c we want to only capture traces if it's enabled

@lucychen-grafana lucychen-grafana requested review from a team, juanicabanas and AgnesToulet and removed request for a team January 30, 2025 16:55
@lucychen-grafana lucychen-grafana marked this pull request as ready for review January 30, 2025 16:56
Copy link
Contributor

@AgnesToulet AgnesToulet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! I tested it and this worked. However, I don't find the current traces very useful as there is a big blank gap in the middle.

An easy place to add some manual spans would be in the withTimingMetrics function. Something like that but adding some config checks:

  async withTimingMetrics<T>(step: string, callback: () => Promise<T>): Promise<T> {
    if (this.config.timingMetrics) {
      const endTimer = this.metrics.durationHistogram.startTimer({ step });
      const res = this.tracer.startActiveSpan(step, async (span) => {
        const cbRes = await callback();
        span.end();
        return cbRes;
      });
      endTimer();

      return res;
    } else {
      return callback();
    }
  }

The Browser class could initialize its tracer in its constructor (this should work for all browser implementations, e.g. with mode: "clustered"):

export class Browser {
  tracer: Tracer;

  constructor(protected config: RenderingConfig, protected log: Logger, protected metrics: Metrics) {
    this.log.debug('Browser initialized', 'config', this.config);
    this.tracer = trace.getTracer('browser');
  }
  ...

Also, I looked into what is missing to have a link between Grafana traces and image renderer traces and this simple change is working (to enable traces in Grafana, you need to set:

[tracing.opentelemetry.otlp]
# otlp destination (ex localhost:4317)
address = localhost:4317

). I will test it out a bit more and check if this need to be initialized only when tracing is enabled or if this is a no-op when tracing is disabled.

src/tracing.ts Outdated
@@ -0,0 +1,41 @@
import {NodeSDK} from '@opentelemetry/sdk-node';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole file should be formatted using Prettier linter (we don't have a check for that in CI but we should do our best).

@AgnesToulet AgnesToulet mentioned this pull request Feb 10, 2025
@lucychen-grafana
Copy link
Author

[tracing.opentelemetry.otlp]

otlp destination (ex localhost:4317)

address = localhost:4317

@AgnesToulet setting this address is enable traces in Grafana in dev env right? Traces should already be enabled in Cloud via Faro

@AgnesToulet
Copy link
Contributor

Yes it's for your local environment. It's enabled by default on Cloud (Faro is for frontend monitoring, its configuration is in a different section of the ini file, I don't think both are related).

README.md Outdated
@@ -40,6 +39,33 @@ This plugin is not compatible with the current Grafana Docker image and requires

If you still want to install the plugin with the Grafana Docker image, refer to the instructions on building a custom Grafana image in [Grafana Docker documentation](https://grafana.com/docs/grafana/latest/setup-grafana/configure-docker/#build-a-custom-grafana-docker-image).

### Plugin for Mac ARM64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be here instead: https://github.com/grafana/grafana-image-renderer/blob/master/docs/building_from_source.md

This README file is the one that will be published on grafana.com and should stay simple I think (I just noticed that the "testing" section should also probably be elsewhere).

We could link to the "Build from source" doc in this file though, to help with discoverability. Like, in the "Supported operating systems" section, you could add:

For Mac ARM64, you need to [build the plugin from source](https://github.com/grafana/grafana-image-renderer/blob/master/docs/building_from_source.md) or use the [remote rendering installation](https://github.com/grafana/grafana-image-renderer?tab=readme-ov-file#remote-rendering-service-installation).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I can move the instruction to Build from source doc and also create a new doc for testing instructions

README.md Outdated
@@ -73,6 +99,11 @@ The following example shows how you can run Grafana and the remote HTTP renderin
ports:
- 8081
```
_Notes:_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This command is already in the "build from source" section.

"dumpio": false,

"tracing": {
"url": "http://host.docker.internal:4318/v1/traces"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works only on Windows and Mac, not Linux. I think the best would be to provide the whole setup so to copy/paste the content of the Tempo docker-compose file, to replace this with http://tempo:4318/v1/traces and to provision the Tempo datasource so people can easily set up tracing (I wonder if we should do all this in the custom-config folder or create a new tracing folder).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh i see. Maybe better to create separate tracing folder w tracing config.json and docker-compose file if end user don't need tracing enabled at all times

src/tracing.ts Outdated

return new NodeSDK({
resource: new Resource({
[SEMRESATTRS_SERVICE_NAME]: 'grafana-image-renderer',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in a previous comment, I think it would be best to be able to configure this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Configure the [SEMRESATTRS_SERVICE_NAME]? Or the whole resource?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The [SEMRESATTRS_SERVICE_NAME], I think internally, we'd prefer to use render-service for example. But grafana-image-renderer is a better default.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want to give that an option to configure it? If in the future we decide to rename it, then it'll make querying disjuncted and would have to take into account of both

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not to configure it multiple times, it's just to configure it once when you set up tracing so that it matches your own deployment convention and make the traces easier to discover. The thing is that the value you put makes the most sense as the default but doesn't match what we use on our Cloud.

@@ -118,6 +132,11 @@ const envConfig: Record<Mode, Keys<RenderingConfig>> = {
verboseLogging: 'GF_PLUGIN_RENDERING_VERBOSE_LOGGING',
dumpio: 'GF_PLUGIN_RENDERING_DUMPIO',
timingMetrics: 'GF_PLUGIN_RENDERING_TIMING_METRICS',
tracing: {
url: 'GF_PLUGIN_RENDERING_TRACING_URL',
serviceName: 'GF_PLUGIN_RENDERING_TRACING_SERVICE_NAME'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not used by the populateRenderingConfigFromEnv function. This should either be used or removed. I think we can remove it as it's less likely to be an issue for people using the image renderer in plugin mode (usually, this means no Kubernetes setup where you already gave a name at your services). And we can always add it later if requested.

Copy link

@nmarrs nmarrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm tried to test this locally but ran into this error in my grafana local server logs:

ERROR[03-06|07:42:50] OpenTelemetry handler returned an error  logger=tracing err="traces export: context deadline exceeded: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp [::1]:4317: connect: connection refused\""

Honestly though if both you and Agnes were able to run this locally we should unblock merging this and we can make any tweaks as necessary once we start getting data from dev / ops instances

One thing I was a little confused on is whether or not traces should work for image renderer in plugin mode or just http server mode (I think the latter). Your directions in PR description include instructions for mac for running as plugin mode but I'm not sure traces will be sent in that case 🤔

@@ -207,4 +224,8 @@ export function populateRenderingConfigFromEnv(config: RenderingConfig, env: Nod
if (env[envKeys.timingMetrics!]) {
config.timingMetrics = env[envKeys.timingMetrics!] === 'true';
}

if (env[envKeys.tracing?.url!]) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is kinda ew haha (using the ! here as it can cause things to break if this is not defined in the object) - given this is used right now in other places and things aren't breaking its probably fine for the time being

@@ -95,15 +99,15 @@ function getGrafanaEndpoint(domain: string) {
let envSettings = {
saveDiff: false,
updateGolden: false,
}
};
Copy link

@nmarrs nmarrs Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm guessing a lot of these extra changes are just running lint - maybe we should consider making a follow-up PR to image renderer to run lint across all of the files to clean this up some for the future

@@ -330,4 +318,20 @@ export class HttpServer {
return res.status(500).json({ error: e.message });
}
};

getHeaders(req: express.Request<any, any, any, RenderOptions, any>): HTTPHeaders {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we not define the objects here in place of "any" type?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Render: Support tracing
5 participants