-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Image Render: Support Tracing #586
base: master
Are you sure you want to change the base?
Conversation
src/tracing.ts
Outdated
@@ -0,0 +1,41 @@ | |||
import {NodeSDK} from '@opentelemetry/sdk-node'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I followed OpenTelemetry Doc for Node.js to implement this
src/tracing.ts
Outdated
export function startTracing(log: Logger) { | ||
sdk.start(); | ||
log.info('Starting tracing'); | ||
|
||
process.on('SIGTERM', () => { | ||
sdk.shutdown() | ||
.then(() => log.debug('Tracing terminated')) | ||
.catch((error) => log.error('Error terminating tracing', error)) | ||
.finally(() => process.exit(0)); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create startTracing separately from the rest of initialization steps b/c we want to only capture traces if it's enabled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good! I tested it and this worked. However, I don't find the current traces very useful as there is a big blank gap in the middle.
An easy place to add some manual spans would be in the withTimingMetrics
function. Something like that but adding some config checks:
async withTimingMetrics<T>(step: string, callback: () => Promise<T>): Promise<T> {
if (this.config.timingMetrics) {
const endTimer = this.metrics.durationHistogram.startTimer({ step });
const res = this.tracer.startActiveSpan(step, async (span) => {
const cbRes = await callback();
span.end();
return cbRes;
});
endTimer();
return res;
} else {
return callback();
}
}
The Browser class could initialize its tracer in its constructor (this should work for all browser implementations, e.g. with mode: "clustered"
):
export class Browser {
tracer: Tracer;
constructor(protected config: RenderingConfig, protected log: Logger, protected metrics: Metrics) {
this.log.debug('Browser initialized', 'config', this.config);
this.tracer = trace.getTracer('browser');
}
...
Also, I looked into what is missing to have a link between Grafana traces and image renderer traces and this simple change is working (to enable traces in Grafana, you need to set:
[tracing.opentelemetry.otlp]
# otlp destination (ex localhost:4317)
address = localhost:4317
). I will test it out a bit more and check if this need to be initialized only when tracing is enabled or if this is a no-op when tracing is disabled.
src/tracing.ts
Outdated
@@ -0,0 +1,41 @@ | |||
import {NodeSDK} from '@opentelemetry/sdk-node'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole file should be formatted using Prettier linter (we don't have a check for that in CI but we should do our best).
Co-authored-by: Agnès Toulet <[email protected]>
@AgnesToulet setting this address is enable traces in Grafana in dev env right? Traces should already be enabled in Cloud via Faro |
Yes it's for your local environment. It's enabled by default on Cloud (Faro is for frontend monitoring, its configuration is in a different section of the ini file, I don't think both are related). |
…fana-image-renderer into lucychen/add_tracing
* add tracing to plugin mode * update dev.json * update imports * add tracing in CSV endpoints * fix tests * fix and simplify config * fix test
README.md
Outdated
@@ -40,6 +39,33 @@ This plugin is not compatible with the current Grafana Docker image and requires | |||
|
|||
If you still want to install the plugin with the Grafana Docker image, refer to the instructions on building a custom Grafana image in [Grafana Docker documentation](https://grafana.com/docs/grafana/latest/setup-grafana/configure-docker/#build-a-custom-grafana-docker-image). | |||
|
|||
### Plugin for Mac ARM64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be here instead: https://github.com/grafana/grafana-image-renderer/blob/master/docs/building_from_source.md
This README file is the one that will be published on grafana.com and should stay simple I think (I just noticed that the "testing" section should also probably be elsewhere).
We could link to the "Build from source" doc in this file though, to help with discoverability. Like, in the "Supported operating systems" section, you could add:
For Mac ARM64, you need to [build the plugin from source](https://github.com/grafana/grafana-image-renderer/blob/master/docs/building_from_source.md) or use the [remote rendering installation](https://github.com/grafana/grafana-image-renderer?tab=readme-ov-file#remote-rendering-service-installation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I can move the instruction to Build from source doc and also create a new doc for testing instructions
README.md
Outdated
@@ -73,6 +99,11 @@ The following example shows how you can run Grafana and the remote HTTP renderin | |||
ports: | |||
- 8081 | |||
``` | |||
_Notes:_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This command is already in the "build from source" section.
"dumpio": false, | ||
|
||
"tracing": { | ||
"url": "http://host.docker.internal:4318/v1/traces" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works only on Windows and Mac, not Linux. I think the best would be to provide the whole setup so to copy/paste the content of the Tempo docker-compose file, to replace this with http://tempo:4318/v1/traces
and to provision the Tempo datasource so people can easily set up tracing (I wonder if we should do all this in the custom-config
folder or create a new tracing
folder).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh i see. Maybe better to create separate tracing folder w tracing config.json and docker-compose file if end user don't need tracing enabled at all times
src/tracing.ts
Outdated
|
||
return new NodeSDK({ | ||
resource: new Resource({ | ||
[SEMRESATTRS_SERVICE_NAME]: 'grafana-image-renderer', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in a previous comment, I think it would be best to be able to configure this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Configure the [SEMRESATTRS_SERVICE_NAME]? Or the whole resource?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The [SEMRESATTRS_SERVICE_NAME], I think internally, we'd prefer to use render-service
for example. But grafana-image-renderer
is a better default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we want to give that an option to configure it? If in the future we decide to rename it, then it'll make querying disjuncted and would have to take into account of both
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not to configure it multiple times, it's just to configure it once when you set up tracing so that it matches your own deployment convention and make the traces easier to discover. The thing is that the value you put makes the most sense as the default but doesn't match what we use on our Cloud.
Co-authored-by: Agnès Toulet <[email protected]>
src/config/rendering.ts
Outdated
@@ -118,6 +132,11 @@ const envConfig: Record<Mode, Keys<RenderingConfig>> = { | |||
verboseLogging: 'GF_PLUGIN_RENDERING_VERBOSE_LOGGING', | |||
dumpio: 'GF_PLUGIN_RENDERING_DUMPIO', | |||
timingMetrics: 'GF_PLUGIN_RENDERING_TIMING_METRICS', | |||
tracing: { | |||
url: 'GF_PLUGIN_RENDERING_TRACING_URL', | |||
serviceName: 'GF_PLUGIN_RENDERING_TRACING_SERVICE_NAME' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not used by the populateRenderingConfigFromEnv
function. This should either be used or removed. I think we can remove it as it's less likely to be an issue for people using the image renderer in plugin mode (usually, this means no Kubernetes setup where you already gave a name at your services). And we can always add it later if requested.
Co-authored-by: Agnès Toulet <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm tried to test this locally but ran into this error in my grafana local server logs:
ERROR[03-06|07:42:50] OpenTelemetry handler returned an error logger=tracing err="traces export: context deadline exceeded: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp [::1]:4317: connect: connection refused\""
Honestly though if both you and Agnes were able to run this locally we should unblock merging this and we can make any tweaks as necessary once we start getting data from dev / ops instances
One thing I was a little confused on is whether or not traces should work for image renderer in plugin mode or just http server mode (I think the latter). Your directions in PR description include instructions for mac for running as plugin mode but I'm not sure traces will be sent in that case 🤔
@@ -207,4 +224,8 @@ export function populateRenderingConfigFromEnv(config: RenderingConfig, env: Nod | |||
if (env[envKeys.timingMetrics!]) { | |||
config.timingMetrics = env[envKeys.timingMetrics!] === 'true'; | |||
} | |||
|
|||
if (env[envKeys.tracing?.url!]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is kinda ew haha (using the ! here as it can cause things to break if this is not defined in the object) - given this is used right now in other places and things aren't breaking its probably fine for the time being
@@ -95,15 +99,15 @@ function getGrafanaEndpoint(domain: string) { | |||
let envSettings = { | |||
saveDiff: false, | |||
updateGolden: false, | |||
} | |||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm guessing a lot of these extra changes are just running lint - maybe we should consider making a follow-up PR to image renderer to run lint across all of the files to clean this up some for the future
@@ -330,4 +318,20 @@ export class HttpServer { | |||
return res.status(500).json({ error: e.message }); | |||
} | |||
}; | |||
|
|||
getHeaders(req: express.Request<any, any, any, RenderOptions, any>): HTTPHeaders { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we not define the objects here in place of "any" type?
Testing Instructions:
Test as service:
Checkout this pr grafana-image-renderer branch. Run as remote server
Open Localhost:3000; run a report or PDF creation, or any call to the image renderer service
Check traces: Explore -> Select ‘gdev-tempo’ datasource -> run query type Search with
resource.service.name="grafana-image-renderer"
http://localhost:3200
To test the plugin on Mac:
Makefile
to setARCH = darwin-arm64-unknown
make build_package
dist
foldergrafana-image-renderer
[rendering]
sectionAdditional Pr to review:
Deployment_tools pr for config setup
Grafana pr