Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: jaeger traces not persisted across container restarts #82

Merged
merged 6 commits into from
Oct 25, 2024

Conversation

rmrt1n
Copy link
Member

@rmrt1n rmrt1n commented Oct 21, 2024

Closes: WORLD-1208

Overview

Previously the Jaeger container only stores the spans in memory, which is lost when the container is restarted. This PR fixes this by configuring Jaeger to use Badger as the span storage engine with a Docker volume.

Brief Changelog

  • Updated the Jaeger container config to use badger

Testing and Verifying

Manually tested and verified.

Summary by CodeRabbit

  • New Features

    • Enhanced Jaeger service configuration with additional environment variables for local development.
    • Introduced volume mount configuration for improved service management.
    • Adjusted default settings for telemetry and metrics to be enabled by default, enhancing monitoring capabilities.
  • Chores

    • Updated .gitignore to refine binary path specifications.
    • Removed linting directive from EVM function without altering functionality.

Copy link

graphite-app bot commented Oct 21, 2024

Your org has enabled the Graphite merge queue for merging into main

Add the label “graphite/merge” to the PR and Graphite will automatically add it to the merge queue when it’s ready to merge.

You must have a Graphite account and log in to Graphite in order to use the merge queue. Sign up using this link.

Copy link
Member Author

rmrt1n commented Oct 21, 2024

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @rmrt1n and the rest of your teammates on Graphite Graphite

Copy link

codecov bot commented Oct 21, 2024

Codecov Report

Attention: Patch coverage is 95.65217% with 1 line in your changes missing coverage. Please review.

Project coverage is 48.09%. Comparing base (e1adc44) to head (8f15234).
Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
common/docker/service/evm.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #82      +/-   ##
==========================================
+ Coverage   45.72%   48.09%   +2.36%     
==========================================
  Files          56       56              
  Lines        3427     3541     +114     
==========================================
+ Hits         1567     1703     +136     
+ Misses       1655     1625      -30     
- Partials      205      213       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@rmrt1n rmrt1n marked this pull request as ready for review October 23, 2024 14:30
Copy link

coderabbitai bot commented Oct 23, 2024

Walkthrough

The changes in this pull request involve modifications to the .gitignore file and enhancements to the Jaeger, EVM, and Nakama functions in their respective files. The .gitignore adjustments update the path specification for the built binary. In the jaeger.go file, new environment variables related to Jaeger's storage backend are introduced, and the HostConfig struct is updated to include a volume mount. The EVM function removes a linting directive, while the Nakama function alters the default behavior of telemetry and metrics settings.

Changes

File Change Summary
.gitignore Updated path specification for the built binary from -world to +./world.
common/docker/service/jaeger.go Added new environment variables for Jaeger's storage backend and updated HostConfig struct to include a Mounts field.
common/docker/service/evm.go Removed linting directive comment from the EVM function; no functional changes made.
common/docker/service/nakama.go Altered default values for traceEnabled and metricsEnabled in the Nakama function.

Assessment against linked issues

Objective Addressed Explanation
Fix Nakama traces are not persisted after Jaeger container restart (WORLD-1208)

Possibly related PRs

  • feat: world engine telemetry setup and config #79: The changes in this PR involve enhancing telemetry capabilities, which includes modifying configurations related to Jaeger and Prometheus, aligning with the changes made in the main PR that updates the .gitignore for the world binary, indicating a focus on the world engine's configuration and management.

Suggested reviewers

  • smsunarto
  • zulkhair

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@rmrt1n rmrt1n requested a review from zulkhair October 23, 2024 14:30
@rmrt1n rmrt1n force-pushed the ryan/world-1208-persistent-spans branch from 15f5e42 to 238e710 Compare October 25, 2024 08:21
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range comments (1)
common/docker/service/nakama.go (1)

Line range hint 39-54: Consider revising the telemetry architecture.

While enabling traces by default helps ensure they are captured for persistence (addressing WORLD-1208), the current changes create an inconsistent architecture where telemetry sub-features (tracing, metrics) can be active even when the main telemetry feature is disabled. This could lead to:

  1. Confusion in configuration management
  2. Unexpected resource usage
  3. Security implications from always-on metrics endpoints

Consider implementing a clearer hierarchy:

  1. cfg.Telemetry as the master switch
  2. Individual features (tracing, metrics) configurable only when telemetry is enabled
  3. Document the architecture decisions in comments
📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE

📥 Commits

Files that changed from the base of the PR and between 15f5e42 and 40cd9ce.

📒 Files selected for processing (4)
  • .gitignore (1 hunks)
  • common/docker/service/evm.go (1 hunks)
  • common/docker/service/jaeger.go (2 hunks)
  • common/docker/service/nakama.go (1 hunks)
🧰 Additional context used
🪛 GitHub Check: codecov/patch
common/docker/service/evm.go

[warning] 31-31: common/docker/service/evm.go#L31
Added line #L31 was not covered by tests

🔇 Additional comments (7)
.gitignore (1)

37-37: LGTM! Verify build scripts compatibility.

The explicit relative path ./world is clearer than the implicit path. However, let's ensure this doesn't affect any build scripts.

✅ Verification successful

Let me check the content of README.md to understand how the binary is referenced there.


Let me check if there are any other gitignore patterns in the repository that might need standardization, and also verify if there are any build scripts that we might have missed in our previous search.


Change to ./world in .gitignore is safe and consistent

The change to use ./world is safe and follows good practices:

  • No build scripts found that could be affected
  • README.md only references the binary in documentation context
  • Other binary ignores in .gitignore (like world-cli) follow similar relative path patterns
  • The change makes the path resolution more explicit without changing functionality
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if any build scripts or documentation reference the binary path
# Test: Search for references to the binary. Expect: Consistent usage of the path.
rg -l 'world' --type sh
rg -l 'world' --type md

Length of output: 206


Script:

#!/bin/bash
# Check how the binary is referenced in README.md
rg 'world' README.md -C 2

Length of output: 1375


Script:

#!/bin/bash
# Check the content of .gitignore for similar patterns
cat .gitignore

# Search for build scripts in any directory
fd -e sh -e bash

Length of output: 618

common/docker/service/jaeger.go (2)

7-7: LGTM: Required import for volume mount configuration.

The added mount package import is necessary for implementing persistent storage.


23-33: LGTM: Proper configuration for persistent storage using Badger.

The environment variables correctly configure Badger as a persistent storage backend, addressing the issue of trace data loss during container restarts.

Security consideration: Verify CORS configuration.

The Access-Control-Allow-Origin:* header allows requests from any origin. While this might be acceptable for local development, it could pose security risks if accidentally used in production.

✅ Verification successful

The previous search might have been too specific. Let's broaden our search to understand the deployment context and verify if this configuration is indeed only used for local development as commented in the code.


Let's examine how this Jaeger configuration is used in the broader context and if there are any production-specific overrides.


We need to verify if this is strictly a development environment configuration or if it could potentially be used in production.


CORS configuration is safe as it's strictly for local development

The CORS setting is part of a local development configuration, as evidenced by:

  1. The code is explicitly commented as "not recommended to be used for production environment"
  2. This configuration is part of the development tooling (dev.go and start.go) used for local development and testing
  3. No production deployment configurations were found that could inherit this CORS setting

The permissive CORS header is acceptable in this context as it's isolated to local development environments.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if there are any production deployment configurations that might inherit this CORS setting
rg -l "QUERY_ADDITIONAL_HEADERS.*Access-Control-Allow-Origin" --type yaml --type json

Length of output: 87


Script:

#!/bin/bash
# Let's check for any deployment or configuration files that might contain Jaeger settings
fd -e yaml -e yml -e json | xargs rg -l "jaeger|JAEGER"

# Also check for any Docker Compose or Kubernetes deployment files
fd "docker-compose|deployment" -e yaml -e yml | xargs cat

Length of output: 181


Script:

#!/bin/bash
# Let's check the content of nakama.go to understand how Jaeger is used
cat common/docker/service/nakama.go

# Also check if there are any other files that might reference this Docker service configuration
rg -l "docker/service" --type go

Length of output: 3692


Script:

#!/bin/bash
# Let's check the dev and start commands to understand the deployment context
cat cmd/world/cardinal/dev.go
cat cmd/world/cardinal/start.go

# Also check if there are any production deployment related files
fd "prod|production" -e go -e yaml -e yml | xargs cat

Length of output: 13207

common/docker/service/evm.go (2)

31-31: Verify the relevance of this change to the PR objective.

This change (removing the linting directive and modifying the default value assignment) appears unrelated to the PR's objective of fixing Jaeger trace persistence. Please clarify if this is an intentional cleanup or should be moved to a separate PR.

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 31-31: common/docker/service/evm.go#L31
Added line #L31 was not covered by tests


31-31: 🧹 Nitpick (assertive)

Add test coverage for default faucet configuration.

This line sets a critical default configuration for the faucet feature but lacks test coverage. Consider adding tests to verify the default behavior.

Would you like me to help generate test cases for the default configuration handling?

🧰 Tools
🪛 GitHub Check: codecov/patch

[warning] 31-31: common/docker/service/evm.go#L31
Added line #L31 was not covered by tests

common/docker/service/nakama.go (2)

39-41: Review the telemetry and tracing logic consistency.

The change makes tracing enabled by default, even when telemetry is disabled (!cfg.Telemetry). This creates a potentially confusing situation where a disabled telemetry feature still produces traces.

Consider one of these approaches:

  1. Keep telemetry as the master switch:
if traceEnabled == "" || !cfg.Telemetry {
-    traceEnabled = "true"
+    traceEnabled = strconv.FormatBool(cfg.Telemetry)
}
  1. Document why traces should be collected independently of the telemetry setting

Line range hint 48-54: Review metrics configuration behavior and security implications.

The changes introduce several concerns:

  1. Metrics are now always enabled by default, potentially consuming resources unnecessarily
  2. The NAKAMA_METRICS_ENABLED environment variable is only checked when telemetry is enabled, creating an inconsistent configuration pattern
  3. The Prometheus endpoint (port 9100) will be exposed by default, which could be a security consideration

Consider this alternative implementation:

-    metricsEnabled := true
+    metricsEnabled := cfg.Telemetry
     if cfg.Telemetry {
         cfgMetricsEnabled, err := strconv.ParseBool(cfg.DockerEnv["NAKAMA_METRICS_ENABLED"])
         if err == nil {
             metricsEnabled = cfgMetricsEnabled
         }
     }

Let's verify the metrics configuration in other services:

.gitignore Show resolved Hide resolved
common/docker/service/jaeger.go Show resolved Hide resolved
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE

📥 Commits

Files that changed from the base of the PR and between 40cd9ce and 8f15234.

📒 Files selected for processing (1)
  • common/docker/service/jaeger.go (2 hunks)
🔇 Additional comments (3)
common/docker/service/jaeger.go (3)

7-7: LGTM: Import added for Docker volume mount configuration.


45-46: Volume mount configuration implements persistence correctly.


23-33: Storage configuration aligns with persistence requirements.

The environment variables correctly configure Badger as a persistent storage backend, addressing the trace persistence issue. The comments effectively document the local development context.

Let's verify the Badger configuration matches Jaeger's recommendations:

✅ Verification successful

Storage configuration is appropriate for local development

The Badger storage configuration is used exclusively in the local development context, as evidenced by:

  • The configuration exists only in common/docker/service/jaeger.go
  • The code includes clear comments about its intended use for local development
  • No production configurations were found that would conflict with this setup
  • The configuration aligns with basic Badger persistence requirements

The environment variables are properly isolated to the development environment, with separate Jaeger configurations (JAEGER_ADDR and JAEGER_SAMPLE_RATE) being used in the Nakama service configuration.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify Badger configuration against Jaeger documentation
# Expected: Should find matching configuration in Jaeger's documentation or examples

rg -l "SPAN_STORAGE_TYPE=badger" .
rg -l "BADGER_EPHEMERAL=false" .

Length of output: 135


Script:

#!/bin/bash
# Let's search for any other Badger-related configurations or documentation in the codebase
rg -i "badger.*storage" .

# Also search for any comments or documentation about storage configuration
rg -i "storage.*configuration" .

# Check if there are any other environment variables related to Jaeger
rg "JAEGER_" .

Length of output: 399

common/docker/service/jaeger.go Show resolved Hide resolved
@rmrt1n rmrt1n merged commit 6abf03c into main Oct 25, 2024
12 checks passed
@rmrt1n rmrt1n deleted the ryan/world-1208-persistent-spans branch October 25, 2024 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants