-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: general service discovery for farm testnets #1335
base: main
Are you sure you want to change the base?
Conversation
MSD Diff No diff |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Overview
This PR introduces a general service discovery service for farm testnets that registers target machines and integrates with the vector stack for log scraping. Key changes include the implementation of a server module with endpoints for adding and retrieving targets, new storage implementations (in-memory and file), and updates to metrics, supervisor, and configuration builder modules to support the service discovery workflow.
Reviewed Changes
File | Description |
---|---|
src/server/mod.rs | Implements the server with axum routes and a graceful shutdown. |
src/storage/in_memory.rs and src/storage/file.rs | Introduces in-memory and file storage mechanisms for target persistence. |
src/metrics.rs | Adds metrics instrumentation for monitoring target statuses. |
src/supervisor.rs | Implements target supervision with target polling, error logging, and GC handling. |
src/server/add_targets.rs & get_targets.rs | Define HTTP endpoints for target management. |
Cargo.toml | Updates dependencies and workspace members. |
multiservice-discovery-shared/builders/general_exec.rs | Adds a configuration builder for general service discovery. |
multiservice-discovery-downloader | Updates configuration generation logic and target conversion functions. |
multiservice-discovery-shared/builders/exec_log_config_structure.rs | Exports builder types for command execution configurations. |
Copilot reviewed 24 out of 24 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (2)
rs/ic-observability/multiservice-discovery-downloader/src/downloader_loop.rs:107
- Using unwrap on cli.output_dir.parent() may panic if the output directory has no parent; add a proper check or use a fallback value.
fs_err::create_dir_all(cli.output_dir.parent().unwrap()).unwrap();
rs/ic-observability/multiservice-discovery-downloader/src/downloader_loop.rs:190
- Deserialization with unwrap in convert_and_filter_general_targets can lead to a runtime panic if the input does not match the expected schema; consider error handling to gracefully skip or report faulty entries.
values.into_iter().map(|value| serde_json::from_value(value).unwrap())
if SystemTime::now().duration_since(self.last_successful_sync).unwrap() > self.gc_timeout { | ||
self.info("GC elapsed, removing target..."); | ||
(self.remove_self)().await; | ||
self.metrics.remove_observed_value(&self.target.name); | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using unwrap on duration_since can cause a panic if the system clock is adjusted or if an error occurs; consider handling the Err case explicitly.
if SystemTime::now().duration_since(self.last_successful_sync).unwrap() > self.gc_timeout { | |
self.info("GC elapsed, removing target..."); | |
(self.remove_self)().await; | |
self.metrics.remove_observed_value(&self.target.name); | |
break; | |
match SystemTime::now().duration_since(self.last_successful_sync) { | |
Ok(duration) => { | |
if duration > self.gc_timeout { | |
self.info("GC elapsed, removing target..."); | |
(self.remove_self)().await; | |
self.metrics.remove_observed_value(&self.target.name); | |
break; | |
} | |
} | |
Err(e) => { | |
self.warn(format!("Failed to calculate duration since last successful sync: {:?}", e)); | |
} |
Copilot is powered by AI, so mistakes are possible. Review output carefully before use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't make any sense to make the code uglier since the time is never set to anything but SystemTime::now
. If this happens ever I would want the app to completely crash since we have a bug and not to just print the error and try and recover.
This should never happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the system time is a problem, can you change it to use monotonic time? std::time::Instant. They are guaranteed to be increasing all the time. Plsfix.
#[derive(Parser, Debug)] | ||
struct CliArgs { | ||
/// Storage mode used for general service discovery. | ||
#[arg(default_value_t = StorageMode::InMemory, long, short)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the in-memory storage be the default? What happens with the list of testnets if the service is updated/restarted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where ever we deploy it it will be deployed with a file. This is the default just so that locally when testing something you can quickly do a cargo run
and be off to testing and not have to provide the files all the time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love sane dev defaults.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
avoid system clock, use std time instant. These can be compared and have no risk of falling backwards.
This service will be used for all sorts of farm testnets which will monitor arbitrary services. The idea is to use this to register testnet machines and then use our vector stack to scrape logs