-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZTC-1478: track the number of times a logger has been replaced with a child logger #47
ZTC-1478: track the number of times a logger has been replaced with a child logger #47
Conversation
// NOTE: we intentionally use a lock without poisoning here to not | ||
// panic the threads if they just share telemetry with failed thread. | ||
pub(crate) type SharedLog = Arc<parking_lot::RwLock<Logger>>; | ||
pub(crate) type SharedLog = Arc<parking_lot::RwLock<LogContext>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment above explains why we cannot get away with just a single Arc
and a single RwLock
, with both the Logger
and the u32
living underneath the RwLock
.
If we really wanted to, we could get rid of the outer RwLock
, and just have SharedLog
be defined as an Arc<LogContext>
. That would mean we would need interior mutability inside LogContext
for updating the generation number; so that could be changed to an AtomicU32
.
I initially did it that way; but I found myself writing voluminous comments everywhere in order to justify/excuse what I was doing. It felt like a code smell, to me. So I added the extra RwLock
layer. I feel like this way of doing it is more readily comprehensible to people who might be reading the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I think it would be better to use an AtomicU32
on generation
. It's weird to me to have a struct that has only partial interior mutability, as it forces what you did here - two layers of locks. Since we're forced into the Arc<RwLock<Logger>>
for storing the logger itself, I think the next best choice is making the other field(s) on LogContext
have their own locks.
One benefit is that it would reduce work on the hot path of actually writing logs, while it would probably be a similar amount of work (not sure how locking a RwLock compares to incrementing + reading an atomic) on the add_fields!
path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated the PR accordingly. My gut says "no, it was better the way it was before" but I am curious what you (and others) think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is an improvement. As you note, it's not mission critical that generation is exactly in sync - we don't care if it triggers at 1005 and not 1000, really. Maybe we could even used Ordering::Relaxed
, but I'm no expert on what exactly that does and whether it's safe. I really like that we don't have an extra lock on the hot path of logging now. overwrite_from
is a little gross, but I think it's worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK! I'm glad we did this, rather than just me making assumptions about what would be best! :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the current implementation, the generation
field is indeed accessed always under the logger lock, which creates the necessary happens-before
relationship. Those atomic accesses could be replaced by relaxed operations without any impact.
As for what those mean, basically it's how strong of a memory barrier the compiler needs to issue depending on the architecture of the processor. For e.g., with strong consistency you don't really get any more expensive than with relaxed in x86_64. But for ARM, strong consistency is more expensive than relaxed because the memory model of the processor is weaker than x86_64.
As for atomic increments vs lock acquisition, the former may be cheaper in some architectures (e.g. x86_64 has fetch_and_inc which is commutative and much more scalable in terms of concurrency than compare_and_swap, with the latter being the primitive used for locks).
foundations/tests/logging.rs
Outdated
// panics (with an error that gives you a hint as to where to go look in your | ||
// code). | ||
#[with_test_telemetry(test)] | ||
fn test_limit_num_generations(_ctx: TestTelemetryContext) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think an important property of MAX_LOG_GENERATION that we want to verify (but I don't think the current test is) is: "MAX_LOG_GENERATION is small enough to prevent a stackoverflow on drop". It seems like this test is verifying "if we call add_fields more than the max, we get our own panic". Could we also have a test that verifies that "if you call add_fields just under the max, it doesn't panic on drop"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unit test case is added!
let log = parent.read().new(slog::o!()); | ||
|
||
Arc::new(parking_lot::RwLock::new(log)) | ||
let log_lock = parent.read(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't strike me as being functionally the same...
Unless I'm missing something, I think these changes will make it so that fork_log still writes to the same output (since it's just cloning the arcs and locks, not creating a new log with a new output)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why it wouldn't be functionally the same. Calling Logger::new()
with an empty KV will clone the drain and add zero fields to the KV. Which is basically what <LogContext as Clone>::clone
will do, too.
I'm open to being convinced otherwise! but I will leave this as is for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah - I mistakenly thought o!
was making a new output or some such, not the KV. Looking at slog's source now, I agree, I think you're right and what you're doing is functionally the same.
// NOTE: we intentionally use a lock without poisoning here to not | ||
// panic the threads if they just share telemetry with failed thread. | ||
pub(crate) type SharedLog = Arc<parking_lot::RwLock<Logger>>; | ||
pub(crate) type SharedLog = Arc<parking_lot::RwLock<LogContext>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I think it would be better to use an AtomicU32
on generation
. It's weird to me to have a struct that has only partial interior mutability, as it forces what you did here - two layers of locks. Since we're forced into the Arc<RwLock<Logger>>
for storing the logger itself, I think the next best choice is making the other field(s) on LogContext
have their own locks.
One benefit is that it would reduce work on the hot path of actually writing logs, while it would probably be a similar amount of work (not sure how locking a RwLock compares to incrementing + reading an atomic) on the add_fields!
path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, this looks way better than if I had done it.
My only comment is around documenting this new behaviour, maybe the documentation for add_fields
and set_verbosity
should warn the user about the potential panic lurking inside?
How does that look? |
LGTM! |
…child Every time we call the add_fields! macro, it adds one to the depth of the nested structure of Arcs inside the logger object. If the structure gets too deeply nested, it causes a stack overflow on drop. Fixing this to make it keep the structure flat rather than nested would be difficult; so, this commit makes it keep track of how many times it has replaced a logger with a child logger, and it stops you (with a panic) before you reach a dangerous level of nesting. The panic string is intended to be informative enough so that you get a hint as to where to go look in your code.
…d with a child logger (cloudflare#47)" This reverts commit 341befe. (This change is being reimplemented in a different way, see the next commit)
Every time we call the add_fields! macro, it adds one to the depth of the nested structure of Arcs inside the logger object. If the structure gets too deeply nested, it causes a stack overflow on drop.
Fixing this to make it keep the structure flat rather than nested would be difficult; so, this commit makes it keep track of how many times it has replaced a logger with a child logger, and it stops you (with a panic) before you reach a dangerous level of nesting.
The panic string is intended to be informative enough so that you get a hint as to where to go look in your code.