Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add status change event #3903
base: main
Are you sure you want to change the base?
Add status change event #3903
Changes from all commits
a04f52c
0b8ec64
6d61cbf
1029bb7
66237fa
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the part I find questionable in this PR. Instead of having a single source of truth for the status, we now have 2, some of the status comes from
st.machineClient.Status()
, but some of the status is hardcoded inmachine/sync.go
. If the 2 get out of sync, or if we want to improve status reporting in general, we'll have multiple places to change.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spending more time on it, I understand more the problems this is trying to solve.
One example is when the VM is stopped.
crc status
will succeed and report that crc is stopped, howevercrc stop
will return an error and say the machine is already stopped.Without
status.CrcStatus = changedEvent.State
, when invokingapi/stop
, we would not be reporting an error in the stream, but instead we'd only say the machine is stopped. This would not match what was reported to theapi/stop
call. There are most likely other state transitions where this is needed.The comment however seems to indicate sometimes
Status
could report a stale status, I haven't seen that in the little testing I did, but I'd be interested in understanding more when this happens? If we don't record this knowledge now when it's more or less fresh in our minds, it will be very difficult to find this back in 6 months from now if we need it :-/There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, there's more details in an older discussion:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion, this is an indication that these events are not fired in the right place, and this is papered over with even more complexity :-/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The API matches the usecase. Stop should in my opinion not return an error when already stopped; as you succeed in doing what is requested. What more information do we actually give with saying this failed as it was already stopped?!
If you believe this should be fixed or recorded, make these states clear for the convergence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The part you quoted is a description of the behaviour in this PR. These are some notes I took during review/testing when I was trying to understand the design choices which led to the code being reviewed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I try to replace this code, with firing event at more "proper" place, but face problem that
starting
state is "artificial", ie exist only insync.go
file, and VM doesn't have/reportsstarting
state. Alsostopping
state is the same.So I have no other ideas, than accept current PR and think on better state handling, like @cfergeau propose there #3903 (comment)
Or change state handling first, and then rebase this PR on top of it.
@gbraad @cfergeau WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can go forward with the current code, but we really need more details in the commit log, so that it records the design choices you made when writing the code, so that we know the current limitations, why this overwrites the state, ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(commenting here, but that applies to most of the file).
Duplicating the state logic in this file ("after calling
Start()
, the VM is in theStart
state, unless there was an error, in which case it's in theError
state") will likely cause additional maintainance burden, as we have to make sure to keep in sync the state logic in this file, and the state logic in other parts of crc - the status events may be reporting a different state fromStatus()
.Can we emit a
StatusChangedEvent
to tell other code the status changed, but not specify the current status in the event payload? This way the receiver of the event can checkStatus()
if they need the status, and we don't have 2 source for the status to keep in sync?Regarding errors, it seems if
Start()
(or other commands) return an error, you consider that the cluster is in an error state? This makes sense for some commands, but I don't think this matches what is reported byStatus()
at the moment? Would be nice to fix the reported Status() when there are start failures for examples, but doing it here would be papering over the real issue in my opinion.And in some other cases, errors are less bad, for example in the power off case, maybe the command failed early, and the cluster is still nicely running?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure to understand how to resolve you concern, the problem is that we don't have a way to track state change(at least I wasn’t able to find it, it would be nice if you point me that place if if it exist)
I first do that, but I face a race condition with updating state, for example:
User call start, we fire state change event, before actuality call
Start
, and if listener is fast enough, it could get old state (basically we could loststarting
state in transition fromstoped
torunning
states)Possible solution to that, may be moving state change event firing from
Synchronized
client in toclient
implementation ofmachine.Client
interface. And fire events only after actually state is changed.It is matched, as we send with SSE status reported by
Status()
in addition we also send error message, which come from status change event.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main concern is #3903 (comment) , and I'm under the impression that if you have a terminal showing the stream of events, and another running
crc status
every second, then the state between the 2 will not always match, in particular sometimes the event stream will containState: state.Error
whilecrc status
will not report an error?Guess I need to run both side by side and compare myself to understand better how this works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I understand the current code base is missing a centralized "setState" facility, and that this PR has to find a way of doing this, just trying to understand the limitations in the PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, output should be exactly the same, to get that I add
https://github.com/crc-org/crc/pull/3903/files#diff-e5710ade03da511d4845f380cd63319bf7329e960c1afbbba7fbed4060fcf8edR37
and
https://github.com/crc-org/crc/pull/3903/files#diff-e5710ade03da511d4845f380cd63319bf7329e960c1afbbba7fbed4060fcf8edR44-R50
Otherwise we need to move event triggering deeper in crc code, and fire event after execution of pice of code which change
crc status
output.And it is a question for all, should we integrate events in more depths of CRC core(which leads to spreading events over our codebase) or keep as is( in relative hi level Client interface implementation )?
@gbraad @cfergeau @praveenkumar WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say we can go with this for now, more details about the design choices, the implementation limitations, ...
One thing we can explore over time some centralized tracking of the cluster state, a
StateTracker
type, we'd add some calls toStateTracker.SetState(...)
in the appropriate places, theStateTracker
would fire events on status changes, it could have aCanChangeState(state)
helper if needed to know if callingStop
will result in an error or not, ...As long as all
crc
commands do not go through the daemon, it could do some polling to reconcile the internal state with changes through externalcrc
calls, ...