-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
broken_config parsing wrong system.config.receive #626
Comments
Some additional information, It turns out that in my case I was able to pass the test by enforcing that logs are sent after the state messages instead of whatever order minimized context switching with my current implementation. Is it a requirement of the protocol to send state logs after the state message? I can't find any information that would indicate this is the case in the current documentation. |
Thank you for the detailed report! I'm starting to investigate... Short answer is "no, that is not a requirement" -- the messages are considered asynchronous in that there's no strict ordering required at that granularity. I don't have an answer for you yet, but I'm starting to dig through it. |
Ok, I have a preliminary fix on this working branch: https://github.com/grafnu/udmi/tree/logordering -- I was able to reproduce by introducing a before-state-send delay, and then the change I have in there clears it up, but if possible I'd like to get verification from you that it fixes the problem on your end. If that works, then I can cleanup and submit a PR proper. |
Thank you for the quick reply. The logordering branch you've provided gets past that hasInterestingSystemStatus(true) check, but it's throwing an error on the checkNotLogged(SYSTEM_CONFIG_APPLY, SYSTEM_CONFIG_APPLY_LEVEL) even though the last one sent was for the config that set the minLogLevel to 100. The system.config.apply message that throws the error has a timestamp of |
Ok, I think I figured out what's going on and just pushed a potential fix to the same branch. Basically, in many situations the timestamps are quantized to a second, but the config updates might happen faster than that so it was intending to reject log messages from before the update but was including them since the (quantized) timestamps were a match. The log handling is a bit tricky because they end up asynchronous and out-of-order sometimes. The fix was just to add a fixed delay before updating the config, which should force log messages to be quantized usefully. If this doesn't work then I might suggest we work out giving me access to the device entry in your cloud IoT instance (really just need the reflector private key you're using) and then I can run the tests locally on my end. But, hopefully this last iteration fixes the issue and there's no more! |
The latest change gets me to the end of the test, but fails at the seconds last line(the checkNotLogged call). It looks like the min_loglevel for the device is set to DEBUG(100), then the test checks that system.config.apply was sent, system.config.receive was not sent, and system.config.parse was not sent(both of which the device sends since it was just configured with a loglevel of DEBUG). Changing the two Thanks again for your help, let me know where I can pass the reflector key to you and I'll get it to you so you can run the sequence on your end if you want. |
I'm running the 1.4.1 branch and am trying to create a device that passes the
broken_config
sequence test, but I've been running into issues and haven't been able to pass the test while others pass fine(extra_config, system_min_loglevel)This is my understanding of what the sequencer tool is doing during the test(up to checking the response to the broken config):
The order of messages during the test from that my device is logging is:
TIME 1: <- config with testing.sequence_name of "reset_config" and min_loglevel of 200
TIME 2: -> state with last_config == TIME1
TIME 2: -> system.config.apply log of 300
TIME 3: <- config with testing.sequence_name of "reset_config", min_loglevel of 200, and system.extra_field of "reset_config"
TIME 4: -> state with last_config == TIME3
TIME 4: -> system.config.apply log of 300
TIME 5: <- config with testing.sequence_name of "reset_config", min_loglevel of 200, and system.extra_field of "reset_config"
TIME 6: -> state with last_config == TIME5
TIME 6: -> system.config.apply log of 300
TIME 7: <- config with testing.sequence_name of "broken_config" and min_loglevel of 200
TIME 8: -> state with last_config == TIME7
TIME 8: -> system.config.apply log of 300
TIME 9: <- config with testing.sequence_name of "broken_config" and min_loglevel of 100
TIME 10: -> state with last_config == TIME9
TIME 10: -> log event with system.config.apply log of 300, system.config.receive of 100, system.config.parse of 100
TIME 11: <- config received equal to b'{ broken because extra_field == break_json'
TIME 12: -> state with last_config == TIME9 and status message
TIME 12: -> log event with system.config.receive of 100, system.config.parse of 500
According to the verbose output of sequencer lastConfigUpdate gets set to TIME10 when system.config.receive from TIME10 is received by the sequencer, and TIME10 is the cutoff time for filtering out logs for the untilLogged call . What I think is happening is the system.config.receive message at TIME10 updates the "filter before" time, then since TIME10 >= TIME10, the call to untilLogged(SYSTEM_CONFIG_RECEIVE, SYSTEM_CONFIG_RECEIVE_LEVEL); after setExtraField("break_json") processes the system.config.receive message from TIME10 and calls checkThatHasInterestingStatus(true), which fails since the state hasn't been updates yet.
The example sequence log in validator/sequences/broken_config/sequencer.log shows that lastConfigUpdate was updated based on the system.config.receive message from T10(2023-03-07T16:40:01Z), the state with the error was sent at T12(2023-03-07T16:40:03Z), but the system.config.receive message that triggered the checkThatHasInterestingStatus call happened at T10:
2023-03-07T16:40:11Z DEBUG validation.feature.sequence stage finished waiting for log category
system.config.receive
levelDEBUG
was logged at 27s after 9s2023-03-07T16:40:11Z TRACE validation.feature.sequence stage resume executing test at 27s
2023-03-07T16:40:11Z DEBUG validation.feature.sequence matching 2023-03-07T16:40:01Z DEBUG system.config.receive: success
Any help with this would be appreciated, thanks.
The text was updated successfully, but these errors were encountered: