broken_config parsing wrong system.config.receive #626

Xelynega · 2023-03-28T22:09:18Z

I'm running the 1.4.1 branch and am trying to create a device that passes the broken_config sequence test, but I've been running into issues and haven't been able to pass the test while others pass fine(extra_config, system_min_loglevel)

This is my understanding of what the sequencer tool is doing during the test(up to checking the response to the broken config):

Sync state/config and check the state last_config matches the config timestamp
Set min_loglevel to 100
Wait for a system.config.apply log message, ignore log messages where timestamp < lastConfigUpdate
Send the broken config
Wait for a system.config.receive log message, ignore log messages where timestamp < lastConfigUpdate
Check that the state has a stored status message with a warning/error(checkThatHasInterestingStatus(true))

The order of messages during the test from that my device is logging is:
TIME 1: <- config with testing.sequence_name of "reset_config" and min_loglevel of 200
TIME 2: -> state with last_config == TIME1
TIME 2: -> system.config.apply log of 300
TIME 3: <- config with testing.sequence_name of "reset_config", min_loglevel of 200, and system.extra_field of "reset_config"
TIME 4: -> state with last_config == TIME3
TIME 4: -> system.config.apply log of 300
TIME 5: <- config with testing.sequence_name of "reset_config", min_loglevel of 200, and system.extra_field of "reset_config"
TIME 6: -> state with last_config == TIME5
TIME 6: -> system.config.apply log of 300
TIME 7: <- config with testing.sequence_name of "broken_config" and min_loglevel of 200
TIME 8: -> state with last_config == TIME7
TIME 8: -> system.config.apply log of 300
TIME 9: <- config with testing.sequence_name of "broken_config" and min_loglevel of 100
TIME 10: -> state with last_config == TIME9
TIME 10: -> log event with system.config.apply log of 300, system.config.receive of 100, system.config.parse of 100
TIME 11: <- config received equal to b'{ broken because extra_field == break_json'
TIME 12: -> state with last_config == TIME9 and status message
TIME 12: -> log event with system.config.receive of 100, system.config.parse of 500

According to the verbose output of sequencer lastConfigUpdate gets set to TIME10 when system.config.receive from TIME10 is received by the sequencer, and TIME10 is the cutoff time for filtering out logs for the untilLogged call . What I think is happening is the system.config.receive message at TIME10 updates the "filter before" time, then since TIME10 >= TIME10, the call to untilLogged(SYSTEM_CONFIG_RECEIVE, SYSTEM_CONFIG_RECEIVE_LEVEL); after setExtraField("break_json") processes the system.config.receive message from TIME10 and calls checkThatHasInterestingStatus(true), which fails since the state hasn't been updates yet.

The example sequence log in validator/sequences/broken_config/sequencer.log shows that lastConfigUpdate was updated based on the system.config.receive message from T10(2023-03-07T16:40:01Z), the state with the error was sent at T12(2023-03-07T16:40:03Z), but the system.config.receive message that triggered the checkThatHasInterestingStatus call happened at T10:

2023-03-07T16:40:11Z DEBUG validation.feature.sequence stage finished waiting for log category system.config.receive level DEBUG was logged at 27s after 9s
2023-03-07T16:40:11Z TRACE validation.feature.sequence stage resume executing test at 27s
2023-03-07T16:40:11Z DEBUG validation.feature.sequence matching 2023-03-07T16:40:01Z DEBUG system.config.receive: success

Any help with this would be appreciated, thanks.

The text was updated successfully, but these errors were encountered:

Xelynega · 2023-03-28T23:11:23Z

Some additional information, It turns out that in my case I was able to pass the test by enforcing that logs are sent after the state messages instead of whatever order minimized context switching with my current implementation. Is it a requirement of the protocol to send state logs after the state message? I can't find any information that would indicate this is the case in the current documentation.

grafnu · 2023-03-29T05:16:05Z

Thank you for the detailed report! I'm starting to investigate... Short answer is "no, that is not a requirement" -- the messages are considered asynchronous in that there's no strict ordering required at that granularity. I don't have an answer for you yet, but I'm starting to dig through it.

grafnu · 2023-03-29T06:00:27Z

Ok, I have a preliminary fix on this working branch: https://github.com/grafnu/udmi/tree/logordering -- I was able to reproduce by introducing a before-state-send delay, and then the change I have in there clears it up, but if possible I'd like to get verification from you that it fixes the problem on your end. If that works, then I can cleanup and submit a PR proper.

Xelynega · 2023-03-29T15:31:03Z

Thank you for the quick reply. The logordering branch you've provided gets past that hasInterestingSystemStatus(true) check, but it's throwing an error on the checkNotLogged(SYSTEM_CONFIG_APPLY, SYSTEM_CONFIG_APPLY_LEVEL) even though the last one sent was for the config that set the minLogLevel to 100.

The system.config.apply message that throws the error has a timestamp of 2023-03-29T15:26:35Z. The state message with the system error status has a timestamp 2023-03-29T15:26:37.310223Z. The state message before the one with the system error status(that sets the min_loglevel to 100) has a timestamp 2023-03-29T15:26:35.530494Z, and the last log message related to lastConfigUpdate set it to 2023-03-29T15:26:35Z. I've attached the full log of the test with -v. Thanks for your help.

udmi.log

grafnu · 2023-03-30T02:06:37Z

Ok, I think I figured out what's going on and just pushed a potential fix to the same branch. Basically, in many situations the timestamps are quantized to a second, but the config updates might happen faster than that so it was intending to reject log messages from before the update but was including them since the (quantized) timestamps were a match. The log handling is a bit tricky because they end up asynchronous and out-of-order sometimes. The fix was just to add a fixed delay before updating the config, which should force log messages to be quantized usefully.

If this doesn't work then I might suggest we work out giving me access to the device entry in your cloud IoT instance (really just need the reflector private key you're using) and then I can run the tests locally on my end. But, hopefully this last iteration fixes the issue and there's no more!

Xelynega · 2023-03-30T15:36:34Z

The latest change gets me to the end of the test, but fails at the seconds last line(the checkNotLogged call). It looks like the min_loglevel for the device is set to DEBUG(100), then the test checks that system.config.apply was sent, system.config.receive was not sent, and system.config.parse was not sent(both of which the device sends since it was just configured with a loglevel of DEBUG).

Changing the two checkNotLogged calls at the end of the test to untilLogged calls results in a consistent pass on my end, so I'm wondering if those are mistakenly swapped.

Thanks again for your help, let me know where I can pass the reflector key to you and I'll get it to you so you can run the sequence on your end if you want.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

broken_config parsing wrong system.config.receive #626

broken_config parsing wrong system.config.receive #626

Xelynega commented Mar 28, 2023

Xelynega commented Mar 28, 2023

grafnu commented Mar 29, 2023

grafnu commented Mar 29, 2023

Xelynega commented Mar 29, 2023 •

edited

Loading

grafnu commented Mar 30, 2023

Xelynega commented Mar 30, 2023

broken_config parsing wrong system.config.receive #626

broken_config parsing wrong system.config.receive #626

Comments

Xelynega commented Mar 28, 2023

Xelynega commented Mar 28, 2023

grafnu commented Mar 29, 2023

grafnu commented Mar 29, 2023

Xelynega commented Mar 29, 2023 • edited Loading

grafnu commented Mar 30, 2023

Xelynega commented Mar 30, 2023

Xelynega commented Mar 29, 2023 •

edited

Loading