-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Logbook full of "unavailable" entries; log shows "Timeout fetching resource status" #155
Comments
OK, I've now looked through the code and the MAC address appears to only be used to set a unique ID for HA, so my theory about it being an ARP issue is a non-starter. |
Unfortunately, this is "normal". I have looked into it several times and OS just does not consistently respond to all calls. I get it too, about 3-4 times a day. Since it is sporadic, it doesn't really effect the operation. The amount you get also seems to depend on how good your connection between the two devices are. One thing I might look at is to downgrade the log message to a warning instead of an error. Also note that I have not tried any firmware beyond 2.1.9 (4). I will have to find some time to upgrade and test. |
Thanks for the reply, Vincent! My OS is wired directly into my switch (as is my HA server), so I don't think it's the network. I had considered that the OS hardware is only 10/half, so I reset the port statistics a couple days ago to see how quickly errors accumulated. But the stats from don't indicate that many collisions or errors as a percentage of packets sent/received:
If this only happened "a few times a day" for me, I'd probably be OK ignoring it. But, looking at the last 61h (= 2d13h, the same time since I cleared the counters on the switch port) of my HA logs:
As you can see, it happens a lot (and today's not even over yet), and of course each one of those timeouts corresponds to a ton of Logbook entries, and it makes my History graphs look bad (as you know, this is only one of many associated with OS). Here is the last hour, as an example: I'm very interested in trying to at least improve the situation, whether that's through retries, caching responses, etc. The first thing I wonder: would reducing the stat query interval from every 5s to every 30s make a difference in how often a timeout occurs? Is this easily configurable to test? |
changing the interval should be as easy as changing this line:
However this will mean all switches, sensor etc take much long to update, which might not be the best experience. Another thing you could take a look at is try using the WIP MQTT branch and see if that works well for you. I haven't taken a look at it in a long time so it might need a bit of updating if you are comfortable with code, which seems like you are. This will make it a push rather than a poll system. There is a PR for this: #111 You will need to have MQTT setup with HASS and also set it in the settings in OpenSprinkler . I think minimum is 2.1.9 (4). |
Ok, I will look in to this a bit a more next week. Thanks again! I’m reluctant to upgrade the firmware, but maybe I can try 2.1.9(4) and see if it behaves any different than (7). Out of curiosity, what hardware version do you have? Wondering if that’s a factor… |
I am on hardware 3.0 |
I wonder how much difference hardware v3.0 makes... FWIW, I changed
It looked really good for a while there, but the last couple days it has spiked up again (even though I haven't made any changes). I will let this run for another few days, and then try changing the scan interval to 60s to see how that effects things. |
I have this issue as well. The frequency of timeouts seemed too precise. I enabled debug log and tried to find out more. It looked like every 25th request times out with the defaults (5s scan interval, 10s timeout). Like clockwork. Motivated by @eddyg, I tried querying the sprinkler using curl instead and see if I can reproduce this behavior. It turns out I can - it is important to add the timeout option ("-m 10" to mimic the integration). I then played with various request frequency and timeouts. I have concluded my opensprinkler fails to respond in a timely fashion for ~16s every ~125s. It does reply to ping requests, it does accept a connection and does accept the request, but it does not reply for the "black out" period of roughly 16s. What is required is not changing the scan interval, but the TIMEOUT in init.py. The opensprinkler integration is not at fault here, but changing the timeout does resolve the frequent unavailability of entities as well as huge number of entries generated in Home Assistant because of the state changes. I hope this helps. |
Wow, that's some very interesting data, @jandado! Thanks for following up here! 👍 I'm going to try changing the Out of curiosity, which hardware version are you using and which firmware? (I have hardware version 2.3, and continue to run the older 2.1.7 release.) I've noticed that my LCD backlight no longer "dims" even though I have the "Idle Brightness" setting set quite low... I know that used to work before I experimented with running 2.1.9(7)... (I see now that 2.1.9(9) is available...) |
Let me know how you guys go after running for a while, if it works I will add the change to the integration |
I set
I plan to try the 2.1.9(9) firmware and see if it makes a difference since the release notes mention "hopefully improving wired Ethernet reliability", but I had issues last time I upgraded from 2.1.7 where API calls would start returning "result:2" (unauthorized)... |
I have HW version 3.1 - AC, firmware 2.1.9(9). I have had zero failures since the modification (that's almost 5 days now). @eddyg, I would suggest you try the following (assuming the token is still valid): It echos the curl exit code, times out in 1s. The reason for the short timeout it to try to identify, if the sprinkler times out randomly or in larger "windows" as mine. This is what my output looks like after running it for something over 4 minutes: @vinteo, I would say the results are, at this point, inconclusive. I can always make the modification locally, if it only works for me. Let's see if Eddy will be able to find a pattern. I find it odd the change did initially nothing for him (on the 6th & 7th), only to get much worse later (8-10th). It seems like there is something else at play. Cheers, |
I have got an update - I was able to identify the source of my problem. I have long forgot I had MQTT configured on my sprinkler, before I had this integration installed. But I have renamed the host running the MQTT service since then. Fixing the hostname in the sprinkler MQTT configuration resolved the "blackouts". The name lookup was the culprit - I assume there must have been multiple attempts by the sprinkler before it gave up each request, but it blocked replying the http requests for the duration. @vinteo, I no longer need the timeout increased. @eddyg - what's the chance of something similar happening in your env? I believe your FW does not yet support MQTT, but if you use the weather service, IFTTT integration, or NTP - perhaps all of them can be a source of what you experience, if they have the potential to block the reply to the integration request and can occasionally take longer than expected. Just my last two cents. The issue is now resolved for my specific case. |
Wow. That’s some amazing debugging work! |
If the log message was downgraded to a warning would that stop it from showing up in the HA system logs? |
First, let me say that I was very excited to find this Home Assistant integration!
A couple pieces of background info:
I am running with OS hardware version 2.3, with firmware 2.1.7 (I had previously tried upgrading to 2.1.9(7) but ran in to weird issues where after a few minutes, API calls would starting returning "result:2" (unauthorized) which required me to run with the "Ignore Password" option. I also had multiple occurrences of the clock doing a "time warp" (even though I have NTP sync enabled) which resulted in a watering program running twice on the same day!! I eventually downgraded back to 2.1.7 and things are once again stable. I opened a ticket with OpenSprinkler about the weirdness I saw after upgrading to 2.1.9(7), but they had not heard of problems like what I described.)
I recently found this integration and successfully added it to my Home Assistant config. Everything seemed to be working great, but then I noticed my Logbook was full of "OpenSprinkler * became unavailable" followed 5 seconds later by "OpenSprinkler * turned [on|off]" entries.
I checked my Home Assistant log, and it's full of messages like this:
Hoping to reproduce the problem, I've been running this loop (the
pw
hash is for the default) polling the "all" endpoint every 5 seconds:but have not seen any failures.
Any idea what might be causing the timeouts in HA? My first thought is that it could be related to needing to supply the controller's MAC address when you're not running 2.1.9(4) or later, so maybe it's ARP-related. (I have not yet looked at the code to see why this is a requirement...)
The text was updated successfully, but these errors were encountered: