Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to debug issues? #24

Closed
PaulWieland opened this issue Dec 8, 2023 · 24 comments
Closed

How to debug issues? #24

PaulWieland opened this issue Dec 8, 2023 · 24 comments

Comments

@PaulWieland
Copy link
Contributor

Having an issue pairing with HomeKit and wonder the best way to trouble shoot / debug what's going on.

  • I flashed version 0.2.2
  • Vist Device
  • Add Accessory in iOS
  • Scan QR Code
  • Add Anyway

Then an error appears saying that the device could not be added.

I then started over/erased/repeated entire procedure. Failed a second time. But the third time it worked...

@thenewwazoo
Copy link
Collaborator

Unfortunately, the only way to get any feedback from the device right now is serial logs. I have a sneaking suspicion that the ESP8266 devices are running out of RAM when they get a whole bunch of connections from various HomeKit hubs and clients while it's also trying to do pairing. I've seen this fail a couple of times but haven't been able to reproduce it reliably.

@PaulWieland
Copy link
Contributor Author

PaulWieland commented Dec 8, 2023

Ok, FWIW I only have one hub and 3 clients.
I was actually watching the serial log but didn't see any activity. I'm not sure if thats because it wasn't refreshing or if its because the client wasn't actually reaching it. I'll do some more experimenting and report back if I find a common theme.

@thenewwazoo
Copy link
Collaborator

thenewwazoo commented Dec 8, 2023

@PaulWieland what're you using to power your homekit device?

edit: more specifically, I'm asking what the output power rating is.

@vponnusamy
Copy link

vponnusamy commented Dec 9, 2023

Running into the same thing--times out when adding the accessory. I'm about 0/10 so far. I do have LOTS of stuff in my HomeKit...which might be relevant to the hypothesis its lots of connections during the pairing.

Edit: try 11 worked!

@eric-ooi
Copy link

Wondering if this is related to the wifi instability that was mentioned in Discord. When I run a continuous ping on the HomeKit firmware I see significant variance in latency and packet loss / timeouts. When I reflash with the ESPHome firmware and run the same ping, there is minimal latency and no packet loss / timeouts.

@thenewwazoo thenewwazoo changed the title How to debug HomeKit Pairing issues? How to debug issues? Dec 13, 2023
@thenewwazoo
Copy link
Collaborator

Connecting a USB cable to collect serial logs is not ideal (especially when the log view is flaky for unknown reasons). And trying to collect logs after-the-fact is probably going to be less successful.

What I'd like is a rolling log file in flash that can be read back out after the fact. @PaulWieland, the ESP8266 part has 1 MB flash, right? What's the default LittleFS partition space, do you know? Even just alternately writing to a pair of log files and clobbering the older one, or something similarly simplistic, would suffice to give a bit of persistence without making log truncation a hard problem to solve, but perhaps there's something off-the-shelf that's available?

@OleSam85
Copy link

I'm having the same issues as reported in #30, specifically after setting up the device in HomeKit it will work properly for about 1-2 hours (need to confirm exactly how long this takes) but then the Home app will show that the ratgdo accessory is not responding. Adding the accessory to my home has been painless so far (though one time I did have to reboot ratgdo first). Even after it stops responding in HomeKit, I can still access the ratgdo web interface and use that to reset it. After that it seems to be responsive again with HomeKit, but only for a short time (<2 hrs).

@thenewwazoo
Copy link
Collaborator

@usaplanb if you could capture serial output from the device for an entire "cycle" that would be hugely useful. the web-based log viewer is flaky, so it'd require using pio monitor or screen(1) or the like, if that's within your abilities

@OleSam85
Copy link

OleSam85 commented Dec 13, 2023

@thenewwazoo I wouldn't say it's within my abilities but after enough searching I think I figured it out. My first attempt I used the default baud rate which meant I was staring at random characters for 5 minutes wondering if that was correct, and of course while that was happening it went unresponsive in my Home app, so I missed the thing I was trying to see! Then I rebooted it and used the correct baud rate for logging and I'm getting what appear to be correct logs (i.e. readable status updates). Of course since the reboot it's been running for a couple hours trouble-free with my laptop on top of the GDO 🙄. Hopefully the issue reoccurs soon and I can grab the file and share it here. I'm running version 0.2.2 FWIW, but I don't think any of the changes since then would be likely to resolve this issue.

In case it helps anyone else who has virtually no programming ability, here's what I did to log on my Windows machine:

  1. Install Python (just used the Microsoft store to do this quickly)
  2. Run the get-platformio.py script (available here: https://docs.platformio.org/en/latest/core/installation/methods/installer-script.html) to install PlatformIO Core.
  3. Connect PC to the ratgdo
  4. Open a command prompt and enter cd C:\Users\<UserName>\.platformio\penv\Scripts\ (replace with your user directory name). Then enter pio device monitor -b 115200 -f default -f log2file.

And that should be enough to get it to start saving the logs. Now I'm just waiting for something to break, but apparently it'll work reliably as long as you're watching it closely.

@wisteso
Copy link

wisteso commented Dec 14, 2023

Might be able to set up something similar to what exists for serial (PC-hosted log receiver) but use UDP instead. Seems browsers can't receive UDP yet, so would need a different approach on PC end. Easy enough to do on the ESP though...

https://arduino-esp8266.readthedocs.io/en/latest/esp8266wifi/udp-examples.html#complete-sketch

Alternatively, have a generic function that will use MQTT, UDP, or whatever alternative is configured. MQTT seems to rarely be the area needing debugging.

@V025
Copy link

V025 commented Dec 21, 2023

I'm having the same issues as reported in #30, specifically after setting up the device in HomeKit it will work properly for about 1-2 hours (need to confirm exactly how long this takes) but then the Home app will show that the ratgdo accessory is not responding. Adding the accessory to my home has been painless so far (though one time I did have to reboot ratgdo first). Even after it stops responding in HomeKit, I can still access the ratgdo web interface and use that to reset it. After that it seems to be responsive again with HomeKit, but only for a short time (<2 hrs).

+1 …same here., though I haven’t tracked the amount of time it takes for it to go unresponsive. Running v0.6.0, HomeKit shows no response. Can connect via WiFi to reboot ratGDO and it comes back on HomeKit within seconds.

update: it’s been over a week and I haven’t experienced the unresponsive thing again. I did realize I had a couple of UniFi AP FW updates which might have coincided with my couple of previous scenarios where ratGDO went unresponsive in HK (while all my other HK accessories remained responsive & properly functional)

update2: as mentioned after reading blakej's comment further below, #24 (comment) - seems like maybe the reason why I never experienced the unresponsive thing again was that I also locked my ratGDO to one AP and prevented it from roaming.

@nojkepop
Copy link

I'm having the same issues as reported in #30, specifically after setting up the device in HomeKit it will work properly for about 1-2 hours (need to confirm exactly how long this takes) but then the Home app will show that the ratgdo accessory is not responding. Adding the accessory to my home has been painless so far (though one time I did have to reboot ratgdo first). Even after it stops responding in HomeKit, I can still access the ratgdo web interface and use that to reset it. After that it seems to be responsive again with HomeKit, but only for a short time (<2 hrs).

I am also having this issue.

@Damien514
Copy link

Okay, adding my little +1 here.

Running food for some time and then, no answer. Nevertheless, I can't even ping or connect to the webpage to reboot. But I can see that my device is well connected to my WiFi network in my Unifi console. I have a mesh network, and I wonder if the ratGDO only try to connect to the same channel it has been set on, not searching only for the network name. Not sure if it is related, but could be a clue.

@edrikk
Copy link

edrikk commented Dec 22, 2023

@thenewwazoo

I'd love to help test, but as I have a sec v1 + 889LM and am hoping/wishing/waiting for sec v1 support :)

I followed Rachio's attempts to fix their sprinkler system's homekit going into no response state, and a couple of other projects. From what I had seen, this is generally caused by mDNS, although that service is not usually the root cause.

I tried looking at the homkit-ratgdo code, but the mdns / arduino portion being used isn't published unless I missed it somehow.
Regardless, here's a couple of potential areas to look at:

Mixiaoxiao/Arduino-HomeKit-ESP8266#139

Looks like any time there’s a topology change, HomeKit notifies the device to update its pairing info. This happens often now with the new HomeKit topology changes… the bad news is, if the device reboots, or fails to read from the flash for any reason (preemption, watchdog, etc…) it blanks out the “magic” bytes completely invalidating the config/pairing data and doesn’t fix it causing the boot process to think the device ID is invalid, and generate a new one. HomeKit looks for the old device ID, and can’t find it. The device then advertises itself as a brand new device.

I'm not sure if this is exactly our issue here (if it is, there's a patch for it in the above thread), but the root cause could be quite similar. i.e. Mesh/Multi-AP/Wifi Reconnect triggered.

@thenewwazoo
Copy link
Collaborator

thenewwazoo commented Dec 22, 2023

Original developer here.

I picked the Arduino-HomeKit-ESP8266 library basically because it was fast to get to market, but I'm quite unhappy with it (and the Arduino stack generally). I'm already basing this firmware off a fork because the developer has moved on to other projects, and it's looking more and more like I'd need to maintain it myself if I want to end up with a device that's suitably stable. Add to that the fact that the platformio project's support for ESP8266 is similarly lagging for whatever reasons. I don't regret basing these initial versions on the Arduino ecosystem, but this experience is confirming my belief that Arduino is great for prototyping but less great for a polished final product.

As such, I'm currently (literally this moment) working on a version based on Espressif's esp-homekit-sdk library. Espressif has also focused their attention away from the ESP8266 and on to the ESP32 series, but I have hopes that the result will be a bit more stable. I just think that's going to be a better use of my (limited, volunteer) time than trying to beat into shape an abandoned library built on an outdated dependency stack.

@blakej
Copy link

blakej commented Jan 3, 2024

Count me as another multiple Unifi AP user, and also have three ratgdos. After repeatedly encountering the "No response" issues for each device, I isolated it to occurring anytime one of the ratgdos would roam between APs. From there, I could easily repeat the issue on demand by telling an AP to reconnect the client device. The device stays "no response" until rebooted (ratgdo web interface continues to works normal and responds to pings).

Did some additional digging, including to the Arduino-HomeKit-ESP8266#139 issue @edrikk linked to above, and this issue is fixed in the main mrthiti/Arduino-HomeKit-ESP8266 master branch (merged from "fix-unable-reconnect-after-lost-connection" branch). The current platformio.ini file is referencing mrthiti/Arduino-HomeKit-ESP8266.git#fix-conflict-base64-file-name, but if the branch name is removed and the master branch used instead, everything works. The "fix-conflict-base64-file-name" branch as well as "fix-unable-reconnect-after-lost-connection" branch are both merged into master.

After recompiling using the master branch, all of my ratgdos are staying online in HomeKit regardless of AP roaming / reconnects.

@thenewwazoo unless there's an intentional reason for using the "fix-conflict-base64-file-name" branch specifically, I'd suggest swapping to master for now until the time intensive esp-homekit-sdk rewrite is completed.

@V025
Copy link

V025 commented Jan 3, 2024

Ah - no wonder I stopped seeing the issue - I forgot that one thing I did was that I locked my ratGDO to one specific AP in my Unifi settings. Nice find @blakej !

@jgstroud
Copy link
Collaborator

jgstroud commented Jan 3, 2024

Hah.... I had submitted a PR last night to pull in that commit. I closed it without merging though as there are still issues we need to debug. I guess it doesn't hurt to just go ahead and move to that for now as it will help
Reopened #76

@cefoster0
Copy link

Ah - no wonder I stopped seeing the issue - I forgot that one thing I did was that I locked my ratGDO to one specific AP in my Unifi settings. Nice find @blakej !

locking to a specific access point is a great workaround.

@lttlrck
Copy link

lttlrck commented Jan 4, 2024

Here's my datapoint, though it doesn't add much to the above. I have persistent "No response" with ping response times/loss all over the place. I'm using a Eero mesh network and I cannot pin a client to an AP or satellite, but I can see that they aren't moving around the mesh. I also disabled "client steering" to make sure the network isn't pushing them around unnecessarily.

Rebooting doesn't seem to help the situation. I'll wait for the release with the fix @blakej suggests or maybe try it myself. Unfortunately I discarded an old AP last week otherwise I would have set that up as a dedicated dumb network for them 😮‍💨

Congrats on this project, looking forward to getting my doors back up and running!

@emily-pesce
Copy link

I have persistent "No response" with ping response times/loss all over the place. I'm using a Eero mesh network and I cannot pin a client to an AP or satellite, but I can see that they aren't moving around the mesh. I also disabled "client steering" to make sure the network isn't pushing them around unnecessarily.

Same issue and symptoms w/ Eero mesh and my new ratgdo flashed with Homekit firmware. Frustrating.

@riq
Copy link

riq commented Jan 10, 2024

I'm seeing a similar issue with my single AP Ruckus network. Ping is steady and web interface is always available. After rebooting it works a little while before becoming non responsive again. I disabled direct multicast on the AP with no change

@jgstroud
Copy link
Collaborator

A lot of the issues reported here are either now fixed or tracked under more specific issues. So closing this for now.

@jgstroud
Copy link
Collaborator

jgstroud commented Apr 4, 2024

v1.2.0 now stores crash dumps to flash for debug purposes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

17 participants