-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Frequent "None" usage #39
Comments
The same problem is plaguing users of the Home Assistant integration as well, though this lower level report is extremely helpful. I'm pretty sure that's what the API is returning. You may be able to repoll and get a correct value but I haven't tested if that will actually help yet. The app does this too sometimes so I'm pretty sure it's an underlying API issue. |
So I altered my code to ignore "None" results (perhaps I'll add retries later), and that at least smooths out the data for the Smart Plugs. However, the Utility Connect is still reporting inconsistent values. Here is a comparison of the data returned by the Sense I have installed, and the Utility Connect. Note that I have solar panels, which is why it goes negative during the day. You can see how the Sense data is smooth, but Utility Connect bounces up and down minute to minute: |
Well, I don't know what's going on, but here is some more data. At fifteen seconds after each minute, I call get_device_list_usage for Scale.MINUTE.value for a time that is the same minute at zero seconds. I print that out, and print out the kW value of channel.usage * 60000.0 Then I call get_chart_usage for the previous minute with Scale.SECOND.value and look at the returned array of values. Now, when the data is good, I get 60 values back in that array, and if I sum them up, the result is pretty close to the value I got for get_device_list usage. Here's an example output: get_device_list_usage: 0.027968704656530306 -> 1678 kW Now when the data goes bad, it gets interesting. I get a bad value from get_device_list_usage, and then I get fewer than 60 results from get_chart_usage. If I sum the get_chart_usage and then scale it up to be equivalent to 60 values, then I get the right value: get_device_list_usage: 0.01881660398338903 -> 1129 kW (WRONG) And the get_chart_usage data can also go bad, where it has a number of zeros instead of real values. If I remove the zeros and scale back up to 60 values, then I get the right value: get_device_list_usage: 0.02685500121166088 -> 1611 kW So, I don't know what is going on with my Vue Utility Connect. Both the get_device_list_usage and the get_chart_usage calls are returning invalid data, with chart data missing or replaced with zeros, and the device_list_usage being wrong whenever get_chart_usage returns missing data. Emporia support had me re-provision my Utility Connect with PG&E, but that didn't fix anything. Is anyone else seeing this, or is it just me? |
I've had all sorts of bad data from the API on the Home Assistant side and there are a number of people reporting worse experiences than I have personally. I'm led to believe it's something with their backend and loading/retrieving values just sometimes being delayed. I'm guessing all the values from the device are thrown into a queue and loaded into the database as fast as the queue can process but sometimes it can run a bit slow and you can see a minute or more of delays, so when you query for that data it has defaulted values (None or 0) until it can catch up and put the correct values in there. I've seen this with their app, where it will report 0 for the live 1S data on both the overview page and on the charts but looking at the historical chart data it's properly populated a few seconds in the past. I haven't had the opportunity to investigate the app when the delay is larger, but I also don't pull more fine-grained data than the last 1-minute usage personally. |
So here is the code I'm using to get a more accurate value:
|
…onse received from emporia (api internal bug)
Ok. I have performed more testing. And this morning caught a "bad" api response with some test code. (Don't have a good ha / python dev setup). Same request... response to DeviceListUsages called on a minute basis. (Had to run it almost 24 hours yesterday to catch one bad message, happened around 630 am eastern time). Second thing. I am not a python expert (read as I am not good), but I added a very hacky / improper error check into a fork to test. So far my meter for last 24 hours has been clean. I know what I did isn't right, but it dumps the bad message instead of getting a zero in the log, and will dump correct messages for plugs and other things probably. Since I did this my meters into HA have been clean (no zero readings... only 24 hours run so far, used to get many a day. I think where you see data gaps (around 6 and around 4 in meter 1, are probably the code working. Unsure.) I added this to my pyemvue fork for a quick test... around line 104. I really wanted .. if Len(channel usages) == len(requested channels from device data) and requested data of first meter != null... but didn't think there was a global holding the config data of in pyemvue and didn't want to mess with ha code for quick test or make another rest call every time to sanity check the api return, which I'm pretty sure is caching the device data. I think it would be nice if main developer could add a proper check either in pyemvue (where it could retry the api command with a retry limit) and then pass valid data back to ha (rather not see data gap. During certain times of day I've seen having to make the same request as many as 5 times. But some of that could be because of additional issues with other 3rd party api I found first. Neither here nor there, but I removed all my plugs from my emporia account today. It seems frequency of "bad api" returns are more with the more devices you pass into the getDeviceUsages api call. Neither here nor there, was thinking... is everyone (including emporia) hitting their api server on the change of minute? If so could we help things out and hopefully get a less percentage of bad messages by requesting 30 seconds into the minute instead? (It may already do this) We being the royal we of course. :) Cheers. Hope this helps get issue properly resolved. I'll keep my test running and monitor it to see if any other "bad response" message cases exist. (javascript doing api request through a proxy to get rid of cors). Sorry for getting my commit tagged above, put comment with error number in fork and git has alot of magic it does. :) |
If any emporia smart guys are here... here is the headers from the message that failed. Would be nice if..
Thanks. (Worth a try) |
24 hours... data filter looks like good... I want an email every time it blocks... :). Any easy way in python / ha? Will add retries to my test code and work on better logic for bad packet until better developer can help. (don't care about my ha data yet..). Date gaps and having "unknown" in logs really bothers me. Test to see if any other fail cases exist running as well. Have 6 plugs on desk, will plug in around and make new account and see if plugs are more problematic.. 6 devices... Once again thanks! |
Sorry all, ocd is killing me... added some ugly retry code.. running now. Hopefully works. Would like to sit with python developer and see why && and ++ were too good for him. Lol. Not really helpful to pound a struggling server 10 times for valid data when its already struggling, but wtf... I tried. Each instance of their app is setting their server up for a stress test. I think if usage for first meter in response is null is valid check for all messages which I can implement without help, but for this test how it is will do what it needs to do. Will cause alot of retries of anyone tries with plugs or single channel devices on their account. 9:10 pm eastern set hopefully no zeros or data gaps :). Will update tomorrow this time.
|
Fix looks good...6 hours run. Changed code to detect "null" in json, which gets changed to None by json parser in python. Should work for plugs and single channel devices as well.
|
If any wants to try replace your manifest.json requirements line with.. "requirements": ["git+https://github.com/DabblerIOT/PyEmVue"], Will put in pull request in a couple days. I see many issues logged under ha-emporia-vue that this would address. Moving on to see about why the time stamps and timing of updates is so wonky. (Want to see for myself). Thinking about what I would "want" to see given the issues. Cheers. Sorry to spam everyone. Just glad to have clean data being logged now. |
Caught api server responding with http error 500 for a couple short periods today. Looking into now. Data shows up as "unavailable" in log when this happens, and there is a gap in histories. Different issue than issue with 0 and None values in log. Looking into now. Short being 10-20 seconds. Ex: from 16:37:23 to 16:37:50 or so GMT today. Looking at code and thinking about adding some retries to this as well. Sigh. Caught with both pyemvue and my other api test code that was still running. |
Running with retry code in for failed http responses... (401 unauthorized is handled in request logic... which is what happens when token expires, and relogin needs to be done). So only 200 is processed, anything else gets a short delay (assume this is on its own thread because it's all sync). Will probably change it to sleep(5) or accomplish delay another, better way (may need help). Hopefully api doesn't have more failure cases. Will monitor it for a couple more days and leave my test code running (easier for me to see log data and response particulars in javascript console.) Code is this now.. may need to wrap all requests in some kind of retry mechanism... but from other work I've only ever seen this one function of the api fail multiple times a day.
|
Good test data after midnight, above code was generating exception a 500, moved generating an exception on a http error response until after all the retries, if it gives up on an error code. Running since late yesterday. Data good now, no None, No Unknown, no holes. Starting to track down why not consistent one minute data, because the retry logic should only be ever now and again, and in typical fashion should only be 1 retry to get valid data. I added a bunch of integrations, and my data is only ever 2 seconds now. Maybe not enough resources on my vm? Who knows. :). Data attached. Also added retry if we get a blank response !response.text: Cheers! |
#139 - in ha project is related with odd frequency of requests, and tracking. I think it's in the ha scheduler somehow. Data still good. Think right now have the failure cases and retries sorted out right so data if no internet issues is good. Data for today (one missing data point was a router reboot here -- which is what it should do). I'll give it until Monday and keep watching it while working on and off on history timing issue, need to learn how integration works with ha more. Pretty sure it isn't in this project. It is pretty simple. Will post hopefully a clean weekend of readings on Monday before putting in pull request. As always cheers!! |
Data still good (almost 2 days). I have seen some occasional (once every now and then). Errors where sensor value was none back to ha code and ha code replaces with last value. Not sure how this is happening. Sorting out how to log and see log from a python module in ha.. the struggle is real. Still sorting out how ha fits together all the way with the docker containers and all that. Sounds like a good idea but makes it hard to work on for a newbie. Thinking maybe my code that detects a 500 error isn't right.. but then even if it didn't detect a 500 error I'm not sure how a None value could get back with error check code to ha... (shouldn't be possible). Will continue to sort out. May take me longer. (sigh). Getting frustrated, so will probably code my dashboards next and come back to this fresh in a week. Neither here nor there. The update daily sensors needs to be called sync with the update 1 minute sensors function... what happens now is that while the 1min update update is waiting for api response, the add data to daily async hits and does the update with minute old data. Probably not a big deal (ie: it works). But doesn't really do what you would expect. my ha module is hacked up now in place.. lol. Guess I should fork it and put changes in a pull request. Cheers. |
I believe that a device other than the first was having a problem getting data from their backend, and returning "null". Changed code to properly check and deal with devices other than the first one returning 0, null, None. Couple hours run so far without unexpected values being passed back to ha. I "think" that if no data comes back, after all the retries there will be an "unknown" logged and a data "gap" in the historical data. (Haven't seen this yet... this would be if the api is down completely.) Cheers. Hopefully that is it. And there is not another dumb ass way the device api will fail. (Not much hair left!) |
Changes look good so far. Need to verify data when I think api servers are under more load and. unhappy. I think mornings us/eastern time are worst for errors (backups maybe? Or load from other places...). api other than some very odd responses this morning has been very stable. Think all the retries and logic will get us clean data now. :). I have my life back. Lol. Cheers. |
Error check code didn't catch first device bad, second device good. Fixed (was missing break). Caught second device bad error on my test code this morning. About 10am eastern. |
Put pull Request in for code fixes. My fixes may need to be adjusted a little but logic is good and works. It's fixed for me as in I'm running off my changed repo right now. |
Running 0.18.1 now. I'll watch data close this next week. Code looked good to me (pretty sure it will accomplish the goal). :) |
Running it as well. My only concern is that it is spamming my output with "Attempt 1 to get device usage" |
I am clean overnight running 0.18.1. Thanks magic.. you are going to have quite a few issues to close on your ha project. (think this issue is duplicated there many times) I'll open an issue on the ha project with a couple things I found with ha project.. (units fix and how minute data is added to daily data). |
Been running 0.18.2 and latest ha release. All good. I'm going to stop checking here for now. Thanks again for integration and hard work. Cheers. |
latest updates from projects... missing data again. I didn't look too close, but unless the api is failing really bad right now, last changes from 18.1 to 18.2 weren't helpful. I don't like the idea of taking half a valid message and processing it. I think if you don't get a 100% complete valid response back from the api you should discard it. I'll keep running for awhile before falling back. |
One device good, one device bad same minute. If you pass back only one device this is what you will get. :( Data attached: |
@magico13 you going to revise code or should I fork, sort out a dabbler fix and then pull request? Let me know. Have had more occurrences. Less than before, but still about 1 per day. 18.1 and my fixed fork were 0. |
Sorry my notifications must have changed because I'm only now seeing this. Glad it's improved from before at least but sounds like additional retries for the usage call may be warranted, I don't think my code tries for as long as yours did. I don't think the changes between 0.18.1 and 0.18.2 would result in much difference, with .1 you'd get a fully empty data set back and it would set everything to None/unavailable, but with .2 only the one(s) still missing after all of the polling attempts will be None. The Home Assistant integration has some code to take the first None it sees for a sensor and just duplicate the previous data, the second None will be kept since the device might be offline. With all the retries built in now I might change that and either throw out all of the data or at least ignore any None data within a partial response. Gotta be careful not accepting partial responses because if someone unplugs a smart outlet (or it fails, like I had happen to one of mine) they don't want their main Vue to suddenly stop reporting. I likely won't get a chance to do more work on it until sometime this weekend, you're welcome to take a crack at it before then if you want. |
I didn't think of that use case... maybe treat panel meters like first class citizens, always expecting valid data on mains? And plugs second class citizens, where if there is a data issue (switched outlet, wife unplugged meter to turn something off), etc... it won't cause data loss on mains? Was talking to myself while sorting out. :). Assumed you were just waiting to see pull request. Problems like this are a pita. I have a stack of plugs I can add to things if we need to test / debug. Have been using them when trying to figure out where usage goes on certain circuits. Any thought on one request for panel meters with one set of rules, and another request for plugs? May stress api less if we request with less meters (reduce chance of getting missing data in one request by smaller request?) Cheers. thanks again for your hard work on this. |
@magico13 18.4 retries are good... getting many (say 100 per day) bad responses on plugs, some (4 per day) bad responses on vue2... no down devices... can unplug a plug on my peloton coat rack to test if needed. :) No data gaps or unknowns on plugs (2 little loads), and vue2s... Just would like to say I walked at gym today, and my wife is recovering from an injury and running an electric heater when icing.. :(. |
I have three different types of Emporia devices in my house, all on the same account:
I poll the devices once per minute and log the data.
The usage data for the Gen 2 Vue circuits is reported accurately.
However, for both the Utility Connect and the Smart Plugs, the usage data gets reported as None frequently - here is an example:
And here is the channel object compared for a valid and invalid result:
Based on my historical graphs, it looks like this started happening on August 11th, around 9am Pacific Daylight Time.
I upgraded from 0.16.0 to 0.16.1 and it didn't fix it.
The text was updated successfully, but these errors were encountered: