-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mqtt.local went down and didn't come back up cleanly after reboot #1210
Comments
The SSH server is also not accepting connections. |
My suspicion is we finally filled up the uSD card with data |
Don't wipe anything before backing up the grafana dashboard json! |
Do you hear the bell tolling @MatthewCroughan ? |
So, I think this is working ok. Dunno why we were having a problem. Might just have been the m-DNS We DO urgently need to look at dealing with the contents of the InfluxDB as we're on about 75% disk usage. So changing the issue name for now. Change it back if it falls over again Not sure whether we
|
Will set this up tonight on the NAS Alex. Perhaps we should really be running a large grafana server here. |
I'd had the |
I said I'd reboot the electricity meter device but forgot, FYI.
… On 8 Aug 2019, at 18:15, Adrian McEwen ***@***.***> wrote:
I'd had the mqtt.local Node RED open in a tab all afternoon, and just spotted there'd been a bunch of the "failed to connect to host" errors and a few connection refused ones too. Plus the power usage messages aren't being generated and it had crashed, so I don't think it's just an "it might fill the disk" issue
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub <#1210?email_source=notifications&email_token=AAAGU25AITSGHIWABILQOY3QDRICDA5CNFSM4IKITKW2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD34JVGA#issuecomment-519608984>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAAGU22QQRXGQEUCG25YF3TQDRICDANCNFSM4IKITKWQ>.
|
I got sidetracked tonight, I'll be providing NFS storage for this tomorrow, you've also mentioned needing storage for other things you're doing. |
If it happens again @amcewen can you check if the IP is still working. |
What do you mean by "check if the IP is still working"? The Pi will still have been on the network when it was reporting the errors, as it'll have had a websocket connection to my browser for the debug output. Happy to check things if I spot the problem (hasn't happened so far today), but not sure what I'm checking ;-) |
Checking the IP address is responding rather than checking mqtt.local is responding |
@ajlennon in 2 hours I'm free, so I'll be setting it up then. What is it exactly that you need? |
NFS, or Influx? Because I can probably make a 1TB influx container that's available network-wide if that suits the architecture. |
Dunno really. We need a policy on archiving data in the database... |
Well it seems to me that it would be as simple as running a single process and making it available on the network, and giving influx access to storage, wherever that may be. So this could be done one of two ways: Nfs + Influx, where the Pi's all run influx servers and clients, and all talk to a single NFS I don't know how realistic of a concern timestamping would be if we were to be writing to the database using a FUSE filesystem VS actually writing to Influx via HTTP as intended. I feel they're identical outcomes, but that writing to Influx via the network is more intended than via a networked filesystem, I'm sure the timestamps are preserved either way. But what if the NFS were to go down? The failure mode is probably catastrophic for Influx, since it wouldn't know how to handle the filesystem not responding or going missing, or its response would be generic and unhelpful in debugging, whereas the http scenario probably has a sane response and is a documented scenario. I think we should run Influx in a container on a machine with large storage, rather than providing arbitrary NFS to the Pi's for now, to make the problem easy to tackle, and provide some serious reliability. I'll take responsibility for maintaining that storage and server, although I can hand out ssh access to the container to anyone at DoES. If we want to mesh this setup, we could have a bunch of Pi's running replicated instances and a few load balancer pi's, along with sort of notification/status that lets us see the status of each server. I don't like the idea of this data going missing, so we do need to set up fail-over. We should be able to handle the load balancing and stuff like that with Balena easily shouldn't we @ajlennon ? |
Firstly we want to ask @goatchurchprime if we want to retain all the data or bucket it up for archival somehow |
The server in question is already mirrored between two 4TB drives. Although we only have 4TB in that server at the moment which may need to increase, and no offsite backup. I have a bunch of 1TB drives going spare if we want to set up a decentralized 1TB setup between multiple nodes with Pi's. @ajlennon How much data is currently in use that we're having trouble with it? I'm going to guess 32GB? What's the rate at which we were accumulating data? 16GB/month? In fact, the data rate should be a metric in influx itself if we could prevent that from being a feedback loop, so we can see how fast we're ballooning up our storage. |
But if mqtt.local is responding then presumably the IP address is also responding, no? |
In any sane world that would of course be true but no. I’ve chatted to @goatchurchprime about this as sometimes the m-DNS mapping somehow fails but the IP itself is reachable. So I’m interested to know if this might be happening here |
@ajlennon @amcewen I think that's because of the way the router is working. There's some sort of cache, I'm not familiar with why this happens, and it'll be a setting somewhere in the networking hardware we have. The gateway remembers the mac address of hardware unit, in this case the Pi, and responds on both hangspot.local (it's previous recorded hostname) and also on mqtt.local This is a caching feature, somewhere. |
yes, the mDNS mapping doesn't always work, so there are times when the mDNS doesn't work but the IP address does. However, I've not encountered the reverse where the mDNS works but the IP address doesn't. (Mostly because that's impossible :-D) When I see the errors, they're transmitted over the network to my browser, so the IP address must be responding, no? They show up in the debug window of Node RED. I'm not doing any mDNS lookups. |
Maybe I’m misunderstanding. Is your nodered flow talking to the IP address or MQTT.local? |
It's not my nodered flow, it's whatever you or @goatchurchprime set up. Have had a look and it seems to be talking to |
And.. on further discussion I've removed those IP addresses from the DHCP configuration, but we can say that those two IP addresses are allocated to this purpose, so should be manually assigned on the box itself (doesn't matter to me if you don't use both of them but I'll record them as being for this purpose on the network documentation). |
Things that previously used This device was previously accessible at
My resolv.conf now shows I have no idea how the router could have anything to do with this other than DROPPING the packets that are related to |
If we've put
This is definitely the problem I'm observing, as I've had to install Now, if this is true, we are in a situation where every device must install something equivalent to Somewhere in base networking protocols, without avahi or mDNS hostnames are transferred to the router. If our domain is set to If we set the domain to .local and have a device that's not running mDNS with a hostname of however if you try to look up Not using |
Now, what has ocurred is that you cannot ping hostnames unless you have an mDNS daemon installed on your system, and vice-versa. This is not the way it should be done and explains why all the devices that had .localdomain are no longer visible to even the router itself. All we have done is invalidate the utility of the router's DNS, as it can no longer report back a lookup to a hostname at all. When you run
@goatchurchprime This is why .localdomain is a thing, or exists at all. So that devices without |
I can have a look at this later today but UniFi support rebroadcasting mDNS responses in order for them to still work in this case |
"you cannot ping [local] hostnames unless you have an mDNS daemon" Given that's the whole point of mDNS I don't think there's anything particularly non-standard going on here. Having the router's internal hidden DNS proxy also happen to return results for things random people have told it on DHCP sounds a bit more non-standard but what do I know? I've turned back on the DHCP results showing in DNS and set the network's domain to |
@johnmckerrell What I meant to say is that you can't ping DNS (hostnames over DHCP leasing, was a thing before mDNS existed) if you use mDNS on your system. Which is a problem, since whatever has been changed means you can't:
UNLESS you have an mDNS daemon on your computer. And that will only respect mDNS is not the only way of getting a hostname, it's fairly modern and it just makes things easier when it's added onto a network. By using A person was trying to use Alex's printer earlier but couldn't because Devices that do not have an mDNS daemon cannot participate their hostnames on the network in this configuration. No mDNS daemon on your system = can't see anything
|
@MatthewCroughan given you were talking about ARP records yesterday it seems like this is new knowledge to you too. I have already made the changes to mostly re-enable what we had previously just with a network domain of |
@johnmckerrell I'm not trying to teach you about anything. I've just been discussing it all night with a friend online and am coming to realise why localdomain is a thing. I'll curb the enthusiasm, sorry :) The arp record comment yesterday was made before reading into any of this, or looking at my own PFSense and reading their documentation on how mDNS, caching options and more work. The Ubiquiti firmware looks like it has way more niche and non-standard features though, so there's probably a million things that are going on that I have on idea about. |
@johnmckerrell, you said:
Does that mean that
|
@amcewen I also said "And.. on further discussion I've removed those IP addresses from the DHCP configuration" It seemed like the box was statically configured and to help with portability elsewhere we thought that would be best, but it seems like it might not be the case. |
Not by me. @goatchurchprime? @MatthewCroughan ? |
@ajlennon @johnmckerrell Are we saying that there's a box somewhere with an mDNS Daemon mqtt.local that is statically configured, that is not @ajlennon's balena pi that we're otherwise not aware of? |
No, I don't think so.
I think when we looked, the wired interface had 10.0.100.1, and the WiFi one was trying to get it and having issues so we figured that the wired one was manually configured. It seems like that might not be the case? |
Historically mqtt.local has changed its IP address - I think you found this @amcewen My understanding is that it's changed its IP address again. My belief is that it is picking up an IP address from the DHCP server on the network unless somebody else has been in there and changed things around. I can double check this tomorrow. |
The 10.0.100.1 & 2 ones have been allocated for this use (stuck on a wiki) so please do use them. Or I can put them back into the dhcp settings if we prefer.
…--
Sent from my mobile phone hence brevity and errors
On 1 Oct 2019, at 22:23, Alex Lennon ***@***.***> wrote:
Historically mqtt.local has changed its IP address - I think you found this @amcewen
My understanding is that it's changed it's IP address again.
My belief is that it is picking up an IP address from the DHCP server locally unless somebody else has been in there and changed things around.
I can double check this tomorrow.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
@johnmckerrell this caching issue is happening again. |
Despite the fact that the Pi is running I believe this is because whatever this feature is, it prevents mDNS discovery when a .localdomain addr is cached. I really hope this can get solved. Whatever the case, not providing |
An additional datapoint... I haven't had any problems talking to a number of Pis with my Museum in a Box stuff over the past week or two. They're all configured with a hostname of I haven't had any problems talking to them with I don't ever try connecting to them without the |
OK so I have restarted mqtt.local with only the wired interface supported. It appears to be responding to mqtt.local on the expected IP address |
I can ping ender3-octoprint.local, also .localdomain, I can't ping it without those because then it tries to resolve to my work vpn network. |
@johnmckerrell My understanding is that if you have an avahi-daemon running, /etc/resolv.conf is going to be pointing to some sort of private network which is the avahi-daemon. If that fails it'll then query the router DNS to see if the machine exists (the default if you don't have an avahi-daemon). The problem is that the ubiquiti feature I think is masking .local some of the time for the same reason it sometimes provides the wrong hostname. |
Well all I'm wondering is if the device is telling the router that it is Just to confirm, the router has its domain set to |
@johnmckerrell My understanding is that outside of mDNS the device requests an IP and gives a hostname. The hostname that is given is usually specified in /etc/hosts like so:
If I chose to request localhost.lan then My theory is that this is the first thing that the router's feature caches, in the same way that Sams-Iphone.localdomain was causing a problem, it is returning .localdomain some of the time rather than allowing mDNS responses all of the time if both parties have an mDNS daemon. This might still come down to your personal machine's configuration too. Since theoretically the mDNS daemon should be the first query, then the router's dns, but this may not be happening everywhere. |
@johnmckerrell After following this, I've got it working on my laptop. For some reason mqtt.local now returns an ipv6 address, whereas I believe I saw on @amcewen's machine it returns an ipv6 address. It all comes down to one's client configuration, which is actually really disappointing since it seems to vary so much between even two installations of Ubuntu. https://unix.stackexchange.com/questions/43762/how-do-i-get-to-use-local-hostnames-with-arch-linux The configuration in question is in Configuration before following the guide: |
MDNS also works just fine on the Vinyl Cutter pc, though there is some strange behaviour that I think is related to the wifi. Discovery of mDNS on the vinyl cutter pc is strangely intermittent. I can't recreate it exactly, but I did observe it. If I execute This failure to resolve and massive resolve delay is not true of pinging the IP address of the machine directly, so it's definitely an mDNS related issue, whether that's down to configuration or the wifi hardware being slow. I do notice that the system has a massively variant ping response time when pinging local addresses. Pinging the router will result in anywhere from 10ms to 262ms. The configuration of and it returns ipv4 addresses for all https://askubuntu.com/questions/843943/how-to-replace-mdns4-minimal-with-bind This gives us all the details related to what the different possible configurations are. |
I've checked on Arthur's Win10 laptop, and it also seems to work. It returns Ipv6 addresses. The same was not true however of my Win10 virtual machine until I enabled the avahi-daemon on the host machine, which is very interesting to me, not sure I understand what's happening there. |
Since yesterday the Liverbird hasn't been showing our energy usage.
Doing a bit of poking into it, I found that
mqtt.local
was offline. @ajlennon power-cycled it, which has brought it back up, but it's failing to connect to its influxdb instance.The text was updated successfully, but these errors were encountered: