Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a method to re-run DNS lookup #125

Open
orinciog opened this issue Apr 22, 2019 · 5 comments
Open

Add a method to re-run DNS lookup #125

orinciog opened this issue Apr 22, 2019 · 5 comments
Milestone

Comments

@orinciog
Copy link

Hello!

In Kubernetes world, where an IP of a host can change often, catching all socket errors is not very helpful.

For example, if the IP of statsd host is changed, the UDP client does not reconnect to this new IP and the only solution to create a new connection with the new IP of the statsd host is to restart the client.

The line with catches all these errors is

except (socket.error, RuntimeError):

@jsocol
Copy link
Owner

jsocol commented Apr 22, 2019

Interesting @orinciog! Let's figure out how to handle this.

Catching all socket errors here is very intentional (the opinion is that it's better to have a false positive of no metrics but the service works, than have a hiccup in the stats actually take the service down) however there isn't much that can possibly get caught here.

Since UDP is fire-and-forget, sending to a non-existent IP address or a port that isn't listening doesn't actually throw anything here, so there's nothing to catch and handle. You can try this with

>>> s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
>>> s.sendto(b'foo', (b'www.xxx.yyy.zzz', 8181))  # some non-existent IP/port
3

We also can't to do a DNS lookup on each call, that's way too slow and expensive—it does something like triple the number of packets, at least, and DNS responses are a lot bigger, too.

We could have some sort of manual "look up the address again" method, though it's going to be up to your application to know when to call it (how do you signal to other running processes in k8s that something has changed? you could send some message to the container and use SIGUSR1 on the process, or have a thread that's listening on another socket, or...). Your application/process will also need to know where all the StatsClient instances are and call the method on all of them.

Alternatively, you could try to recreate StatsClients frequently (e.g. instead of creating one at module import time, create it at the start of a request/cron/something and throw it away at the end). You'd incur more DNS cost but reduce the risk of lost data to only those metrics emitted during a request when the statsd host also changes IPs.

I'm open to adding some kind of "look up the address" method, but anything else is going to be very application- and deployment-specific and out of scope for this library.

@orinciog
Copy link
Author

orinciog commented Apr 22, 2019

sending to a non-existent IP address or a port that isn't listening doesn't actually throw anything here

Ah, I didn't know that is not throwing anything.

Alternatively, you could try to recreate StatsClients frequently

Actually, this is what I'm doing right now. I'm recreating the client at each 2 minutes.

I'm open to adding some kind of "look up the address" method

I'll think about and if I will find a good approach, I will do a pull-request.

Thank you,

@jsocol
Copy link
Owner

jsocol commented Apr 22, 2019

Another option might be to allow you to opt into doing the DNS lookup on every packet (socket.sendto allows you to pass a hostname) maybe with a new constructor param. But that should come with some big bold warning, because it has dramatic performance—and other—implications. (For example, in AWS I've seen failure to cache DNS requests lead to burning through internal DNS quotas and effectively break EC2 hosts.)

@orinciog orinciog reopened this Apr 22, 2019
@jsocol
Copy link
Owner

jsocol commented Apr 22, 2019

Ah you deleted your comment but yeah, let's actually take DNS-each-time off the table. It's too dangerous a foot-gun to allow that volume of DNS requests, even with giant flashing red lights.

"Rerun the DNS lookup on demand" is much safer.

@jsocol jsocol changed the title Reconnect in UDP mode Add a method to re-run DNS lookup Jul 15, 2019
csestelo pushed a commit to Destygo/pystatsd that referenced this issue Jul 18, 2022
@jsocol jsocol added this to the 4.1 milestone Nov 6, 2022
@jsocol
Copy link
Owner

jsocol commented Nov 6, 2022

OK, I have an idea/plan here, basically going to wrap socket operations in a mutex, make reconnect() trigger DNS, and make sure that's true for both the UDP and the TCP client.

This means introducing some synchronization primitives into the code, which it's never really needed before. We're also going to need to wrap self._sock.send() calls in the lock. (Possibly using rwmutex or vendoring in something like it.)

I am not positive what the implications of this are for asyncio. I believe that threading.Lock should work across coroutines so long as we don't use threading.RLock, but I'm not sure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants