Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

not update lighthouse ips if dns lookup failed #1184

Closed
wants to merge 1 commit into from
Closed

Conversation

yylt
Copy link

@yylt yylt commented Jul 18, 2024

No description provided.

Copy link

Thanks for the contribution! Before we can merge this, we need @yylt to sign the Salesforce Inc. Contributor License Agreement.

@johnmaguire
Copy link
Collaborator

Can you please describe the purpose of this change? In #1183 you stated it resolved some issues. Can you describe the issues this change resolved?

My understanding of the code here is that if DNS resolution fails, it may be seen as a "different" list of IPs than resolved previously. However, the results of this function are prepended to the overall cache here:

nebula/lighthouse.go

Lines 565 to 569 in 16eaae3

case addrPort.Addr().Is4():
am.unlockedPrependV4(lh.myVpnNet.Addr(), NewIp4AndPortFromNetIP(addrPort.Addr(), addrPort.Port()))
case addrPort.Addr().Is6():
am.unlockedPrependV6(lh.myVpnNet.Addr(), NewIp6AndPortFromNetIP(addrPort.Addr(), addrPort.Port()))
}

In other words, I don't think a failure to resolve will end up removing any previously successfully resolved queries. Given that that is the case, I'm not sure I understand what this PR is attempting to fix.

netipAddrs := map[netip.AddrPort]struct{}{}
for _, hostPort := range r.hostnames {
timeoutCtx, timeoutCancel := context.WithTimeout(ctx, r.lookupTimeout)
addrs, err := net.DefaultResolver.LookupNetIP(timeoutCtx, r.network, hostPort.name)
timeoutCancel()
if err != nil {
l.WithFields(logrus.Fields{"hostname": hostPort.name, "network": r.network}).WithError(err).Error("DNS resolution failed for static_map host")
lookupSuccess = false
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for correctness, the default would need to be false and toggle this to true when err == nil. This is because if we try to resolve multiple hostnames for a given Nebula IP, and one fails and another succeeds, we'd like to use the results from the successful query.

@yylt
Copy link
Author

yylt commented Sep 13, 2024

Can you please describe the purpose of this change? In #1183 you stated it resolved some issues. Can you describe the issues this change resolved?您能否描述一下此更改的目的?在#1183 中,您表示它解决了一些问题。您能描述一下此更改解决的问题吗?

My understanding of the code here is that if DNS resolution fails, it may be seen as a "different" list of IPs than resolved previously. However, the results of this function are prepended to the overall cache here: 我对这里代码的理解是,如果 DNS 解析失败,它可能会被视为与之前解析的“不同”IP 列表。然而,该函数的结果被添加到此处的整体缓存中:

nebula/lighthouse.go

Lines 565 to 569 in 16eaae3

case addrPort.Addr().Is4():
am.unlockedPrependV4(lh.myVpnNet.Addr(), NewIp4AndPortFromNetIP(addrPort.Addr(), addrPort.Port()))
case addrPort.Addr().Is6():
am.unlockedPrependV6(lh.myVpnNet.Addr(), NewIp6AndPortFromNetIP(addrPort.Addr(), addrPort.Port()))
}

In other words, I don't think a failure to resolve will end up removing any previously successfully resolved queries. Given that that is the case, I'm not sure I understand what this PR is attempting to fix.换句话说,我认为解决失败不会最终删除任何以前成功解决的查询。鉴于这种情况,我不确定我是否理解此 PR 试图解决的问题。

I apologize, the PR and issue mentioned different situation, both related to DDNS, so they were linked together. Here is log section and why submit this change.

Jul 12 03:24:25 node nebula[2115492]: level=error msg="DNS resolution failed for static_map host" error="lookup foo.bar.com: i/o timeout" hostname=foo.bar.com network=ip4
Jul 12 03:24:25 node nebula[2115492]: level=info msg="DNS results changed for host list" newSet="map[]" origSet="&map[112.193.70.10:11010:{}]"
Jul 12 03:24:55 node nebula[2115492]: level=info msg="DNS results changed for host list" newSet="map[112.193.70.10:11010:{}]" origSet="&map[]"
Jul 12 03:38:55 node nebula[2115492]: level=error msg="DNS resolution failed for static_map host" error="lookup foo.bar.com: i/o timeout" hostname=foo.bar.com network=ip4
Jul 12 03:38:55 node nebula[2115492]: level=info msg="DNS results changed for host list" newSet="map[]" origSet="&map[112.193.70.10:11010:{}]"
Jul 12 03:39:25 node nebula[2115492]: level=info msg="DNS results changed for host list" newSet="map[112.193.70.10:11010:{}]" origSet="&map[]"
Jul 12 03:43:55 node nebula[2115492]: level=error msg="DNS resolution failed for static_map host" error="lookup foo.bar.com: i/o timeout" hostname=foo.bar.com network=ip4
Jul 12 03:43:55 node nebula[2115492]: level=info msg="DNS results changed for host list" newSet="map[]" origSet="&map[112.193.70.10:11010:{}]"
Jul 12 03:44:25 node nebula[2115492]: level=info msg="DNS results changed for host list" newSet="map[112.193.70.10:11010:{}]" origSet="&map[]"
Jul 12 04:05:55 node nebula[2115492]: level=error msg="DNS resolution failed for static_map host" error="lookup foo.bar.com: i/o timeout" hostname=foo.bar.com network=ip4
Jul 12 04:05:55 node nebula[2115492]: level=info msg="DNS results changed for host list" newSet="map[]" origSet="&map[112.193.70.10:11010:{}]"
Jul 12 04:06:25 node nebula[2115492]: level=info msg="DNS results changed for host list" newSet="map[112.193.70.10:11010:{}]" origSet="&map[]"
Jul 12 04:30:25 node nebula[2115492]: level=error msg="DNS resolution failed for static_map host" error="lookup foo.bar.com: i/o timeout" hostname=foo.bar.com network=ip4
Jul 12 04:30:25 node nebula[2115492]: level=info msg="DNS results changed for host list" newSet="map[]" origSet="&map[112.193.70.10:11010:{}]"
Jul 12 04:30:55 node nebula[2115492]: level=info msg="DNS results changed for host list" newSet="map[112.193.70.10:11010:{}]" origSet="&map[]"
Jul 12 04:31:55 node nebula[2115492]: level=error msg="DNS resolution failed for static_map host" error="lookup foo.bar.com: i/o timeout" hostname=foo.bar.com network=ip4
Jul 12 04:31:55 node nebula[2115492]: level=info msg="DNS results changed for host list" newSet="map[]" origSet="&map[112.193.70.10:11010:{}]"
Jul 12 04:32:25 node nebula[2115492]: level=info msg="DNS results changed for host list" newSet="map[112.193.70.10:11010:{}]" origSet="&map[]"

Based on the above, it appears that DNS queries are fluctuating—sometimes there is an ip, and other times times out.

This might be related to UDP traffic in the environment. However, since this is happening in virtual machine, I'm not sure how to investigate further, so handle the error if dns query failed.

@johnmaguire
Copy link
Collaborator

@yylt Thanks for explaining. Indeed it looks like half of the DNS queries in the log are failing. That being said, the failing lookups should not be overriding the successful lookups in the cache Nebula uses to find hosts (which is a separate list from the one referenced in origSet.)

One way to verify this is to enable the debug SSH server and then issue the list-lighthouse-addrmap following a failed DNS query. You should still see the IP address present.

@yylt
Copy link
Author

yylt commented Sep 13, 2024

Since the nebula at this node is already on the version that merged the PR, the issue has not recurred up to now, which has been around 50 days.

Additionally, the DDNS changes approximately once a week, requiring SSH to be opened and checked after address change. It is unlikely to be feasible in the short term.

@johnmaguire
Copy link
Collaborator

@yylt Understood that you're unable to test currently, but this PR doesn't appear to solve any issues and the code as written creates undesirable behaviors (such as a single DNS lookup failure causing successful queries to be discarded.) Thanks for the contribution, but I'm going to close the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants