Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS lookup should take into account available source IPv6 and IPv4 addresses (IDFGH-12201) #13255

Open
3 tasks done
sgryphon opened this issue Feb 25, 2024 · 13 comments
Open
3 tasks done
Assignees
Labels
Resolution: NA Issue resolution is unavailable Status: In Progress Work is in progress Type: Bug bugs in IDF

Comments

@sgryphon
Copy link

sgryphon commented Feb 25, 2024

Answers checklist.

  • I have read the documentation ESP-IDF Programming Guide and the issue is not addressed there.
  • I have updated my IDF branch (master or release) to the latest version and checked that the issue is present there.
  • I have searched the issue tracker for a similar issue and not found a similar issue.

IDF version.

master

Espressif SoC revision.

ESP32

Operating System used.

Linux

How did you build your project?

Command line with idf.py

If you are using Windows, please specify command line type.

None

Development Kit.

M5Stack Core 2

Power Supply used.

USB

What is the expected behavior?

With both IPv4 and IPv6 fully enabled, DNS lookup (via TLS) should work correctly in all network environments -- IPv4-only, IPv6-only, and dual-stack, for all reachable destinations, by taking into account available addresses.

For example when moving a device to an IPv6-only network only an IPv6 address is available (even though IPv4 is enabled), and so connections to a dual-stack host (that has both) should use the IPv6 address to make the TLS connection.

For this particular bug:

  • IPv6 is enabled
  • The network provides a public IPv6 address
  • The destination has an IPv6 address, and there is a valid route

But the connection fails.

What is the actual behavior?

TLS connections fail on an IPv6-only network when connecting to a dual-stack destination, because the preference order is statically configured to IPv4 first. Even though a local IPv4 address is not currently available, for a dual-stack destination the IPv4 address is returned by getaddrinfo() (instead of the IPv6), so the connection fails.

Note that if you disable IPv4, then the connection works; if you enable IPv4 then then connection fails. Enabling IPv4 should not make IPv6 fail (and vice-versa).

Steps to reproduce.

  1. Use the updated common protocol example code that supports multiple network types, in PR branch Update protocol examples to support all network types #13249 (IDFGH-12196) #13250
  2. Make sure the https_request (TLS) example has a dual-stack destination; currently the address used in the example is IPv4 only, so either run in a network the DNS64 (so that it gets a NAT64 address), or change the code to use:
  1. Build the https_request (TLS) example and connect to a dual-stack network, and see that it works (by using the IPv4 address, if you turn on logging on esp-tls)
  2. Change to an IPv6-only network; either reconfigure your network, or change the config value and rebuild. (A real world device may set the network dynamically rather than have it compiled in).
  3. Note the connection now fails, because the DNS returns and IPv4 address but the application does not have one.

A kind of work around is to reconfigure the application to entirely turn off IPv4, however this then stops the application from being able to roam to different network types and connect to IPv4 only servers.

Debug Logs.

I (816) example_connect: Connecting to Wildspace...
I (816) example_connect: Waiting for IP(s)
I (3226) wifi:new:<11,0>, old:<1,0>, ap:<255,255>, sta:<11,0>, prof:1
I (3486) wifi:state: init -> auth (b0)
I (3496) wifi:state: auth -> assoc (0)
I (3516) wifi:state: assoc -> run (10)
I (3556) wifi:connected with Wildspace, aid = 1, channel 11, BW20, bssid = ea:63:da:bd:5a:09
I (3556) wifi:security: WPA2-PSK, phy: bgn, rssi: -70
I (3566) wifi:pm start, type: 1

I (3566) wifi:dp: 1, bi: 102400, li: 3, scale listen interval from 307200 us to 307200 us
I (3626) wifi:AP's beacon interval = 102400 us, DTIM period = 1
I (4616) example_connect: Got IPv6 event: Interface "example_netif_sta" address: fe80:0000:0000:0000:0a3a:f2ff:fe65:db28, type: ESP_IP6_ADDR_IS_LINK_LOCAL
I (5626) wifi:<ba-add>idx:0 (ifx:0, ea:63:da:bd:5a:09), tid:0, ssn:0, winSize:64
I (7616) example_connect: Got IPv6 event: Interface "example_netif_sta" address: 2407:8800:bc61:1300:0a3a:f2ff:fe65:db28, type: ESP_IP6_ADDR_IS_GLOBAL
I (7616) example_connect: Got IPv6 event: Interface "example_netif_sta" address: fd7c:e25e:67e8:0000:0a3a:f2ff:fe65:db28, type: ESP_IP6_ADDR_IS_UNIQUE_LOCAL
I (7626) example_common: Connected to example_netif_sta
I (7636) example_common: - IPv4 address: 0.0.0.0,
I (7646) example_common: - IPv6 address: fe80:0000:0000:0000:0a3a:f2ff:fe65:db28, type: ESP_IP6_ADDR_IS_LINK_LOCAL
I (7656) example_common: - IPv6 address: 2407:8800:bc61:1300:0a3a:f2ff:fe65:db28, type: ESP_IP6_ADDR_IS_GLOBAL
I (7666) example_common: - IPv6 address: fd7c:e25e:67e8:0000:0a3a:f2ff:fe65:db28, type: ESP_IP6_ADDR_IS_UNIQUE_LOCAL
I (7676) example: Updating time from NVS
I (7676) example: Start https_request example
I (7686) example: https_request using crt bundle
I (7696) main_task: Returned from app_main()
E (7766) esp-tls: [sock=54] Resolved IPv4 address: 51.75.78.103
E (7776) esp-tls: [sock=54] connect() error: Host is unreachable
E (7776) esp-tls: Failed to open new connection
E (7776) example: Connection failed...
I (7786) example: 10...
I (8786) example: 9...
I (9786) example: 8...
I (10786) example: 7...
I (11786) example: 6...

More Information.

Issue

The DNS lookup function getaddrinfo() does not take into account available source address, and so does not work propertly across all network types.

In particular it preferences IPv4 over IPv6, and completely fails in an IPv6-only network for a dual-stack destination, as the IPv4 address is unreachable.

You can work around this in some cases, by checking available addresses yourself and then calling getaddrinfo() multiple times -- this approach has been used in the updated http_request example.

However HTTP is not secure and the same approach can't be used with HTTPS, as the host name is needed to TLS and resolved internally in the TLS code.

Technical details

For the https_request example the TLS code esp_tls.c eventually calls getaddrinfo() passing in AF_UNSPEC to get any address.

However the code in netdb.c then converts this into a fixed preference order (when both IPv4 and IPv6 are enabled) of NETCONN_DNS_IPV4_IPV6.

A client with both IPv4 and IPv6 enabled should work in any network IPv4-only, IPv6-only, or dual-stack, and to any reachable destination.

  • The code works in an IPv4-only network.
  • In a dual-stack it kind of works because it returns the IPv4 address by preference (even if dual stack), and only returns IPv6 if that fails (i.e. the destination is IPv6 only).
  • However it fails in an IPv6-only network where the destination is dual-stack because it returns the unreachable IPv4 address.

Changing to use a static order of NETCONN_DNS_IPV6_IPV4 wouldn't fully work either.

This other order allows IPv6-only to work, and means that dual-stack preferences IPv6 and still falls back for IPv4-only destinations.

But it has the reverse problem that in an IPv4-only network a dual stack destination will fail, as it returns the unreachable IPv6 address.

Proposed solution

To be able to work across all networks, the address selection needs to be dynamic based on what is actually available. (Not static based on what is enabled in configuration)

For a dual-stack destination, if a global (including ULA) IPv6 address is available, then use IPv6, but if it a gobal IPv6 address is not available (even though IPv6 is enabled it may not be provided, e.g. if currently on an IPv4-only network) then the IPv4 address needs to be used.

A full implementation of this approach is detailed in RFC 6724, taking into account not only what addresses are available, but their scopes and with special allowances for deprecated address ranges.

Available addresses should be sorted according to RFC 6724, with the application using the first address returned.

The standard linux function getaddrinfo() takes this approach "The sorting function used within getaddrinfo() is defined in RFC 3484" (RFC 3484 was replaced by RFC 6724). See https://man7.org/linux/man-pages/man3/getaddrinfo.3.html

This new DNS resolution dynamically based on available addresses could be configuration flagged to allow the old behaviour (fixed prefrence of IPv4) to continue to be an option.

@sgryphon
Copy link
Author

sgryphon commented Feb 26, 2024

I now have a pull request up with a fix implementing the RFC 6724 algorithm to select the best destination address to return from getaddrinfo() based on the available source addresses: espressif/esp-lwip#66

The problematic example -- connecting TLS to a dual-stack host from an IPv6-only network -- now works.

The logs show both DNS results, and then the selected one; because the device is in an IPv6-only network, the IPv6 address is selected. If connected to an IPv4-only network then the IPv4 address is used.

dns_recv: "v4v6.ipv6-test.com": response = 2001:41d0:701:1100:0:0:0:29c8
dns_recv: "v4v6.ipv6-test.com": response = 51.75.78.103
E (7791) esp-tls: [sock=54] Resolved IPv6 address: 2001:41D0:701:1100::29C8

Full log on the same IPv6-only network as in the bug report. The new function logs are prefixed with dns_select, and you can see RFC 6724 Destination Address Selection Rule 2 determines the IPv6 destination address has a source address with a matching scope (both global) whereas IPv4 does not (while the DNS result is global the device only source address is localhost, which is link-local scope).

I (821) example_connect: Connecting to Wildspace...
I (821) example_connect: Waiting for IP(s)
I (3231) wifi:new:<1,0>, old:<1,0>, ap:<255,255>, sta:<1,0>, prof:1
I (3491) wifi:state: init -> auth (b0)
I (3591) wifi:state: auth -> assoc (0)
I (3611) wifi:state: assoc -> run (10)
I (3621) wifi:connected with Wildspace, aid = 1, channel 1, BW20, bssid = 02:25:9c:13:92:ab
I (3621) wifi:security: WPA2-PSK, phy: bgn, rssi: -53
I (3631) wifi:pm start, type: 1

I (3631) wifi:dp: 1, bi: 102400, li: 3, scale listen interval from 307200 us to 307200 us
I (3721) wifi:dp: 2, bi: 102400, li: 4, scale listen interval from 307200 us to 409600 us
I (3721) wifi:AP's beacon interval = 102400 us, DTIM period = 2
I (5621) example_connect: Got IPv6 event: Interface "example_netif_sta" address: fe80:0000:0000:0000:0a3a:f2ff:fe65:db28, type: ESP_IP6_ADDR_IS_LINK_LOCAL
I (11621) example_connect: Got IPv6 event: Interface "example_netif_sta" address: 2407:8800:bc61:1300:0a3a:f2ff:fe65:db28, type: ESP_IP6_ADDR_IS_GLOBAL
I (11621) example_connect: Got IPv6 event: Interface "example_netif_sta" address: fd7c:e25e:67e8:0000:0a3a:f2ff:fe65:db28, type: ESP_IP6_ADDR_IS_UNIQUE_LOCAL
I (11631) example_common: Connected to example_netif_sta
I (11641) example_common: - IPv4 address: 0.0.0.0,
I (11651) example_common: - IPv6 address: fe80:0000:0000:0000:0a3a:f2ff:fe65:db28, type: ESP_IP6_ADDR_IS_LINK_LOCAL
I (11661) example_common: - IPv6 address: 2407:8800:bc61:1300:0a3a:f2ff:fe65:db28, type: ESP_IP6_ADDR_IS_GLOBAL
I (11671) example_common: - IPv6 address: fd7c:e25e:67e8:0000:0a3a:f2ff:fe65:db28, type: ESP_IP6_ADDR_IS_UNIQUE_LOCAL
I (11681) example: Updating time from NVS
I (11681) example: Start https_request example
I (11691) example: https_request using crt bundle
dns_enqueue: "v4v6.ipv6-test.com": use DNS entry 0
dns_enqueue: "v4v6.ipv6-test.com": use DNS pcb 0
dns_send: dns_servers[0] "v4v6.ipv6-test.com": request
sending DNS request ID 42552 for name "v4v6.ipv6-test.com" to server 0
I (11721) main_task: Returned from app_main()
dns_recv: "v4v6.ipv6-test.com": response = 2001:41d0:701:1100:0:0:0:29c8
dns_enqueue: "v4v6.ipv6-test.com": use DNS entry 1
dns_enqueue: "v4v6.ipv6-test.com": use DNS pcb 0
dns_send: dns_servers[0] "v4v6.ipv6-test.com": request
sending DNS request ID 14817 for name "v4v6.ipv6-test.com" to server 0
dns_recv: "v4v6.ipv6-test.com": response = 51.75.78.103
dns_select: selecting from 2 candidates
dns_select: precedence labels flags 0x2013, ipv6 scopes flags 0x4004, ipv4 scopes flags 0x0004
dns_select: rule 2, cand_0 scope (14) match 1, cand_1 scope (14) match 0
E (11811) esp-tls: [sock=54] Resolved IPv6 address: 2001:41D0:701:1100::29C8
dns_tmr: dns_check_entries
I (13161) esp-x509-crt-bundle: Certificate validated
dns_tmr: dns_check_entries
I (14691) example: Connection established...
I (14691) example: 113 bytes written
I (14691) example: Reading HTTP response...
dns_tmr: dns_check_entries
HTTP/1.1 200 OK
Date: Tue, 27 Feb 2024 21:33:20 GMT
Server: Apache/2.4.25 (Debian)
Vary: Accept-Encoding
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8

26
2407:8800:bc61:1300:a3a:f2ff:fe65:db28
0


dns_tmr: dns_check_entries
dns_tmr: dns_check_entries
dns_tmr: dns_check_entries
dns_tmr: dns_check_entries
dns_tmr: dns_check_entries
I (20211) example: connection closed
I (20211) example: 10...
dns_tmr: dns_check_entries
I (21211) example: 9...

The same configuration, running on an IPv4-only network, uses the IPv4 address:

I (821) example_connect: Connecting to Shadow...
I (821) example_connect: Waiting for IP(s)
I (3231) wifi:new:<1,0>, old:<1,0>, ap:<255,255>, sta:<1,0>, prof:1
I (3491) wifi:state: init -> auth (b0)
I (3571) wifi:state: auth -> assoc (0)
I (3641) wifi:state: assoc -> run (10)
I (3661) wifi:connected with Shadow, aid = 1, channel 1, BW20, bssid = 06:25:9c:13:92:ab
I (3661) wifi:security: WPA2-PSK, phy: bgn, rssi: -52
I (3661) wifi:pm start, type: 1

I (3661) wifi:dp: 1, bi: 102400, li: 3, scale listen interval from 307200 us to 307200 us
I (3751) wifi:dp: 2, bi: 102400, li: 4, scale listen interval from 307200 us to 409600 us
I (3751) wifi:AP's beacon interval = 102400 us, DTIM period = 2
I (5621) example_connect: Got IPv6 event: Interface "example_netif_sta" address: fe80:0000:0000:0000:0a3a:f2ff:fe65:db28, type: ESP_IP6_ADDR_IS_LINK_LOCAL
I (8671) esp_netif_handlers: example_netif_sta ip: 192.168.5.146, mask: 255.255.255.0, gw: 192.168.5.1
I (8671) example_connect: Got IPv4 event: Interface "example_netif_sta" address: 192.168.5.146
I (8681) example_common: Connected to example_netif_sta
I (8681) example_common: - IPv4 address: 192.168.5.146,
I (8691) example_common: - IPv6 address: fe80:0000:0000:0000:0a3a:f2ff:fe65:db28, type: ESP_IP6_ADDR_IS_LINK_LOCAL
I (8701) example: Updating time from NVS
I (8701) example: Start https_request example
I (8711) example: https_request using crt bundle
dns_enqueue: "v4v6.ipv6-test.com": use DNS entry 0
dns_enqueue: "v4v6.ipv6-test.com": use DNS pcb 0
dns_send: dns_servers[0] "v4v6.ipv6-test.com": request
sending DNS request ID 60548 for name "v4v6.ipv6-test.com" to server 0
I (8731) main_task: Returned from app_main()
dns_recv: "v4v6.ipv6-test.com": response = 2001:41d0:701:1100:0:0:0:29c8
dns_enqueue: "v4v6.ipv6-test.com": use DNS entry 1
dns_enqueue: "v4v6.ipv6-test.com": use DNS pcb 0
dns_send: dns_servers[0] "v4v6.ipv6-test.com": request
sending DNS request ID 17189 for name "v4v6.ipv6-test.com" to server 0
dns_recv: "v4v6.ipv6-test.com": response = 51.75.78.103
dns_select: selecting from 2 candidates
dns_select: precedence labels flags 0x0013, ipv6 scopes flags 0x0004, ipv4 scopes flags 0x4004
dns_select: rule 2, cand_0 scope (14) match 0, cand_1 scope (14) match 1
dns_tmr: dns_check_entries
I (10111) esp-x509-crt-bundle: Certificate validated
dns_tmr: dns_check_entries
I (11631) example: Connection established...
I (11631) example: 113 bytes written
I (11631) example: Reading HTTP response...
dns_tmr: dns_check_entries
HTTP/1.1 200 OK
Date: Tue, 27 Feb 2024 21:35:54 GMT
Server: Apache/2.4.25 (Debian)
Vary: Accept-Encoding
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8

f
220.240.255.134
0


dns_tmr: dns_check_entries
dns_tmr: dns_check_entries
dns_tmr: dns_check_entries
dns_tmr: dns_check_entries
dns_tmr: dns_check_entries
I (17171) example: connection closed
I (17171) example: 10...

@sgryphon sgryphon changed the title DNS lookup should take into account available source addresses (IDFGH-12201) DNS lookup should take into account available source IPv6 and IPv4 addresses (IDFGH-12201) Mar 5, 2024
@espressif-bot espressif-bot added Status: In Progress Work is in progress and removed Status: Opened Issue is new labels Mar 6, 2024
@AxelLin
Copy link
Contributor

AxelLin commented Aug 24, 2024

@espressif-abhikroy
It has been in "In Progress" status for 5 Months, how is the status now?

@ivancmz
Copy link

ivancmz commented Sep 5, 2024

Hi. Does anyone have any information about this issue? I believe I'm having a related issue, on an IPv4 network, the way that LWIP does DNS result ordering seems to be returning IPv6 IP first, and when it does, HTTPS connection to a server fails. It seems to happen more frequently when a server has 3 or more IPv6 addresses (ie. securetoken.googleapis.com), like there is some kind of limit in how many records LWIP reads and if it doesn't find an IPv4 in the first few records it returns an IPv6. I tried disabling IPv6 on menuconfig, but then it simply does not resolve the server.

I found this on https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-guides/lwip.html#ipv6-support :

The number of IP addresses returned by network database APIs such as getaddrinfo() and gethostbyname() is restricted by the macro DNS_MAX_HOST_IP. By default, the value of this macro is set to 1.

but I'm not sure if I can change this value or how. I tried adding it in "\components\lwip\port\esp32\include\lwipopts.h" but I see no change, getaddrinfo still responds with a single IP.

Looking at the code in netdb.c, and it seems DNS_MAX_HOST_IP is not used anywhere and there are no loops to fill the getaddrinfo or gethostbyname() results with more than 1 value. 🤔

@espressif-abhikroy
Copy link
Collaborator

espressif-abhikroy commented Sep 5, 2024

Hi. Does anyone have any information about this issue? I believe I'm having a related issue, on an IPv4 network, the way that LWIP does DNS result ordering seems to be returning IPv6 IP first, and when it does, HTTPS connection to a server fails. It seems to happen more frequently when a server has 3 or more IPv6 addresses (ie. securetoken.googleapis.com), like there is some kind of limit in how many records LWIP reads and if it doesn't find an IPv4 in the first few records it returns an IPv6. I tried disabling IPv6 on menuconfig, but then it simply does not resolve the server.

I found this on https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-guides/lwip.html#ipv6-support :

The number of IP addresses returned by network database APIs such as getaddrinfo() and gethostbyname() is restricted by the macro DNS_MAX_HOST_IP. By default, the value of this macro is set to 1.

but I'm not sure if I can change this value or how. I tried adding it in "\components\lwip\port\esp32\include\lwipopts.h" but I see no change, getaddrinfo still responds with a single IP.

Looking at the code in netdb.c, and it seems DNS_MAX_HOST_IP is not used anywhere and there are no loops to fill the getaddrinfo or gethostbyname() results with more than 1 value. 🤔

@ivancmz Thanks for reaching out.

The LWIP DNS component does not perform any reordering of IP addresses; it returns them in the exact order they are received from the DNS server.

If you're specifically looking for an IPv4 address, please try calling getaddrinfo() with the AF_INET family, which should return an IPv4 address instead of IPv6.

By default, LWIP returns only one IP address per DNS query. If you expect to receive more than one IP address, you can adjust the DNS_MAX_HOST_IP value to a higher number.

To increase this setting, please follow these steps:

idf.py menuconfig

Navigate to:

Component config ---> LWIP ---> DNS ---> Maximum number of IP addresses per host

This setting determines the maximum number of IP addresses stored per host. If the server returns multiple IP addresses, the actual number stored will be the smaller of either the configured value or the number of addresses returned by the server.

If you're unsure of the DNS server's behavior and need a tool to investigate, you may consider adding this managed component to your project:
https://components.espressif.com/components/espressif/console_cmd_ping/versions/1.1.0

This component provides a console interface, enabling you to run commands like getaddrinfo, get/setdnsserver, ping, and others. If connectivity is available, you can run:

getaddrinfo xyz.com

This will allow you to verify the addresses returned by the DNS server.

Alternatively you can simply use this project
https://github.com/espressif/esp-protocols/tree/master/components/console_cmd_ping/examples/ping-basic

This project demonstrates how to use the console commands effectively.

@espressif-abhikroy
Copy link
Collaborator

@espressif-abhikroy It has been in "In Progress" status for 5 Months, how is the status now?

@AxelLin @sgryphon
I sincerely apologize for the long delay. The modifications we made to the existing DNS implementation have caused this issue to take longer to address.

There is an issue with the LWIP getaddrinfo() call when using the AF_UNSPEC family. On Linux and macOS, calling getaddrinfo() with AF_UNSPEC returns both IPv4 and IPv6 addresses according to the Happy Eyeballs Algorithm. This algorithm sends queries for both IPv4 and IPv6 simultaneously and returns the results for both, leaving it to the application to use the first reachable address. Unfortunately, LWIP's DNS behavior does not currently align with this.

To address this, I am adding a function in examples/common_components/protocol_examples_common that will implement similar behavior. With this change, getaddrinfo() will be able to return both IPv4 and IPv6 addresses up to the DNS_MAX_HOST_IP limit.

I am in the process of getting this change reviewed and it should be available soon.

@ivancmz
Copy link

ivancmz commented Sep 5, 2024

Hi @espressif-abhikroy thank you very much for such a complete answer!

Quick question regarding:

Component config ---> LWIP ---> DNS ---> Maximum number of IP addresses per host

Do you know from which version of the idf onwards do these settings exist? only on master?

@espressif-abhikroy
Copy link
Collaborator

Hi @espressif-abhikroy thank you very much for such a complete answer!

Quick question regarding:

Component config ---> LWIP ---> DNS ---> Maximum number of IP addresses per host

Do you know from which version of the idf onwards do these settings exist? only on master?

This setting is available on the master branch and has been backported to version 5.2. It will also be included in the upcoming 5.3 release.

@ivancmz
Copy link

ivancmz commented Sep 9, 2024

Hi, I've been testing with "Maximum number of IP addresses per host" set to 10, and getaddrinfo(). On some IPv4 networks I get a result with 4 IPv6 addresses and 0 IPv4 addresses for a particular server (securetoken.googleapis.com), so I'm still not able to resolve a usable address. The reason I'm getting 4 results seems to be a timeout. I've been looking at the code on dns.c and netdb.c but I can't figure out where this timeout is set and if it might be changed. Is it possible to change this timeout?

@espressif-abhikroy
Copy link
Collaborator

espressif-abhikroy commented Sep 11, 2024

Hi, I've been testing with "Maximum number of IP addresses per host" set to 10, and getaddrinfo(). On some IPv4 networks I get a result with 4 IPv6 addresses and 0 IPv4 addresses for a particular server (securetoken.googleapis.com), so I'm still not able to resolve a usable address. The reason I'm getting 4 results seems to be a timeout. I've been looking at the code on dns.c and netdb.c but I can't figure out where this timeout is set and if it might be changed. Is it possible to change this timeout?

@ivancmz Unfortunately, the timeout is not configurable at this time. However, you can modify the number of retries by adjusting the DNS_MAX_RETRIES macro in the components/lwip/lwip/src/include/lwip/opt.h file.

Please note that this change needs to be made directly in the header file, as it cannot be modified via menuconfig. You may try increasing the value, and if it resolves the issue, kindly inform us if you require this option to be configurable through menuconfig. We can then consider incorporating this into the LWIP DNS configuration.

Also make sure the DNS server(securetoken.googleapis.com) actually can return IPv4 address.

@ivancmz
Copy link

ivancmz commented Sep 23, 2024

Hi @espressif-abhikroy thanks for your help. I don't see much difference in behavior when adjusting the DNS_MAX_RETRIES macro, even with a value of 200. It would be nice that this setting could be tweaked in menuconfig, but since it appears to have little or no effect, I guess it's fine as it is. I'm thinking that the problem I'm experiencing is more on the DNS server side, I'll try to capture DNS traffic to find out more.

@espressif-bot espressif-bot added Status: Done Issue is done internally Resolution: NA Issue resolution is unavailable and removed Status: In Progress Work is in progress labels Nov 22, 2024
@espressif-abhikroy
Copy link
Collaborator

The getaddrinfo() system call in lwIP within ESP-IDF has a limitation when using AF_UNSPEC, as it defaults to returning only an IPv4 address in dual-stack mode.

This issue has been addressed in commit e2ae81a.
When enabled, the behavior is now consistent with Linux, supporting both IPv4 and IPv6 resolutions as expected.

However, the feature is currently disabled by default.
To enable it, you can turn on the CONFIG_LWIP_USE_ESP_GETADDRINFO option. This can be configured in the menuconfig under:
Component config -> LWIP -> DNS -> Enable esp_getaddrinfo() instead of lwip_getaddrinfo()

@sgryphon
Copy link
Author

sgryphon commented Dec 4, 2024

On Linux and macOS, calling getaddrinfo() with AF_UNSPEC returns both IPv4 and IPv6 addresses according to the Happy Eyeballs Algorithm. This algorithm sends queries for both IPv4 and IPv6 simultaneously and returns the results for both, leaving it to the application to use the first reachable address.

I would have to check (e.g. on a Linux system), but Happy Eyeballs should only apply to dual-stack clients; the situation above was specified as an IPv4 network -- on an IPv4 only network, there is no point in trying IPv6 addresses at all (and vice-versa on IPv6 only network).

Also the implementation linked above appends the IPv6 addresses after the IPv4 addresses. Happy Eyeballs (RFC 6555) specifies to use them in host address preference order (which is usually IPv6 first), and if that is not available, then "implementations MUST prefer IPv6 over IPv4".

i.e. on a dual-stack client you should have the IPv6 addresses first, and then the IPv4 after.

(On a single stack client, then you only need the relevant stack).

For Linux at least in the docs, "The sorting function used within getaddrinfo() is defined in RFC 3484;", see https://man7.org/linux/man-pages/man3/getaddrinfo.3.html

This was later obsoleted by RFC 6724 Default Address Selection for IPv6.

The fix I proposed, in https://github.com/espressif/esp-idf/pull/13258/files, which relies on changes in ESP-LWIP espressif/esp-lwip#66 does follow the RFC 6724 algorithm, the same as Linux.

(i.e. on an IPv4 only network, it will return the IPv4 addresses first).

@espressif-abhikroy
Copy link
Collaborator

On Linux and macOS, calling getaddrinfo() with AF_UNSPEC returns both IPv4 and IPv6 addresses according to the Happy Eyeballs Algorithm. This algorithm sends queries for both IPv4 and IPv6 simultaneously and returns the results for both, leaving it to the application to use the first reachable address.

I would have to check (e.g. on a Linux system), but Happy Eyeballs should only apply to dual-stack clients; the situation above was specified as an IPv4 network -- on an IPv4 only network, there is no point in trying IPv6 addresses at all (and vice-versa on IPv6 only network).

Also the implementation linked above appends the IPv6 addresses after the IPv4 addresses. Happy Eyeballs (RFC 6555) specifies to use them in host address preference order (which is usually IPv6 first), and if that is not available, then "implementations MUST prefer IPv6 over IPv4".

i.e. on a dual-stack client you should have the IPv6 addresses first, and then the IPv4 after.

(On a single stack client, then you only need the relevant stack).

For Linux at least in the docs, "The sorting function used within getaddrinfo() is defined in RFC 3484;", see https://man7.org/linux/man-pages/man3/getaddrinfo.3.html

This was later obsoleted by RFC 6724 Default Address Selection for IPv6.

The fix I proposed, in https://github.com/espressif/esp-idf/pull/13258/files, which relies on changes in ESP-LWIP espressif/esp-lwip#66 does follow the RFC 6724 algorithm, the same as Linux.

(i.e. on an IPv4 only network, it will return the IPv4 addresses first).

The changes in e2ae81a do not fully implement the Happy Eyeballs algorithm. Instead, they introduce a mechanism to make separate queries for IPv4 and IPv6 addresses when the AF_UNSPEC flag is used in dual-stack mode. For single-stack mode and other flags, the original behavior remains unchanged.
Currently, the IPv4 address is returned first, followed by the IPv6 address. This order is hardcoded. If needed, I can add a configuration option to customize this order—please open a separate issue for that change.
The sorting function described in RFC 3484 is not implemented in this case. The addresses are returned in the same order as provided by the server.
As stated in the getaddrinfo manual:
"Normally, the application should try using the addresses in the order in which they are returned."
This code follows the expectation that the application will handle the order accordingly.

The changes suggested in PR #13258 and espressif/esp-lwip#66 were made at a time when lwIP returned only a single IP address for any query. Currently, the number of IP addresses that lwIP can return for a DNS query is configurable using CONFIG_LWIP_DNS_MAX_HOST_IP. Therefore, implementing a dynamic sorting algorithm in lwIP is not necessary.
Instead, such functionality can be added to esp_getaddrinfo(), provided CONFIG_LWIP_USE_ESP_GETADDRINFO is enabled. Also, if dynamic sorting is essential for your application, I recommend implementing it as part of a wrapper in your application, as it is not a widely used case for esp-idf users.

@espressif-bot espressif-bot added Status: In Progress Work is in progress and removed Status: Done Issue is done internally labels Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Resolution: NA Issue resolution is unavailable Status: In Progress Work is in progress Type: Bug bugs in IDF
Projects
None yet
Development

No branches or pull requests

5 participants