Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Someguy sometimes returns 404 for a routable peer #99

Open
2color opened this issue Jan 22, 2025 · 9 comments
Open

Someguy sometimes returns 404 for a routable peer #99

2color opened this issue Jan 22, 2025 · 9 comments
Assignees
Labels
need/author-input Needs input from the original author

Comments

@2color
Copy link
Member

2color commented Jan 22, 2025

Problem

delegated-ipfs.dev sometimes returns a 404 for a peer that is clearly routable. This can be observed for with PeerID bafzaajaiaejcav3fwj35j27gor72ap5aqhiz44qmje4gcxvo5wogjmczwhk4xp7p which is constantly running with almost no downtime.

➜  interface git:(fix-typo-in-doc) curl -v -H 'accept: application/json' 'https://delegated-ipfs.dev/routing/v1/peers/bafzaajaiaejcav3fwj35j27gor72ap5aqhiz44qmje4gcxvo5wogjmczwhk4xp7p'
* Host delegated-ipfs.dev:443 was resolved.
* IPv6: (none)
* IPv4: 209.94.90.3, 209.94.90.2
*   Trying 209.94.90.3:443...
* Connected to delegated-ipfs.dev (209.94.90.3) port 443
* ALPN: curl offers h2,http/1.1
* (304) (OUT), TLS handshake, Client hello (1):
*  CAfile: /etc/ssl/cert.pem
*  CApath: none
* (304) (IN), TLS handshake, Server hello (2):
* (304) (IN), TLS handshake, Unknown (8):
* (304) (IN), TLS handshake, Certificate (11):
* (304) (IN), TLS handshake, CERT verify (15):
* (304) (IN), TLS handshake, Finished (20):
* (304) (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / AEAD-CHACHA20-POLY1305-SHA256 / [blank] / UNDEF
* ALPN: server accepted h2
* Server certificate:
*  subject: CN=delegated-ipfs.dev
*  start date: Dec  7 06:59:47 2024 GMT
*  expire date: Mar  7 06:59:46 2025 GMT
*  subjectAltName: host "delegated-ipfs.dev" matched cert's "delegated-ipfs.dev"
*  issuer: C=US; O=Google Trust Services; CN=WE1
*  SSL certificate verify ok.
* using HTTP/2
* [HTTP/2] [1] OPENED stream for https://delegated-ipfs.dev/routing/v1/peers/bafzaajaiaejcav3fwj35j27gor72ap5aqhiz44qmje4gcxvo5wogjmczwhk4xp7p
* [HTTP/2] [1] [:method: GET]
* [HTTP/2] [1] [:scheme: https]
* [HTTP/2] [1] [:authority: delegated-ipfs.dev]
* [HTTP/2] [1] [:path: /routing/v1/peers/bafzaajaiaejcav3fwj35j27gor72ap5aqhiz44qmje4gcxvo5wogjmczwhk4xp7p]
* [HTTP/2] [1] [user-agent: curl/8.7.1]
* [HTTP/2] [1] [accept: application/json]
> GET /routing/v1/peers/bafzaajaiaejcav3fwj35j27gor72ap5aqhiz44qmje4gcxvo5wogjmczwhk4xp7p HTTP/2
> Host: delegated-ipfs.dev
> User-Agent: curl/8.7.1
> accept: application/json
>
* Request completely sent off



< HTTP/2 404
< date: Wed, 22 Jan 2025 08:53:44 GMT
< content-type: application/json
< content-length: 14
< cache-control: public, max-age=15, stale-while-revalidate=172800, stale-if-error=172800
< last-modified: Wed, 22 Jan 2025 08:53:44 GMT
< vary: Accept-Encoding
< vary: Origin
< vary: Accept
< x-ipfs-pop: someguy-am6
< cf-cache-status: MISS
< server: cloudflare
< cf-ray: 905e508b18eb895b-BKK
< alt-svc: h3=":443"; ma=86400
<
* Connection #0 to host delegated-ipfs.dev left intact
{"Peers":null}⏎


➜  interface git:(fix-typo-in-doc) curl -v  "https://delegated-ipfs.dev/routing/v1/peers/bafzaajaiaejcav3fwj35j27gor72ap5aqhiz44qmje4gcxvo5wogjmczwhk4xp7p"
* Host delegated-ipfs.dev:443 was resolved.
* IPv6: (none)
* IPv4: 209.94.90.2, 209.94.90.3
*   Trying 209.94.90.2:443...
* Connected to delegated-ipfs.dev (209.94.90.2) port 443
* ALPN: curl offers h2,http/1.1
* (304) (OUT), TLS handshake, Client hello (1):
*  CAfile: /etc/ssl/cert.pem
*  CApath: none
* (304) (IN), TLS handshake, Server hello (2):
* (304) (IN), TLS handshake, Unknown (8):
* (304) (IN), TLS handshake, Certificate (11):
* (304) (IN), TLS handshake, CERT verify (15):
* (304) (IN), TLS handshake, Finished (20):
* (304) (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / AEAD-CHACHA20-POLY1305-SHA256 / [blank] / UNDEF
* ALPN: server accepted h2
* Server certificate:
*  subject: CN=delegated-ipfs.dev
*  start date: Dec  7 06:59:47 2024 GMT
*  expire date: Mar  7 06:59:46 2025 GMT
*  subjectAltName: host "delegated-ipfs.dev" matched cert's "delegated-ipfs.dev"
*  issuer: C=US; O=Google Trust Services; CN=WE1
*  SSL certificate verify ok.
* using HTTP/2
* [HTTP/2] [1] OPENED stream for https://delegated-ipfs.dev/routing/v1/peers/bafzaajaiaejcav3fwj35j27gor72ap5aqhiz44qmje4gcxvo5wogjmczwhk4xp7p
* [HTTP/2] [1] [:method: GET]
* [HTTP/2] [1] [:scheme: https]
* [HTTP/2] [1] [:authority: delegated-ipfs.dev]
* [HTTP/2] [1] [:path: /routing/v1/peers/bafzaajaiaejcav3fwj35j27gor72ap5aqhiz44qmje4gcxvo5wogjmczwhk4xp7p]
* [HTTP/2] [1] [user-agent: curl/8.7.1]
* [HTTP/2] [1] [accept: */*]
* [HTTP/2] [1] [cache-control: no-cache]
> GET /routing/v1/peers/bafzaajaiaejcav3fwj35j27gor72ap5aqhiz44qmje4gcxvo5wogjmczwhk4xp7p HTTP/2
> Host: delegated-ipfs.dev
> User-Agent: curl/8.7.1
> Accept: */*
> Cache-control: no-cache
> Authorization: tr4c3
> Traceparent: 00-6f5551e7e5c6bd2df06cb1fa52d33481-00996738e23a59b9-01
>
* Request completely sent off
< HTTP/2 404
< date: Wed, 22 Jan 2025 09:04:26 GMT
< content-type: application/json
< content-length: 14
< cache-control: public, max-age=15, stale-while-revalidate=172800, stale-if-error=172800
< last-modified: Wed, 22 Jan 2025 09:04:26 GMT
< vary: Accept-Encoding
< vary: Origin
< vary: Accept
< x-ipfs-pop: someguy-am6
< cf-cache-status: EXPIRED
< server: cloudflare
< cf-ray: 905e605d9f0dd2ce-FRA
< alt-svc: h3=":443"; ma=86400
<
* Connection #0 to host delegated-ipfs.dev left intact
{"Peers":null}⏎

@2color
Copy link
Member Author

2color commented Jan 22, 2025

Someguy doesn't use the cache at all for /v1/peers/ calls. It seems that in some instances, that go-libp2p-kad-dht router fails to find the peer even though it's online and routable (maybe it's due to reaching connection limits on the peer searched?). In such cases, we don't look up the cache. We just return the NotFound:

if err == routing.ErrNotFound {
// ErrNotFound will be returned if either dialing the peer failed or the peer was not found
r.cachedAddrBook.RecordFailedConnection(pid) // record the failure used for probing/backoff purposes
return nil, routing.ErrNotFound
}

Next steps:

  • Figure out why go-libp2p-kad-dht fails to resolve the peer
  • Consider looking up the cache when a peer can't be found by go-libp2p-kad-dht

@2color
Copy link
Member Author

2color commented Jan 22, 2025

Another thing that's weird and somewhat related to this issue. Sometimes someguy will return more than one webrtc-direct multiaddr with different cert hashes. When some of the cert hashes are known to be older.

Example:

{
  "Peers": [
    {
      "Addrs": [
        "/ip4/147.28.186.157/udp/9095/webrtc-direct/certhash/uEiBoW5fyVSSAU90AwMvlGHQ6YIiGF4GjFMIsL1NM9ljIuA",
        "/ip4/147.28.186.157/udp/9095/quic-v1",
        "/ip4/147.28.186.157/udp/9095/quic-v1/webtransport/certhash/uEiDF8DU16dllhg6FWM3CMtqgZhNytNrt2CJ4d_sf-ThfHA/certhash/uEiAFmismVS4uGGz9zF8yLRC10wtqPciwcBD1BuAch4sX3A",
        "/ip6/2604:1380:4642:6600::3/udp/9095/quic-v1/webtransport/certhash/uEiDF8DU16dllhg6FWM3CMtqgZhNytNrt2CJ4d_sf-ThfHA/certhash/uEiAFmismVS4uGGz9zF8yLRC10wtqPciwcBD1BuAch4sX3A",
        "/ip6/2604:1380:4642:6600::3/udp/9095/quic-v1",
        "/ip4/147.28.186.157/udp/9095/webrtc-direct/certhash/uEiC6yY8kGKhTw9gr74_eDLWf08PNyAiSKgs22JHc_rD8qw"
      ],
      "ID": "12D3KooWFhXabKDwALpzqMbto94sB7rvmZ6M28hs9Y9xSopDKwQr",
      "Schema": "peer"
    }
  ]
}

@guillaumemichel
Copy link

Triage notes:

  • @2color is it still an issue or was it solved by the latest cache adjustment?
  • If not solved, please provide the latest details

@guillaumemichel guillaumemichel added the need/author-input Needs input from the original author label Jan 28, 2025
Copy link

github-actions bot commented Feb 4, 2025

Oops, seems like we needed more information for this issue, please comment with more details or this issue will be closed in 7 days.

@2color
Copy link
Member Author

2color commented Feb 4, 2025

Triage notes:

  • @2color is it still an issue or was it solved by the latest cache adjustment?
  • If not solved, please provide the latest details

Yes. Though I've had trouble reproducing it. I'll give it another try

@2color
Copy link
Member Author

2color commented Feb 4, 2025

I haven't been able to reproduce. Will close for now.

@2color 2color closed this as completed Feb 4, 2025
@2color 2color reopened this Feb 14, 2025
@2color
Copy link
Member Author

2color commented Feb 14, 2025

I was able to recreate this issue. Here's a trace

Image

This happened when the go-libp2p peer was reaching its limits and was rejecting new connections. In the trace, you can see that some of the KademliaDHT.ProtocolMessenger.GetClosestPeers fail with different errors (some failed to dial, some failed to identify, and some due to context cancellation).

Copy link

Oops, seems like we needed more information for this issue, please comment with more details or this issue will be closed in 7 days.

@guillaumemichel guillaumemichel added need/analysis Needs further analysis before proceeding and removed kind/stale need/author-input Needs input from the original author labels Feb 21, 2025
@guillaumemichel guillaumemichel self-assigned this Feb 21, 2025
@guillaumemichel
Copy link

Since all the context cancelled errors happen approx. at the same time, I suspect this is due to a bug in go-libp2p2-kad-dht addressed in libp2p/go-libp2p-kad-dht#1017, and released in v0.29.1.

Could you try again with the updated go-libp2p-kad-dht?

@guillaumemichel guillaumemichel added need/author-input Needs input from the original author and removed need/analysis Needs further analysis before proceeding labels Feb 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need/author-input Needs input from the original author
Projects
None yet
Development

No branches or pull requests

2 participants