From abb58b8372d1f10a52784d7dea7a073007dd9021 Mon Sep 17 00:00:00 2001 From: Daniel Norman <1992255+2color@users.noreply.github.com> Date: Tue, 10 Dec 2024 03:57:35 +0100 Subject: [PATCH] feat: rework gateway page and add recursive gateway (#1966) * feat: rework gateway page and add recursive gateway * Apply suggestions from code review Co-authored-by: Marcin Rataj * Apply suggestions from code review --------- Co-authored-by: Daniel N <2color@users.noreply.github.com> Co-authored-by: Marcin Rataj --- docs/concepts/ipfs-gateway.md | 158 ++++++++++++++++++++-------------- 1 file changed, 91 insertions(+), 67 deletions(-) diff --git a/docs/concepts/ipfs-gateway.md b/docs/concepts/ipfs-gateway.md index ba3cf3e04..f79c0b5cf 100644 --- a/docs/concepts/ipfs-gateway.md +++ b/docs/concepts/ipfs-gateway.md @@ -3,103 +3,119 @@ title: IPFS Gateway description: Learn why gateways are an important part of using IPFS in conjunction with the legacy web. related: 'IPFS Docs: Address IPFS on the Web': /how-to/address-ipfs-on-web/ - 'IPFS public gateway checker': https://ipfs.github.io/public-gateway-checker/ + 'IPFS public gateway checker': https://ipfs.fyi/gateways 'Gateway specifications': https://specs.ipfs.tech/http-gateways/ --- # IPFS Gateway -An _IPFS gateway_ is a web-based service that gets content from an IPFS network (private, or the public swarm backed by Amino DHT), and makes it available via HTTP, allowing IPFS-incompatible browsers, tools and software to benefit from [content-addressing](https://docs.ipfs.tech/concepts/content-addressing/). For example, some browsers or tools like [Curl](https://curl.haxx.se/) or [Wget](https://www.gnu.org/software/wget/) don't support IPFS natively and cannot access to IPFS content using canonical addressing like `ipfs://{CID}/{optional path to resource}`. While tools like [IPFS Companion](https://github.com/ipfs-shipyard/ipfs-companion) add browser support for native IPFS URLs, this is not always an option. As such, there are multiple gateway types and gateway providers available so that applications of all kinds can interface with IPFS using HTTP. +An _IPFS gateway_ is a standardized HTTP API for getting content-addressed data from IPFS nodes and CID providers (private, or the public IPFS Mainnet). It allows using HTTP semantics for interaction with IPFS. For example, some browsers or tools like [Curl](https://curl.haxx.se/) or [Wget](https://www.gnu.org/software/wget/) don't support IPFS natively and cannot access to IPFS content using canonical addressing like `ipfs://{CID}/{optional path to resource}`. While tools like [IPFS Companion](../install/ipfs-companion.md) add browser support for native IPFS URLs, this is not always an option. As such, IPFS gateways enable a broad range of applications to interface with IPFS using HTTP. This page discusses: -- [Gateway request lifecycle](#gateway-request-lifecycle) - [Gateway providers](#gateway-providers) - [Gateway types](#gateway-types) + - [Recursive vs. non-recursive gateways](#recursive-vs-non-recursive-gateways) + - [Trusted vs. trustless gateways](#trusted-vs-trustless-gateways) + - [Read-only gateways](#read-only-gateways) + - [Authenticated gateways](#authenticated-gateways) +- [Gateway request lifecycle](#gateway-request-lifecycle) +- [Resolution styles](#resolution-styles) + - [Path](#path) + - [Subdomain](#subdomain) + - [DNSLink](#dnslink) +- [Gateway URL formats](#gateway-url-formats) - [Working with gateways](#working-with-gateways) - [Implementing gateways](#implementing-gateways) -- [FAQs](#frequently-asked-questions-faqs) +- [Learning more](#learning-more) -## Gateway request lifecycle +## Gateway providers -:::callout -This section uses the _default_ gateway request lifecycle of [IPFS Kubo](https://github.com/ipfs/kubo) to introduce the basic concepts in the lifecycle. However, some gateways only serve content that they have and/or want to provide. For example, a Kubo gateway with `NoFetch` enabled will not attempt to retrieve content from the network. -::: +Regardless of who deploys a gateway and where, any IPFS gateway resolves access to any requested IPFS [content identifier](content-addressing.md). -When a client request for a CID reaches an IPFS gateway, the gateway first checks whether the CID is cached locally. At this point, one of the following occurs: +### Your local gateway -- **If the CID is cached locally**, the gateway responds with the content referred to by the CID, and the lifecycle is complete. +Your machine may host a gateway as a local service; e.g., at `localhost:8080`. You have a local gateway service if you installed [IPFS Desktop](../install/ipfs-desktop.md), [Kubo](../install/command-line.md) or another form of IPFS node. -- **If the CID is not in the local cache**, the gateway will attempt to retrieve it from the network. +### Public gateways -The CID retrieval process is composed of two parts, content discovery / routing and content retrieval: +Public ([recursive](#recursive-vs-non-recursive-gateways)) gateways are provided by various organizations, including the IPFS Foundation as a [public utility](./public-utilities.md#public-ipfs-gateways). -1. In the **content discovery / routing** step, the gateway will determine provider location; that is, _where_ the data specified by the CID can be found: +For a list of public gateways, see the [IPFS Gateways Checker](https://ipfs.fyi/gateways). - - Asking peers that it is directly connected to if they have the data specified by the CID. - - Query the DHT for the IDs and network addresses of peers that have the data specified by the CID. +## Gateway types -2. Next, the gateway performs **content retrieval**, which can be broken into the following steps: +There are multiple gateway types, each with specific use case, security, performance, and functional implications. - 1. The gateway connects to the provider. - 1. The gateway fetches the CIDs content. - 1. The gateway streams the content to the client. +- [Recursive vs. non-recursive gateways](#recursive-vs-non-recursive-gateways) +- [Trusted vs. trustless gateways](#trusted-vs-trustless-gateways) +- [Authentication support](#authenticated-gateways) +- [Read support](#read-only-gateways) -:::callout -- Learn more about content discovery, routing, retrieval and the subsystems involved in each part of the process in [How IPFS works](./how-ipfs-works.md). -- Dive into the technical specifications for gateways in the [IPFS HTTP Gateways specification](https://specs.ipfs.tech/http-gateways/) page. -::: +### Recursive vs. non-recursive gateways -## Gateway providers - -Regardless of who deploys a gateway and where, any IPFS gateway resolves access to any requested IPFS [content identifier](content-addressing.md). Therefore, for best performance, when you need the service of a gateway, you should use the one closest to you. +Recursive gateways are gateways that will attempt to retrieve content from other peers on the network if they do not have it locally. This is the default behavior in [Rainbow](https://github.com/ipfs/rainbow/#readme) and [Kubo](../install/command-line.md) running with [`Gateway.NoFetch=false`](https://github.com/ipfs/kubo/blob/master/docs/config.md#gatewaynofetch). -### Your local gateway +Non-recursive gateways are gateways that only serve content that they have themselves. For example, [Kubo](../install/command-line.md) can be configured to act as a non-recursive gateway by setting the [`Gateway.NoFetch=true`](https://github.com/ipfs/kubo/blob/master/docs/config.md#gatewaynofetch) option. -Your machine may host a gateway as a local service; e.g., at `localhost:8080`. You have a local gateway service if you installed [IPFS Desktop](https://github.com/ipfs-shipyard/ipfs-desktop#ipfs-desktop) or another form of IPFS node. +In general, recursive gateways are more powerful for end-users because they abstract away all details of the peer-to-peer network. However, they are much more resource-intensive for operators and prone to abuse. -### Private gateways +[Trustless, verifiable retrieval](../reference/http/gateway.md#trustless-verifiable-retrieval) from non-recursive gateways is becoming a popular way to provide IPFS content to the network ([HTTP](https://docs.ipfs.tech/reference/http/gateway/#trustless-verifiable-retrieval) as an alternative or in addition to [Bitswap](../concepts/glossary.md#bitswap)). -_Private gateways_ are configured to limit access to requests from specific domains or parts of the public internet. +## Trusted vs. trustless gateways -They are frequently, but not exclusively, used behind firewalls. Running [IPFS Desktop](https://github.com/ipfs-shipyard/ipfs-desktop#ipfs-desktop) or another form of IPFS node triggers connection attempts to other IPFS peers. Private network administrators may treat such connection attempts as potential security vulnerabilities. Private IPFS gateway servers located inside the private network and running a trusted code base provide an alternative architecture for read/write access to externally-hosted IPFS content. +See [Trusted vs. Trustless Gateways](../reference/http/gateway.md#trusted-vs-trustless) for more information. -### Public gateways +### Read-only gateways -For more information about public gateways, see the [Public IPFS Gateways](./public-utilities.md#public-ipfs-gateways) +_Read-only gateways_ are the simplest kind of gateway. This gateway type provides a way to fetch IPFS content using the HTTP GET method. -## Gateway types +## Authenticated gateways -There are multiple gateway types, each with specific use case, security, performance, and functional implications. +If a gateway provider wants to limit access to requests with authentication, they may need to configure a reverse proxy, develop an IPFS plugin, or set a cache-layer above IPFS. -- [Read support](#read-only-gateways) -- [Authentication support](#authenticated-gateways) -- [Resolution style](#resolution-style) -- [Service](#gateway-services) +Configuring a reverse proxy is the most popular way for providers handling authentication. Reverse proxy can also keep the original IPFS API calls which makes gateway adaptable to all IPFS SDK and toolkits. -### Read-only gateways -_Read-only gateways_ are the simplest kind of gateway. This gateway type provides a way to fetch IPFS content using the HTTP GET method. +## Gateway request lifecycle -### Authenticated gateways +:::callout +This section uses the _default_ recursive gateway request lifecycle of [IPFS Kubo](https://github.com/ipfs/kubo) to introduce the basic concepts in the lifecycle. However, non-recursive gateways only serve content that they have and/or want to provide. For example, a Kubo gateway with [`Gateway.NoFetch=true`](https://github.com/ipfs/kubo/blob/master/docs/config.md#gatewaynofetch) will **not** attempt to retrieve content from the network. +::: -If a gateway provider wants to limit access to requests with authentication, they may need to configure a reverse proxy, develop an IPFS plugin, or set a cache-layer above IPFS. +When a client request for a CID reaches an IPFS gateway, the gateway first checks whether the CID is cached locally. At this point, one of the following occurs: -Configuring a reverse proxy is the most popular way for providers handling authentication. Reverse proxy can also keep the original IPFS API calls which makes gateway adaptable to all IPFS SDK and toolkits. +- **If the CID is cached locally**, the gateway responds with the content referred to by the CID, and the lifecycle is complete. + +- **If the CID is not in the local cache**, a non-recursive gateway would error, however our gateway is recursive and will attempt to retrieve it from the network. + +The CID retrieval process is composed of two parts, content discovery / routing and content retrieval: + +1. In the **content discovery / routing** step, the gateway will determine provider location; that is, _where_ the data specified by the CID can be found: + + - Asking peers that it is directly connected to if they have the data specified by the CID. + - Query the DHT for the IDs and network addresses of peers that have the data specified by the CID. -![Auth with Reverse proxy](./images/ipfs-gateways/public-authed-gateway.png) +2. Next, the gateway performs **content retrieval**, which can be broken into the following steps: -Providers can design their own centralized authentication service like [Infura IPFS Auth](https://docs.infura.io/networks/ipfs/how-to/authenticate-requests), or a decentralized authentication service like [IPFS W3Auth](https://wiki.crust.network/docs/en/buildIPFSWeb3AuthGW)). + 1. The gateway connects to the provider. + 1. The gateway fetches the CIDs content. + 1. The gateway streams the content to the client. -### Resolution style +:::callout +- Learn more about content discovery, routing, retrieval and the subsystems involved in each part of the process in [How IPFS works](./how-ipfs-works.md). +- Dive into the technical specifications for gateways in the [IPFS HTTP Gateways specification](https://specs.ipfs.tech/http-gateways/) page. +::: + +## Resolution styles -Three resolution styles exist: +Gateways typically support three resolution styles: - [Path](#path) - [Subdomain](#subdomain) - [DNSLink](#dnslink) -#### Path +### Path The examples discussed above employed path resolution: @@ -115,17 +131,17 @@ This type of gateway does not provide origin isolation and should not be used fo Learn more at [Address IPFS on the web: Path Gateway](../how-to/address-ipfs-on-web.md#path-gateway) and [Path Gateway Specification](https://specs.ipfs.tech/http-gateways/path-gateway/). ::: -#### Subdomain +### Subdomain -Subdomain resolution style maintains compliance with the [single-origin policy](https://en.wikipedia.org/wiki/Same-origin_policy). The canonical form of access, `https://{CID}.ipfs.{gatewayURL}/{optional path to resource}`, causes the browser to interpret each returned file as being from a different origin. +Subdomain resolution style ensures compliance with the [single-origin policy](https://en.wikipedia.org/wiki/Same-origin_policy). The canonical form of access, `https://{CID}.ipfs.{gatewayURL}/{optional path to resource}`, ensures origin isolation per CID. ::: callout -This type of gateway does provide origin isolation and should be used for hosting web apps. +Subdomain gateways provide origin isolation and should be used for hosting web apps. Learn more at [Address IPFS on the web: Subdomain Gateway](../how-to/address-ipfs-on-web.md#subdomain-gateway) and [Subdomain Gateway Specification](https://specs.ipfs.tech/http-gateways/subdomain-gateway/). ::: -#### DNSlink +### DNSLink Whenever the content of data within IPFS changes, IPFS creates a new CID based on the content of that data. Many applications require access to the latest version of a file or website but will not know the exact CID for that latest version. The [InterPlanetary Name Service (IPNS)](ipns.md) allows a version-independent IPNS identifier to resolve into the current version's IPFS CID. @@ -141,27 +157,35 @@ DNSLink resolution occurs when the gateway recognizes an IPNS identifier contain https://{gateway URL}/ipns/{example.com}/{optional path} ``` -2. The gateway searches the DNS TXT records of the requested domain `{example.com}` for a string of the form `dnslink=/ipfs/{CID}` or `_dnslink=/ipfs/{CID}`. If found, the gateway uses the specified CID to serve up `ipfs://{CID}/{optional path}`. As with path resolution, this form of DNSLink resolution violates the single-origin policy. The domain operator may ensure single-origin policy compliance — and the delivery of the current version of content — by adding an `Alias` record in the DNS that refers to a suitable IPFS gateway; e.g., `gateway.ipfs.io`. -3. The `Alias` record redirects any access to that `example.com` to the specified gateway. Hence the browser's request to `https://{example.com}/{optional path to resource}` redirects to the gateway specified in the `Alias`. -4. The gateway employs DNSLink resolution to return the current content version from IPFS. -5. The browser does not perceive the gateway as the origin of the content and therefore enforces the single-origin policy to protect `example.com`. +2. The gateway searches the DNS TXT records on the `_dnslink.` subdomain (`_dnslink.example.com`) for a string of the form `dnslink=/ipfs/{CID}`. If found, the gateway uses the specified content identifier to find and serve up `ipfs://{CID}/{optional path}`. + +It is possible to use an HTTP gateway for serving content on the DNSLink domain itself: + +1. Point `example.com` at IP of your HTTP gateway, make sure `A`/`AAAA`/`HTTPS` records are set, and TLS termination is configured. +2. Client sends request to: + + ```bash + https://{example.com}/{optional path} + ``` + +3. Gateway detects HTTP header `Host: example.com` in the incoming request and searches DNSLink the same way as in previous example. ::: callout Learn more at [Address IPFS on the web: DNSLink Gateway](../how-to/address-ipfs-on-web.md#dnslink-gateway) and [DNSLink Gateway Specification](https://specs.ipfs.tech/http-gateways/dnslink-gateway/). ::: -### Gateway services +## Gateway URL formats -Currently HTTP gateways may access both IPFS and IPNS services: +Currently HTTP gateways typically expose both immutable IPFS and mutable IPNS (either IPNS names or DNSLink) resources using the following URL formats: -| Service | Style | Canonical form of access | -| ------- | --------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| IPFS | path | `https://{gateway URL}/ipfs/{CID}/{optional path to resource}` | -| IPFS | subdomain | `https://{CID}.ipfs.{gatewayURL}/{optional path to resource}` | -| IPFS | DNSLink | `https://{example.com}/{optional path to resource}` **preferred**, or
`https://{gateway URL}/ipns/{example.com}/{optional path to resource}` | -| IPNS | path | `https://{gateway URL}/ipns/{IPNS identifier}/{optional path to resource}` | -| IPNS | subdomain | `https://{IPNS identifier}.ipns.{gatewayURL}/{optional path to resource}` | -| IPNS | DNSLink | Useful when IPNS identifier is a domain:
`https://{example.com}/{optional path to resource}` **preferred**, or
`https://{gateway URL}/ipns/{example.com}/{optional path to resource}` | +| Service | Resolution style | Canonical form of access | +| ------- | ---------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| IPFS | path | `https://{gateway URL}/ipfs/{CID}/{optional path to resource}` | +| IPFS | subdomain | `https://{CID}.ipfs.{gatewayURL}/{optional path to resource}` | +| IPFS | DNSLink | `https://{example.com}/{optional path to resource}` **preferred**, or
`https://{gateway URL}/ipns/{example.com}/{optional path to resource}` | +| IPNS | path | `https://{gateway URL}/ipns/{IPNS identifier}/{optional path to resource}` | +| IPNS | subdomain | `https://{IPNS identifier}.ipns.{gatewayURL}/{optional path to resource}` | +| IPNS | DNSLink | Useful when IPNS identifier is a domain:
`https://{example.com}/{optional path to resource}` **preferred**, or
`https://{gateway URL}/ipns/{example.com}/{optional path to resource}` | ## Working with gateways