index.html

<!DOCTYPE html>
<html>

<head>
  <meta charset="utf-8" />
  <title>ActivityPub Discovery</title>
  <script src="https://www.w3.org/Tools/respec/respec-w3c" class="remove" defer></script>
  <script class="remove">
    // All config options at https://respec.org/docs/
    var respecConfig = {
      specStatus: "CG-DRAFT",
      latestVersion: null,
      editors: [{ name: "Evan Prodromou", url: "https://socialwebfoundation.org/@evanp" }],
      github: "swicg/activitypub-html-discovery",
      shortName: "apdisco",
      xref: "web-platform",
      group: "socialcg",
      noRecTrack: true
    };
  </script>
</head>

<body>
  <section id="abstract">
    <p>ActivityPub is a standard for publishing structured social network data on the Web in JSON-LD format. This
      document describes various methods for discovering the ActivityPub object described by an HTML page, and
      conversely the HTML page for an ActivityPub object.</p>
  </section>
  <section id="sotd">
    <p>This is a draft of the Social Web Incubator Group (SocialCG) Discovery Task Force.</p>
  </section>
  <section class="informative">
    <h2>Introduction</h2>

    <p>ActivityPub is a standard for publishing structured social network data on the Web in JSON-LD format and sharing that data from client to server and from server to server. This document describes several methods for discovering the ActivityPub object described by an HTML page, and conversely the HTML page for an ActivityPub object.</p>

    <p>
      Social data in the ActivityPub model is a "resource" like a person, image, or place. That resource has an ActivityPub JSON-LD representation (if it doesn't, it's not covered by this document!) and may have an HTML representation. The ActivityPub representation and the HTML representations each have an URL -- possibly the same URL.
    </p>

    <p>
    "Forward discovery" is any process that, given the resource's HTML representation, will return the resource's ActivityPub representation.
    </p>

    <ul>
      <li>resource HTML &rarr; resource ActivityPub</li>
    </ul>

    <p>
      There are multiple techniques for forward discovery which will be documented in this report. Some discovery techniques require the full HTML document -- its markup and content. Others only need the URL of the HTML representation.
    </p>

    <ul>
      <li>resource HTML document &rarr; resource ActivityPub</li>
      <li>resource HTML URL &rarr; resource ActivityPub</li>
    </ul>

    <p>
      When a consumer is doing discovery, it can start with the input it has -- document or URL -- and convert to the other input if it needs to.Converting an HTML document to an HTML URL usually requires access to the document's context, like a browser environment. Converting an HTML URL to a document requires fetching the document and parsing it.
    </p>

    <ul>
      <li>resource HTML document &rarr; resource HTML URL &rarr; resource ActivityPub</li>
      <li>resource HTML URL &rarr; resource HTML document &rarr; resource ActivityPub</li>
    </ul>

    <p>
    "Reverse discovery" is any process that, given the resource's ActivityPub representation, will return the resource's HTML representation.
    </p>

    <ul>
      <li>resource ActivityPub &rarr; resource HTML</li>
    </ul>

    <p>
      As with forward discovery, some reverse discovery techniques require the full ActivityPub document -- its JSON properties and content. Others only need the URL of the ActivityPub representation.
    </p>

    <ul>
      <li>resource ActivityPub document &rarr; resource HTML</li>
      <li>resource ActivityPub URL &rarr; resource HTML</li>
    </ul>

    <p>
      When a consumer is doing reverse discovery, it can start with techniques that use the input it has -- document or URL -- and switch to techniques that use the other input only if needed. Converting an ActivityPub document to an ActivityPub URL requires extracting the <code>id</code> property of the object. Converting an ActivityPub URL to a document requires fetching the document and parsing it.
    </p>

    <ul>
      <li>resource ActivityPub document &rarr; resource ActivityPub URL &rarr; resource HTML</li>
      <li>resource ActivityPub URL &rarr; resource ActivityPub document &rarr; resource HTML</li>
    </ul>

    <p>
      Some resources have a relationship to another resource, which is its <dfn>author</dfn> or creator. The author resource has an ActivityPub JSON-LD representation (again, if it doesn't, it's not covered by this document!) and may have an HTML representation.
    </p>

    <p>
    "Author discovery" is any process that, given the resource's HTML representation, will return the author resource's ActivityPub representation. Author discovery is important because many ActivityPub processes require delivering activities to the author.
    </p>

    <p>
    There are a few paths for author discovery:
    </p>

    <ul>
      <li>resource HTML &rarr; resource ActivityPub &rarr; author ActivityPub</li>
      <li>resource HTML &rarr; author HTML &rarr; author ActivityPub</li>
      <li>resource HTML &rarr; author ActivityPub</li>
    </ul>

    <p>
      Note that a resource may have an author resource with an ActivityPub representation, but not have its own ActivityPub representation. An example is an article published in a content-management system (CMS) that is ascribed to an actor with an ActivityPub account who wants to receive credit and/or feedback for the work.
    </p>

    <p>As with forward and reverse discovery, the consumer may start with the URL of the resource HTML or the content of the resource HTML.</p>

    <p>
    For some kinds of resources, especially ActivityPub actors, a Webfinger is a more readable alternative to the ActivityPub representation's URL. In these cases, the forward discovery process and author discovery process might include the Webfinger discovery process as an intermediate step:
    </p>

    <ul>
      <li>resource HTML &rarr; resource Webfinger &rarr; resource ActivityPub</li>
      <li>resource HTML &rarr; author Webfinger &rarr; author ActivityPub</li>
    </ul>

    <p>
      In this document, the terms "<a href="https://www.w3.org/TR/activitystreams-core/#publishers">publisher</a>" and "<a href="https://www.w3.org/TR/activitystreams-core/#consumers">consumer</a>" are used as  in Activity Streams 2.0 Core. The terms are extended to include implementations that publish or consume HTML representations of resources with ActivityPub representations.
    </p>

    <p>
      In this document, we describe several methods of forward discovery, reverse discovery, and author discovery. Different methods are implemented by different publishers and consumers, and have different trade-offs in terms of complexity, performance, and reliability. A section on verification explains how to verify that the discovered information is accurate. The final sections define best practices for publishers and consumers to maximize interoperability and minimize development effort.
    </p>

    <section>
      <h3>Applicable resource types</h3>
      <p>Unless otherwise specified, the techniques described below can be used with any Activity Streams 2.0 types. The best-defined groups of AS2 types for HTML discovery are actor types:</p>
      <ul>
        <li><code>Person</code></li>
        <li><code>Application</code></li>
        <li><code>Service</code></li>
        <li><code>Group</code></li>
        <li><code>Organization</code></li>
      </ul>
      <p>
        and digital content types:
      </p>
      <ul>
        <li><code>Note</code></li>
        <li><code>Article</code></li>
        <li><code>Image</code></li>
        <li><code>Video</code></li>
        <li><code>Audio</code></li>
        <li><code>Document</code></li>
        <li><code>Page</code></li>
      </ul>
      <p>
        Other ActivityPub types are less likely to have their own HTML representations, such as activity types.
      </p>
      <p>
        <code>Collection</code> types are often better represented by an object they are closely related to. For example, an actor's <code>outbox</code>
        collection is often provided on the actor's profile page, which is a representation of the actor. Similarly, the <code>likes</code> or <code>replies</code> of an <code>Image</code> object are often provided on the object's page, and don't have an independent HTML representation. That said, this document does not preclude the possibility of HTML representations for collection types.
      </p>
    </section>
    <section>
      <h3>Motivating user stories</h3>
      <p>These are some of the user stories that motivate this work.</p>
      <p>
      <ul>
        <li><em>As a Web user, when viewing a Web page, I want to <code>Like</code> the contents, so that I can share it with my followers, let the author know I appreciated it, and save it to my <code>liked</code> collection.</em> A browser-based ActivityPub API client could submit a <code>Like</code> activity to the user's ActivityPub server, but it would need to know the ID of the ActivityPub equivalent of the page.</li>
        <li><em>As a Web user, when viewing a Web page, I want to <code>Announce</code> the contents, so that I can share it with my followers.</em> A browser-based ActivityPub API client could submit a <code>Announce</code> activity to the user's ActivityPub server, but it would need to know the ID of the ActivityPub equivalent of the page.</li>
        <li><em>As a Web user, when viewing an actor's profile, I want to <code>Follow</code> the actor, so that I can get updates about their activities in my inbox.</em> A browser-based ActivityPub API client could submit a <code>Follow</code> activity to the user's ActivityPub server, but it would need to know the actor ID of the actor whose profile is being viewed.</li>
        <li><em>As a Web user, when viewing an actor's profile, I want to send them a direct message, to share information or make a connection.</em> A browser-based ActivityPub API client could send a <code>Note</code> activity to the user's ActivityPub server, with the profile actor's ID in the <code>to</code> property, but the API client would need to know the actor ID of the actor whose profile is being viewed.</li>
        <li><em>As a Web user, when viewing an image page in my browser, I want to send a direct message to the author, to share information or give feedback.</em> A browser-based ActivityPub API client could send a <code>Note</code> activity to the user's ActivityPub server, with the image's author's actor ID in the <code>to</code> property, but the API client would need to know the actor ID of the author of the image.</li>
        <li><em>As a social network client user, when I see a link in the content of a social media post, I want to load the resource for that link into my social media client without launching an external browser, so that I have a smooth user experience and can fully interact with the content.</em> Given a link like <code>https://html.example/blog/page-1.html</code>, an ActivityPub API client could discover the related ActivityPub ID <code>https://ap.example/api/page-1.jsonld</code>, retrieve it with machine-readable metadata, and provide affordances for interacting with the object, such as liking, sharing, or replying.</li>
        <li><em>As a social network client user, when I see a Webfinger handle mentioned in a social media post, I want to follow a link to the actor's profile page, so I can learn more about them.</em> Converting a Webfinger address into an ActivityPub actor ID is well-covered in the ActivityPub and Webfinger report, but the next step of converting an actor ID to a profile page is not.</li>
        <li><em>As a social network client user, when I see a Webfinger handle mentioned in a social media post, I want to load the actor's profile into my social network client, so I can interact directly with it.</em> If the link in the <code>content</code> is to the actor's profile page, it's necessary to be able to turn that link into an actor ID to allow more inspection of the actor and affordances like following or blocking.</li>
      </ul>
      </p>
    </section>
    <section>
      <h3>URLs in examples</h3>
      <p>This document uses a consistent format for example URLs:
      <pre>
        <code>https://{name}.example/{path}/{type}-{ordinal}{?ext}</code>
      </pre>
      </p>
      <p>Where:</p>
      <ul>
        <li><code>{name}</code> is the domain name of the server. There are three default domain names used:
          <ul>
            <li><code>ap.example</code> - A server that primarily provides ActivityPub JSON-LD documents.</li>
            <li><code>html.example</code> - A server that primarily provides HTML documents.</li>
            <li><code>mixed.example</code> - A server that provides both ActivityPub JSON-LD and HTML documents.</li>
          </ul>
          Other names may be used when additional examples are required.
        </li>
        <li><code>{path}</code> is the path to the object. It should be opaque; none of the paths in this document have
          semantic meaning unless otherwise specified.</li>
        <li><code>{type}</code> is the content type of the resource. This will usually be the lowercase version of the
          ActivityPub object type, such as <code>Note</code>, <code>Person</code>, or <code>Image</code>.</li>
        </li>
        <li><code>{ordinal}</code> is an ordinal number, when multiple objects are being described in the same
          discussion.</li>
        <li><code>{ext}</code> is an optional "file extension" that indicates the Internet media type of the resource,
          including:
          <ul>
            <li><code>.jsonld</code> for JSON-LD objects</li>
            <li><code>.html</code> for HTML documents</li>
            <li><code>.png</code> for PNG images, <code>.jpg</code> for JPEG images, etc.</li>
          </ul>
          Extensions may be left off of the URL, especially if the same URL will be used for multiple media types. Note that including the <code>.jsonld</code> extension is not common practice for ActivityPub <code>id</code> values. It is used in this report to highlight that the URL is for a JSON-LD representation.
        </li>
      </ul>
      <p>The structure used in the examples is merely mnemonic and non-normative. None of the techniques described in this document depend on a particular URL structure, unless otherwise specified.</p>
    </section>
  </section>
  <section id="discovery">
    <h2>HTML to ActivityPub</h2>
    <p>This form of discovery, <dfn>forward discovery</dfn>, will identify an ActivityPub JSON-LD resource based on the HTML representation of the object.</p>
    <section>
      <h3>URL as input</h3>
      <p>These discovery techniques require an URL as input. Consumers may start with URLs if they are extracting links from RSS feeds or microblogging content, or when converting from other social networking platform content.</p>
      <section>
        <h4>Content negotiation</h4>
        <p><a href="https://en.wikipedia.org/wiki/Content_negotiation">Content negotiation</a> is a catch-all term for
          ways of negotiating the representation of a resource through the HTTP protocol. In this document, it will
          specifically cover <a href="https://datatracker.ietf.org/doc/html/rfc7231#section-3.4.1">proactive negotiation</a>
          using the <a href="https://datatracker.ietf.org/doc/html/rfc7231#section-5.3.2">Accept</a> header.
        </p>
        <p>
          Given the URL for an HTML document, such as <code>https://mixed.example/some/path/to/note-1</code>, a consumer
          could attempt to retrieve the corresponding ActivityPub JSON-LD object using this HTTP request:
        </p>
        <pre class="HTTP">
          GET /some/path/to/note-1 HTTP/1.1
          Host: mixed.example
          Accept: application/activity+json, application/ld+json, application/json
        </pre>
        <p>
          A compliant server may respond with the ActivityPub JSON-LD object in the body of the response:
        </p>
        <pre class="HTTP">
          HTTP/1.1 200 OK
          Content-Type: application/activity+json

          {
            "@context": "https://www.w3.org/ns/activitystreams",
            "id": "https://mixed.example/some/path/to/note-1",
            "type": "Article",
            "content": "This is a note."
          }
        </pre>
        <p>
          This is typically used when the ActivityPub server and the HTML server are implemented in the same software package.
          Because this has historically been the case for many implementations, some consumers expect this behavior to be the
          default.
          </p>
          <p>
          Alternately, the server may respond with a <code>308 Permanent Redirect</code> to indicate the location of the JSON-LD
          representation.
        </p>
        <pre class="HTTP">
          HTTP/1.1 308 Permanent Redirect
          Location: https://mixed.example/different/path/to/note-1.jsonld
        </pre>
        <section>
          <h5>Content negotiation failure</h5>

          <p>
            If the server does not support content negotiation, it may respond with a <code>406 Not Acceptable</code> status
            code.
          </p>
          <pre>
            HTTP/1.1 406 Not Acceptable
            Content-Type: text/plain

            No representation matching this request could be found.
          </pre>
          <p>Less compliant servers may ignore the <code>Accept</code> header altogether and return the HTML content regardless:
          </p>
          <pre class="HTTP">
            HTTP/1.1 200 OK
            Content-Type: text/html

            &lt;html&gt;
            &lt;head&gt;
            &lt;title&gt;Note 1&lt;/title&gt;
            &lt;/head&gt;
            &lt;body&gt;
            &lt;p&gt;This is a note.&lt;/p&gt;
          </pre>
          <p>
            A more difficult failure mode to detect arises when the server does not support ActivityPub, but does support content negotiation for another JSON format. Such a server returns a <code>200 OK</code> status code with a JSON
            object that does not use JSON-LD, or JSON-LD object that does not use the Activity Streams 2.0 vocabulary:
          </p>
          <pre class="HTTP">
            HTTP/1.1 200 OK
            Content-Type: application/json

            {
              "property": "value",
              "otherProperty": "otherValue"
            }
          </pre>
        </section>
      </section>
      <section>
        <h4>HTTP Link header</h4>
        <p>
          The <a href="https://datatracker.ietf.org/doc/html/rfc8288">HTTP Link header</a> can be used to indicate an
          alternative representation of a resource. A consumer can use this header to discover the ActivityPub JSON-LD object
          for an HTML page.
        </p>
        <p>
          Given the URL for an HTML document, such as <code>https://html.example/user/test1/article-1</code>, the consumer can
          use a HTTP <code>HEAD</code> request to get the headers for the resource, which will hopefully include the
          <code>Link</code> header:
        </p>
        <pre class="HTTP">
          HEAD /user/test1/article-1 HTTP/1.1
          Host: html.example
        </pre>
        <p>
          A compliant server will respond with the headers for the resource:
        </p>
        <pre class="HTTP">
        HTTP/1.1 200 OK
        Link: &lt;https://ap.example/api/articles/article-1.jsonld&gt;; rel="alternate"; type="application/activity+json"
        </pre>
        <p>
          The link header with the <code>alternate</code> relation type, and an ActivityPub-compatible media type, indicates
          that the ActivityPub JSON-LD object is available at the linked URL.
        </p>
        <p>
          This can be a very efficient method of discovery, since the consumer does not need to download the entire HTML
          document and parse its contents.
        </p>
        <p>
          Servers may also include the <code>Link</code> header in the response to a <code>GET</code> request for the HTML page.
        </p>
        <pre class="HTTP">
          GET /user/test1/article-1 HTTP/1.1
          Host: html.example
          </pre>
        <p>
          A compliant server will respond with the headers for the resource:
        </p>
        <pre class="HTTP">
        HTTP/1.1 200 OK
        Link: &lt;https://ap.example/api/articles/article-1.jsonld&gt;; rel="alternate"; type="application/activity+json"
        Content-type: text/html

        &lt;html&gt;
        &lt;head&gt;
        ...
        </pre>
        <section>
          <h5>Link header failure</h5>
          <p>
            Some servers may return the full body of the HTML document in response to a <code>HEAD</code> request, without
            including a <code>Link</code> header.
          </p>
          <pre class="HTTP">
          HTTP/1.1 200 OK
          Content-type: text/html

          &lt;html&gt;
          &lt;head&gt;
          ...
          </pre>
        </section>
      </section>
      <section>
        <h3>Webfinger</h3>
        <p><a href="https://datatracker.ietf.org/doc/html/rfc7033">Webfinger</a>
        is a standard for discovering metadata about a resource identified with an URL. Finding the ActivityPub URL for an actor identified with an <code>acct:</code> URL is well documented in the <a href="https://www.w3.org/community/reports/socialcg/CG-FINAL-apwf-20240608/">ActivityPub and Webfinger</a> report. However, Webfinger can be used to find metadata about other resources, including HTML pages with <code>https:</code> URLs.</p>
        <p>Given an URL for a document, like <code>https://html.example/group-1.html</code>, a GET request can be made to an URL in the <code>/.well-known/</code> path of the domain for the URL, as follows:</p>
        <pre class="HTTP">
          GET /.well-known/webfinger?resource=https%3A%2F%2Fhtml.example%2Fgroup-1.html HTTP/1.1
          Host: html.example
        </pre>
        <p>Note that the <code>/.well-known/webfinger</code> path is fixed and required for Webfinger.</p>
        <p>A compliant server will respond with the metadata for the resource:</p>
        <pre class="HTTP">
          HTTP/1.1 200 OK
          Content-Type: application/jrd+json

          {
            "subject": "https://html.example/group-1.html",
            "links": [
              {
                "rel": "alternate",
                "type": "application/activity+json",
                "href": "https://ap.example/api/groups/group-1.jsonld"
              }
            ]
          }
        </pre>
        <p>Note that unlike other URLs used in the examples in this report, the <code>/.well-known/webfinger</code> path is fixed and required for Webfinger.</p>
        <p>The JRD JSON format includes <a href="https://datatracker.ietf.org/doc/html/rfc7033#section-4.4">a number of properties</a>, as defined in the Webfinger RFC 7033. The relevant data structure in this example is the object in the <code>links</code> array with the <code>rel</code> property set to <code>alternate</code> and the <code>type</code> property set to <code>application/activity+json</code>, an ActivityPub-compatible media type. The <code>href</code> property of this link is URL of the ActivityPub equivalent for the HTML page.</p>
        <section>
          <h4>Webfinger failure</h4>
          <p>Not all Webfinger-aware servers return JRD documents for <code>https</code> URLs. Others might only return JRD documents for URLs that represent actors, such as registered users.</p>
          <p>As with other link-relation-based discovery mechanisms, like the HTTP Link header or the &lt;link&gt; element, a JSON or JSON-LD media in the link's <code>type</code> property might not indicate an ActivityPub URL, but some other JSON or JSON-LD object.</p>
        </section>
      </section>
    </section>
    <section>
      <h3>Document as input</h3>
      <p>Alternately, a consumer may start with the full contents of an HTML document, including markup and other content. For example, a browser-based application may have access to the HTML loaded in the browser window. It's also usually possible to extract the URL from the environment -- for example, using the <code>document.location</code> property in a JavaScript environment. But using the document content for discovery can return the ActivityPub equivalent without the HTTP requests that discovery by URL requires, saving some time and network traffic.</p>
      <section>
        <h4>HTML &lt;link&gt; element</h4>
        <p>The <a href="https://html.spec.whatwg.org/multipage/semantics.html#the-link-element">link</a> element is a metadata element used in the <code>&lt;head&gt;</code> section of an HTML document. It provides
        links for the whole document, using a number of different link relations.</p>
        <p>To indicate its equivalent ActivityPub object, the HTML page at <code>https://html.example/watch/video-1.html</code> could include the following link element:</p>
        <pre class="HTML">
          &lt;!doctype html&gt;
          &lt;html&gt;
            &lt;head&gt;
              &lt;title&gt;Video 1&lt;/title&gt;
              &lt;link
                rel="alternate"
                type="application/activity+json"
                href="https://ap.example/api/descriptors/video-1.jsonld" /&gt;
            &lt;/head&gt;
            &lt;body&gt;
              &lt;!-- rest of the page --&gt;
            &lt;/body&gt;
          &lt;/html&gt;
        </pre>
        <p>Consumers need to parse the HTML to find the <code>link</code> element with the <code>alternate</code> relation and an ActivityPub-compatible media type as <code>type</code>. This can be slow and complicated.</p>
        <section>
          <h5>Link element failure</h5>
          <p>Some servers may include a <code>link</code> element with an
            <code>alternate</code> relation and with a JSON type or JSON-LD type that does not link to an ActivityPub resource.</p>
          <pre class="HTML">
            &lt;!doctype html&gt;
            &lt;html&gt;
              &lt;head&gt;
                &lt;title&gt;Video 1&lt;/title&gt;
                &lt;link
                  rel="alternate"
                  type="application/json"
                  href="https://api.example/unrelated/videodescriptor.json" /&gt;
              &lt;/head&gt;
              &lt;body&gt;
                &lt;!-- rest of the page --&gt;
              &lt;/body&gt;
            &lt;/html&gt;
          </pre>
        </section>
      </section>
      <section>
        <h3>HTML &lt;a&gt; element</h3>
        <p>The <a href="https://html.spec.whatwg.org/multipage/text-level-semantics.html#the-a-element">a</a> element is an element used in the <code>&lt;body&gt;</code> section of an HTML document. It can be used to define relationships with other documents, with the benefit that the link is (usually) visible and clickable by a reader.</p>
        <p>To indicate its equivalent ActivityPub object, the HTML page at <code>https://html.example/profiles/person-1.html</code> could include the following <code>a</code> element:</p>
        <pre class="HTML">
          &lt;!doctype html&gt;
          &lt;html&gt;
            &lt;head&gt;
              &lt;title&gt;Person 1&lt;/title&gt;
            &lt;/head&gt;
            &lt;body&gt;
              &lt;a
                rel="alternate"
                type="application/activity+json"
                href="https://ap.example/users/person-1.jsonld" &gt;
                Actor data for Person 1
              &lt;/a&gt;
              &lt;!-- rest of the page --&gt;
            &lt;/body&gt;
          &lt;/html&gt;
        </pre>
        <p>Consumers will need to parse the HTML to find the <code>a</code> element with the <code>alternate</code> relation and an ActivityPub-compatible media type as <code>type</code>. This can be even more slow and complicated than with the <code>link</code> header. The <code>link</code> header is usually in the first few kilobytes of a document, and will usually be nested only 2 levels below the document in the DOM tree. An <code>a</code> element may be anywhere in the <code>body</code>, maybe nested very deep in the tree.</p>
        <section>
          <h4>a element failure</h4>
          <p>As with the <code>link</code> element, some servers may include an   <code>a</code> element with an
            <code>alternate</code> relation and with a JSON type or JSON-LD type that does not link to an ActivityPub resource.
          </p>
          <p>
            In addition, many content management systems allow end users to set
            <code>rel</code> and other properties on <code>a</code> elements, which may result in false matches. Even more than with other methods, using the <code>a</code> element for discovery requires reverse discovery for confirmation (see <a href="#best-practices-for-consumers">Best practices for consumers</a>).
          </p>
          </pre>
        </section>
      </section>

      <section>
        <h3>Embedded JSON-LD</h3>
        <p>HTML documents can include <a href="https://www.w3.org/TR/json-ld11/">JSON-LD</a> data in a <code>&lt;script&gt;</code> element in the <code>&lt;head&gt;</code> section of the document. This data can be used to provide metadata about the document, including its equivalent ActivityPub object.</p>
        <p>Given a page that shows an image at <code>https://html.example/gallery/image-17.html</code>, the HTML for the page could look like this:</p>
        <pre>
          &lt;!DOCTYPE html&gt;
          &lt;html lang=&quot;en&quot;&gt;
            &lt;head&gt;
              &lt;title&gt;Image 17&lt;/title&gt;
              &lt;script type=&quot;application/ld+json&quot;&gt;
              {
                &quot;@context&quot;: &quot;https://www.w3.org/ns/activitystreams&quot;,
                &quot;type&quot;: &quot;Image&quot;,
                &quot;id&quot;: &quot;https://ap.example/api/images/image-17.jsonld&quot;,
                &quot;url&quot;: [
                  {
                    &quot;type&quot;: &quot;Link&quot;,
                    &quot;mediaType&quot;: &quot;text/html&quot;,
                    &quot;href&quot;: &quot;https://html.example/gallery/image-17.html&quot;
                  }
                ]
              }
              &lt;/script&gt;
            &lt;/head&gt;
            &lt;body&gt;
                &lt;h1&gt;Image 17&lt;/h1&gt;
                &lt;p&gt;&lt;img src=&quot;https://html.example/images/image-17.png&quot;&gt;&lt;/p&gt;
            &lt;/body&gt;
          &lt;/html&gt;
        </pre>
        <p>This embedded JSON-LD specifies that an ActivityPub object with the ID <code>https://ap.example/api/images/image-17.jsonld</code> exists, and that it has an HTML page <code>url</code> at <code>https://html.example/gallery/image-17.html</code>, that is, the current page's URL. This is a roundabout, but clear, way to specify the ActivityPub ID of the current page.</p>
        <p>Consumers need to parse the HTML page, and the embedded JSON-LD, to extract the ActivityPub object ID. An advantage to this technique is that other properties of the ActivityPub object can be embedded as well; however, to confirm those properties, the consumer will need to fetch the object from its canonical URL, the ID, anyways.</p>
        <section>
          <h4>Embedded JSON-LD failure</h4>
          <p>
            Complicated structures for the <code>url</code> property may make it hard to confirm that the object's URL is the same as the current page's.
          </p>
          <p>
            Embedded JSON-LD is very popular for embedding <a href="https://schema.org/">Schema.org</a> metadata. This can lead to false positives when looking for ActivityPub objects.
          </p>
        </section>
      </section>
    </section>
  </section>
  <section id="reverse-discovery">
    <h2>ActivityPub to HTML</h2>
    <p>
        <dfn>Reverse discovery</dfn>, in this report, means identifying the HTML
        page that represents the same object as an ActivityPub JSON-LD object. This is necessary for user stories like creating a link to an actor in microsyntax.
    </p>
    <section>
      <h3>URL as input</h3>
      <p>These techniques require the ActivityPub object's URL as input. Often, the object's URL is obtained either as a property of another ActivityPub object, or from the <code>id</code> property of the ActivityPub JSON-LD document.</p>
      <p>If these techniques aren't successful, the consumer can use the URL to fetch the ActivityPub JSON-LD document, and then use a reverse discovery technique that takes a document as input.</p>
      <section>
        <h4>Content negotiation</h4>
        <p>As with forward discovery, it's possible for the HTML and JSON-LD representations of an object to be found at the same URL.</p>
        <section>
          <h5>Examples</h5>
          <p>
            Given the URL for an ActivityPub object, such as <code>https://mixed.example/some/path/to/note-1</code>, a consumer
            could attempt to retrieve the corresponding HTML resource using this HTTP request:
          </p>
          <pre class="HTTP">
            GET /some/path/to/note-1 HTTP/1.1
            Host: mixed.example
            Accept: text/html
          </pre>
          <p>
            A compliant server may respond with the HTML document in the body of the response:
          </p>
          <pre class="HTTP">
            HTTP/1.1 200 OK
            Content-Type: text/html

            &lt;html&gt;
            &lt;head&gt;
            &lt;title&gt;Note 1&lt;/title&gt;
            &lt;/head&gt;
            &lt;body&gt;
            &lt;p&gt;This is a note.&lt;/p&gt;
          </pre>
          <p>
            Alternately, the server may respond with a <code>308 Permanent Redirect</code> to indicate the location of the HTML
            representation.
          </p>
          <pre class="HTTP">
            HTTP/1.1 308 Permanent Redirect
            Location: https://mixed.example/different/path/to/note-1.html
          </pre>
        </section>
        <section>
          <h5>Content negotiation failure</h5>

          <p>
            If the server does not support content negotiation, it may respond with a <code>406 Not Acceptable</code> status
            code.
          </p>
          <pre>
            HTTP/1.1 406 Not Acceptable
            Content-Type: text/plain

            No representation matching this request could be found.
          </pre>
          <p>Less compliant servers may ignore the <code>Accept</code> header altogether and return the JSON-LD content regardless:
          </p>
          <pre class="HTTP">
            HTTP/1.1 200 OK
            Content-Type: application/activity+json

            {
              "@context": "https://www.w3.org/ns/activitystreams",
              "id": "https://mixed.example/some/path/to/note-1",
              "type": "Note",
              "content": "This is a note."
            }
          </pre>
          <p>
        </section>
      </section>
      <section>
        <h4>HTTP Link header</h4>
        <p>As with forward discovery, it is possible to use the <code>Link</code> header to identify an HTML page related to a given ActivityPub JSON-LD resource. A <code>Link</code> header with the <code>alternate</code> link relation and a <code>type</code> equal to <code>text/html</code> indicates an HTML page representing the same object.</p>
        <p>The advantage of this technique is that it does not require downloading and parsing the JSON-LD content of the ActivityPub object. The <code>Link</code> header has fewer options for formatting than other methods such as the <code>url</code> property, for example, making it slightly easier for consumers.</p>
        <section>
        <h5>Example</h5>
        <p>Given an ActivityPub JSON-LD object at <code>https://ap.example/some/path/person-1.jsonld</code>, a consumer could use a <code>HEAD</code> HTTP request to get the relevant headers for the resource:</p>
        <pre class="HTTP">
          HEAD /some/path/person-1.jsonld HTTP/1.1
          Host: ap.example
        </pre>
        <p>The publisher would respond with the HTTP headers, including a <code>Link</code> header:</p>
        <pre class="HTTP">
          HTTP/1.1 200 OK
          Content-Type: application/activity+json
          Link: &lt;https://html.example/profiles/person-1.html&gt;; rel="alternate"; type="text/html"
        </pre>
        </section>
        <section>
          <h5>HTTP Link header failure</h5>
          <p>Some non-compliant HTTP servers will send the full body of the resource in the response to the <code>HEAD</code> request.</p>
        </section>
      </section>
      <section>
        <h4>Webfinger</h4>
        <p>The Webfinger protocol can be used to find an HTML page related to an ActivityPub object in a number of ways.</p>
        <p>The consumer can identify the resource for the a Webfinger query in two ways. First, the <code>id</code> property, usually an <code>https</code> URL, can be passed as the <code>resource</code> parameter for the Webfinger query. Alternately, if the ActivityPub object is an <a href="https://www.w3.org/TR/activitypub/#actors">actor</a>, an <code>acct</code> URL in the format <code>acct:username@domain.example</code> can be constructed using the technique for <a href="https://www.w3.org/community/reports/socialcg/CG-FINAL-apwf-20240608/#reverse-discovery">Webfinger reverse discovery</a>. This <code>acct</code> URL can be used as the <code>resource</code> parameter for the Webfinger query.</p>
        <p>The publisher can provide a link to the HTML representation of the object in the JRD output of the Webfinger query in at least two ways.
        </p>
        <p>
        First, the <code>links</code> property of the output object can contain a link object with a <code>rel</code> property set to <code>alternate</code> and the <code>type</code> property set to <code>text/html</code>. If such a link exists, its <code>href</code> property is the URL of the related HTML page.
        </p>
        <p>Second, the <code>links</code> property of the JRD output object may include an object with a <code>rel</code> property set to <code>http://webfinger.net/rel/profile-page</code>. This is defined to be "the main home/profile page that a human should visit when getting info about that webfinger account." (<a href="https://webfinger.net/rel/">https://webfinger.net/rel/</a>) It is not guaranteed to be HTML, but a <code>type</code> property can further define that. Per the definition, "it&apos;s likely text/html if it&apos;s for users."</p>
        <p>An advantage of using Webfinger for discovery is that it is widely implemented by ActivityPub publishers to enable using <code>acct</code> URLs as identities.</p>
        <section>
          <h5>Examples</h5>
          <p>
            Given an ActivityPub <code>Place</code> object at <code>https://ap.example/geo/place-7.jsonld</code>, a consumer could use a Webfinger query to find the HTML page for the object:
          </p>
          <pre class="HTTP">
            GET /.well-known/webfinger?resource=https%3A%2F%2Fap.example%2Fgeo%2Fplace-7.jsonld HTTP/1.1
            Host: ap.example
          </pre>
          <p>Note that the <code>/.well-known/webfinger</code> path is fixed and required for Webfinger.</p>
          <p>The publisher could return the following JRD output:</p>
          <pre class="JSON">
            {
              "subject": "https://ap.example/geo/place-7.jsonld",
              "links": [
                {
                  "rel": "alternate",
                  "type": "text/html",
                  "href": "https://html.example/map/nl/ams/17921.html"
                }
              ]
            }
          </pre>
          <p>
            In this example, the <code>links</code> property of the JRD object contains a single object with a <code>rel</code> property set to <code>alternate</code> and a <code>type</code> property set to <code>text/html</code>. The <code>href</code> property of this object is the URL of the HTML page representing the object.
          </p>
          <p>
            Alternately, given an ActivityPub <code>Person</code> object at <code>https://ap.example/profiles/person-19.jsonld</code>, the consumer could construct an <code>acct</code> URL as <code>acct:person-19@ap.example</code> and use it as the <code>resource</code> parameter for the Webfinger query:
          </p>
          <pre class="HTTP">
            GET /.well-known/webfinger?resource=acct%3Aperson-19%40ap.example HTTP/1.1
            Host: ap.example
          </pre>
          <p>Note that the <code>/.well-known/webfinger</code> path is fixed and required for Webfinger discovery.</p>
          <p>The publisher could return the following JRD output:</p>
          <pre class="JSON">
            {
              "subject": "acct:person-19@ap.example",
              "links": [
                {
                  "rel": "http://webfinger.net/rel/profile-page",
                  "type": "text/html",
                  "href": "https://html.example/profiles/person-19.html"
                }
              ]
            }
          </pre>
          <p>In this output, the <code>http://webfinger.net/rel/profile-page</code> relationship identifies an HTML page for the <code>Person</code> object.</p>
        </section>
        <section>
          <h5>Webfinger failure</h5>
          <p>Some servers may not return JRD documents for <code>https</code> URLs. Others might only return JRD documents for URLs that represent actors, such as registered users.</p>
        </section>
      </section>
    </section>
    <section>
      <h3>Document as input</h3>
      <p>These techniques require the ActivityPub JSON-LD document as the input for the process. The document can be obtained through delivery via the ActivityPub protocol, or through the ActivityPub API, or by other means. </p>
      <p>If none of these techniques are successful, the consumer can obtain the URL of the object from the <code>id</code> property, and then try one or more of the techniques that require an URL.</p>
      <section>
        <h4><code>url</code> property</h4>
        <p>ActivityPub objects can have an optional <code>url</code> property, which "[i]dentifies one or more links to representations of the object." The property is the preferred way to indicate a corresponding HTML page for an ActivityPub object.
        </p>
        <p>As with many Activity Vocabulary properties, this can have several formats:
        </p>
        <ul>
          <li>A string. In this case, the string is the URL itself.</li>
          <li>A <code>Link</code> object. This structure is used to provide additional information about the link, including the <code>mediaType</code>. For an equivalent HTML representation, the <code>mediaType</code> property will be "text/html". The <code>href</code> property of the <code>Link</code> object is the URL.</li>
          <li>An array: One or more strings and/or <code>Link</code> objects.</li>
        </ul>
        <section>
          <h5>Examples</h5>
          <p>In this example, the <code>url</code> property is only a string.</p>
          <pre class="HTTP">
          {
            "@context": "https://www.w3.org/ns/activitystreams",
            "id": "https://ap.example/some/path/person-1.jsonld",
            "type": "Person",
            "name": "Person One",
            "url": "https://html.example/profile/person-1.html"
          }
          </pre>
          <p>In the next example, the <code>url</code> property is a full <code>Link</code>-type object with <code>mediaType</code> property
          equal to "text/html".</p>
          <pre class="HTTP">
          {
            "@context": "https://www.w3.org/ns/activitystreams",
            "id": "https://ap.example/geo/place-17.jsonld",
            "type": "Place",
            "nameMap": {
              "en": "Berlin"
            },
            "url": {
              "type": "Link",
              "mediaType": "text/html",
              "href": "https://html.example/map/de/ber/ber.html"
            }
          }
          </pre>
          <p>
            In this final example, the <code>url</code> property is an array of <code>Link</code>-type objects with different <code>mediaType</code> properties.
          </p>
          <pre class="HTTP">
          {
            "@context": "https://www.w3.org/ns/activitystreams",
            "id": "https://ap.example/photos/gallery/image-3.jsonld",
            "type": "Image",
            "summary": "Jason and Carol at the lake house",
            "url": [
              {
                "type": "Link",
                "mediaType": "text/html",
                "href": "https://html.example/gallery/3.html"
              },
              {
                "type": "Link",
                "mediaType": "image/webp",
                "href": "https://upload.example/files/08/17/2021/lakehouse.webp"
              }
            ]
          }
          </pre>
        </section>
        <section>
          <h5><code>url</code> property failure</h5>
          <p>
            When the <code>url</code> property is only a string, it may not
            represent an HTML page. Especially for objects with binary content types, like <code>Image</code>, <code>Video</code>, and <code>Audio</code>, the <code>url</code> property is often used for the URL of the respective binary representation of the object.
          </p>
          <p>
            The <code>mediaType</code> of <code>Link</code>-type objects in the <code>url</code> property is not always defined, and when it is defined, it is not always "text/html".
          </p>
          <p>
            The property is defined only for representations of the current object. However, the <code>Link</code>-type object can have a link relation property, <code>rel</code>. Publishers may misuse the <code>url</code> property to including links that aren't a representation of the object, but instead a related object, like "next" or "author".
          </p>
        </section>
      </section>
    </section>
  </section>
  <section id="author-discovery">
    <h2>HTML to author ActivityPub</h2>
    <p>This section describes various methods for determining the ActivityPub ID of the author of an object represented by an HTML page.</p>
    <section>
      <h3>Discover equivalent object</h3>
      <p>One way to discover the author of an object represented by an HTML page is to first discover the object's ActivityPub JSON-LD representation, then use that information to determine the author.</p>
      <p>Any of the methods in the <a href="#discovery">forward discovery</a> section can be used to find the URL of the ActivityPub JSON-LD resource. Fetching the resource will retrieve the full JSON-LD representation of the object.</p>
      <p>Three main properties are used, in ActivityPub, to identify the author of an object:</p>
      <ul>
        <li><a href="https://www.w3.org/TR/activitystreams-vocabulary/#dfn-attributedto">attributedTo</a>. This is the primary property for defining the author of an object. It is primarily used for content-type objects, such as <code>Note</code>, <code>Image</code> or <code>Video</code>, but can also be used for other object types.</li>
        <li><a href="https://www.w3.org/TR/activitystreams-vocabulary/#dfn-actor">actor</a>. This identifies the actor who performed activity-type objects, such as a <code>Like</code> or <code>Question</code> activity.</li>
        <li><a href="https://w3c-ccg.github.io/security-vocab/#owner">owner</a>. This property is almost exclusively used for <a href="https://w3c-ccg.github.io/security-vocab/#publicKey">publicKey</a> objects.</li>
      </ul>
      <p>Each of these properties can have values that are a string, a JSON object, or a JSON array consisting of strings and/or JSON objects. If the value is a string, it is the URL of the author object. If the value is an object, its <code>id</code> property is the URL of the ActivityPub JSON-LD object. For an array, each item can be resolved as either a string or an object.</p>
      <p>The advantage of this method is that it does not require a separate discovery path for authors.</p>
      <section>
        <h4>Example</h4>
        <p>Given an HTML page at <code>https://html.example/note-1.html</code> that represents a <code>Note</code> object, the consumer can use an HTML &lt;link&gt; header to identify the URL of the ActivityPub note:</p>
        <pre class="HTML">
          &lt;link
            rel="alternate"
            type="application/activity+json"
            href="https://ap.example/api/notes/note-1.jsonld" /&gt;
        </pre>
        <p>Fetching the content at <code>https://ap.example/api/notes/note-1.jsonld</code> will return the JSON-LD representation of the note:</p>
        <pre class="JSON">
          {
            "@context": "https://www.w3.org/ns/activitystreams",
            "id": "https://ap.example/api/notes/note-1.jsonld",
            "type": "Note",
            "content": "This is a note.",
            "attributedTo": {
              "id": "https://ap.example/profiles/person-1.jsonld",
              "type": "Person",
              "name": "Person One"
            }
          }
        </pre>
        <p>
          In this example, the <code>attributedTo</code> property is an object with an <code>id</code> property that is the URL of the author object.
        </p>
      </section>
      <section>
        <h4>Object discovery failure</h4>
        <p>This discovery method can fail if the ActivityPub object requires authentication to be fetched -- which is often the case for content that was not made available to the public by the author. A simple consumer may not have the context necessary to use OAuth 2.0, HTTP Signature, or other authentication methods.</p>
        <p>Not all ActivityPub implementations include the <code>attributedTo</code> or <code>actor</code> property for an object in its default representation.</p>
      </section>
    </section>
    <section>
      <h3>URL as input</h3>
      <p>This section describes techniques for discovering an ActivityPub author object using the HTML representation's URL as an input. The HTML representation's URL may come from ActivityPub properties, links in other content, or by other means.</p>
      <p>If these techniques do not succeed, consumers can fetch the content of the HTML document and use one of the techniques that require HTML as input.</p>
      <section>
        <h4>HTTP Link header</h4>
        <p>The <code>Link</code> header can also be used to identify the author of an HTML page. The <a href="https://html.spec.whatwg.org/multipage/links.html#link-type-author">author</a> link type identifies the author of a resource. So, a <code>Link</code> header with a <code>rel</code> property equal to "author" and a <code>type</code> property equal to "application/activity+json" will link to the ActivityPub representation of the author of the object.</p>
        <p>This technique has the advantage of requiring a minimum amount of data transfer and complicated parsing of data structures. It can also be used with different media types than just HTML; for example, a JPEG image.</p>
        <section>
          <h5>Example</h5>
          <p>
            Given an HTML page at <code>https://html.example/files/video-33.html</code> that represents a <code>Video</code> object, the consumer can use an HTTP <code>HEAD</code> request to identify the URL of the ActivityPub video:
          </p>
          <pre class="HTTP">
            HEAD /files/video-33.html HTTP/1.1
            Host: html.example
          </pre>
          <p>The publisher would respond with the HTTP headers:</p>
          <pre class="HTTP">
            HTTP/1.1 200 OK
            Content-Type: text/html
            Link: &lt;https://ap.example/api/videos/video-33.jsonld&gt;; rel="alternate"; type="application/activity+json"
            Link: &lt;https://ap.example/profiles/person-7.jsonld&gt;; rel="author"; type="application/activity+json"
          </pre>
          <p>Note that there are two <code>Link</code> headers in the result; one represents the <code>Video</code>, and the other represents the <code>Person</code> that authored the video. The <code>rel</code> value of "author" distinguishes the author link.</p>
        </section>
        <section>
          <h5>Link header failure</h5>
          <p>Some non-compliant publishers will respond to a <code>HEAD</code> request with the full body of the HTML document.</p>
        </section>
      </section>
    </section>
    <section>
      <h3>Document as input</h3>
      <p>These techniques require the contents of the HTML representation. This could already be available in a browser environment, or through other means.</p>
      <p>If these techniques don't succeed, it's sometimes possible to obtain the URL of the document through the environment, for example, using the <code>document.location</code> property in a browser JavaScript program. Consumers can then try techniques that use the URL as input.</p>
      <section>
        <h4>HTML &lt;link&gt; element</h4>
        <p>The ActivityPub id of the author of an object represented by an HTML document can also be found using the HTML &lt;link&gt; element. As with the HTTP <code>Link</code> header, if a &lt;link&gt; element has a <code>rel</code> attribute equal to "author" and a <code>type</code> attribute equal to "application/activity+json", its <code>href</code> property is the URL of the ActivityPub object representing the author.</p>
        <p>This technique is useful when the HTML page is already downloaded and available, such as within a browser environment.</p>
        <section>
          <h5>Example</h5>
          <p>Given an HTML page at <code>https://html.example/files/document-40.html</code> that represents a <code>Document</code> object, the consumer can use the following HTML to identify the URL of the ActivityPub representation of the document:</p>
          <pre class="HTML">
            &lt;!doctype html&gt;
            &lt;html&gt;
              &lt;head&gt;
                &lt;title&gt;Document 33&lt;/title&gt;
                &lt;link
                  rel="author"
                  type="application/activity+json"
                  href="https://ap.example/profiles/person-7.jsonld" /&gt;
              &lt;/head&gt;
              &lt;body&gt;
                &lt;!-- rest of the page --&gt;
              &lt;/body&gt;
            &lt;/html&gt;
          </pre>
          <p>In this example, the &lt;link&gt; element with the <code>rel</code> attribute set to "author" and the <code>type</code> attribute set to "application/activity+json" is the author of the document.</p>
        </section>
        <section>
          <h5>Link element failure</h5>
          <p>Some servers may include a &lt;link&gt; element with an <code>author</code> relation and with a JSON type or JSON-LD type that does not link to an ActivityPub resource.</p>
        </section>
      </section>
      <section>
        <h4>OpenGraph Protocol</h4>
        <p>
          The <a href="https://ogp.me/">OpenGraph Protocol</a> is a set of metadata properties that can be included in an HTML document to provide information about the document. Several of these properties can be used to identify the author of the document; usually with the URL of an HTML page that has a <a href="https://ogp.me/#type_profile">profile</a> type.
        </p>
        <ul>
          <li><code>music:musician</code> identifies the creator of a song or album.</li>
          <li><code>music:creator</code> identifies the creator of a playlist or station.</li>
          <li><code>video:actor</code> identifies the actor in a video.</li>
          <li><code>video:director</code> identifies the director of a video.</li>
          <li><code>video:writer</code> identifies the writer of a video.</li>
          <li><code>article:author</code> identifies the author of an article.</li>
          <li><code>book:author</code> identifies the author of a book.</li>
        </ul>
        <p>
          For these properties, the most likely discovery path is to follow the property's URL value to an HTML page for the profile, and then use <a href="#discovery">forward discovery</a> to find the ActivityPub object corresponding to that profile.
        </p>
        <p>
          Another option is the <code>fediverse:creator</code> property, <a href="https://blog.joinmastodon.org/2024/07/highlighting-journalism-on-mastodon/">developed by Mastodon</a>. This property is used to identify the creator of a document. Its value is the Webfinger ID of the creator, prefixed with an <code>@</code> character, not an <code>acct:</code> URL. The consumer can use the Webfinger protocol to find the ActivityPub object corresponding to the creator.
        </p>
        <p>The advantage of using OGP for metadata is that it is widely implemented by Web publishing systems.</p>
        <section>
          <h5>Example</h5>
          <p>
            Given an HTML page at <code>https://html.example/files/article-40.html</code> that represents a <code>Article</code> object, the consumer can use the OpenGraph metadata to identify the URL of HTML profile for the author of the article:
          </p>
          <pre class="HTML">
            &lt;!doctype html&gt;
            &lt;html&gt;
              &lt;head&gt;
                &lt;title&gt;Article 40&lt;/title&gt;
                &lt;meta property="article:author" content="https://html.example/profiles/person-7.html" /&gt;
              &lt;/head&gt;
              &lt;body&gt;
                &lt;!-- rest of the page --&gt;
              &lt;/body&gt;
            &lt;/html&gt;
          </pre>
          <p>
            In this example, the <code>article:author</code> property is the author of the document. (Note that OpenGraph Protocol uses the non-standard <code>property</code> attribute for metadata.)
          </p>
          <p>
            Given an HTML page at <code>https://html.example/files/video-40.html</code> that represents a <code>Video</code> object, the consumer can use OpenGraph metadata to identify the Webfinger ID of the creator of the video:
          </p>
          <pre class="HTML">
            &lt;!doctype html&gt;
            &lt;html&gt;
              &lt;head&gt;
                &lt;title&gt;Video 40&lt;/title&gt;
                &lt;meta property="fediverse:creator" content="@person-22@ap.example" /&gt;
              &lt;/head&gt;
              &lt;body&gt;
                &lt;!-- rest of the page --&gt;
              &lt;/body&gt;
            &lt;/html&gt;
          </pre>
          <p>
            In this example, the <code>fediverse:creator</code> property is the Webfinger ID of the creator of the video, prefixed with the <code>@</code> symbol. The Webfinger ID can be used to find the ActivityPub object corresponding to the creator.
          </p>
        </section>
        <section>
          <h5>OpenGraph Protocol failure</h5>
          <p>
            The OpenGraph Protocol is not widely used for identifying authors of documents. When it is used, the profile page may not support discovery of an ActivityPub object.
          </p>
        </section>
      </section>
    </section>
    <section>
      <h3>Discovering author HTML</h3>
      <p>Another option for author discovery is to discover the HTML profile page for the author of the resource, and then discover the equivalent ActivityPub object for the HTML profile page.</p>
      <p>Techniques for discovering author HTML pages are out of scope for this report, but may include using HTTP <code>Link</code> headers, &lt;link&gt; elements, &lt;a&gt; elements, or Webfinger <code>link</code> objects with the <code>rel</code> property set to "author" and the <code>type</code> property set to "text/html" or omitted. Embedded JSON-LD, OpenGraph Protocol, and other mechanisms exist for discovering author profile HTML pages.</p>
      <section>
        <h4>Example</h4>
        <p>Given the contents of an HTML page at <code>https://html.example/files/video-40.html</code> that represents a <code>Video</code> object, the consumer can check the &lt;link&gt; elements in the HTML to find the URL of the author's profile page:</p>
        <pre class="HTML">
          &lt;!doctype html&gt;
          &lt;html&gt;
            &lt;head&gt;
              &lt;title&gt;Video 40&lt;/title&gt;
              &lt;link
                rel="author"
                type="text/html"
                href="https://html.example/profiles/person-22.html" /&gt;
            &lt;/head&gt;
            &lt;body&gt;
              &lt;!-- rest of the page --&gt;
            &lt;/body&gt;
          &lt;/html&gt;
        </pre>
        <p>It could then use a forward discovery technique, such as HTTP <code>Link</code> header discovery, to get the equivalent ActivityPub object.</p>
        <pre class="HTTP">
          HEAD /profiles/person-22.html HTTP/1.1
          Host: html.example
        </pre>
        <p>The response might include a <code>Link</code> header with the URL of the equivalent ActivityPub object:</p>
        <pre class="HTTP">
          HTTP/1.1 200 OK
          Content-Type: text/html
          Link: &lt;https://ap.example/profiles/person-22.jsonld&gt;; rel="alternate"; type="application/activity+json"
        </pre>
      </section>
      <section>
        <h4>Author HTML failure</h4>
        <p>Author HTML discovery can fail if the author's profile page is not available, or if the profile page does not link to an ActivityPub object.</p>
      </section>
    </section>
  </section>
  <section id="verification">
    <h2>Verification</h2>
    <p>Publishers of HTML representations and ActivityPub representations include data or metadata to help with discovery of related representations or resources. This data is a <em>claim</em> that the linked resource really has the relationship stated.</p>
    <p>Unfortunately, not all claims are true. Consumers need to verify the claims made by publishers, using the verification techniques described here. Some techniques are direct and can be used with confidence; others are heuristics that provide some level of support to the claims, but are not foolproof.</p>
    <section>
      <h3>Representation verification</h3>
      <p>Verification is a process of confirming that an HTML page and an ActivityPub object represent the same resource. This is necessary to ensure that the publisher of one representation is not falsely connecting two unrelated resources. These forms of verification are valid for both forward and reverse discovery.</p>
      <section>
        <h4>Two-way discovery</h4>
        <p>The most reliable verification method is two-way discovery. This consists of first doing discovery in one direction, and then doing discovery in the other direction with the results. For example, doing forward discovery from an HTML page to an ActivityPub object, and then doing reverse discovery from the ActivityPub object to, hopefully, the same HTML page.</p>
        <section>
          <h5>Example</h5>
          <p>Given an HTML page with the URL <code>https://html.example/downloads/image-14.html</code>, the consumer could discover the ActivityPub JSON-LD URL using the <code>Link</code> header method:</p>
          <pre class="HTTP">
            HEAD /downloads/image-14.html HTTP/1.1
            Host: html.example
          </pre>
          <p>The publisher would respond with the HTTP headers:</p>
          <pre class="HTTP">
            HTTP/1.1 200 OK
            Content-Type: text/html
            Link: &lt;https://ap.example/api/images/image-14.jsonld&gt;; rel="alternate"; type="application/activity+json"
          </pre>
          <p>Then, the consumer could fetch the ActivityPub object at <code>https://ap.example/api/images/image-14.jsonld</code> and look for the URL of the HTML page in the <code>url</code> property:</p>
          <pre class="JSON">
            {
              "@context": "https://www.w3.org/ns/activitystreams",
              "id": "https://ap.example/api/images/image-14.jsonld",
              "name": "Image 14",
              "type": "Image",
              "url": "https://html.example/downloads/image-14.html"
            }
          </pre>
          <p>By comparing the URL in the ActivityPub object with the original HTML page URL, the consumer can confirm that the two representations relate to the same resource.</p>
        </section>
        <section>
          <h5>Two-way discovery failure</h5>
          <p>Two-way discovery can fail if the discovered representation does not have a path back to the origin representation.</p>
          <p>Two-way discovery can become complicated if more than one discovered representation is found, which may connect back to more than one original representation. Verifying multiple relationships, and ignoring those that cannot be verified, complicates this process for the consumer.</p>
        </section>
      </section>
      <section>
        <h4>Same origin</h4>
        <p>
          Another mechanism for verifying a discovery process is to compare the <a href="https://url.spec.whatwg.org/#concept-url-origin">origins</a> of the original representation's URL and the discovered representation's URL. The origin is the combination of the scheme, host, and port of a URL. If the origins are the same, the two representations can be considered related.
        </p>
        <p>
          Content negotiation without a redirect will always have the same origin, as the URL of the HTML page and the URL of the JSON-LD representation are the same.
        </p>
        <section>
          <h5>Example</h5>
          <p>Given an HTML page with the URL <code>https://mixed.example/profiles/person-3</code>, the consumer could discover the ActivityPub JSON-LD URL using the &lt;link&gt; element method:</p>
          <pre class="HTML">
            &lt;link
              rel="alternate"
              type="application/activity+json"
              href="https://mixed.example/api/person/person-3" /&gt;
          </pre>
          <p>The <code>href</code> property of the &lt;link&gt; element is the URL of the ActivityPub JSON-LD representation. The origin of the URL of the HTML page is <code>https://mixed.example</code>, and the origin of the URL of the ActivityPub JSON-LD representation is <code>https://mixed.example</code>, so the origins match and the discovery is verified.</p>
        </section>
        <section>
          <h5>Same origin failure</h5>
          <p>Same origin verification assumes that a single publisher controls an entire domain. Although this is often true for machine-readable formats like JSON-LD, having multiple publishers in control of parts of a domain is more common for HTML documents. For example, documents with URLs starting with <code>https://html.example/home/user1/</code> might be created by one user, and those starting with <code>https://html.example/home/user2/</code> might be created by another. A carefully crafted &lt;link&gt; or other mechanism could be used by one user to link their HTML page to an ActivityPub object created by another.</p>
          <p>Same origin verification will give a false negative if the publisher is using different domains for HTML pages and ActivityPub JSON-LD objects. This can happen if ActivityPub features are added on to an existing published Web site, or if the publisher needs to keep the domains separate for implementation reasons. If same origin verification gives a negative result, other methods such as two-way verification should be used.</p>
        </section>
      </section>
      <section>
        <h4>Allowlist</h4>
        <p>Another means of verification, or more precisely an excuse for   skipping verification, is an allowlist. This is a list of origins or, possibly, other properties of the representation that can be used to confirm trust in the publisher and skip verification.</p>
        <p>Assuming that each origin is controlled by a single publisher, if the consumer trusts the publisher, they can skip verification of discovery when the original representation has an URL with that origin.</p>
        <section>
          <h5>Example</h5>
          <p>Given an HTML page with the URL <code>https://html.example/profiles/person-3</code>, the consumer could discover the ActivityPub JSON-LD URL using the Embedded JSON-LD method:</p>
          <pre class="HTML">
            &lt;script type="application/ld+json"&gt;
            {
              "@context": "https://www.w3.org/ns/activitystreams",
              "id": "https://ap.example/api/person/person-3",
              "type": "Person",
              "name": "Person Three"
              "url": "https://html.example/profiles/person-3"
            }
            &lt;/script&gt;
          </pre>
          <p>Given that the consumer trusts the publisher of <code>https://html.example</code>, they can skip verification of the discovery process, and accept <code>https://ap.example/api/person/person-3</code> as the ActivityPub JSON-LD representation's URL.</p>
        </section>
        <section>
          <h4>Allowlist failure</h4>
          <p>Maintaining an allowlist is time-consuming. The number of domains with Web sites is in the hundreds of millions; identifying even a tiny fraction to be trusted takes a lot of human effort.</p>
          <p>Depending on allowlists as the only means of verification severely limits the number of domains that can be interacted with.</p>
        </section>
      </section>
    </section>
    <section>
      <h3>Author verification</h3>
      <p>Verification of author discovery is necessary to ensure that attackers cannot maliciously ascribe content to an actor that did not create it.</p>
      <section>
        <h4>Outbox verification</h4>
        <p>Unfortunately, the only current way to fully verify the authorship of an HTML object is by scanning the <code>outbox</code> property of the actor object. With tens or hundreds of thousands of items in the outbox not unusual, this is a time-consuming process that is subject to possible errors.</p>
        <p><code>outbox</code> properties in ActivityPub are <code>OrderedCollection</code> objects, often with <code>OrderedCollectionPage</code> objects that represent pages of content. Scanning this collection from newest to oldest members, the consumer can look for <code>Create</code> activities with the <code>url</code> property set to the HTML representation URL being verified, or for activity objects with the <code>url</code> property set to the HTML representation being verified.</p>
        <section>
          <h5>Example</h5>
          <p>The consumer has discovered that <code>https://ap.example/user/person-6.jsonld</code> is the author of the resource represented by the HTML document at <code>https://html.example/blog/article-9.html</code>. The consumer retrieves the ActivityPub JSON-LD for the person:</p>
          <pre class="json">
          {
            "@context": "https://www.w3.org/ns/activitystreams",
            "id": "https://ap.example/user/person-6.jsonld",
            "type": "Person",
            "name": "Person Six",
            "inbox": "https://ap.example/user/person-6/inbox",
            "outbox": "https://ap.example/user/person-6/outbox",
            "following": "https://ap.example/user/person-6/following",
            "followers": "https://ap.example/user/person-6/followers",
            "liked": "https://ap.example/user/person-6/liked"
          }
          </pre>
          <p>The consumer then fetches the URL that is the value of the <code>outbox</code> property:</p>
          <pre class="JSON-LD">
          {
            "@context": "https://www.w3.org/ns/activitystreams",
            "id": "https://ap.example/user/person-6/outbox",
            "type": "OrderedCollection",
            "totalItems": 3803,
            "first": "https://ap.example/user/person-6/outbox/page/39"
          }
          </pre>
          <p>It fetches the first page of the collection:</p>
          <pre class="JSON-LD">
          {
            "@context": "https://www.w3.org/ns/activitystreams",
            "id": "https://ap.example/user/person-6/outbox/page/39",
            "type": "OrderedCollectionPage",
            "partOf": "https://ap.example/user/person-6/outbox",
            "next": "https://ap.example/user/person-6/outbox/page/40",
            "items": [
              {
                "type": "Create",
                "actor": "https://ap.example/user/person-6.jsonld",
                "object": {
                  "id": "https://ap.example/object/article-10.jsonld",
                  "type": "Article",
                  "name": "Article Ten",
                  "url": "https://html.example/blog/article-10.html"
                }
              },
              {
                "type": "Create",
                "actor": "https://ap.example/user/person-6.jsonld",
                "object": {
                  "id": "https://ap.example/object/article-9.jsonld",
                  "type": "Article",
                  "name": "Article Nine",
                  "url": "https://html.example/blog/article-9.html"
                }
              },
              {
                "type": "Create",
                "actor": "https://ap.example/user/person-6.jsonld",
                "object": {
                  "id": "https://ap.example/object/article-8.jsonld",
                  "type": "Article",
                  "name": "Article Eight",
                  "url": "https://html.example/blog/article-8.html"
                }
              }
            ]
          }
          </pre>
          <p>The second object in the <code>items</code> array is a <code>Create</code> activity with an <code>object</code> property with a value that includes an <code>url</code> property with same value as the HTML URL we are trying to verify. Since the claim from the HTML and the ActivityPub JSON-LD support each other, the relation is verified.</p>
        </section>
        <section>
          <h5>Outbox author verification failure</h5>
          <p>This method is error-prone and depends on fetching hundreds or thousands of pages. Not all pages will include the full representation of the objects in their <code>items</code> array; some may just include the <code>id</code> property of each, requiring even more requests.</p>
          <p>Some authors may not include a <code>Create</code> activity for every object they create on the Web. They may use an author discovery process to identify the author, but not include the object in the ActivityPub representation.</p>
        </section>
      </section>
      <section>
        <h4>Same origin verification</h4>
        <p>This method, or heuristic, assumes that the claims of authorship are mutually supporting if the URLs of the representations have the same origin. The origin of an URL includes its protocol, domain name, and port number. The method assumes that the same entity (like a person or organization) controls all URLs published with this origin, and therefore would not make claims to contradict itself.</p>
        <section>
          <h5>Example</h5>
          <p>Given an HTML page with the URL <code>https://mixed.example/profiles/person-3.html</code>, the consumer could discover the ActivityPub JSON-LD URL using the Embedded JSON-LD method with the value <code>https://mixed.example/api/person-3.jsonld</code>. Because both URLs have the origin <code>https://mixed.example</code>, the consumer will assume that the discovery is verified.</p>
        </section>
        <section>
          <h5>Same origin verification failure</h5>
          <p>Assuming that the same entity controls creation of all URLs on a server is somewhat risky. For HTML creation, especially, some servers divide up into per-user paths. Other servers allow user-uploaded data, including JSON and HTML.</p>
        </section>
      </section>
      <section>
        <h4>Allowlist verification</h3>
        <p>Consumers may include a list of origins or other properties of the representation that don't require verification. This assumes an externally-established trust relationship.</p>
        <section>
          <h5>Example</h5>
          <p>Given an HTML page with the URL <code>https://html.example/profiles/person-3.html</code>, the consumer could discover the ActivityPub JSON-LD URL using the Embedded JSON-LD method with the value <code>https://ap.example/api/person-3.jsonld</code>. If the consumer trusts the publisher of <code>https://html.example</code>, they can skip verification of the discovery process, and accept <code>https://ap.example/api/person-3.jsonld</code> as the ActivityPub JSON-LD representation's URL.</p>
        </section>
        <section>
          <h5>Allowlist verification failure</h5>
          <p>Establishing trust relationships out-of-band is labor intensive, and most consumers will only have a small number of trusted domains or other entities.</p>
        </section>
      </section>
    </section>
  </section>
  <section id="consumers">
    <h2>Best practices for consumers</h2>
    <p>This section describes some best practices for consumers of HTML and ActivityPub.</p>
    <section>
      <h3>Discovery techniques</h3>
      <section>
        <h4>Forward discovery</h4>
        <p>Consumers with the HTML representation's URL as input should try these techniques:</p>
        <ul>
          <li><strong>HTTP Link</strong>.This is a fast and easy way to identify a related ActivityPub object. It will work whether the HTML and ActivityPub representations are on the same server or on different ones. It will also work with binary media types like images or videos.</li>
          <li><strong>Content negotiation</strong>. For servers that implement both the HTML and ActivityPub representations of the object, this is an easy method to use. It will not work for ActivityPub implementations assembled from different server systems, so it should be supplemented with other methods.</li>
        </ul>
        <p>Consumers with the HTML document as input should try these techniques:</p>
        <ul>
          <li><strong>HTML &lt;link&gt;</strong>. This element is somewhat easier to fetch than others, and requires no further parsing. It's always at the second level of element hierarchy, below the <code>head</code> element.</li>
          <li><strong>HTML &lt;a&gt;</strong>. This element is often buried deep in the element tree of an HTML document.</li>
        </ul>
        <p>If discovery with the URL as input is unsuccessful, fetching the document may provided better information. Similarly, if discovery with the document does not succeed, obtaining the URL and using it for discovery may work.</p>
        <p>Consumers with rigourous discovery requirements, like indexers or search engines, can try additional discovery methods like Webfinger or embedded JSON-LD. However, it's unlikely that publishers that haven't implemented one of the above methods would implement more obscure methods.</p>
      </section>
      <section>
        <h4>Reverse discovery</h4>
        <p>Consumers with the ActivityPub representation's URL as input should try the following techniques:</p>
        <ul>
          <li><strong>HTTP Link</strong> This method is simple, doesn't require a lot of parsing, and works even if the HTML and ActivityPub representations are on different servers.</li>
          <li><strong>Content negotiation</strong>. Many servers that implement both the ActivityPub and HTML representations of a resource use this method.</li>
        </ul>
        <p>Consumers with the ActivityPub document as input should try these techniques:</p>
        <ul>
          <li><strong>url property</strong>. This is a straightforward method that is defined within the Activity Streams 2.0 specification.</li>
        </ul>
        <p>If these techniques are exhausted, using the other type of input is a good next step.</p>
        <p>Other discovery techniques are unlikely to be used by publishers if these are not.</p>
      </section>
      <section>
        <h4>Author discovery</h4>
        <p>Consumers with the URL of HTML representation as input should start with these techniques:</p>
        <ul>
          <li><strong>HTTP Link</strong>. Similar to forward discovery, this is a very quick and easy way to identify a related author when an URL is available. It will work for several types of server configuration.</li>
        </ul>
        <p>Consumers with the HTML document as input should try these techniques:</p>
        <ul>
          <li><strong>HTML &lt;link&gt;</strong>. There's a long tradition of using this property for linking metadata. It's also easy to parse with the DOM, and doesn't require using queries or deep traversal of element trees.</li>
          <li><strong>HTML &lt;a&gt;</strong>. This is an easy property for content management system users to set directly.</li>
          <li><strong>OpenGraph</strong>. The OpenGraph properties, like <code>fediverse:creator</code>, are easy to fetch from the <code>head</code> of the document.</li>
        </ul>
        <p>Failing these, the next best option is to discover the ActivityPub representation of the resource, and then use ActivityPub properties to discover the author.</p>
        <p>More demanding consumers may want to continue with discovery of the author's HTML profile page, but this can be a complicated process.</p>
      </section>
    </section>
    <section>
      <h3>Verification</h3>
      <p>Consumers should do verification of forward and reverse discovery.</p>
      <p>Two-way discovery is the most reliable way to verify results, but can have resource overhead.</p>
      <p>Same-origin verification can be reliable for reverse discovery, but is less so for forward discovery or author discovery.</p>
      <p>Allowlists are a last resort when other verification is not possible.</p>
    </section>
  </section>
  <section id="publishers">
    <h2>Best practices for publishers</h2>
    <p>Publishers that want consumers to be able to discover ActivityPub object and their authors should consider these methods.</p>
    <section>
      <h3>Publishing ActivityPub JSON-LD</h3>
      <p>When publishing an Activity Streams 2.0 JSON-LD object for ActivityPub, publishers should consider these best practices.</p>
      <ul>
        <li><strong>url property</strong>. Include either a <code>Link</code> object on its own, or an array of <code>Link</code> objects, as the <code>url</code> property of the ActivityPub object. Explicitly include the <code>type</code> property with a value of "text/html". Avoid using multiple links with the same <code>type</code>, since there's no easy way to distinguish which one is the correct HTML representation of the resource.</li>
        <li><strong>attributedTo property</strong>. For every ActivityPub object, include the <code>attributedTo</code> property, either as an URL or as a JSON object with at least the <code>id</code> property. For activities, include <code>actor</code>, and for public keys, include <code>owner</code>.</li>
        <li><strong>HTTP Link</strong>. Publishers that have control over HTTP headers for their ActivityPub objects should include the HTTP <code>Link</code> headers with both "alternate" and "author" relations.Add the "application/activity+json" media type explicitly. Don't include multiple links with the same relation and media type, since there's not a clear algorithm for choosing between them.</li>
        <li><strong>Content negotiation</strong>. Publishers that implement both HTML and ActivityPub representations on the same server should strongly consider giving them the same route and using content negotiation to navigate between them. Many ActivityPub implementations follow this pattern, and older or simpler consumers will assume that content negotiation is supported.</li>
        <li><strong>Webfinger</strong>. Implement Webfinger discovery at least for <code>https:</code> URLs for actor objects, and preferably for all objects. Include members of the <code>links</code> array both for the "alternate" relation, with media type "text/html", and the "author" relation, with media type "application/activity+json" and if possible with media type "text/html" as well. For actors, implement the <code>http://webfinger.net/rel/profile-page</code> relation.</li>
      </ul>
    </section>
    <section>
      <h3>Publishing HTML</h3>
      <p>When publishing HTML representations of an ActivityPub resource, include these discovery options:</p>
      <ul>
        <li><strong>HTML &lt;link&gt;</strong>. Add one <code>link</code> element with <code>rel</code> set to "alternate" and <code>type</code> set to "application/activity+json". If possible, also include a link element for the author resource, with <code>rel</code> set to "author" and <code>type</code> set to "application/activity+json". Avoid having multiple links with the same relation and media type, since there's not an easy way to determine which one is the best.</li>
        <li><strong>HTML &lt;a&gt;</strong>. If possible, add visible <code>a</code> elements to the page layout that users can find and click. Add at least one <code>a</code> element with <code>rel</code> equal to "alternate" and <code>type</code> equal to "application/activity+json". Add at least one <code>a</code> element with <code>rel</code> equal to "author" and <code>type</code> equal to "application/activity+json".Don't include multiple <code>a</code> links with the same <code>rel</code> and <code>type</code> values unless they also have the same <code>href</code> value.</li>
        <li><strong>HTTP Link</strong>. Publishers with control over HTTP headers should include <code>Link</code> headers for discovery. Add one <code>Link</code> header with <code>rel</code> set to "alternate" and <code>type</code> set to "application/activity+json". Add another with <code>rel</code> set to "author" and <code>type</code> set to "application/activity+json". Avoid having duplicate <code>Link</code> headers with the same <code>rel</code> and <code>type</code> values.</li>
        <li><strong>Content negotiation</strong>. Publishers that implement both HTML and ActivityPub representations on the same domain should strongly consider using the same route for both representations, and using content negotiation to differentiate between the two. Many ActivityPub single-server applications use this method, and some consumers will assume that it is used.</li>
        <li><strong>OpenGraph</strong>. For identifying authors, use the <code>meta</code> element with name <code>fediverse:creator</code> and value set to the Webfinger of the author preceded by an "@" symbol.</li>
        <li><strong>Embedded JSON-LD</strong>. This technique allows consumers not only to identify the URL for the ActivityPub representation, but to retrieve the full body of the object, without an additional HTTP request.</li>
      </ul>
    </section>
  </section>
</body>

</html>