From eb1572ce7f7a6e97ec44c27568286345c2a7748e Mon Sep 17 00:00:00 2001
From: Roland Shoemaker <roland@golang.org>
Date: Fri, 14 Apr 2023 12:13:47 -0700
Subject: [PATCH] html: another shot at security doc

Be clearer about the operation of the tokenizer and the parser (and
their differences), and be explicit about the need for re-serialization
when they are being used in security contexts.

Change-Id: Ieb8f2a9d4806fb7a8849a15671667396e81c53b9
Reviewed-on: https://go-review.googlesource.com/c/net/+/484795
Auto-Submit: Roland Shoemaker <roland@golang.org>
Reviewed-by: Damien Neil <dneil@google.com>
Run-TryBot: Roland Shoemaker <roland@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
---
 html/doc.go | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/html/doc.go b/html/doc.go
index 5ff8480cf..2466ae3d9 100644
--- a/html/doc.go
+++ b/html/doc.go
@@ -99,14 +99,20 @@ Care should be taken when parsing and interpreting HTML, whether full documents
 or fragments, within the framework of the HTML specification, especially with
 regard to untrusted inputs.
 
-This package provides both a tokenizer and a parser. Only the parser constructs
-a DOM according to the HTML specification, resolving malformed and misplaced
-tags where appropriate. The tokenizer simply tokenizes the HTML presented to it,
-and as such does not resolve issues that may exist in the processed HTML,
-producing a literal interpretation of the input.
-
-If your use case requires semantically well-formed HTML, as defined by the
-WHATWG specification, the parser should be used rather than the tokenizer.
+This package provides both a tokenizer and a parser, which implement the
+tokenization, and tokenization and tree construction stages of the WHATWG HTML
+parsing specification respectively. While the tokenizer parses and normalizes
+individual HTML tokens, only the parser constructs the DOM tree from the
+tokenized HTML, as described in the tree construction stage of the
+specification, dynamically modifying or extending the docuemnt's DOM tree.
+
+If your use case requires semantically well-formed HTML documents, as defined by
+the WHATWG specification, the parser should be used rather than the tokenizer.
+
+In security contexts, if trust decisions are being made using the tokenized or
+parsed content, the input must be re-serialized (for instance by using Render or
+Token.String) in order for those trust decisions to hold, as the process of
+tokenization or parsing may alter the content.
 */
 package html // import "golang.org/x/net/html"