diff --git a/html/doc.go b/html/doc.go index 5ff8480cf..2466ae3d9 100644 --- a/html/doc.go +++ b/html/doc.go @@ -99,14 +99,20 @@ Care should be taken when parsing and interpreting HTML, whether full documents or fragments, within the framework of the HTML specification, especially with regard to untrusted inputs. -This package provides both a tokenizer and a parser. Only the parser constructs -a DOM according to the HTML specification, resolving malformed and misplaced -tags where appropriate. The tokenizer simply tokenizes the HTML presented to it, -and as such does not resolve issues that may exist in the processed HTML, -producing a literal interpretation of the input. - -If your use case requires semantically well-formed HTML, as defined by the -WHATWG specification, the parser should be used rather than the tokenizer. +This package provides both a tokenizer and a parser, which implement the +tokenization, and tokenization and tree construction stages of the WHATWG HTML +parsing specification respectively. While the tokenizer parses and normalizes +individual HTML tokens, only the parser constructs the DOM tree from the +tokenized HTML, as described in the tree construction stage of the +specification, dynamically modifying or extending the docuemnt's DOM tree. + +If your use case requires semantically well-formed HTML documents, as defined by +the WHATWG specification, the parser should be used rather than the tokenizer. + +In security contexts, if trust decisions are being made using the tokenized or +parsed content, the input must be re-serialized (for instance by using Render or +Token.String) in order for those trust decisions to hold, as the process of +tokenization or parsing may alter the content. */ package html // import "golang.org/x/net/html"