FSharp.HTML

A parse for HTML5 based on the official W3C specification.

Usage

the html source text is:

<!DOCTYPE html>
<html>
  <head>
    <meta charset="utf-8">
    <title>My test page</title>
  </head>
  <body>
    <img src="images/firefox-icon.png" alt="My test image">
  </body>
</html>

we can use this code to parse html source to HtmlNode list:

let sourceText = ...
let doctype,nodes = HtmlUtils.parseDoc sourceText

doctype is a string that is extracted from doctype tag. and nodes is a HtmlNode list.

All parsing processes in a package are public, and you are free to compose them to implement your functional requirements. Parser is highly configurable, see source code HtmlUtils

Parse only html structures without changing the content. Please use HtmldocCompiler.compile. In fact, the HtmlUtils.parseDoc is defined as follows:

let parseDoc (txt:string) = 
    let doctype,nodes =
        txt
        |> HtmldocCompiler.compile
    let nodes =
        nodes
        |> List.map Whitespace.removeWS
        |> Whitespace.trimWhitespace
        |> List.map HtmlCharRefs.unescapseNode
    doctype,nodes

Knowing the above code, you can determine the parsing result as your needs.

generate html source text:

Render.stringifyNode
Render.stringifyDoc

HtmlUtils.stringifyNode
HtmlUtils.stringifyDoc

some transform:

BrRemover.splitByBr
HrRemover.splitByHr

API

The user can parse the string through the functions in the HtmlUtils module.

HtmlUtils

You can also use a tokenizer to get a token sequence.

let tokens = HtmlTokenizer.tokenize txt

The main structure types are defined as follows:

The type HtmlNode see to HtmlNode.
The type HtmlToken see to HtmlToken.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
FSharp.HTML.Generators		FSharp.HTML.Generators
FSharp.HTML.Test		FSharp.HTML.Test
FSharp.HTML		FSharp.HTML
.gitattributes		.gitattributes
.gitignore		.gitignore
FSharp.HTML.sln		FSharp.HTML.sln
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FSharp.HTML

Usage

API

About

Uh oh!

Releases

Packages

Uh oh!

Languages

xp44mm/FSharp.HTML

Folders and files

Latest commit

History

Repository files navigation

FSharp.HTML

Usage

API

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages