Skip to content

t-webber/html-parser

Repository files navigation

Html Parser

C parser

github license coverage rust-edition

This is a rust library that parses html source files and allows you to search and filter in this Html with a specific set of rules.

Do not use this parser to check the syntax of your HTML code. Many HTML files are parsed without any errors by this parser, as the sole objective is to get a parsed version. Only breaking syntax errors raises errors.

Obviously, all valid HTML files work fine.

Standard

This is a simple lightweight html parser, that converts an html file (in the str format) to a tree representing the html tags and text.

Getting started

You can install it with

cargo add html_parser

then us it like this:

use html_parser::prelude::*;

let html: &str = r#"
<!DOCTYPE html>
<html lang="en">
    <head>
        <title>Html sample</title>
    </head>
    <body>
        <p>This is an html sample.</p>
    </body>
</html>
"#;

// Parse your html
let tree: Html = parse_html(html).expect("Invalid HTML");

// Now you can use it!
assert!(format!("{tree}") == html);

Find & filter

You can also use the find and filter methods to manage this html. To do this, you need to create your filtering options with the Filter type.

Filter

use html_parser::prelude::*;

let html: &str = r##"
  <section>
    <h1>Welcome to My Random Page</h1>
    <nav>
      <ul>
        <li><a href="/home">Home</a></li>
        <li><a href="/about">About</a></li>
        <li><a href="/services">Services</a></li>
        <li><a href="/contact">Contact</a></li>
      </ul>
    </nav>
  </section>
"##;

// Create your filter
let filter = Filter::new().tag_name("li");

// Parse your html
let filtered_tree: Html = parse_html(html).expect("Invalid HTML").filter(&filter);

// Check the result: filtered_tree contains the 4 lis from the above html string
if let Html::Vec(links) = filtered_tree {
    assert!(links.len() == 4)
} else {
    unreachable!()
}

Find

The finder returns the first element that respects the filter:

use html_parser::prelude::*;

let html: &str = r##"
  <section>
    <h1>Welcome to My Random Page</h1>
    <nav>
      <ul>
        <li><a href="/home">Home</a></li>
        <li><a href="/about">About</a></li>
        <li><a href="/services">Services</a></li>
        <li><a href="/contact">Contact</a></li>
      </ul>
    </nav>
  </section>
"##;

// Create your filter
let filter = Filter::new().tag_name("a");

// Parse your html
let link: Html = parse_html(html).expect("Invalid HTML").find(&filter);

// Check the result: link contains `<a href="/home">Home</a>`
if let Html::Tag { tag, child, .. } = link {
    if let Html::Text(text) = *child {
        assert!(tag.as_name() == "a" && text == "Home");
    } else {
        unreachable!()
    }
} else {
    unreachable!()
}

About

A rust crate to parse and filter some HTML

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Languages