Skip to content

Releases: spider-rs/spider

v2.5.3

14 Sep 17:08
Compare
Choose a tag to compare

Whats Changed

  1. Add visited links string interning performance increase on look up and memory reduction on store. #204

Full Changelog: v2.4.1...v2.5.3

v2.4.1

09 Sep 16:39
Compare
Choose a tag to compare

Whats Changed

The screenshot performance has drastically increased by taking advantage of chromes params to handle full_screen without re-adjusting the layout and the optimize_for_speed param. This works well the concurrent interception handling to avoid stalling on re-layout. If you use the crawler to take screenshots it is recommended to upgrade.

  • perf(chrome): add major screenshot performance custom command
  • chore(utils): add trie match all base path
  • chore(examples): add css scraping example

Full Changelog: v2.3.5...v2.4.1

v2.3.5

08 Sep 07:20
Compare
Choose a tag to compare

Whats Changed

Major performance improvement on chrome enabling concurrent request interception for resource heavy pages.

  • add response headers when chrome is used
  • add hybrid cache response and headers chrome
  • fix chrome sub pages setup
  • perf(chrome): add concurrent request interception

Full Changelog: v2.2.18...v2.3.5

v2.2.18

29 Aug 02:03
Compare
Choose a tag to compare

Whats Changed

We can now auto detect locales without losing out on performance. We default enabled the encoding flag for this change!

  • get_html now properly encodes the HTML instead of UTF8 default encoding
  • bump [email protected]
  • fix chrome hang on ws connections handler
  • fix fetch stream infinite loop on error
  • fix chrome frame setting url ( this temp prevents hybrid caching from having the req/res for the page )
let mut website: Website = Website::new("https://tenki.jp");
// all of the content output has the proper encoding automatically

Full Changelog: v2.1.9...v2.2.18

v2.1.9

26 Aug 17:07
Compare
Choose a tag to compare

Whats New

This release brings bug fixes with chrome opening pages causing hangs. The builder method website.with_return_page_links can be used to attach the links found on the web page to the page object.

  • chore(chrome): fix instances being left open from ignorable handler errors
  • chore(scrape): add sitemap and smart [#206]
  • feat(page): add return page links configuration
  • chore(config): fix budget reset on crawl end

Thanks @DimitriTimoz

Full Changelog: v2.0.6...v2.1.9

v2.0.6

20 Aug 20:51
Compare
Choose a tag to compare

What's Changed

  • add http response cookies map
  • fix chrome fs feature flag build
  • Update README.md by @James4Ever0 in #203

New Contributors

Full Changelog: v2.0.3...v2.0.6

v2.0.3

14 Aug 11:49
Compare
Choose a tag to compare

Whats Changed

  1. Scrape and Crawl now perform functionality identically as scrape re-uses crawl underneath.
  2. Scrape API cleanup
  3. Add get_chrome_page chrome page ref

Full Changelog: v1.99.30...v2.0.3

v1.99.30

07 Aug 20:33
Compare
Choose a tag to compare

Whats Changed

  • feat Web automation steps by target url or path.
  • add internal ViewPort for chrome handling.
  • add partial eq configuration
    let mut automation_scripts = HashMap::new();

    automation_scripts.insert(
        "/en/blog".into(),
        Vec::from([
            WebAutomation::Evaluate(r#"document.body.style.background = "blue";"#.into()),
            WebAutomation::ScrollY(2000),
            WebAutomation::Click("article a".into()),
            WebAutomation::Wait(5000),
            WebAutomation::Screenshot {
                output: "example.png".into(),
                full_page: true,
                omit_background: true,
            },
        ]),
    );

    let mut website: Website = Website::new("https://rsseau.fr/en/blog")
        .with_chrome_intercept(true, true)
        .with_wait_for_idle_network(Some(WaitForIdleNetwork::new(Some(Duration::from_secs(30)))))
        .with_caching(cfg!(feature = "cache"))
        .with_limit(1)
        .with_automation_scripts(Some(automation_scripts))
        .build()
        .unwrap();
web-automation-chrome.mov

Full Changelog: v1.99.21...v1.99.30

v1.99.21

07 Aug 15:40
Compare
Choose a tag to compare

Whats Changed

You can now block ads over the network when using chrome and chrome_intercept using the adblock feature flag.

Full Changelog: v1.99.18...v1.99.21

v1.99.18

05 Aug 19:19
Compare
Choose a tag to compare

Whats Changed

  1. chore(fs,chrome): fix chrome fs storing [#198]

Thanks for the help @haijd

Full Changelog: v1.99.16...v1.99.18