- feat(transform): HTML transformation crate with spider_transformations
- feat(css_scraping): CSS scraping with the spider_utils
- chore(chrome): stabilize concurrent screenshot handling
- feat(whitelist): whitelist routes to only crawl.
- feat(openai): use OpenAI to dynamically drive the browser.
- feat(chrome): add chrome_headless_new flag
- chore(chrome): add wait_for events
- feat(smart): add smart mode feature flag (HTTP until JS Rendering is needed per page)
- feat(cron): add cron feature flag [#153]
- feat(sync): subscribe to page updates to perform async handling of data
- feat(js): add init of script parsing
- feat(worker): add tls support
- chore(request): add custom domain redirect policy
- chore(glob): fix glob crawl establish
- chore(crawl): fix crawl asset detection and trailing start
- feat(fs): add temp storage resource handling (#112)
- feat(url-glob): URL globbing (#113) thanks to @roniemartinez)
- chore(request): fix resource success handling
- feat(proxies): add proxy support
- feat(decentralization): add workload split
- perf(crawl): add join handle task management
- perf(links): add fast pre serialized url anchor link extracting and reduced memory usage
- perf(links): fix case sensitivity handling
- perf(crawl): reduce memory usage on link gathering
- chore(crawl): remove
Website.reset
method and improve crawl handling resource usage (reset
not needed now ) - chore(crawl): add heap usage of links visited
- perf(crawl): massive scans capability to utilize more cpu
- feat(timeout): add optional
configuration.request_timeout
duration - build(tokio): remove unused
net
feature - chore(docs): add missing scrape section
- perf(req): enable brotli
- chore(tls): add ALPN tls defaults
- chore(statics): add initial static media ignore
- chore(robots): add shared client handling across parsers
- feat(crawl): add subdomain and tld crawling
- perf(links): filter dup links after async batch
- chore(delay): fix crawl delay thread groups
- perf(page): slim channel page sending required props
- feat(regex): add optional regex black listing
- chore(bin): fix bin executable #17
- feat(cli): add cli separation binary #17
- feat(robots): add robots crawl delay respect and ua assign #24
- feat(async): add async page body gathering
- perf(latency): add connection re-use across request #25