Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⬆️ 🛠️(deps): update dependency crawl4ai to v0.3.72 #701

Merged
merged 1 commit into from
Oct 27, 2024

Conversation

renovate[bot]
Copy link
Contributor

@renovate renovate bot commented Oct 14, 2024

This PR contains the following updates:

Package Change Age Adoption Passing Confidence
crawl4ai 0.3.5 -> 0.3.72 age adoption passing confidence

Release Notes

unclecode/crawl4ai (crawl4ai)

v0.3.6

1. Improved Crawling Control
  • New Hook: Added before_retrieve_html hook in AsyncPlaywrightCrawlerStrategy.
  • Delayed HTML Retrieval: Introduced delay_before_return_html parameter to allow waiting before retrieving HTML content.
    • Useful for pages with delayed content loading.
  • Flexible Timeout: smart_wait function now uses page_timeout (default 60 seconds) instead of a fixed 30-second timeout.
    • Provides better handling for slow-loading pages.
  • How to use: Set page_timeout=your_desired_timeout (in milliseconds) when calling crawler.arun().
2. Browser Type Selection
  • Added support for different browser types (Chromium, Firefox, WebKit).
  • Users can now specify the browser type when initializing AsyncWebCrawler.
  • How to use: Set browser_type="firefox" or browser_type="webkit" when initializing AsyncWebCrawler.
3. Screenshot Capture
  • Added ability to capture screenshots during crawling.
  • Useful for debugging and content verification.
  • How to use: Set screenshot=True when calling crawler.arun().
4. Enhanced LLM Extraction Strategy
  • Added support for multiple LLM providers (OpenAI, Hugging Face, Ollama).
  • Custom Arguments: Added support for passing extra arguments to LLM providers via extra_args parameter.
  • Custom Headers: Users can now pass custom headers to the extraction strategy.
  • How to use: Specify the desired provider and custom arguments when using LLMExtractionStrategy.
5. iframe Content Extraction
  • New feature to process and extract content from iframes.
  • How to use: Set process_iframes=True in the crawl method.
6. Delayed Content Retrieval
  • Introduced get_delayed_content method in AsyncCrawlResponse.
  • Allows retrieval of content after a specified delay, useful for dynamically loaded content.
  • How to use: Access result.get_delayed_content(delay_in_seconds) after crawling.

Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR is behind base branch, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.

@renovate renovate bot added the dependencies Pull requests that update a dependency file label Oct 14, 2024
Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have skipped reviewing this pull request. Here's why:

  • It seems to have been created by a bot (hey, renovate[bot]!). We assume it knows what it's doing!
  • We don't review packaging changes - Let us know if you'd like us to change this.

@renovate renovate bot force-pushed the renovate/crawl4ai-0.x branch 2 times, most recently from 81a058e to 4cea47a Compare October 18, 2024 08:05
@renovate renovate bot changed the title ⬆️ 🛠️(deps): update dependency crawl4ai to v0.3.6 ⬆️ 🛠️(deps): update dependency crawl4ai to v0.3.71 Oct 18, 2024
@renovate renovate bot force-pushed the renovate/crawl4ai-0.x branch 4 times, most recently from 7e0d646 to 186a6a1 Compare October 24, 2024 13:52
@renovate renovate bot changed the title ⬆️ 🛠️(deps): update dependency crawl4ai to v0.3.71 ⬆️ 🛠️(deps): update dependency crawl4ai to v0.3.72 Oct 24, 2024
Copy link
Contributor

sweep-ai bot commented Oct 27, 2024

Hey @renovate[bot], here is an example of how you can ask me to improve this pull request:

@Sweep Add unit tests for the new `before_retrieve_html` hook in `AsyncPlaywrightCrawlerStrategy` to verify it gets called at the correct time and with the expected parameters.

📖 For more information on how to use Sweep, please read our documentation.

Copy link

codecov bot commented Oct 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.33%. Comparing base (681f036) to head (3bbf634).
Report is 2 commits behind head on dev.

Additional details and impacted files
@@           Coverage Diff           @@
##              dev     #701   +/-   ##
=======================================
  Coverage   81.33%   81.33%           
=======================================
  Files           7        7           
  Lines         209      209           
=======================================
  Hits          170      170           
  Misses         39       39           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@github-actions github-actions bot merged commit 389ef5e into dev Oct 27, 2024
7 checks passed
@github-actions github-actions bot deleted the renovate/crawl4ai-0.x branch October 27, 2024 13:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants