Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update headless browser and planet_n_spider v0.0.3 #123

Merged
merged 4 commits into from
Dec 11, 2024

Conversation

Criamos
Copy link
Contributor

@Criamos Criamos commented Dec 11, 2024

This PR includes the following changes:

  • CI/CD and dependency updates:
    • headless browser:
      • browserless/chromium to v2.24 (from 2.14)
    • dependencies:
      • playwright to v1.49 (from v1.48)
      • trafilatura to v2.0.0 (from v1.12)
  • planet_n_spider v0.0.3:
    • implemented feedback from Jan (and the editorial staff) w.r.t. thumbnail extraction, metadata mappings (new_lrt / discipline) and license information

- change: ignore "og:image"-thumbnails and take website-screenshots for each item instead (as discussed with Jan on 2024-12-11)
- feat: mapping from Planet-N's "class_list" WP-JSON property to our "new_lrt"- and "discipline"-Vocabs
- change: all items are considered teaching modules ("Unterrichtsbaustein") by default
- feat: set default license to CUSTOM with Planet-N's description text
  - the license description can be found at https://www.planet-n.de/info/
- LomBase used a default "root"-logger, which made it quite hard to understand where logging messages came from
- by using loguru, individual logging messages are much easier to traceback to individual lines of code
@Criamos Criamos changed the title Update headless Browser and planet_n_spider v0.0.3 Update headless browser and planet_n_spider v0.0.3 Dec 11, 2024
@Criamos Criamos self-assigned this Dec 11, 2024
@Criamos Criamos added enhancement New feature or request dependencies Pull requests that update a dependency file labels Dec 11, 2024
@Criamos
Copy link
Contributor Author

Criamos commented Dec 11, 2024

After confirming these changes with several test-crawls of planet_n_spider and rpi_virtuell_spider (against Staging), I'm merging this PR to develop.

@Criamos Criamos marked this pull request as ready for review December 11, 2024 15:52
@Criamos Criamos merged commit 8c1965b into develop Dec 11, 2024
3 checks passed
@Criamos Criamos deleted the headless_browser_update branch December 11, 2024 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant