Skip to content

Releases: openeduhub/oeh-search-etl

v2024.12.18

18 Dec 08:39
7318bd9
Compare
Choose a tag to compare

What's Changed

  • Fix "httpx"-related ReadErrors in es_connector by @Criamos in #113
  • Merge recent HTTPX-related fixes into master by @Criamos in #114
  • Improved Exception Handling during website-screenshot fallback and several fixes for pydantic ValidationErrors by @Criamos in #115
  • Merge fixes from PR 115 into master by @Criamos in #116
  • Feat: Planet-N crawler // update GitHub workflows by @Criamos in #117
  • feat: (optional) OER-Filter Pipeline by @Criamos in #118
  • Portal Globales Lernen & updated DocStrings by @Criamos in #119
  • feat: parse robots.txt for AI usage indicators ("ccm:ai_allow_usage") by @Criamos in #120
  • Upgrade to Python 3.13 and Scrapy v2.12 / feat: robots.txt parsing for "ccm:ai_allow_usage" by @Criamos in #121
  • Merge develop into master by @Criamos in #122
  • Update headless browser and planet_n_spider v0.0.3 by @Criamos in #123
  • Merge PR 123 from develop into master by @Criamos in #124

Full Changelog: v2024.09.04...v2024.12.18

v2024.09.04

04 Sep 10:22
a8013ff
Compare
Choose a tag to compare

What's Changed

  • Python 3.12 Migration and Dependency Updates by @Criamos in #104
  • Python 3.12 Migration / Dependency Upgrades / OERSI: BIRD-related metadata enrichment (iMoox / vhb) by @Criamos in #105
  • Crawler for BNE-Portal.de (+ more flexible playwright controls for cookies / ad-blocking) by @Criamos in #106
  • Merge oersi_spider v0.2.7 and bne_portal_spider v0.0.3 into develop by @Criamos in #107
  • feat: support edu-sharing v9.x API (+ dependency updates) by @Criamos in #109
  • merge develop into master (2024-09-04) by @Criamos in #111

Full Changelog: v2024.04.11...v2024.09.04

v2024.04.11

11 Apr 09:43
79fae39
Compare
Choose a tag to compare

This release reflects the state of oeh-search-etl as of 2024-04-11, which includes several crawler updates.
Highlight: A completely rewritten bpb_spider for bpb.de.


What's Changed

  • ITSJOINTLY-1323 - add new channels and support "YouTube Handle" URLs by @Criamos in #101
  • Crawler Updates (Q1 2024) - KMap, DiLerTube, BpB, Tutory, YouTube by @Criamos in #102
  • Merge changes between 2024-01 and 2024-04-10 into master by @Criamos in #103

Full Changelog: v2024.02.06...v2024.04.11

v2024.02.06

06 Feb 12:31
e9aa79a
Compare
Choose a tag to compare

This release reflects the state of oeh-search-etl as of 2024-02-06, which contains the most recent changes from develop made since 2023-08-15. For details, please check the individual PullRequests as listed in the auto-generated summary below. (Highlights: PR #95 and #99)


What's Changed

  • 2023-08-15 feature requests and OER Sommercamp 2023 updates by @Criamos in #92
  • optimize SODIX program flow, increase crawler performance and reliability by @Criamos in #93
  • feat: language string normalization pipeline (ISO 639-1) and serlo_spider v0.3.2 by @Criamos in #94
  • oersi_spider v0.1.8 (feat: "offline"-import) and improved thumbnail handling by @Criamos in #96
  • First batch of "WLO-BIRD-Connector v2"-related changes by @Criamos in #97
  • Improve error-handling for broken image files // rpi_virtuell_spider v0.0.9 by @Criamos in #98
  • feat: attach whitelisted edu-sharing "source template" metadata properties to scraped items ("Quellen-Datensatz"-Template) by @Criamos in #99
  • sync to async performance improvements, Python 3.11.6, Scrapy 2.11, browserless v2, "Quellen-Datensatz"-templates and crawler updates by @Criamos in #95
  • merge changes between 2023-08-15 and 2024-02-06 into master by @Criamos in #100

Full Changelog: v2023.08.15...v2024.02.06

v2023.08.15

06 Feb 12:03
94c6f4d
Compare
Choose a tag to compare

This release reflects the state of oeh-search-etl as of 2023-08-15, which is mainly intended for version-pinning purposes.

Changes made after this release will contain substantial changes to our thumbnail generation pipeline and async performance optimizations (see: async-related PullRequest #95), which might contain breaking changes.


What's Changed

  • Introduce "poetry" for dependency management; Updates for Serlo & Science in School crawlers by @Criamos in #84
  • lehreronline_spider v0.0.7 and slightly reduced es_connector log-spam by @Criamos in #85
  • serlo_spider performance improvements by @Criamos in #87
  • change pyCharm run/debug configurations to enable profiling by @Criamos in #88
  • feat: Lehrer-Online - optional '.env'-setting to choose which sub-portals should be crawled by @Criamos in #89
  • 2023-08-11 (Team4 feature requests and fixes, implement languageLevel-Vocab) by @Criamos in #90
  • 2023-08-15 by @Criamos in #91

Full Changelog: v2023.07.21...v2023.08.15

2023-07-21

21 Jul 09:21
cc304c6
Compare
Choose a tag to compare

This release reflects the current state of oeh-search-etl as of 2023-07-21.

(The main purpose of this release is to version-pin the current project state for LISUM-related crawlers.)

What's Changed

  • es_connector: improve error handling / follow PEP8 code style guidelines by @Criamos in #82

Full Changelog: v2023.07.20...v2023.07.21

2023-07-18

18 Jul 13:51
8056882
Compare
Choose a tag to compare

This release reflects the current state of oeh-search-etl as of 2023-07-18 (for an overview of all commits, please see: PR #81).

(The main purpose of this release is to version-pin the current project state for LISUM-related crawlers.)