Releases: openeduhub/oeh-search-etl
v2024.12.18
What's Changed
- Fix "httpx"-related ReadErrors in
es_connector
by @Criamos in #113 - Merge recent HTTPX-related fixes into
master
by @Criamos in #114 - Improved Exception Handling during website-screenshot fallback and several fixes for
pydantic
ValidationErrors by @Criamos in #115 - Merge fixes from PR 115 into
master
by @Criamos in #116 - Feat: Planet-N crawler // update GitHub workflows by @Criamos in #117
- feat: (optional) OER-Filter Pipeline by @Criamos in #118
- Portal Globales Lernen & updated DocStrings by @Criamos in #119
- feat: parse robots.txt for AI usage indicators ("ccm:ai_allow_usage") by @Criamos in #120
- Upgrade to Python 3.13 and Scrapy v2.12 / feat: robots.txt parsing for "ccm:ai_allow_usage" by @Criamos in #121
- Merge develop into master by @Criamos in #122
- Update headless browser and planet_n_spider v0.0.3 by @Criamos in #123
- Merge PR 123 from develop into master by @Criamos in #124
Full Changelog: v2024.09.04...v2024.12.18
v2024.09.04
What's Changed
- Python 3.12 Migration and Dependency Updates by @Criamos in #104
- Python 3.12 Migration / Dependency Upgrades / OERSI: BIRD-related metadata enrichment (
iMoox
/vhb
) by @Criamos in #105 - Crawler for BNE-Portal.de (+ more flexible playwright controls for cookies / ad-blocking) by @Criamos in #106
- Merge oersi_spider v0.2.7 and bne_portal_spider v0.0.3 into develop by @Criamos in #107
- feat: support edu-sharing v9.x API (+ dependency updates) by @Criamos in #109
- merge develop into master (2024-09-04) by @Criamos in #111
Full Changelog: v2024.04.11...v2024.09.04
v2024.04.11
This release reflects the state of oeh-search-etl
as of 2024-04-11, which includes several crawler updates.
Highlight: A completely rewritten bpb_spider
for bpb.de.
What's Changed
- ITSJOINTLY-1323 - add new channels and support "YouTube Handle" URLs by @Criamos in #101
- Crawler Updates (Q1 2024) - KMap, DiLerTube, BpB, Tutory, YouTube by @Criamos in #102
- Merge changes between 2024-01 and 2024-04-10 into master by @Criamos in #103
Full Changelog: v2024.02.06...v2024.04.11
v2024.02.06
This release reflects the state of oeh-search-etl
as of 2024-02-06, which contains the most recent changes from develop
made since 2023-08-15. For details, please check the individual PullRequests as listed in the auto-generated summary below. (Highlights: PR #95 and #99)
What's Changed
- 2023-08-15 feature requests and OER Sommercamp 2023 updates by @Criamos in #92
- optimize SODIX program flow, increase crawler performance and reliability by @Criamos in #93
- feat: language string normalization pipeline (ISO 639-1) and serlo_spider v0.3.2 by @Criamos in #94
- oersi_spider v0.1.8 (feat: "offline"-import) and improved thumbnail handling by @Criamos in #96
- First batch of "WLO-BIRD-Connector v2"-related changes by @Criamos in #97
- Improve error-handling for broken image files // rpi_virtuell_spider v0.0.9 by @Criamos in #98
- feat: attach whitelisted edu-sharing "source template" metadata properties to scraped items ("Quellen-Datensatz"-Template) by @Criamos in #99
- sync to async performance improvements, Python 3.11.6, Scrapy 2.11, browserless v2, "Quellen-Datensatz"-templates and crawler updates by @Criamos in #95
- merge changes between 2023-08-15 and 2024-02-06 into master by @Criamos in #100
Full Changelog: v2023.08.15...v2024.02.06
v2023.08.15
This release reflects the state of oeh-search-etl
as of 2023-08-15, which is mainly intended for version-pinning purposes.
Changes made after this release will contain substantial changes to our thumbnail generation pipeline and async
performance optimizations (see: async
-related PullRequest #95), which might contain breaking changes.
What's Changed
- Introduce "poetry" for dependency management; Updates for Serlo & Science in School crawlers by @Criamos in #84
- lehreronline_spider v0.0.7 and slightly reduced es_connector log-spam by @Criamos in #85
- serlo_spider performance improvements by @Criamos in #87
- change pyCharm run/debug configurations to enable profiling by @Criamos in #88
- feat: Lehrer-Online - optional '.env'-setting to choose which sub-portals should be crawled by @Criamos in #89
- 2023-08-11 (Team4 feature requests and fixes, implement
languageLevel
-Vocab) by @Criamos in #90 - 2023-08-15 by @Criamos in #91
Full Changelog: v2023.07.21...v2023.08.15
2023-07-21
This release reflects the current state of oeh-search-etl
as of 2023-07-21.
(The main purpose of this release is to version-pin the current project state for LISUM-related crawlers.)
What's Changed
Full Changelog: v2023.07.20...v2023.07.21
2023-07-18
This release reflects the current state of oeh-search-etl
as of 2023-07-18 (for an overview of all commits, please see: PR #81).
(The main purpose of this release is to version-pin the current project state for LISUM-related crawlers.)