Releases: dginev/CorTeX
ar5iv 04-2024 release
Tagging the version of CorTeX used for the full arXiv conversion run upto 04.2024, also used for producing the ar5iv 04-2024 dataset.
Recent changes mostly track package updates, and have small refinements to the report pages.
arXMLiv 2022 release
arxmliv 2020 release
Minor release capturing the latest CorTeX state for generating and bundling the arXMLiv 2020 dataset.
arxmliv 08.2019 release
This version of cortex was used to convert and bundle the 08.2019 version of the arXMLiv dataset.
History Feature polish and feedback
Minor polish of the newly released history reports, and related patches
History Feature
CorTeX now has an automatic "historical runs" reporting capacity.
It provides insight into incremental changes in subsequent runs of a service over a corpus, helping to track both improvements and regressions, at a course-granular severity level.
See #41 for additional details.
Update to Rocket 0.4
Minor hygiene release: Update to latest Rust nightly (1.33) and Rocket (0.4).
arxmliv 08.2018 upgrades
Frontend upgrades, as well as stability fixes, for the successful conversion run of arXiv upto 08.2018 with the tex_to_html service.
Improved Report Interfaces
Combined release changes upto 0.3.0, include:
- pagination and dedicated preview URLs for task list reports
- worker metadata tracking, as well as per-service worker reports
- breaking changes to dispatcher API, as sink (zmq::PULL) replies are now also required to include an identity message, for better tracking
Diesel.rs Backend
This release, detailed in PR #24 , is a major backend rewrite that ensures a more solid and maintainable foundation. This includes:
- The postgresql backend is now realized entirely using the diesel ORM
- The log messaging table has now been split into 5 tables - one per (latexml-convention) severity, in an effort to keep the final table sizes for billions of messages reasonable.
- The new LogRecord trait makes that usable in Rust with moderate boilerplate, which I find acceptable.
- The implications for the code base are more significant - there are large refactors in the backend APIs and coding style.
- The code quality has been boosted by a more disciplined use of rustfmt and clippy.
The release has undergone a stress test of converting 1000 arXiv artciles and using the respective reports, as a basic sanity check.