We are pleased to announce version 0.15.0 of ACHE Crawler!
This version includes several dependency updates and fixes a robots.txt serialization bug that only happens when the robots.txt feature is enabled. This fix may cause data backward incompatibility of previous crawls that use robots.txt. We also plan to upgrade Elasticsearch support in the next version, so this version may be the last version to support legacy Elasticsearch versions (e.g., <6.x).
Following is a detailed log of the changes since the last version:
- Bump okhttp from 3.14.0 to 4.9.3
- Bump jackson-* libraries from 2.13.1 to 2.13.3
- Bump logback-classic from 1.2.9 to 1.2.11
- Bump slf4j-api from 1.7.32 to 1.7.36
- Bump RoaringBitmap from 0.9.23 to 0.9.27
- Bump metrics-* libraries from 4.2.7 to 4.2.17
- Bump aws-java-sdk-s3 from 1.12.131 to 1.12.225
- Remove aws-java-sdk-s3 dependency from main project
- Add support for Elasticsearch 7.x and 8.x indexing (#282)
- Bump jetty-server from 9.4.44.v20210927 to 9.4.48.v20220622
- Bump kryo-serializers from 0.42 to 0.43
- Bump RoaringBitmap from 0.9.27 to 0.9.39
- Bump tika-parsers from 1.18 to 1.28.4
- Bump gradle-node-plugin to version 3.5.1 and node.js to 18.14.2
- Migrate tests from jUnit 4 to 5
- Migrate test assertions from Hamcrest to AssertJ
- Bump org.apache.httpcomponents:httpclient from 4.5.13 to 4.5.14
- Bump ch.qos.logback:logback-classic from 1.2.+ to 1.4.5
- Fix robots.txt serialization bug
- Bump jackson-* libraries from 2.13.3 to 2.14.2
- Bump org.apache.commons:commons-lang3 from 3.4 to 3.12.0
- Bump org.apache.commons:commons-compress from 1.21 to 1.22
- Bump org.apache.kafka:kafka-clients from 3.2.0 to 3.4.0
- Bump com.squareup.okhttp3:okhttp from 4.9.3 to 4.10.0