Skip to content

Releases: html-extract/hext

Hext v1.0.0

12 Dec 23:05
Compare
Choose a tag to compare

Changes

  • New syntax: Nested rules ( #22 )
    # match <div> elements that have a descendant <a> at any depth
    <div> { <a/> } </div>
    
  • Abort extraction after a specified amount of searches (4dff797): Added a new parameter max_searches to Rule::extract and htmlext. It is disabled by default (value 0). If running untrusted hext templates, I recommend setting max_searches to some high value, like 10000, to protect against resource exhaustion. Nested rules can cause nasty runtime performance (see #22 for an example).

Hext v0.8.3

09 Nov 15:32
Compare
Choose a tag to compare

Static binary releases

Install the htmlext command-line utility and Hext for Python (v3.8 or earlier):

pip install hext

Install Hext for Node (v12 or earlier):

npm install hext

Both are compatible with Linux (x86_64) and Mac OS X ≥ 10.11.

Hext for WebAssembly

See https://github.com/html-extract/hext-emscripten/releases

Changes

  • Rules can now match custom-tags
  • Custom-tags are matched in a case-insensitive manner

Releases can be verified with the public key at https://thomastrapp.com/public_key.asc, which has the key ID 086653AA8CC7270E and the fingerprint E6EA EFD0 2CBB 0EFF C010 1324 0866 53AA 8CC7 270E.

Hext v0.8.2

23 Jul 20:40
Compare
Choose a tag to compare

Notable but minor changes:

  • make install now uses CMake's GNUInstallDirs. This allows for finer control of what gets installed where, when configuring the project (67afed7).
  • htmlext's Version.cpp and libhext's Version.cpp are generated out of the source tree (9e8125e).
  • Removed libhext's custom Doxygen theme (93dab11).

Releases can be verified with the public key at http://thomastrapp.com/public_key.asc, which has the key ID 086653AA8CC7270E and the fingerprint E6EA EFD0 2CBB 0EFF C010 1324 0866 53AA 8CC7 270E.

Hext v0.8.0

10 Jun 22:05
Compare
Choose a tag to compare

Static binary releases

Install the htmlext command-line utility and Hext for Python:

pip install hext

Install Hext for Node:

npm install hext

Both are compatible with Linux (x86_64) and Mac OS X ≥ 10.11. Visit https://hext.thomastrapp.com/download.

Major changes

  • Syntax: Added greedy rules. Existing Hext snippets do not need to be changed. Documentation.
  • CMake and FindPackage Hext: When using libhext in a CMake project, the target hext::hext must be used. The variables Hext_LIBRARY and Hext_INCLUDE_DIRS do no longer exist. See https://hext.thomastrapp.com/download#using-libhext
  • New dependency: RapidJSON. Previously, RapidJSON (which generates the JSON output of htmlext) was downloaded automatically, if not found. This is no longer the case. RapidJSON is installable through most package managers (rapidjson-dev). See https://github.com/Tencent/rapidjson
  • Mac OS X support: Hext can now be built and installed on Mac OS X.

Minor changes

  • Modernized CMake usage. CMake 3.8 or later required.
  • Boost 1.70 support (#8)
  • Compatibility with Node v10, v11 and v12 (#6)
  • Added a man page for htmlext
  • Added build scripts for releases
  • Travis CI tests
  • libhext examples are no longer built automatically
  • libhext/test: Google Test is now a hard dependency
  • Python2 unicode string support
  • PHP bindings now require PHP7

Acknowledgments

I'd like to thank the following people for their feedback and help:

👍

Releases can be verified with the public key at http://thomastrapp.com/public_key.asc, which has the key ID 086653AA8CC7270E and the fingerprint E6EA EFD0 2CBB 0EFF C010 1324 0866 53AA 8CC7 270E.

Hext v0.7.0

23 Sep 21:19
Compare
Choose a tag to compare

Notable changes:

  • A C++17 capable compiler is now required to build from source
  • Rapidjson was updated to v1.1.0
  • Improved Python 3 support, courtesy of @brandonrobertz

Attached to this release you will find four experimental builds:

SHA256SUM can be verified with SHA256SUM.asc using my public key at http://thomastrapp.com/public_key.asc, which has the key ID 086653AA8CC7270E and the fingerprint E6EA EFD0 2CBB 0EFF C010 1324 0866 53AA 8CC7 270E.

The binaries were built using docker, Ubuntu 18.04 and build-hext-binary-releases.sh.

Hext v0.6.0

07 Apr 12:56
Compare
Choose a tag to compare

Notable changes:

  • libhext: reduce amount of exported symbols

This will be the last release before switching to C++17.

Hext v0.5.2

26 Jul 14:38
Compare
Choose a tag to compare

Notable changes:

  • libhext/NodeUtil: add proper serialization of HTML for InnerHtml
  • libhext: use versioned soname

Hext v0.5.1

20 Jul 15:57
Compare
Choose a tag to compare

Notable changes:

  • htmlext and libhext can now be compiled with Visual Studio 2015
  • libhext/bindings: htmlext.{js,php,py,rb}: Handle hext syntax error

Hext v0.5.0

06 Jul 16:15
Compare
Choose a tag to compare

Public API changes:

  • libhext/bindings/ruby: convert hext::Result to native types

Hext v0.4.0

04 Jul 14:13
Compare
Choose a tag to compare

Public API changes:

  • libhext/bindings/python: convert hext::Result to native types