Skip to content

Releases: sparklemotion/nokogiri

1.11.4 / 2021-05-14

14 May 23:32
9d69b44
Compare
Choose a tag to compare

1.11.4 / 2021-05-14

Security

[CRuby] Vendored libxml2 upgraded to v2.9.12 which addresses:

Note that two additional CVEs were addressed upstream but are not relevant to this release. CVE-2021-3516 via xmllint is not present in Nokogiri, and CVE-2020-7595 has been patched in Nokogiri since v1.10.8 (see #1992).

Please see nokogiri/GHSA-7rrm-v45f-jp64 or #2233 for a more complete analysis of these CVEs and patches.

Dependencies

  • [CRuby] vendored libxml2 is updated from 2.9.10 to 2.9.12. (Note that 2.9.11 was skipped because it was superseded by 2.9.12 a few hours after its release.)

1.11.3 / 2021-04-07

07 Apr 20:34
d244fb8
Compare
Choose a tag to compare

1.11.3 / 2021-04-07

Fixed

  • [CRuby] Passing non-Node objects to Document#root= now raises an ArgumentError exception. Previously this likely segfaulted. [#1900]
  • [JRuby] Passing non-Node objects to Document#root= now raises an ArgumentError exception. Previously this raised a TypeError exception.
  • [CRuby] arm64/aarch64 systems (like Apple's M1) can now compile libxml2 and libxslt from source (though we continue to strongly advise users to install the native gems for the best possible experience)

1.11.2 / 2021-03-11

11 Mar 15:57
2975cb4
Compare
Choose a tag to compare

1.11.2 / 2021-03-11

Fixed

  • [CRuby] NodeSet may now safely contain Node objects from multiple documents. Previously the GC lifecycle of the parent Document objects could lead to nodes being GCed while still in scope. [#1952]
  • [CRuby] Patch libxml2 to avoid "huge input lookup" errors on large CDATA elements. (See upstream GNOME/libxml2#200 and GNOME/libxml2!100.) [#2132].
  • [CRuby+Windows] Enable Nokogumbo (and other downstream gems) to compile and link against nokogiri.so by including LDFLAGS in Nokogiri::VERSION_INFO. [#2167]
  • [CRuby] {XML,HTML}::Document.parse now invokes #initialize exactly once. Previously #initialize was invoked twice on each object.
  • [JRuby] {XML,HTML}::Document.parse now invokes #initialize exactly once. Previously #initialize was not called, which was a problem for subclassing such as done by Loofah.

Improved

  • Reduce the number of object allocations needed when parsing an HTML::DocumentFragment. [#2087] (Thanks, @ashmaroli!)
  • [JRuby] Update the algorithm used to calculate Node#line to be wrong less-often. The underlying parser, Xerces, does not track line numbers, and so we've always used a hacky solution for this method. [#1223, #2177]
  • Introduce --enable-system-libraries and --disable-system-libraries flags to extconf.rb. These flags provide the same functionality as --use-system-libraries and the NOKOGIRI_USE_SYSTEM_LIBRARIES environment variable, but are more idiomatic. [#2193] (Thanks, @eregon!)
  • [TruffleRuby] --disable-static is now the default on TruffleRuby when the packaged libraries are used. This is more flexible and compiles faster. (Note, though, that the default on TR is still to use system libraries.) [#2191, #2193] (Thanks, @eregon!)

Changed

  • Nokogiri::XML::Path is now a Module (previously it has been a Class). It has been acting solely as a Module since v1.0.0. See 8461c74.

v1.11.1 / 2021-01-06

06 Jan 05:32
7be6f04
Compare
Choose a tag to compare

v1.11.1 / 2021-01-06

Fixed

  • [CRuby] If libxml-ruby is loaded before nokogiri, the SAX and Push parsers no longer call libxml-ruby's handlers. Instead, they defensively override the libxml2 global handler before parsing. [#2168]

SHA-256 Checksums of published gems

a41091292992cb99be1b53927e1de4abe5912742ded956b0ba3383ce4f29711c  nokogiri-1.11.1-arm64-darwin.gem
d44fccb8475394eb71f29dfa7bb3ac32ee50795972c4557ffe54122ce486479d  nokogiri-1.11.1-java.gem
f760285e3db732ee0d6e06370f89407f656d5181a55329271760e82658b4c3fc  nokogiri-1.11.1-x64-mingw32.gem
dd48343bc4628936d371ba7256c4f74513b6fa642e553ad7401ce0d9b8d26e1f  nokogiri-1.11.1-x86-linux.gem
7f49138821d714fe2c5d040dda4af24199ae207960bf6aad4a61483f896bb046  nokogiri-1.11.1-x86-mingw32.gem
5c26111f7f26831508cc5234e273afd93f43fbbfd0dcae5394490038b88d28e7  nokogiri-1.11.1-x86_64-darwin.gem
c3617c0680af1dd9fda5c0fd7d72a0da68b422c0c0b4cebcd7c45ff5082ea6d2  nokogiri-1.11.1-x86_64-linux.gem
42c2a54dd3ef03ef2543177bee3b5308313214e99f0d1aa85f984324329e5caa  nokogiri-1.11.1.gem

v1.11.0 / 2021-01-03

04 Jan 04:25
1c1fba5
Compare
Choose a tag to compare

v1.11.0 / 2021-01-03

Notes

Faster, more reliable installation: Native Gems for Linux and OSX/Darwin

"Native gems" contain pre-compiled libraries for a specific machine architecture. On supported platforms, this removes the need for compiling the C extension and the packaged libraries. This results in much faster installation and more reliable installation, which as you probably know are the biggest headaches for Nokogiri users.

We've been shipping native Windows gems since 2009, but starting in v1.11.0 we are also shipping native gems for these platforms:

  • Linux: x86-linux and x86_64-linux -- including musl platforms like alpine
  • OSX/Darwin: x86_64-darwin and arm64-darwin

We'd appreciate your thoughts and feedback on this work at #2075.

Dependencies

Ruby

This release introduces support for Ruby 2.7 and 3.0 in the precompiled native gems.

This release ends support for:

Gems

  • Explicitly add racc as a runtime dependency. [#1988] (Thanks, @voxik!)
  • [MRI] Upgrade mini_portile2 dependency from ~> 2.4.0 to ~> 2.5.0 [#2005] (Thanks, @alejandroperea!)

Security

See note below about CVE-2020-26247 in the "Changed" subsection entitled "XML::Schema parsing treats input as untrusted by default".

Added

  • Add Node methods for manipulating "keyword attributes" (for example, class and rel): #kwattr_values, #kwattr_add, #kwattr_append, and #kwattr_remove. [#2000]
  • Add support for CSS queries a:has(> b), a:has(~ b), and a:has(+ b). [#688] (Thanks, @jonathanhefner!)
  • Add Node#value? to better match expected semantics of a Hash-like object. [#1838, #1840] (Thanks, @MatzFan!)
  • [CRuby] Add Nokogiri::XML::Node#line= for use by downstream libs like nokogumbo. [#1918] (Thanks, @stevecheckoway!)
  • nokogiri.gemspec is back after a 10-year hiatus. We still prefer you use the official releases, but master is pretty stable these days, and YOLO.

Performance

  • [CRuby] The CSS ~= operator and class selector . are about 2x faster. [#2137, #2135]
  • [CRuby] Patch libxml2 to call strlen from xmlStrlen rather than the naive implementation, because strlen is generally optimized for the architecture. [#2144] (Thanks, @ilyazub!)
  • Improve performance of some namespace operations. [#1916] (Thanks, @ashmaroli!)
  • Remove unnecessary array allocations from Node serialization methods [#1911] (Thanks, @ashmaroli!)
  • Avoid creation of unnecessary zero-length String objects. [#1970] (Thanks, @ashmaroli!)
  • Always compile libxml2 and libxslt with '-O2' [#2022, #2100] (Thanks, @ilyazub!)
  • [JRuby] Lots of code cleanup and performance improvements. [#1934] (Thanks, @kares!)
  • [CRuby] RelaxNG.from_document no longer leaks memory. [#2114]

Improved

  • [CRuby] Handle incorrectly-closed HTML comments as WHATWG recommends for browsers. [#2058] (Thanks to HackerOne user mayflower for reporting this!)
  • {HTML,XML}::Document#parse now accept Pathname objects. Previously this worked only if the referenced file was less than 4096 bytes long; longer files resulted in undefined behavior because the read method would be repeatedly invoked. [#1821, #2110] (Thanks, @doriantaylor and @phokz!)
  • [CRuby] Nokogumbo builds faster because it can now use header files provided by Nokogiri. [#1788] (Thanks, @stevecheckoway!)
  • Add frozen_string_literal: true magic comment to all lib files. [#1745] (Thanks, @oniofchaos!)
  • [JRuby] Clean up deprecated calls into JRuby. [#2027] (Thanks, @headius!)

Fixed

  • HTML Parsing in "strict" mode (i.e., the RECOVER parse option not set) now correctly raises a XML::SyntaxError exception. Previously the value of the RECOVER bit was being ignored by CRuby and was misinterpreted by JRuby. [#2130]
  • The CSS ~= operator now correctly handles non-space whitespace in the class attribute. commit e45dedd
  • The switch to turn off the CSS-to-XPath cache is now thread-local, rather than being shared mutable state. [#1935]
  • The Node methods add_previous_sibling, previous=, before, add_next_sibling, next=, after, replace, and swap now correctly use their parent as the context node for parsing markup. These methods now also raise a RuntimeError if they are called on a node with no parent. [nokogumbo#160]
  • [JRuby] XML::Schema XSD validation errors are captured in XML::Schema#errors. These errors were previously ignored.
  • [JRuby] Standardize reading from IO like objects, including StringIO. [#1888, #1897]
  • [JRuby] Fix how custom XPath function namespaces are inferred to be less naive. [#1890, #2148]
  • [JRuby] Clarify exception message when custom XPath functions can't be resolved.
  • [JRuby] Comparison of Node to Document with Node#<=> now matches CRuby/libxml2 behavior.
  • [CRuby] Syntax errors are now correctly captured in Document#errors for short HTML documents. Previously the SAX parser used for encoding detection was clobbering libxml2's global error handler.
  • [CRuby] Fixed installation on AIX with respect to vasprintf. [#1908]
  • [CRuby] On some platforms, avoid symbol name collision with glibc's canonicalize. [#2105]
  • [Windows Visual C++] Fixed compiler warnings and errors. [#2061, #2068]
  • [CRuby] Fixed Nokogumbo integration which broke in the v1.11.0 release candidates. [#1788] (Thanks, @stevecheckoway!)
  • [JRuby] Fixed document encoding regression in v1.11.0 release candidates. [#2080, #2083] (Thanks, @thbar!)

Removed

  • The internal method Nokogiri::CSS::Parser.cache_on= has been removed. Use .set_cache if you need to muck with the cache internals.
  • The class method Nokogiri::CSS::Parser.parse has been removed. This was originally deprecated in 2009 in 13db61b. Use Nokogiri::CSS.parse instead.

Changed

XML::Schema input is now "untrusted" by default

Address CVE-2020-26247.

In Nokogiri versions <= 1.11.0.rc3, XML Schemas parsed by Nokogiri::XML::Schema were trusted by default, allowing external resources to be accessed over the network, potentially enabling XXE or SSRF attacks.

This behavior is counter to the security policy intended by Nokogiri maintainers, which is to treat all input as untrusted by default whenever possible.

Please note that this security fix was pushed into a new minor version, 1.11.x, rather than a patch release to the 1.10.x branch, because it is a breaking change for some schemas and the risk was assessed to be "Low Severity".

More information and instructions for enabling "trusted input" behavior in v1.11.0.rc4 and later is available at the [publi...

Read more

v1.11.0.rc4 / 2020-12-29

29 Dec 16:47
f7bc31f
Compare
Choose a tag to compare
Pre-release

v1.11.0.rc4 / 2020-12-29

Latest is v1.11.0.rc4 (2020-12-29). To try out release candidates, use gem install --prerelease or gem install nokogiri -v1.11.0.rc4

If you're using bundler, try updating your Gemfile with:

gem "nokogiri", "~> 1.11.0.rc4"`

Delta since v1.11.0.rc3:

Notes

  • Added precompiled native gem support for Darwin (OSX) platform arm64-darwin

Dependencies

Ruby

Gems

  • Explicitly add racc as a runtime dependency. [#1988] (Thanks, @voxik!)

Security

See note below about CVE-2020-26247 in the "Changed" subsection entitled "XML::Schema parsing treats input as untrusted by default".

Performance

  • [CRuby] The CSS ~= operator and class selector . are about 2x faster. [#2137, #2135]
  • [CRuby] Patch libxml2 to call strlen from xmlStrlen rather than the naive implementation, because strlen is generally optimized for the architecture. [#2144] (Thanks, @ilyazub!)
  • Always compile libxml2 and libxslt with '-O2' [#2022, #2100] (Thanks, @ilyazub!)
  • [CRuby] RelaxNG.from_document no longer leaks memory. [#2114]

Improved

  • [CRuby] Handle incorrectly-closed HTML comments as WHATWG recommends for browsers. [#2058] (Thanks to HackerOne user mayflower for reporting this!)
  • {HTML,XML}::Document#parse now accept Pathname objects. Previously this worked only if the referenced file was less than 4096 bytes long; longer files resulted in undefined behavior because the read method would be repeatedly invoked. [#1821, #2110] (Thanks, @doriantaylor and @phokz!)
  • [CRuby] Nokogumbo builds faster because it can now use header files provided by Nokogiri. [#1788] (Thanks, @stevecheckoway!)
  • [JRuby] Clean up deprecated calls into JRuby. [#2027] (Thanks, @headius!)

Fixed

  • HTML Parsing in "strict" mode (i.e., the RECOVER parse option not set) now correctly raises a XML::SyntaxError exception. Previously the value of the RECOVER bit was being ignored by CRuby and was misinterpreted by JRuby. [#2130]
  • The CSS ~= operator now correctly handles non-space whitespace in the class attribute. commit e45dedd
  • The Node methods add_previous_sibling, previous=, before, add_next_sibling, next=, after, replace, and swap now correctly use their parent as the context node for parsing markup. These methods now also raise a RuntimeError if they are called on a node with no parent. [nokogumbo#160]
  • [JRuby] XML::Schema XSD validation errors are captured in XML::Schema#errors. These errors were previously ignored.
  • [JRuby] Fix how custom XPath function namespaces are inferred to be less naive. [#1890, #2148]
  • [JRuby] Clarify exception message when custom XPath functions can't be resolved.
  • [JRuby] Comparison of Node to Document with Node#<=> now matches CRuby/libxml2 behavior.
  • [CRuby] Syntax errors are now correctly captured in Document#errors for short HTML documents. Previously the SAX parser used for encoding detection was clobbering libxml2's global error handler.
  • [CRuby] On some platforms, avoid symbol name collision with glibc's canonicalize. [#2105]
  • [CRuby] Fixed Nokogumbo integration which broke in the v1.11.0 release candidates. [#1788] (Thanks, @stevecheckoway!)
  • [JRuby] Fixed document encoding regression in v1.11.0 release candidates. [#2080, #2083] (Thanks, @thbar!)

Changed

XML::Schema input is now "untrusted" by default

Address CVE-2020-26247.

In Nokogiri versions <= 1.11.0.rc3, XML Schemas parsed by Nokogiri::XML::Schema were trusted by default, allowing external resources to be accessed over the network, potentially enabling XXE or SSRF attacks.

This behavior is counter to the security policy intended by Nokogiri maintainers, which is to treat all input as untrusted by default whenever possible.

Please note that this security fix was pushed into a new minor version, 1.11.x, rather than a patch release to the 1.10.x branch, because it is a breaking change for some schemas and the risk was assessed to be "Low Severity".

More information and instructions for enabling "trusted input" behavior in v1.11.0.rc4 and later is available at the public advisory.

HTML parser now obeys the strict or norecover parsing option

(Also noted above in the "Fixed" section) HTML Parsing in "strict" mode (i.e., the RECOVER parse option not set) now correctly raises a XML::SyntaxError exception. Previously the value of the RECOVER bit was being ignored by CRuby and was misinterpreted by JRuby.

If you're using the default parser options, you will be unaffected by this fix. If you're passing strict or norecover to your HTML parser call, you may be surprised to see that the parser now fails to recover and raises a XML::SyntaxError exception. Given the number of HTML documents on the internet that libxml2 would consider to be ill-formed, this is probably not what you want, and you can omit setting that parse option to restore the behavior that you have been relying upon.

Apologies to anyone inconvenienced by this breaking bugfix being present in a minor release, but I felt it was appropriate to introduce this fix because it's straightforward to fix any code that has been relying on this buggy behavior.

VersionInfo, the output of nokogiri -v, and related constants

This release changes the metadata provided in Nokogiri::VersionInfo which also affects the output of nokogiri -v. Some related constants have also been changed. If you're using VersionInfo programmatically, or relying on constants related to underlying library versions, please read the detailed changes for Nokogiri::VersionInfo at #2139 and accept our apologies for the inconvenience.

v1.11.0.rc3 / 2020-09-08

08 Sep 13:18
959db1d
Compare
Choose a tag to compare
Pre-release

v1.11.0.rc3 / 2020-09-08

To try out release candidates, use gem install --prerelease or gem install nokogiri -v1.11.0.rc3

If you're using bundler, try updating your Gemfile with:

gem "nokogiri", "~> 1.11.0.rc3"`

Delta since v1.11.0.rc2:

Notes

Added precompiled native gem support for OSX/Darwin platform x86_64-darwin19.

Fixed

  • [Windows Visual C++] Fixed compiler warnings and errors. [#2061, #2068]

1.10.10 / 2020-07-06

06 Jul 13:42
a9a3717
Compare
Choose a tag to compare

1.10.10 / 2020-07-06

Features

  • [MRI] Cross-built Windows gems now support Ruby 2.7 [#2029]. Note that prior to this release, the v1.11.x prereleases provided this support.

v1.11.0.rc2 / 2020-04-01

01 Apr 19:21
v1.11.0.rc2
a762738
Compare
Choose a tag to compare
Pre-release

v1.11.0.rc2 / 2020-04-01

To try out release candidates, use gem install --prerelease. Latest is v1.11.0.rc2.

Delta since v1.11.0.rc1:

Notes

Note that the linux-native gems for v1.11.0.rc2 and later support musl systems (e.g., alpine).

Dependencies

  • [MRI] Upgrade mini_portile2 dependency from ~> 2.4.0 to ~> 2.5.0 [#2005] (Thanks, @alejandroperea!)

Added

  • Add Node methods for manipulating keyword attributes (like class and rel): #kwattr_values, #kwattr_add, #kwattr_append, and #kwattr_remove. [#2000]

Fixed

  • The switch to turn off the CSS-to-XPath cache is now thread-local, rather than being shared mutable state. [#1935]
  • The switch to turn off the CSS-to-XPath cache is now thread-local, rather than being shared mutable state. [#1935]

Removed

  • The internal method Nokogiri::CSS::Parser.cache_on= has been removed. Use .set_cache if you need to muck with the cache internals.
  • The method Nokogiri::CSS::Parser.parse has been removed. This was originally deprecated in 2009 in 13db61b.

1.10.9 / 2020-03-01

01 Mar 19:06
e2e191d
Compare
Choose a tag to compare

1.10.9 / 2020-03-01

Fixed

  • [MRI] Raise an exception when Nokogiri detects a specific libxml2 edge case involving blank Schema nodes wrapped by Ruby objects that would cause a segfault. Currently no fix is available upstream, so we're preventing a dangerous operation and informing users to code around it if possible. [#1985, #2001]
  • [JRuby] Change NodeSet#to_a to return a RubyArray instead of Object, for compilation under JRuby 9.2.9 and later. [#1968, #1969] (Thanks, @headius!)