Update `org.jsoup:jsoup` to `v1.18.3` (was `1.17.2`) #798

renovate · 2025-01-01T04:15:08Z

This PR contains the following updates:

Package	Change	Age	Adoption	Passing	Confidence
org.jsoup:jsoup (source)	`1.17.2` -> `1.18.3`

Release Notes

jhy/jsoup (org.jsoup:jsoup)

`v1.18.3`

Bug Fixes

When serializing to XML, attribute names containing -, ., or digits were incorrectly marked as invalid and
removed. 2235
If an element has an ; in an attribute name, it could not be converted to a W3C DOM element, and so subsquent XPath
queries could miss that element. Now, the attribute name is more completely
normalized. 2244

`v1.18.2`

Improvements

Optimized the throughput and memory use throughout the input read and parse flows, with heap allocations and GC
down between -6% and -89%, and throughput improved up to +143% for small inputs. Most inputs sizes will see
throughput increases of ~ 20%. These performance improvements come through recycling the backing byte[] and char[]
arrays used to read and parse the input. 2186
Speed optimized html() and Entities.escape() when the input contains UTF characters in a supplementary plane, by
around 49%. 2183
The form associated elements returned by FormElement.elements() now reflect changes made to the DOM,
subsequently to the original parse. 2140
In the TreeBuilder, the onNodeInserted() and onNodeClosed() events are now also fired for the outermost /
root Document node. This enables source position tracking on the Document node (which was previously unset). And
it also enables the node traversor to see the outer Document node. 2182
Selected Elements can now be position swapped inline using
Elements#set(). 2212

Bug Fixes

Element.cssSelector() would fail if the element's class contained a *
character. 2169
When tracking source ranges, a text node following an invalid self-closing element may be left
untracked. 2175
When a document has no doctype, or a doctype not named html, it should be parsed in Quirks
Mode. 2197
With a selector like div:has(span + a), the has() component was not working correctly, as the inner combining
query caused the evaluator to match those against the outer's siblings, not
children. 2187
A selector query that included multiple :has() components in a nested :has() might incorrectly
execute. 2131
When cookie names in a response are duplicated, the simple view of cookies available via
Connection.Response#cookies() will provide the last one set. Generally it is better to use
the Jsoup.newSession method to maintain a cookie jar, as that
applies appropriate path selection on cookies when making requests. 1831
When parsing named HTML entities, base entities should resolve if they are a prefix of the input token (and not in an
attribute). 2207
Fixed incorrect tracking of source ranges for attributes merged from late-occurring elements that were implicitly
created (html or body). 2204
Follow the current HTML specification in the tokenizer to allow < as part of a tag name, instead of emitting it as a
character node. 2230
Similarly, allow a < as the start of an attribute name, vs creating a new element. The previous behavior was
intended to parse closer to what we anticipated the author's intent to be, but that does not align to the spec or to
how browsers behave. 1483

`v1.18.1`

Improvements

Stream Parser: A StreamParser provides a progressive parse of its input. As each Element is completed, it is
emitted via a Stream or Iterator interface. Elements returned will be complete with all their children, and an
(empty) next sibling, if applicable. Elements (or their children) may be removed from the DOM during the parse,
for e.g. to conserve memory, providing a mechanism to parse an input document that would otherwise be too large to fit
into memory, yet still providing a DOM interface to the document and its elements. Additionally, the parser provides
a selectFirst(String query) / selectNext(String query), which will run the parser until a hit is found, at which
point the parse is suspended. It can be resumed via another select() call, or via the stream() or iterator()
methods. 2096
Download Progress: added a Response Progress event interface, which reports progress and URLs are downloaded (and
parsed). Supported on both a session and a single connection
level. 2164, 656
Added Path accepting parse methods: Jsoup.parse(Path), Jsoup.parse(path, charsetName, baseUri, parser),
etc. 2055
Updated the button tag configuration to include a space between multiple button elements in the Element.text()
method. 2105
Added support for the ns|* all elements in namespace Selector. 1811
When normalising attribute names during serialization, invalid characters are now replaced with _, vs being
stripped. This should make the process clearer, and generally prevent an invalid attribute name being coerced
unexpectedly. 2143

Changes

Removed previously deprecated internal classes and methods. 2094
Build change: the built jar's OSGi manifest no longer imports itself. 2158

Bug Fixes

When tracking source positions, if the first node was a TextNode, its position was incorrectly set
to -1. 2106
When connecting (or redirecting) to URLs with characters such as {, } in the path, a Malformed URL exception would
be thrown (if in development), or the URL might otherwise not be escaped correctly (if in
production). The URL encoding process has been improved to handle these characters
correctly. 2142
When using W3CDom with a custom output Document, a Null Pointer Exception would be
thrown. 2114
The :has() selector did not match correctly when using sibling combinators (like
e.g.: h1:has(+h2)). 2137
The :empty selector incorrectly matched elements that started with a blank text node and were followed by
non-empty nodes, due to an incorrect short-circuit. 2130
Element.cssSelector() would fail with "Did not find balanced marker" when building a selector for elements that had
a ( or [ in their class names. And selectors with those characters escaped would not match as
expected. 2146
Updated Entities.escape(string) to make the escaped text suitable for both text nodes and attributes (previously was
only for text nodes). This does not impact the output of Element.html() which correctly applies a minimal escape
depending on if the use will be for text data or in a quoted
attribute. 1278
Fuzz: a Stack Overflow exception could occur when resolving a crafted <base href> URL, in the normalizing regex.
2165

Configuration

📅 Schedule: Branch creation - "monthly" (UTC), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.

If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.

Update org.jsoup:jsoup to v1.18.3 (was 1.17.2)

bd8ac3f

renovate bot added the type:updates label Jan 1, 2025

ihostage approved these changes Jan 9, 2025

View reviewed changes

ihostage merged commit ecc4805 into main Jan 9, 2025
50 checks passed

mergify bot deleted the renovate/jsoup branch January 9, 2025 19:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update `org.jsoup:jsoup` to `v1.18.3` (was `1.17.2`) #798

Update `org.jsoup:jsoup` to `v1.18.3` (was `1.17.2`) #798

renovate bot commented Jan 1, 2025

Update org.jsoup:jsoup to v1.18.3 (was 1.17.2) #798

Update org.jsoup:jsoup to v1.18.3 (was 1.17.2) #798

Conversation

renovate bot commented Jan 1, 2025

Release Notes

v1.18.3

Bug Fixes

v1.18.2

Improvements

Bug Fixes

v1.18.1

Improvements

Changes

Bug Fixes

Configuration

Update `org.jsoup:jsoup` to `v1.18.3` (was `1.17.2`) #798

Update `org.jsoup:jsoup` to `v1.18.3` (was `1.17.2`) #798

`v1.18.3`

`v1.18.2`

`v1.18.1`