Skip to content

feat: improved XMLArgs processing #3363

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 31 commits into from
May 4, 2025

Conversation

airween
Copy link
Member

@airween airween commented Apr 20, 2025

what

This PR adds a new feature within XML processing. It's same as the #3358 but for v3.

why

See v2 patch.

references

See #3178 and #3358.

@airween airween added the 3.x Related to ModSecurity version 3.x label Apr 20, 2025
@airween airween requested review from theseion and fzipi April 26, 2025 09:40
Copy link
Contributor

@fzipi fzipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this being the same patch as v2. Here you implemented the extra ctl:parseXMLintoArgs: it is not in v2. My bad, this is indeed implemented.

@airween
Copy link
Member Author

airween commented Apr 26, 2025

I don't see this being the same patch as v2. Here you implemented the extra ctl:parseXMLintoArgs: it is not in v2.

It's there.

airween and others added 16 commits April 27, 2025 20:18
Co-authored-by: Max Leske <[email protected]>
Co-authored-by: Max Leske <[email protected]>
Co-authored-by: Max Leske <[email protected]>
Co-authored-by: Max Leske <[email protected]>
Co-authored-by: Max Leske <[email protected]>
Co-authored-by: Max Leske <[email protected]>
Co-authored-by: Max Leske <[email protected]>
Co-authored-by: Max Leske <[email protected]>
Co-authored-by: Max Leske <[email protected]>
Co-authored-by: Max Leske <[email protected]>
Co-authored-by: Max Leske <[email protected]>
Co-authored-by: Max Leske <[email protected]>
Co-authored-by: Max Leske <[email protected]>
Co-authored-by: Max Leske <[email protected]>
Co-authored-by: Max Leske <[email protected]>
Co-authored-by: Max Leske <[email protected]>
airween and others added 2 commits April 27, 2025 20:28
Co-authored-by: Max Leske <[email protected]>
Co-authored-by: Max Leske <[email protected]>
@airween
Copy link
Member Author

airween commented Apr 27, 2025

A few tests were failed after some commit - I have to check the reason.

@airween
Copy link
Member Author

airween commented Apr 28, 2025

I added a few new commits here as at #3358:

  • bf707de - changed the directive SecXMLintoArgs to SecXmlIntoArgs
  • e8dc60e - there is a small change in the node value's parsing: if the node value contains a multi-byte character the SAX calls the function multiple times, so we need to concatenate those sub-strings, not creating a new string every time
  • 89442ed - changed the directives in tests and add a new test with a multibyte character

@airween airween requested review from theseion and fzipi April 28, 2025 20:55
@fzipi
Copy link
Contributor

fzipi commented Apr 29, 2025

Just out of curiosity, have you tried enabling this and sending a big xml file?

@airween
Copy link
Member Author

airween commented Apr 29, 2025

Just out of curiosity, have you tried enabling this and sending a big xml file?

What do you mean "a big xml file"?

Btw the size of the file is just one thing... There are couple of limits in the engine when you're sending an XML payload and want to process it as ARGS, like:

  • SecRequestBodyNoFilesLimit - the default value is 128kB
  • SecArgumentsLimit - default value is 1000

I sent an XML with size > 400kB (I increased the SecRequestBodyNoFilesLimit), then I get:

ModSecurity: XML parsing error: More than 1000 ARGS (GET + XML) [hostname "localhost"] [uri "/post"] [
ModSecurity: Access denied with code 400 (phase 2). Match of "eq 0" against "REQBODY_ERROR" required. [file "/etc/modsecurity/modsecurity.conf"] [line "78"] [id "200002"] [msg "Failed to parse request body."] [data "XML parsing error: More than 1000 ARGS (GET + XML)"] [severity "CRITICAL"] [hostname "localhost"] [uri "/post"]

Actually, the behavior of this function is exactly the same as that used for JSON. There is no difference. You could ask "have you tried enabling this and sending a big JSON file?". But the JSON works as is, and this feature is optional.

If you tell me what you're curious about, I'll try it - with an XML and a JSON too.

@fzipi
Copy link
Contributor

fzipi commented Apr 29, 2025

Just out of curiosity, have you tried enabling this and sending a big xml file?

The sentence starts with Just out of curiosity. So yes, just curious.

@airween
Copy link
Member Author

airween commented May 1, 2025

The sentence starts with Just out of curiosity. So yes, just curious.

I saw, but honestly: your question is absolutely legitimate. That's why I'm asking you: what should we expect with what size of XML, and/or how many nodes of XML?

I wrote a small script which helps to generate different payloads in XML and JSON - but with same content. I tried them, and the problem is not in the conversion (JSON to ARGS and XML to ARGS), but in the rules' processing. With the same structure in XML and JSON, I got the same ARGS list in the engine, but above 1000 of built ARGS performance deteriorates sharply, and rules which check ARGS and ARGS_NAMES will be very slow. Now I want to add some other features to that script, eg. generate x-www-form-urlencoded payload, and try that too.

Which is clearly visible: the critical value is not the size of the file, but number of nodes (both in XML and in JSON).

And of course: without any rules there is no any performance issue even with a huge number of node JSON/XML file - indeed the aim is not to forget the rules, but that's also a good question with which rules should we test?

Co-authored-by: Max Leske <[email protected]>
@theseion
Copy link
Collaborator

theseion commented May 2, 2025

I saw, but honestly: your question is absolutely legitimate. That's why I'm asking you: what should we expect with what size of XML, and/or how many nodes of XML?

Unfortunately, this depends heavily on the installation. With SOAP, for example, you'll probably process 50-100 nodes per request, with some requests sending much larger XMLs. I don't know whether we can do much more than document the performance given different numbers of nodes and let users decide how to configure the engine.

And of course: without any rules there is no any performance issue even with a huge number of node JSON/XML file - indeed the aim is not to forget the rules, but that's also a good question with which rules should we test?

You could use rules with @unconditionalMatch and multiMatch, one each for ARGS, ARGS_NAMES, and XML. This should give you worst case performance for each target.

@airween airween requested a review from theseion May 2, 2025 21:10
@airween
Copy link
Member Author

airween commented May 2, 2025

I added a new commit:

  • a3876e3 - there is a small change in the node value's parsing: if the node value contains a multi-byte character the SAX calls the function multiple times, so we need to concatenate those sub-strings, not creating a new string every time

@airween airween merged commit 220caa5 into owasp-modsecurity:v3/master May 4, 2025
50 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.x Related to ModSecurity version 3.x
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants