Detect if a string was filtered #571

hrishikesh-k · 2022-09-09T05:42:20Z

My question is somewhat related to: #292

Question or comment

TL;DR: Detect if string was modified after filtering or get the filtered content.

I'm trying to find a way to check if sanitize-html actually did some filtering on a given string. For example, I'm using the library to filter my query params as well. If the query params include the expected values like numbers or small strings (which is how my front-end would send the request, but people could always use tools like curl), it should work fine. But if someone tries to use some malicious query param value, I'm trying to check it with this library. Now, if the library actually filters something out, I wish to reject that API call instead of proceeding with the filtered value as it would most likely be an invalid or unexpected value.

For example:

https://www.example.com/api/?param1=12345

should work fine as even if I pass it through sanitize-html, it will remain unchanged.

However, if someone send a request like:

https://www.example.com/api/?param1=some-malicious-string

and sanitize-html filters something from it, I wish to stop my API from processing further.

I have considered checking the original string with the sanitized string like:

const original = request.query['param1']
const sanitized = sanitizeHtml(original)
if (original === sanitized) {
  // process
} else {
  //reject because filtered
}

But I was wondering if there's any better way to do this instead of having to filter multiple params like this. Also, when I use it with my message body, I cannot rely on this comparison, as I would expect some attributes to be stripped out. I'm using WindiCSS with Attributify mode: https://windicss.org/features/attributify.html in TipTap editor. The attributes are only for styling in the frontend, and I do not care about them in the backend, so it's okay for those to be filtered out, which is why I'm not using this library's allow-list for those (but I can if that's the only way).

I was planning to use DOMPurify and found that they had this option: https://github.com/cure53/DOMPurify#okay-makes-sense-lets-move-on (.removed which showed the filtered content), but I'm having this issue: kkomelin/isomorphic-dompurify#54 (for non-Next.js apps running on AWS Lambda), and thus, need to use something different.

I did not find any relevant info in the docs or in the issues at the moment.

Let me know if this question/request doesn't make sense and I'd be happy to clarify further.

Details:

Version of Node.js:
16

Server Operating System:
Linux

Additional context:
N/A

Screenshots:
N/A

The text was updated successfully, but these errors were encountered:

boutell · 2022-09-12T12:19:45Z

There is currently no support for detecting whether sanitize-html made any changes.

Simple string comparison will fail because sanitize-html always formats attributes in a consistent way, i.e. always uses double quotes whereas the original might have single quotes or no quotes and still be valid in many cases.

I think tracking whether sanitize-html discarded anything would be a feature worth having, it would make a good pull request. It could take time to reach a complete feature set there, including support for allowing custom transformation functions etc. to report that they did something.

One problem with this idea though is that if you're OK with some changes and not others, it might not satisfy you, and there's no really universal way to check for the changes that matter to some developers and not to others.

Closing because the question has an answer, but this doesn't mean it's not a topic of interest.

carcinocron · 2022-09-27T17:45:36Z

would it theoretically work to run sanitize with no rules to get the "consistent way" for reference, then check it against the result of running sanitize with rules?

hrishikesh-k added the question label Sep 9, 2022

boutell closed this as completed Sep 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Detect if a string was filtered #571

Detect if a string was filtered #571

hrishikesh-k commented Sep 9, 2022 •

edited

Loading

boutell commented Sep 12, 2022

carcinocron commented Sep 27, 2022

Detect if a string was filtered #571

Detect if a string was filtered #571

Comments

hrishikesh-k commented Sep 9, 2022 • edited Loading

Question or comment

Details:

boutell commented Sep 12, 2022

carcinocron commented Sep 27, 2022

hrishikesh-k commented Sep 9, 2022 •

edited

Loading