Fix empty and multiple values headers #413
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
@devl00p, @bretfourbe, @fwininger
While testing the wapp module that use wappalyzer, I realized the module mess up with some headers configurations
Firstly, when the regex is empty in our JSON file, we replace it with its attribute as you can see here
But then we compare this value (the copied attribute) with its value from the response instead of checking if it is simply here, which will not raise it.
The second bug I found was with headers that have multiple values for the same attribute. Let's admit a response has 2 headers
Server
(which will most likely never happens for this attribute but whatever):The code use to concatenate each value from attributes in a dictionary with a single string that way:
Which will make some regexes fail: admitting we can detect a technology thanks to its server attribute "that" by its regex, which is "^that$". It will not work as the regex is expecting the string to start and end with this word. As this behavior comes from HTTPX, you can find more information here.
Instead of using a split method (because we can't take for granted the fact that all the attributes will never have a " ," as values inside of them) I safely reparsed the dictionary from the httpx.Header object, so now we have a dictionary that looks more like this:
And the regex search for each element.
I also found a third bug in which the keys provided by the JSON database are set in lowercases, meanwhile the keys of the dictionary made by the httpx.Response for the cookies are in uppercase. I lowered down all the keys, so it can detect all the technos
Finally, I've written some more unit tests for those cases to check that everything runs smoothly.
EDIT 1: I found out that
Is a bad way to check the units tests since we can put technos in expected values that couldn't be in the persister, so I patched them using sets instead which guarantees the existence and uniqueness of each expected strings:
It turned out that there were more bugs involved than simply the headers, I'll keep working on them even tho the name of the MR is loosing its meaning as this patch is growing bigger
EDIT 2:
In the unit test file, for the DOM, a techno (Joomla) implied by another one (Sellacious) was missing, so I added it and some attributes were set to id whereas they have to be set to class according to the JSON.
I encountered the same issue with the meta tags that I've experienced with the headers tags (the fact that as they are stored in a dictionary, every tag with the same name will be squashed into one regardless of their content). This made me add a new method in the HTML parser to allow a list of tuples
(name, content)
so they are all unique (and use the same format as multi_items() from httpx.Headers), and I reparsed them as a dictionary of lists in the Wappalyzer class just like the headersAll the unit tests are now green, and I haven't found anymore bugs so far