Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix empty and multiple values headers #413

Closed
wants to merge 1 commit into from

Conversation

RMI78
Copy link
Contributor

@RMI78 RMI78 commented Jun 16, 2023

@devl00p, @bretfourbe, @fwininger
While testing the wapp module that use wappalyzer, I realized the module mess up with some headers configurations

Firstly, when the regex is empty in our JSON file, we replace it with its attribute as you can see here
But then we compare this value (the copied attribute) with its value from the response instead of checking if it is simply here, which will not raise it.

The second bug I found was with headers that have multiple values for the same attribute. Let's admit a response has 2 headers Server (which will most likely never happens for this attribute but whatever):

Server : this 
Server : that

The code use to concatenate each value from attributes in a dictionary with a single string that way:

{
   "Server": "this, that"
}

Which will make some regexes fail: admitting we can detect a technology thanks to its server attribute "that" by its regex, which is "^that$". It will not work as the regex is expecting the string to start and end with this word. As this behavior comes from HTTPX, you can find more information here.

Instead of using a split method (because we can't take for granted the fact that all the attributes will never have a " ," as values inside of them) I safely reparsed the dictionary from the httpx.Header object, so now we have a dictionary that looks more like this:

{
   "Server": ["this", "that"]
}

And the regex search for each element.

I also found a third bug in which the keys provided by the JSON database are set in lowercases, meanwhile the keys of the dictionary made by the httpx.Response for the cookies are in uppercase. I lowered down all the keys, so it can detect all the technos

Finally, I've written some more unit tests for those cases to check that everything runs smoothly.

EDIT 1: I found out that

 expected_result = [
            '{"name": "Astra Widgets", "versions": ["1.5.4"], "categories": ["WordPress plugins", "Widgets"], "groups": ["Add-ons", "Other"]}',
            '{"name": "PHP", "versions": [], "categories": ["Programming languages"], "groups": ["Web development"]}'

        ]
for arg in persister.add_payload.call_args_list:
            assert arg[1]["info"] in expected_result

Is a bad way to check the units tests since we can put technos in expected values that couldn't be in the persister, so I patched them using sets instead which guarantees the existence and uniqueness of each expected strings:

 expected_result = [
            '{"name": "Astra Widgets", "versions": ["1.5.4"], "categories": ["WordPress plugins", "Widgets"], "groups": ["Add-ons", "Other"]}',
            '{"name": "PHP", "versions": [], "categories": ["Programming languages"], "groups": ["Web development"]}'

        ]
assert set([arg[1]['info'] for arg in persister.add_payload.call_args_list]) == expected_result

It turned out that there were more bugs involved than simply the headers, I'll keep working on them even tho the name of the MR is loosing its meaning as this patch is growing bigger

EDIT 2:
In the unit test file, for the DOM, a techno (Joomla) implied by another one (Sellacious) was missing, so I added it and some attributes were set to id whereas they have to be set to class according to the JSON.
I encountered the same issue with the meta tags that I've experienced with the headers tags (the fact that as they are stored in a dictionary, every tag with the same name will be squashed into one regardless of their content). This made me add a new method in the HTML parser to allow a list of tuples (name, content) so they are all unique (and use the same format as multi_items() from httpx.Headers), and I reparsed them as a dictionary of lists in the Wappalyzer class just like the headers

All the unit tests are now green, and I haven't found anymore bugs so far

@RMI78 RMI78 force-pushed the wappalyzer_fix branch 7 times, most recently from 5ae8739 to 423abee Compare June 20, 2023 08:45
@RMI78 RMI78 force-pushed the wappalyzer_fix branch 2 times, most recently from c4346c6 to 1775ffe Compare June 26, 2023 13:44
@codecov-commenter
Copy link

codecov-commenter commented Jun 26, 2023

Codecov Report

Merging #413 (c4346c6) into master (9c91017) will increase coverage by 0.09%.
The diff coverage is 100.00%.

❗ Current head c4346c6 differs from pull request most recent head fdf333d. Consider uploading reports for the commit fdf333d to get more accurate results

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

@@            Coverage Diff             @@
##           master     #413      +/-   ##
==========================================
+ Coverage   76.25%   76.34%   +0.09%     
==========================================
  Files          94       94              
  Lines        8531     8548      +17     
==========================================
+ Hits         6505     6526      +21     
+ Misses       2026     2022       -4     
Impacted Files Coverage Δ
wapitiCore/parsers/html_parser.py 92.52% <100.00%> (+0.19%) ⬆️
wapitiCore/wappalyzer/wappalyzer.py 95.28% <100.00%> (+1.47%) ⬆️

... and 1 file with indirect coverage changes

@RMI78 RMI78 marked this pull request as draft July 3, 2023 13:01
@RMI78
Copy link
Contributor Author

RMI78 commented Jul 3, 2023

As this MR modifies the code a lot, it requires being split into multiple smaller MR with further comments if needed.

@Slokilla
Copy link
Collaborator

As you splitted this merge request, I think you can close it.

#434
#441

@RMI78 RMI78 closed this Aug 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants