Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent addition of duplicates in process_entity_whitelist #144

Open
boolean5 opened this issue Jul 1, 2020 · 0 comments
Open

Prevent addition of duplicates in process_entity_whitelist #144

boolean5 opened this issue Jul 1, 2020 · 0 comments
Assignees

Comments

@boolean5
Copy link
Contributor

boolean5 commented Jul 1, 2020

In process_entity_whitelist, the urls set is updated but never checked against, so duplicates may be added. Shouldn't we check if a url is already in this set before adding it to the list?

Something else that I noticed is that the if prop == res check (https://github.com/mozilla-services/shavar-list-creation/blob/master/lists2safebrowsing.py#L348) happens before the domains corresponding to prop and res are canonicalized. This means that if prop is www.example...com/// and res is www.example.com, www.example.com/?resource=www.example.com will be added to the list. It seems to me that this defeats the purpose of the if prop == res check.

Another observation is that because canonicalization is not applied to query parameters, in case prop is www.example.com and res is www.example...com/// the url added to the list will be www.example.com/?resource=www.example...com///. Is that intentional?

@boolean5 boolean5 self-assigned this Jul 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant