-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python: Several standard library models #17454
Conversation
- empty models for now - `summaryModel` of `codeql/python-all` will be added to shortly.
- `quote` together with `re.compile` recover regex injection alerts on haiwen/seahub - `quote_plus` recovers the URL redirection alert on DemocracyClub/EveryElection - `unquote` recovers path injection alerts on `cloudera/hue` - it was tedious finding justifications for the rest..
There is already a model there so we add to that one. We did observe that this existing model was blocked by the external MaD model. This is concerning and needs to be cleared up.
Two of the generated summaries have been excluded: - ["re", "Member[split]", "Argument[0,pattern:]", "ReturnValue", "taint"] From the documentation, it is not clear why pattern should figure in the return value, as that is the part denoting split point and thus all those instances are filtered out. From the implementation Spit function: https://github.com/python/cpython/blob/3.12/Lib/re/__init__.py#L199 _compile function being called by split: https://github.com/python/cpython/blob/3.12/Lib/re/__init__.py#L280 We see that in case the pattern is already a compiled `Pattern`, it is returned directly from _compile and could thus be part of the return value from split. This is probably not possible to arrange for an attacker, and so an FP in practice. - ["urllib2", "Member[unquote]", "Argument[0,string:]", "ReturnValue", "taint"] urllib2 seems to be only in Python2 (e.g. https://docs.python.org/2.7/library/urllib2.html) and I cannot locate the function unquote.
Co-authored-by: Rasmus Wriedt Larsen <[email protected]>
It is not clear from the code how this could happen and I do not remember the path I saw, perhaps it was unreasonable.
It would be nice to simplify to a single sequence content type..
MaD row numbers in provenance column
This is MaD...
e0cec32
to
3434c38
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I already approved all these changes as part of other chains of PRs, right?
In that case, I only have one nitpick: With this PR we have two files for modeling stdlib with MaD (https://github.com/github/codeql/blob/main/python/ql/lib/semmle/python/frameworks/Stdlib.model.yml) so we should consolidate to only using one location. Personally I feel the obvious place to put modeling of a new library would be ql/lib/semmle/python/frameworks/<package>.model.yml
(such as the one for Asyncpg). So unless you strongly object, would you mind moving the new models to python/ql/lib/semmle/python/frameworks/Stdlib.model.yml
? 🙏
Co-authored-by: Rasmus Wriedt Larsen <[email protected]>
Nice. I had totally missed that we had started using MaD for StdLib. In that case I will put the new models there too. I will also adjust #17565. |
Co-authored-by: Rasmus Wriedt Larsen <[email protected]>
This PR is merging the long-lived branch containing the modelling work to stop extracting the standard library by default.
Pull Request checklist
All query authors
Internal query authors only
.ql
,.qll
, or.qhelp
files. See the documentation (internal access required).