Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Go: template/text.Template execution methods: support reading arbitrary content #17701

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

smowton
Copy link
Contributor

@smowton smowton commented Oct 8, 2024

Add a mechanism for specifying dataflow nodes that ought read taint through any access path, and use it to make template/text functions read taint out of fields.

Note this PR currently has #18355 applied because it appears to significantly improve performance for some outlier repositories. This shouldn't be merged until that PR has been dealt with.

@smowton smowton requested a review from a team as a code owner October 8, 2024 16:18
Comment on lines 159 to 160
or
defaultImplicitTaintReadGlobal(node, c)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also needs to be done in the corresponding file in tainttracking2.

Comment on lines 81 to 82
any(ImplicitFieldReadNode ifrn).shouldImplicitlyReadAllFields(node) and
getElementTypeIncludingFields*(node.getType()) = containerType
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be slightly easier to understand if you replace this with or defaultImplicitTaintReadGlobal(node, c). (In theory you could also remove these lines and the overall functionality would be the same, but I think best to not do that.)

go/ql/lib/semmle/go/frameworks/stdlib/TextTemplate.qll Fixed Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think this new test folder is best in go/ql/test/library-tests/semmle/go/frameworks? All the other folders in there are for tests of particular libraries. What about in go/ql/test/library-tests/semmle/go/dataflow?

@smowton smowton force-pushed the smowton/feature/read-fields-before-executetemplate branch from c0cbb4a to 63ff0cb Compare October 17, 2024 14:47
Copy link
Contributor

github-actions bot commented Oct 17, 2024

⚠️ The head of this PR and the base branch were compared for differences in the framework coverage reports. The generated reports are available in the artifacts of this workflow run. The differences will be picked up by the nightly job after the PR gets merged.

Click to show differences in coverage

go

Generated file changes for go

  • Changes to framework-coverage-go.rst:
-    `Standard library <https://pkg.go.dev/std>`_,"````, ``archive/*``, ``bufio``, ``bytes``, ``cmp``, ``compress/*``, ``container/*``, ``context``, ``crypto``, ``crypto/*``, ``database/*``, ``debug/*``, ``embed``, ``encoding``, ``encoding/*``, ``errors``, ``expvar``, ``flag``, ``fmt``, ``go/*``, ``hash``, ``hash/*``, ``html``, ``html/*``, ``image``, ``image/*``, ``index/*``, ``io``, ``io/*``, ``log``, ``log/*``, ``maps``, ``math``, ``math/*``, ``mime``, ``mime/*``, ``net``, ``net/*``, ``os``, ``os/*``, ``path``, ``path/*``, ``plugin``, ``reflect``, ``reflect/*``, ``regexp``, ``regexp/*``, ``slices``, ``sort``, ``strconv``, ``strings``, ``sync``, ``sync/*``, ``syscall``, ``syscall/*``, ``testing``, ``testing/*``, ``text/*``, ``time``, ``time/*``, ``unicode``, ``unicode/*``, ``unsafe``",52,605,104
+    `Standard library <https://pkg.go.dev/std>`_,"````, ``archive/*``, ``bufio``, ``bytes``, ``cmp``, ``compress/*``, ``container/*``, ``context``, ``crypto``, ``crypto/*``, ``database/*``, ``debug/*``, ``embed``, ``encoding``, ``encoding/*``, ``errors``, ``expvar``, ``flag``, ``fmt``, ``go/*``, ``hash``, ``hash/*``, ``html``, ``html/*``, ``image``, ``image/*``, ``index/*``, ``io``, ``io/*``, ``log``, ``log/*``, ``maps``, ``math``, ``math/*``, ``mime``, ``mime/*``, ``net``, ``net/*``, ``os``, ``os/*``, ``path``, ``path/*``, ``plugin``, ``reflect``, ``reflect/*``, ``regexp``, ``regexp/*``, ``slices``, ``sort``, ``strconv``, ``strings``, ``sync``, ``sync/*``, ``syscall``, ``syscall/*``, ``testing``, ``testing/*``, ``text/*``, ``time``, ``time/*``, ``unicode``, ``unicode/*``, ``unsafe``",52,603,104
-    Totals,,459,943,1532
+    Totals,,459,941,1532
  • Changes to framework-coverage-go.csv:
- text/template,,,6,,,,,,,,,,,,,,,,,,,,,,,6,
+ text/template,,,4,,,,,,,,,,,,,,,,,,,,,,,4,

@smowton smowton force-pushed the smowton/feature/read-fields-before-executetemplate branch from fe808f7 to d822d14 Compare December 16, 2024 14:21
@owen-mc
Copy link
Contributor

owen-mc commented Dec 17, 2024

@smowton What is the status of this? Is it waiting for review, or are you still working on it?

@smowton
Copy link
Contributor Author

smowton commented Dec 17, 2024

@owen-mc thanks to my currently blurry work situation I found time to fix it up. Currently running QA; please do review.

@smowton
Copy link
Contributor Author

smowton commented Dec 19, 2024

QA results: new TPs from around 20 repos. Found and fixed an FP based on defining an interface that could be text/template.Template or could be html/template.Template (the latter is sanitizing, and in practice text/template is not used).

Significant slowdowns in two, which so far as I can see are simply caused by significantly more flow being possible when text-template instantiation is able to pull taint down from fields to top level. I propose to go ahead on grounds that the vast majority of repositories suffer no consequences, and that is real flow (though we could iterate to try to find heuristics to prevent excessively-expensive exploration if necessary).

smowton added a commit to smowton/codeql that referenced this pull request Dec 20, 2024
Observations:
* revFlowThrough can be much larger than the other reverse-flow predicates, presumably when there are many different innerReturnAps.
* It is only ever used in conjunction with flowThroughIntoCall, which can therefore be pushed in, and several of its parameters can thereby be dropped in exchange for exposing `arg`.
* `revFlowThroughArg` can then be trivially inlined.

Result: on repository `go-gitea/gitea` with PR github#17701 producing a wider selection of access paths than are seen on `main`, `revFlowThrough` drops in size from ~120m tuples to ~4m, and the runtime of the reverse-flow computation for dataflow stage 4 goes from dominating the forward-flow cost to relatively insignificant. Overall runtime falls from 3 minutes to 2 with substantial ram available, and presumably falls much more under GHA-style memory pressure.
smowton added a commit to smowton/codeql that referenced this pull request Dec 20, 2024
Observations:
* revFlowThrough can be much larger than the other reverse-flow predicates, presumably when there are many different innerReturnAps.
* It is only ever used in conjunction with flowThroughIntoCall, which can therefore be pushed in, and several of its parameters can thereby be dropped in exchange for exposing `arg`.
* `revFlowThroughArg` can then be trivially inlined.

Result: on repository `go-gitea/gitea` with PR github#17701 producing a wider selection of access paths than are seen on `main`, `revFlowThrough` drops in size from ~120m tuples to ~4m, and the runtime of the reverse-flow computation for dataflow stage 4 goes from dominating the forward-flow cost to relatively insignificant. Overall runtime falls from 3 minutes to 2 with substantial ram available, and presumably falls much more under GHA-style memory pressure.
Copy link
Contributor

@owen-mc owen-mc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking in good shape. I assume that we should wait for #18355 to be merged so that we can avoid the two significant slowdowns you found?

Otherwise it's too easy to define a common interface to both text/template, which doesn't sanitize, and html/template, which does.
Observations:
* revFlowThrough can be much larger than the other reverse-flow predicates, presumably when there are many different innerReturnAps.
* It is only ever used in conjunction with flowThroughIntoCall, which can therefore be pushed in, and several of its parameters can thereby be dropped in exchange for exposing `arg`.
* `revFlowThroughArg` can then be trivially inlined.

Result: on repository `go-gitea/gitea` with PR github#17701 producing a wider selection of access paths than are seen on `main`, `revFlowThrough` drops in size from ~120m tuples to ~4m, and the runtime of the reverse-flow computation for dataflow stage 4 goes from dominating the forward-flow cost to relatively insignificant. Overall runtime falls from 3 minutes to 2 with substantial ram available, and presumably falls much more under GHA-style memory pressure.
@smowton smowton force-pushed the smowton/feature/read-fields-before-executetemplate branch from 0988340 to 2580ea2 Compare January 10, 2025 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants