Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: Add Gradio models #16135

Merged
merged 13 commits into from
May 16, 2024
1 change: 1 addition & 0 deletions docs/codeql/reusables/supported-frameworks.rst
Original file line number Diff line number Diff line change
Expand Up @@ -202,6 +202,7 @@ and the CodeQL library pack ``codeql/python-all`` (`changelog <https://github.co
Flask-Admin, Web framework
Tornado, Web framework
Twisted, Web framework
Gradio, Web framework
starlette, Asynchronous Server Gateway Interface (ASGI)
ldap3, Lightweight Directory Access Protocol (LDAP)
python-ldap, Lightweight Directory Access Protocol (LDAP)
Expand Down
1 change: 1 addition & 0 deletions python/ql/lib/semmle/python/Frameworks.qll
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ private import semmle.python.frameworks.FastApi
private import semmle.python.frameworks.Flask
private import semmle.python.frameworks.FlaskAdmin
private import semmle.python.frameworks.FlaskSqlAlchemy
private import semmle.python.frameworks.Gradio
private import semmle.python.frameworks.Httpx
private import semmle.python.frameworks.Idna
private import semmle.python.frameworks.Invoke
Expand Down
123 changes: 123 additions & 0 deletions python/ql/lib/semmle/python/frameworks/Gradio.qll
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
/**
* Provides classes modeling security-relevant aspects of the `gradio` PyPI package.
* See https://pypi.org/project/gradio/.
*/

import python
import semmle.python.dataflow.new.RemoteFlowSources
import semmle.python.dataflow.new.TaintTracking
import semmle.python.ApiGraphs

/**
* Provides models for the `gradio` PyPI package.
* See https://pypi.org/project/gradio/.
*/
module Gradio {
/**
* The event handlers, Interface and gradio.ChatInterface classes, which take untrusted data.
*/
class GradioInput extends API::CallNode {
sylwia-budzynska marked this conversation as resolved.
Show resolved Hide resolved
GradioInput() {
this =
API::moduleImport("gradio")
.getMember([
"Button", "Textbox", "UploadButton", "Slider", "JSON", "HTML", "Markdown", "File",
"AnnotatedImage", "Audio", "BarPlot", "Chatbot", "Checkbox", "CheckboxGroup",
"ClearButton", "Code", "ColorPicker", "Dataframe", "Dataset", "DownloadButton",
"Dropdown", "DuplicateButton", "FileExplorer", "Gallery", "HighlightedText",
"Image", "ImageEditor", "Label", "LinePlot", "LoginButton", "LogoutButton",
"Model3D", "Number", "ParamViewer", "Plot", "Radio", "ScatterPlot", "SimpleImage",
Comment on lines +24 to +29
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all these vulnerable? I quickly checked a few of them and it looks like:

  • HighlightedText only allows the user to specify substrings of a given text
  • ColorPicker only passes on hex strings
  • AnnotatedImage does not accept user input at all

We should probably only include actually vulnerable inputs here in order to avoid the query getting noisy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s the event listeners that take untrusted user input, so, for example, gradio.AnnotatedImage.select(fn, inputs, outputs) (see docs) and that's what we model here.

To clarify - with the above models, we look for any event listeners of the listed classes. For example, gr.Button has only one, click() (see this example, which asks for a user's name and displays it). Side note that if an event listener does not exist for a given class, CodeQL doesn't (can't) match it (for example there is no gr.Button.select()). I found this way of modeling to be the most succinct and it doesn't slow performance, but I'm open to feedback around it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine to model situations that will never match, I was more worried about false positives. But on reflection, I think this is fine to start; we can refine it if we actually encounter any problems..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've already found a few vulnerabilities using these models, including in stable-diffusion-webui (120k stars):
https://securitylab.github.com/advisories/GHSL-2024-010_stable-diffusion-webui/
https://securitylab.github.com/advisories/GHSL-2024-019_GHSL-2024-024_kohya_ss/
To be fair, I tested the AnnotatedImage.select() event listener (and several other event listeners on the list), and it does take user input, but it's not very popular to use, so I haven't actually seen vulnerabilities around it. Most of the vulnerabilities happen with Button.click() (including the 7 vulns I linked to above). I think there shouldn't be many false positives here, but I'll be happy to help if anything needs refining.

"State", "Video"
])
.getReturn()
.getMember([
"change", "input", "click", "submit", "edit", "clear", "play", "pause", "stop",
"end", "start_recording", "pause_recording", "stop_recording", "focus", "blur",
"upload", "release", "select", "stream", "like", "load", "key_up",
])
.getACall()
or
this = API::moduleImport("gradio").getMember(["Interface", "ChatInterface"]).getACall()
}
}

/**
* The `inputs` parameters in Gradio event handlers, that are lists and are sources of untrusted data.
* This model allows tracking each element list back to source, f.ex. `gr.Textbox(...)`.
*/
class GradioInputList extends RemoteFlowSource::Range {
sylwia-budzynska marked this conversation as resolved.
Show resolved Hide resolved
GradioInputList() {
exists(GradioInput call |
// limit only to lists of parameters given to `inputs`.
(
(
call.getKeywordParameter("inputs").asSink().asCfgNode() instanceof ListNode
or
call.getParameter(1).asSink().asCfgNode() instanceof ListNode
) and
(
this = call.getKeywordParameter("inputs").getASubscript().getAValueReachingSink()
or
this = call.getParameter(1).getASubscript().getAValueReachingSink()
)
)
)
}

override string getSourceType() { result = "Gradio untrusted input" }
}

/**
* The `inputs` parameters in Gradio event handlers, that are not lists and are sources of untrusted data.
*/
class GradioInputParameter extends RemoteFlowSource::Range {
sylwia-budzynska marked this conversation as resolved.
Show resolved Hide resolved
GradioInputParameter() {
exists(GradioInput call |
this = call.getParameter(0, "fn").getParameter(_).asSource() and
// exclude lists of parameters given to `inputs`
not call.getKeywordParameter("inputs").asSink().asCfgNode() instanceof ListNode and
not call.getParameter(1).asSink().asCfgNode() instanceof ListNode
)
}

override string getSourceType() { result = "Gradio untrusted input" }
}

/**
* The `inputs` parameters in Gradio decorators to event handlers, that are sources of untrusted data.
*/
class GradioInputDecorator extends RemoteFlowSource::Range {
sylwia-budzynska marked this conversation as resolved.
Show resolved Hide resolved
GradioInputDecorator() {
exists(GradioInput call |
this = call.getReturn().getACall().getParameter(0).getParameter(_).asSource()
)
}

override string getSourceType() { result = "Gradio untrusted input" }
}

/**
* Extra taint propagation for tracking `inputs` parameters in Gradio event handlers, that are lists.
*/
private class ListTaintStep extends TaintTracking::AdditionalTaintStep {
override predicate step(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {
exists(GradioInput node |
// handle cases where there are multiple arguments passed as a list to `inputs`
(
(
node.getKeywordParameter("inputs").asSink().asCfgNode() instanceof ListNode
or
node.getParameter(1).asSink().asCfgNode() instanceof ListNode
) and
exists(int i | nodeTo = node.getParameter(0, "fn").getParameter(i).asSource() |
nodeFrom.asCfgNode() =
node.getKeywordParameter("inputs").asSink().asCfgNode().(ListNode).getElement(i)
or
nodeFrom.asCfgNode() =
node.getParameter(1).asSink().asCfgNode().(ListNode).getElement(i)
)
)
)
}
}
}
4 changes: 4 additions & 0 deletions python/ql/src/change-notes/2024-04-05-gradio-models.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
---
category: minorAnalysis
---
* Added models of `gradio` PyPI package.
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
testFailures
failures
33 changes: 33 additions & 0 deletions python/ql/test/library-tests/frameworks/gradio/source_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
import gradio as gr


with gr.Blocks() as demo:
name = gr.Textbox(label="Name")
output = gr.Textbox(label="Output Box")
# static block - not used as a source
static_block = gr.HTML("""
<div style='height: 100px; width: 800px; background-color: pink;'></div>
""")
greet_btn = gr.Button("Hello")

# decorator
@greet_btn.click(inputs=name, outputs=output)
def greet(name): # $ source=name
return "Hello " + name + "!"

# `click` event handler with keyword arguments
def greet1(name): # $ source=name
return "Hello " + name + "!"

greet1_btn = gr.Button("Hello")
greet1_btn.click(fn=greet1, inputs=name, outputs=output, api_name="greet")

# `click` event handler with positional arguments
def greet2(name): # $ source=name
return "Hello " + name + "!"

greet2_btn = gr.Button("Hello")
greet2_btn.click(fn=greet2, inputs=name, outputs=output, api_name="greet")


demo.launch()
20 changes: 20 additions & 0 deletions python/ql/test/library-tests/frameworks/gradio/source_test.ql
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
import python
import semmle.python.dataflow.new.RemoteFlowSources
import TestUtilities.InlineExpectationsTest
private import semmle.python.dataflow.new.internal.PrintNode

module SourceTest implements TestSig {
string getARelevantTag() { result = "source" }

predicate hasActualResult(Location location, string element, string tag, string value) {
exists(location.getFile().getRelativePath()) and
exists(RemoteFlowSource rfs |
location = rfs.getLocation() and
element = rfs.toString() and
value = prettyNode(rfs) and
tag = "source"
)
}
}

import MakeTest<SourceTest>
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
edges
| taint_step_test.py:5:5:5:8 | ControlFlowNode for path | taint_step_test.py:19:43:19:46 | ControlFlowNode for path | provenance | |
| taint_step_test.py:5:12:5:35 | ControlFlowNode for Attribute() | taint_step_test.py:5:5:5:8 | ControlFlowNode for path | provenance | |
| taint_step_test.py:6:5:6:8 | ControlFlowNode for file | taint_step_test.py:19:48:19:51 | ControlFlowNode for file | provenance | |
| taint_step_test.py:6:12:6:35 | ControlFlowNode for Attribute() | taint_step_test.py:6:5:6:8 | ControlFlowNode for file | provenance | |
| taint_step_test.py:11:18:11:21 | ControlFlowNode for path | taint_step_test.py:12:9:12:16 | ControlFlowNode for filepath | provenance | |
| taint_step_test.py:11:18:11:21 | ControlFlowNode for path | taint_step_test.py:12:9:12:16 | ControlFlowNode for filepath | provenance | AdditionalTaintStep |
| taint_step_test.py:11:24:11:27 | ControlFlowNode for file | taint_step_test.py:12:9:12:16 | ControlFlowNode for filepath | provenance | AdditionalTaintStep |
| taint_step_test.py:12:9:12:16 | ControlFlowNode for filepath | taint_step_test.py:13:19:13:26 | ControlFlowNode for filepath | provenance | |
| taint_step_test.py:19:43:19:46 | ControlFlowNode for path | taint_step_test.py:11:18:11:21 | ControlFlowNode for path | provenance | AdditionalTaintStep |
| taint_step_test.py:19:48:19:51 | ControlFlowNode for file | taint_step_test.py:11:24:11:27 | ControlFlowNode for file | provenance | AdditionalTaintStep |
nodes
| taint_step_test.py:5:5:5:8 | ControlFlowNode for path | semmle.label | ControlFlowNode for path |
| taint_step_test.py:5:12:5:35 | ControlFlowNode for Attribute() | semmle.label | ControlFlowNode for Attribute() |
| taint_step_test.py:6:5:6:8 | ControlFlowNode for file | semmle.label | ControlFlowNode for file |
| taint_step_test.py:6:12:6:35 | ControlFlowNode for Attribute() | semmle.label | ControlFlowNode for Attribute() |
| taint_step_test.py:11:18:11:21 | ControlFlowNode for path | semmle.label | ControlFlowNode for path |
| taint_step_test.py:11:24:11:27 | ControlFlowNode for file | semmle.label | ControlFlowNode for file |
| taint_step_test.py:12:9:12:16 | ControlFlowNode for filepath | semmle.label | ControlFlowNode for filepath |
| taint_step_test.py:13:19:13:26 | ControlFlowNode for filepath | semmle.label | ControlFlowNode for filepath |
| taint_step_test.py:19:43:19:46 | ControlFlowNode for path | semmle.label | ControlFlowNode for path |
| taint_step_test.py:19:48:19:51 | ControlFlowNode for file | semmle.label | ControlFlowNode for file |
subpaths
#select
| taint_step_test.py:13:19:13:26 | ControlFlowNode for filepath | taint_step_test.py:5:12:5:35 | ControlFlowNode for Attribute() | taint_step_test.py:13:19:13:26 | ControlFlowNode for filepath | This path depends on a $@. | taint_step_test.py:5:12:5:35 | ControlFlowNode for Attribute() | user-provided value |
| taint_step_test.py:13:19:13:26 | ControlFlowNode for filepath | taint_step_test.py:6:12:6:35 | ControlFlowNode for Attribute() | taint_step_test.py:13:19:13:26 | ControlFlowNode for filepath | This path depends on a $@. | taint_step_test.py:6:12:6:35 | ControlFlowNode for Attribute() | user-provided value |
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
import gradio as gr
import os

with gr.Blocks() as demo:
path = gr.Textbox(label="Path") # $ source=gr.Textbox(..)
file = gr.Textbox(label="File") # $ source=gr.Textbox(..)
output = gr.Textbox(label="Output Box")


# path injection sink
def fileread(path, file):
filepath = os.path.join(path, file)
with open(filepath, "r") as f:
return f.read()


# `click` event handler with `inputs` containing a list
greet1_btn = gr.Button("Path for the file to display")
greet1_btn.click(fn=fileread, inputs=[path,file], outputs=output, api_name="fileread")


demo.launch()
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Security/CWE-022/PathInjection.ql
Loading