Skip to content

Commit a325909

Browse files
authored
Security scanner for Azure ML Compute Instance (Azure#1755)
* Security scanner for Azure ML Compute Instance detecting malware and vulnerabilities * Apply black formatting * Apply doc template * Address PR feedback * Update doc * Update README.md * Address PR feedback
1 parent f2119b7 commit a325909

13 files changed

+3155
-0
lines changed
+187
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,187 @@
1+
---
2+
page_type: sample
3+
languages:
4+
- bash
5+
- python
6+
products:
7+
- azure-machine-learning
8+
description: Sample setup script to scan Compute Instances for malware and security vulnerabilities
9+
---
10+
11+
# Compute Instance Security Scanner
12+
13+
[![license: MIT](https://img.shields.io/badge/License-MIT-purple.svg)](../../../LICENSE)
14+
15+
A security scanner for Azure ML [Compute Instances](https://learn.microsoft.com/en-us/azure/machine-learning/concept-compute-instance) reporting malware and vulnerabilities in OS and Python packages to [Azure Log Analytics](https://learn.microsoft.com/en-us/azure/azure-monitor/logs/log-analytics-overview). For details on the vulnerability management process for the Azure Machine Learning service, see [Vulnerability Management](https://learn.microsoft.com/azure/machine-learning/concept-vulnerability-management).
16+
17+
## Getting Started
18+
19+
> Prerequisite: an Azure ML workspace with a Compute Instance running and [diagnostic logs](https://learn.microsoft.com/en-us/azure/machine-learning/monitor-azure-machine-learning) streaming to Log Analytics. See further down for alternative setups.
20+
21+
1. **Upload the scanner to Azure ML**:
22+
1. Download [`amlsecscan.py`](amlsecscan.py)
23+
2. Open [Azure ML Studio](https://ml.azure.com/)
24+
3. Go to the Notebooks tab
25+
4. Upload the file into your user folder `/Users/{user_name}` (replacing `{user_name}` with your user alias)
26+
2. **Install the scanner**: open a terminal in Azure ML Notebooks and run `sudo ./amlsecscan.py install`
27+
3. **Run a scan**: in the terminal, run `sudo ./amlsecscan.py scan all` (this takes a few minutes)
28+
29+
## Assessments
30+
31+
The security scanner installs [ClamAV](https://www.clamav.net/) to report malware and [Trivy](https://github.com/aquasecurity/trivy) to report OS and Python vulnerabilities.
32+
33+
Security scans are scheduled via CRON jobs to run either daily around 5AM or 10 minutes after OS startup. A CRON job also emits heartbeats every 10 minutes. Scans have their CPU usage limited to 20% and are deprioritized by running at priority 19.
34+
35+
Trivy is configured to report vulnerabilities of severity either `HIGH` or `CRITICAL` for which a fix is available. The ClamAV realtime scanning is not enabled.
36+
37+
## Telemetry
38+
39+
In Log Analytics, the scanner reports hearbeats to table `AmlSecurityComputeHealth_CL` and assessment results to `AmlSecurityComputeAssessments_CL`.
40+
41+
Examples of Log Analytics [KQL](https://docs.microsoft.com/en-us/azure/data-explorer/kql-quick-reference) queries:
42+
- Recent heartbeats and scan status: `AmlSecurityComputeHealth_CL | top 100 by TimeGenerated desc`
43+
- Recent assessments: `AmlSecurityComputeAssessments_CL | top 100 by TimeGenerated desc`
44+
45+
## Installation
46+
47+
Irrespective of how the scanner is installed, the scanner script must first be copied to the Azure ML workspace:
48+
1. Download [amlsecscan.py](amlsecscan.py)
49+
2. Open [Azure ML Studio](https://ml.azure.com/)
50+
3. Go to the Notebooks tab
51+
4. Upload the file into your user folder `/Users/{user_name}` (replacing `{user_name}` with your user alias)
52+
53+
The scanner can be installed on both existing and new Compute Instances.
54+
55+
### Existing Compute Instances
56+
57+
To install the scanner using default settings, run the `install` command without parameters:
58+
```bash
59+
sudo ./amlsecscan.py install
60+
```
61+
The scanner will use the first Log Analytics workspace to which the Azure ML workspace streams [diagnostic logs](https://learn.microsoft.com/en-us/azure/machine-learning/monitor-azure-machine-learning).
62+
63+
If diagnostic logs are not enabled or an alternative Log Analytics workspace needs to be used, its ARM Resource ID can be specified on the command line using the `-la` parameter:
64+
```bash
65+
sudo ./amlsecscan.py install -la /subscriptions/{subscription_id}/resourceGroups/{resource_group_name}/providers/Microsoft.OperationalInsights/workspaces/{workspace_name}
66+
```
67+
68+
Another option, in case this configuration is reused multiple times, is to store the ARM Resource ID of the Log Analytics workspace in a JSON configuration file called
69+
`amlsecscan.json` in the same folder as `amlsecscan.py`:
70+
```json
71+
{
72+
"logAnalyticsResourceId": "/subscriptions/{subscription_id}/resourceGroups/{resource_group_name}/providers/Microsoft.OperationalInsights/workspaces/{workspace_name}"
73+
}
74+
```
75+
76+
> The ARM Resource ID of a Log Analytics workspace can be obtained by opening the [Azure Portal](https://portal.azure.com), navigating to the Log Analytics workspace, and copying this substring from the browser URL.
77+
78+
### New Compute Instances
79+
80+
#### Using Azure ML Studio
81+
82+
1. Create a file called `amlsecscan.sh` with content `sudo python3 amlsecscan.py install` .
83+
2. Open the [Compute Instance list](https://ml.azure.com/compute/list) in [Azure ML Studio](https://ml.azure.com)
84+
3. Click on the `+ New` button
85+
4. In the pop-up, select the machine name and size then click `Next: Advanced Settings`
86+
5. Toggle `Provision with setup script`, select `Local file`, and pick `amlsecscan.sh`
87+
6. Click on the `Create` button
88+
89+
#### Using an ARM Template
90+
91+
For automated deployments, the scanner can be installed as part of the ARM templates deploying the Compute Instances.
92+
93+
Start by creating an ARM template, say `deploy.json`, with a `setupScripts` section. In the example below replace `{azure_ml_workspace_name}`, `{azure_ml_compute_name}`, `{user_name}` with appropriate values. You may also want to adjust `location`/`VMSize`, add `schedules`, etc.
94+
```json
95+
{
96+
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
97+
"contentVersion": "1.0.0.0",
98+
"resources": [
99+
{
100+
"type": "Microsoft.MachineLearningServices/workspaces/computes",
101+
"name": "{azure_ml_workspace_name}/{azure_ml_compute_name}",
102+
"location": "westus2",
103+
"apiVersion": "2021-07-01",
104+
"properties": {
105+
"computeType": "ComputeInstance",
106+
"disableLocalAuth": true,
107+
"properties": {
108+
"VMSize": "Standard_F2s_v2",
109+
"applicationSharingPolicy": "Personal",
110+
"sshSettings": { "sshPublicAccess": "Disabled" },
111+
"setupScripts": {
112+
"scripts": {
113+
"creationScript": {
114+
"scriptSource":"inline",
115+
"scriptData":"[base64('sudo python3 {user_name}/amlsecscan.py install')]",
116+
"timeout": "10m"
117+
}
118+
}
119+
}
120+
}
121+
}
122+
}
123+
]
124+
}
125+
```
126+
127+
Deploy the ARM template using the [Az CLI](https://docs.microsoft.com/en-us/cli/azure/):
128+
```bash
129+
az login
130+
az account set --subscription {subscription_id}
131+
az deployment group create --resource-group {resource_group_name} --template-file deploy.json
132+
```
133+
134+
## Clean Up
135+
136+
To stop scan scheduling and remove the scanner, run `sudo ./amlsecscan.py uninstall` .
137+
138+
## Troubleshooting
139+
140+
### Ensure that telemetry is emitted
141+
142+
Check the scanner health in Log Analytics: `AmlSecurityComputeHealth_CL | top 100 by TimeGenerated desc` .
143+
144+
It should show heartbeats with `Type_s == 'Heartbeat'` every 10 minutes.
145+
146+
Scan status (`Type_s == 'ScanMalware'` or `Type_s == 'ScanVulnerabilities'`) should appear as pairs of log entries,
147+
one with `Status_s = 'Started'` followed by one with `Status_s = 'Succeeded'`. If `Status_s` is `Failed`, the `Details_s` field includes the error message.
148+
149+
If heartbeats are not present in Log Analytics, verify whether heartbeats can be emitted by running `./amlsecscan.py heartbeat` in a terminal on the Compute Instance.
150+
151+
### Ensure that the scanner is running
152+
153+
If logs are missing in Log Analytics, scans may not be running. Local logs are available in `syslog` for investigation:
154+
- Check `cron` logs: `sudo cat /var/log/syslog | grep -i cron`
155+
- Check scanner logs: `sudo cat /var/log/syslog | grep -i amlsecscan`
156+
157+
The CRON configuration is located at `/etc/cron.d/amlsecscan` .
158+
159+
Scans can be run manually with higher verbosity to get more details: `sudo /home/azureuser/.amlsecscan/run.sh scan all -ll DEBUG` .
160+
161+
### Investigate Compute Instance deployment failures
162+
163+
Compute Instance creation logs are stored under `/Logs/{azure_ml_compute_name}/creation`.
164+
They can also be found by selecting the Compute Instance in Azure ML Studio, clicking on the `Logs` tab, and opening the file `Setup > stdout`.
165+
166+
### Verify that malware gets reported
167+
168+
Malware detection can be verified by downloading a simulated malware file: `wget -O ~/eicar.com.txt https://secure.eicar.org/eicar.com.txt` .
169+
170+
The malware should be reported in Log Analytics: `AmlSecurityComputeAssessments_CL | where Type_s == 'Malware' | top 100 by TimeGenerated desc` .
171+
172+
### Verify that the scanner files are present
173+
174+
After installation, the following files should be present on the Compute Instance:
175+
176+
File|Description
177+
--|--
178+
`/home/azureuser/.amlsecscan/config.json`|Scanner configuration
179+
`/home/azureuser/.amlsecscan/run.sh`|Scanner CRON entry point
180+
`/etc/cron.d/amlsecscan`|Scanner CRON schedule
181+
182+
### Verify that resource-usage limits are in place
183+
184+
When running through CRON schedule, scans have their CPU usage limited to 20% and are deprioritized by running at priority 19.
185+
When running manually, CPU usage is not limited and priority is left as default.
186+
187+
After a scan is run, its `cgroups` configuration limiting resource usage can be found under `/sys/fs/cgroup/cpu/amlsecscan` .

0 commit comments

Comments
 (0)