Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defender export provider inadvertently downloads user data when retrieving a count of users without advanced auditing #1404

Open
tkol2022 opened this issue Nov 5, 2024 · 6 comments · May be fixed by #1406
Assignees
Labels
enhancement This issue or pull request will add new or improve existing functionality
Milestone

Comments

@tkol2022
Copy link
Collaborator

tkol2022 commented Nov 5, 2024

💡 Summary

Recently we noticed that when the tenant size becomes larger (e.g. 20,000 users) the Defender provider takes significantly longer to execute than previously observed and that prompted my code review which resulted in this issue.

The Defender export provider contains code with calls the server-side API associated with Get-MgBetaUser using a filter to get a count of users that do not have the advanced auditing feature enabled. The way the code is structured, the only piece of data that ends up being used is a result count value which is passed to the Rego policy checks. I manually ran the cmdlet with the same parameters used in the code against a test tenant and noticed that it not only provides data in the count variable, but the cmdlet also returns a dataset of all the users that match the filter with a comprehensive set of properties in Powershell objects. This leads me to believe that for large tenants, particularly ones that have a lot of users that are NOT assigned the advanced audit feature, this code may take some time to complete because it is downloading the dataset from the server. It could also be the case that downloading the data is fast and the query simply takes a long time to complete on the server side so that should be investigated.

I think this should be investigated and anything that we can do to improve the performance of this policy check for larger tenants should be considered. This is especially pertinent because many large tenants might not have the advanced audit feature turned on for a wide subset of users.

Here is the code that performs the count against the server using a Filter parameter.
Image

Observe below when I ran the same command manually against a test tenant that it returns the full dataset of matching users and their properties. Even though we are not using this information in the code because the cmdlet is being piped to Out-Null, the data is still being downloaded which could take time, not to mention a Powershell object is being created for every downloaded user with a host of properties in RAM.

Get-MgBetaUser -Filter "not assignedPlans/any(a:a/servicePlanId eq 2f442157-a11c-46b9-ae5b-6e39ff4e5849 and a/capabilityStatus eq 'Enabled')" -Count UsersWithoutAdvancedAuditCount -ConsistencyLevel eventual -All

Image

Each downloaded user gets its own Powershell object each containing 128 property keys and associated values:
Image

Motivation and context

Improving the performance of the tool is paramount to user satisfaction. Also with larger tenants this code seems to significantly slow down our automated functional tests when running Defender so if we correct this it will enable us to test features for larger tenants without performance impacts.

@tkol2022 tkol2022 added the enhancement This issue or pull request will add new or improve existing functionality label Nov 5, 2024
@tkol2022
Copy link
Collaborator Author

tkol2022 commented Nov 5, 2024

I marked this as high priority because I think we should investigate to ensure that the tool performs within a reasonable time threshold against larger tenants to contribute towards higher user satisfaction.

@gdasher
Copy link
Collaborator

gdasher commented Nov 6, 2024

Can you check whether adding $top=1 or some such improves performance by sending back less data?

@tkol2022
Copy link
Collaborator Author

tkol2022 commented Nov 6, 2024

Can you check whether adding $top=1 or some such improves performance by sending back less data?

Yes. Already got a fix working with that approach yesterday in MS Graph Explorer but now trying to replicate the fix using Cmdlets or Invoke-MgGraphRequest calls. Also running the profiler to ensure that indeed this is what slows down Defender with larger tenant counts and we don't chase the wrong root cause.

@tkol2022
Copy link
Collaborator Author

tkol2022 commented Nov 6, 2024

Size data

Based on some tests today I calculated the average downloaded size of a user record from the Get-MgBetaUser cmdlet.

1,144 users = 28,228,552 bytes
Each user's downloaded record is ~ 24,675 bytes

Therefore, if a tenant had 20k users
20,000 users = 493,500,000 or about 1/2 Gigabyte of data would be downloaded

@tkol2022
Copy link
Collaborator Author

tkol2022 commented Nov 6, 2024

Description of the fix

To correct the problem of downloading unnecessary data and to make the query run faster I made the following code change to the lines in ExportDefenderProvider.psm1.

I replaced "All = $true" with "Top = 1" in the query parameters when calling the Get-MgBetaUser cmdlet. This still runs the query in the back-end but only returns a single record which seems to make the operation lightning fast.

Image

I verified that the count that is created by the query is still correct in the JSON.

Image

In a test tenant that has 2,041 users, the new query takes 0.65 seconds.
Image

In a test tenant that has 2,041 users, the old query (without the fix) takes 10.76 seconds.
Image

@tkol2022
Copy link
Collaborator Author

tkol2022 commented Nov 6, 2024

How the fix works under the hood

If you were to run the query in MS Graph Explorer you can see how the fix works a little bit under the hood. Note that the fix requires adding a request header named ConsistencyLevel = eventual and a querystring of $count=true&$top=1. The count is returned in a variable named @odata.count inside the response body.

Image

@tkol2022 tkol2022 self-assigned this Nov 6, 2024
@tkol2022 tkol2022 added this to the Kraken milestone Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement This issue or pull request will add new or improve existing functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants