Centralize what flags as "malicious" #95

Robin5605 · 2023-07-11T18:21:21Z

The way the current system works is that the API returns all packages that have been scanned (it quite literally just dumps the results of the SQLAlchemy result) within the given constraints in the request. This then means that the consumer (the bot, in this case) has to filter through the response for the packages it wants to display (in this case, the bot will filter through packages with a score greater than or equal to 5).

It has been expressed numerous times that what constitutes as "malicious" should be in a centralized location (such as this API). There are a few ways of going about doing this, I'd like to get ideas on the table in this issue. A basic solution we could start off with is a field in constants.py that we can tweak (though worth nothing we would have to redeploy to tweak this). The API response would then return a list of packages scanned, and a list of malicious packages.

We can also discuss having the API itself dispatch a webhook to the appropriate channels instead of having the bot poll the API every 60 seconds. I'm leaning more towards this approach.

The text was updated successfully, but these errors were encountered:

import-pandas-as-numpy · 2023-07-11T18:27:16Z

This seems like a desirable endgoal-- albeit I'm not keen on redeploying just to fix the weighting.
I think ultimately, the thing we care most about is malicious packages, having a way to query this (and a way to query changes to this) can help inform our actual detection metrics.

I pitched the idea of an additional table to track malicious packages (to the outrage of everyone) but I do think that the idea of segregating detections from the global package list can help us aggregate data better for what exactly we're detecting and why. This'll be a lot more relevant when we introduce additional detection schemas such as the AST idea, whereby we'll want to know what context something was detected in, since both detection systems will be using YARA.

Do I think it's a perfect idea? Nah. But if we look at how we're trying to do the BigQuery dataset polling, where the bot is simply revolving every 'x' seconds and moving the query to encompass the last notifications, it seems like we might be able to apply that here. To clarify that because I think it's kind of confusing, being able to query our table every minute, or providing a callback for the cronjob to run that query as well, might be useful. (IE. When the cronjob runs and adds jobs to the package queue, it could also query the current state of the database using the last time it was ran and now, and report all the detections via a webhook.)

This simplifies the model at least in my head.

Robin5605 · 2023-07-11T18:33:50Z

Here's another idea that was proposed:
We have an endpoint that, when hit, will send an embed webhook to some configurable webhook URL with "all packages scanned in the last 60 seconds" We could then configure a Kubernetes cron job to run some configurable amount of time and hit that endpoint

As for malicious packages, we could simply dispatch those as they come in from clients (so it'd be real-time).

import-pandas-as-numpy · 2023-07-11T18:36:59Z

Here's another idea that was proposed: We have an endpoint that, when hit, will send an embed webhook to some configurable webhook URL with "all packages scanned in the last 60 seconds" We could then configure a Kubernetes cron job to run some configurable amount of time and hit that endpoint

As for malicious packages, we could simply dispatch those as they come in from clients (so it'd be real-time).

This doesn't make sense to me (dispatching from the clients) as we'd lose the premise of this-- the score threshold. Unless we're pushing it down to the clients themselves through get job. Which, ehhhh... I mean I guess?

Robin5605 · 2023-07-11T18:40:37Z

This doesn't make sense to me (dispatching from the clients) as we'd lose the premise of this-- the score threshold.

The way this would work is clients would send their result up to the API as usual (including the score, the rules matched, etc). If this sent score exceeds some threshold we've set server-side, we'd trigger a webhook (from the server). This causes no change in client behaviour.

Robin5605 · 2023-07-11T18:49:35Z

albeit I'm not keen on redeploying just to fix the weighting.

As for this, we could probably just save it as in the database, and have an endpoint to tweak it. Perhaps a function in the bot to hit this endpoint so we can tweak the weights without having to make an HTTP request ourselves

import-pandas-as-numpy · 2023-07-12T17:57:01Z

Both sound reasonable to me, thanks for clarifying. You're cleared hot for implementation unless anyone has some other nits.

Robin5605 · 2023-07-12T18:37:05Z

On further thought this might be more difficult than anticipated to do within the API 🤔
That is, as long as we keep the interacting "reporting" functionality
We'd need to set up interactions over HTTP with Discord, then create endpoints to handle all of the stuff. Might be easier to just keep using the bot, honestly. I'll leave this up for discussion again for now if someone has any better ideas.

Robin5605 · 2023-07-24T21:45:13Z

I propose a new endpoint, perhaps something along the lines of GET /scans that will return a list of all packages scanned, and a list of packages that were malicious.

Robin5605 · 2024-06-07T05:54:13Z

Going to close this as succeeded by #260

Robin5605 self-assigned this Jul 12, 2023

Robin5605 mentioned this issue Jul 24, 2023

Greylist check #61

Open

Robin5605 mentioned this issue Jul 25, 2023

Centralize malicious #132

Draft

Robin5605 closed this as completed Jun 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Centralize what flags as "malicious" #95

Centralize what flags as "malicious" #95

Robin5605 commented Jul 11, 2023

import-pandas-as-numpy commented Jul 11, 2023 •

edited

Loading

Robin5605 commented Jul 11, 2023

import-pandas-as-numpy commented Jul 11, 2023

Robin5605 commented Jul 11, 2023 •

edited

Loading

Robin5605 commented Jul 11, 2023

import-pandas-as-numpy commented Jul 12, 2023

Robin5605 commented Jul 12, 2023

Robin5605 commented Jul 24, 2023

Robin5605 commented Jun 7, 2024

Centralize what flags as "malicious" #95

Centralize what flags as "malicious" #95

Comments

Robin5605 commented Jul 11, 2023

import-pandas-as-numpy commented Jul 11, 2023 • edited Loading

Robin5605 commented Jul 11, 2023

import-pandas-as-numpy commented Jul 11, 2023

Robin5605 commented Jul 11, 2023 • edited Loading

Robin5605 commented Jul 11, 2023

import-pandas-as-numpy commented Jul 12, 2023

Robin5605 commented Jul 12, 2023

Robin5605 commented Jul 24, 2023

Robin5605 commented Jun 7, 2024

import-pandas-as-numpy commented Jul 11, 2023 •

edited

Loading

Robin5605 commented Jul 11, 2023 •

edited

Loading