Add ThreatLevel scorer #459

regulartim · 2025-02-17T17:04:28Z

Description

Integrate a mathematical model (essentially a weighted sum) that aims to score IoCs according to how much of a threat they are.

Related issues

part of #36

Type of change

Please delete options that are not relevant.

New feature (non-breaking change which adds functionality).

Checklist

I have read and understood the rules about how to Contribute to this project.
The pull request is for the branch develop.
I have added documentation of the new features.
Linters (Black, Flake, Isort) gave 0 errors. If you have correctly installed pre-commit, it does these checks and adjustments on your behalf.
I have added tests for the feature/bug I solved. All the tests (new and old ones) gave 0 errors.
If changes were made to an existing model/serializer/view, the docs were updated and regenerated (check CONTRIBUTE.md).
If the GUI has been modified:
- I have a provided a screenshot of the result in the PR.
- I have created new frontend tests for the new component or updated existing ones.

Important Rules

If you miss to compile the Checklist properly, your PR won't be reviewed by the maintainers.
If your changes decrease the overall tests coverage (you will know after the Codecov CI job is done), you should add the required tests to fix the problem
Everytime you make changes to the PR and you think the work is done, you should explicitly ask for a review. After being reviewed and received a "change request", you should explicitly ask for a review again once you have made the requested changes.

regulartim · 2025-02-17T17:04:39Z

Open questions:

Is "threat level" an appropriate name?
The weights are arbitrarily chosen by me. Are there more sensible weights?
One part of the score is based on the Spamhaus ASN-DROP list. Is an enrichment like that something that we want in GreedyBear?
If yes, how to properly give them credit? (see: https://www.spamhaus.org/blocklists/do-not-route-or-peer/)

mlodic · 2025-02-17T17:33:16Z

Is "threat level" an appropriate name?

yeah I mean, why not? :P

Anyway I think that the score that you created reflects more the probability that an IP address is performing "internet noise" than how much it is dangerous. In practice you want to proactively block those IP addresses if you are doing network protection but, very differently, if you are doing threat hunting activities to find the needle in the haystack you want to do the exact opposite which is filter those IP addresses and focus on the ones who performed very few and sparse interactions.

So, based on how the data want to be used, the meaning of data changes. This is one of the reasons why I called the original data "feeds" without using the word "threat".

So I would call it "noise" or something like that. That's also how other providers like Greynoise manage this information.

The weights are arbitrarily chosen by me. Are there more sensible weights?

I really like this stuff, I ❤️ magic numbers :) Feel free to experiment with anything you want.

I think that your metrics can reflects really well the "noise" if we decide to intend it like that. Another interesting score could be to try to extract the rarest ones. This could bring some additional FP but, at the same time, this can find really wonderful TP.

One part of the score is based on the Spamhaus ASN-DROP list. Is an enrichment like that something that we want in GreedyBear?

Yep, it's some sort of external validation that I think it helps a lot.

If yes, how to properly give them credit? (see: https://www.spamhaus.org/blocklists/do-not-route-or-peer/)

We can explicitly say it in the documentation. That's what I do in IntelOwl

regulartim · 2025-02-17T17:41:38Z

Thanks for your feedback. :)

Anyway I think that the score that you created reflects more the probability that an IP address is performing "internet noise" than how much it is dangerous.

When you look at it from that perspective, the random forest scores perform much better in that regard. They are specifically trained to "predict noise". Therefore this score would give us no added value. I will think about it and probably talk to Daniel.

I also asked Spamhaus, if they are OK with the integration of the ASN-DROP list but they did not reply yet.

regulartim · 2025-02-21T07:04:18Z

I will soon evaluate the different scorers with data from AbuseIPDB. If this scorer performs worse than the Random Forest models, I see no point in integrating it into GreedyBear. Until then, I would just like to leave this PR open.

add ThreatLevel scorer

967fc05

regulartim requested a review from mlodic February 17, 2025 17:05

regulartim mentioned this pull request Feb 17, 2025

Deliver scores in the Feeds API #460

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ThreatLevel scorer #459

Add ThreatLevel scorer #459

regulartim commented Feb 17, 2025 •

edited

Loading

regulartim commented Feb 17, 2025

mlodic commented Feb 17, 2025

regulartim commented Feb 17, 2025

regulartim commented Feb 21, 2025

Add ThreatLevel scorer #459

Are you sure you want to change the base?

Add ThreatLevel scorer #459

Conversation

regulartim commented Feb 17, 2025 • edited Loading

Description

Related issues

Type of change

Checklist

Important Rules

regulartim commented Feb 17, 2025

mlodic commented Feb 17, 2025

regulartim commented Feb 17, 2025

regulartim commented Feb 21, 2025

regulartim commented Feb 17, 2025 •

edited

Loading