Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ThreatLevel scorer #459

Draft
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

regulartim
Copy link
Collaborator

@regulartim regulartim commented Feb 17, 2025

Description

Integrate a mathematical model (essentially a weighted sum) that aims to score IoCs according to how much of a threat they are.

Related issues

part of #36

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality).

Checklist

  • I have read and understood the rules about how to Contribute to this project.
  • The pull request is for the branch develop.
  • I have added documentation of the new features.
  • Linters (Black, Flake, Isort) gave 0 errors. If you have correctly installed pre-commit, it does these checks and adjustments on your behalf.
  • I have added tests for the feature/bug I solved. All the tests (new and old ones) gave 0 errors.
  • If changes were made to an existing model/serializer/view, the docs were updated and regenerated (check CONTRIBUTE.md).
  • If the GUI has been modified:
    • I have a provided a screenshot of the result in the PR.
    • I have created new frontend tests for the new component or updated existing ones.

Important Rules

  • If you miss to compile the Checklist properly, your PR won't be reviewed by the maintainers.
  • If your changes decrease the overall tests coverage (you will know after the Codecov CI job is done), you should add the required tests to fix the problem
  • Everytime you make changes to the PR and you think the work is done, you should explicitly ask for a review. After being reviewed and received a "change request", you should explicitly ask for a review again once you have made the requested changes.

@regulartim
Copy link
Collaborator Author

Open questions:

  • Is "threat level" an appropriate name?
  • The weights are arbitrarily chosen by me. Are there more sensible weights?
  • One part of the score is based on the Spamhaus ASN-DROP list. Is an enrichment like that something that we want in GreedyBear?
  • If yes, how to properly give them credit? (see: https://www.spamhaus.org/blocklists/do-not-route-or-peer/)

@mlodic
Copy link
Member

mlodic commented Feb 17, 2025

Is "threat level" an appropriate name?

yeah I mean, why not? :P

Anyway I think that the score that you created reflects more the probability that an IP address is performing "internet noise" than how much it is dangerous. In practice you want to proactively block those IP addresses if you are doing network protection but, very differently, if you are doing threat hunting activities to find the needle in the haystack you want to do the exact opposite which is filter those IP addresses and focus on the ones who performed very few and sparse interactions.

So, based on how the data want to be used, the meaning of data changes. This is one of the reasons why I called the original data "feeds" without using the word "threat".

So I would call it "noise" or something like that. That's also how other providers like Greynoise manage this information.

The weights are arbitrarily chosen by me. Are there more sensible weights?

I really like this stuff, I ❤️ magic numbers :) Feel free to experiment with anything you want.

I think that your metrics can reflects really well the "noise" if we decide to intend it like that. Another interesting score could be to try to extract the rarest ones. This could bring some additional FP but, at the same time, this can find really wonderful TP.

One part of the score is based on the Spamhaus ASN-DROP list. Is an enrichment like that something that we want in GreedyBear?

Yep, it's some sort of external validation that I think it helps a lot.

If yes, how to properly give them credit? (see: https://www.spamhaus.org/blocklists/do-not-route-or-peer/)

We can explicitly say it in the documentation. That's what I do in IntelOwl

@regulartim
Copy link
Collaborator Author

Thanks for your feedback. :)

Anyway I think that the score that you created reflects more the probability that an IP address is performing "internet noise" than how much it is dangerous.

When you look at it from that perspective, the random forest scores perform much better in that regard. They are specifically trained to "predict noise". Therefore this score would give us no added value. I will think about it and probably talk to Daniel.

I also asked Spamhaus, if they are OK with the integration of the ASN-DROP list but they did not reply yet.

@regulartim
Copy link
Collaborator Author

I will soon evaluate the different scorers with data from AbuseIPDB. If this scorer performs worse than the Random Forest models, I see no point in integrating it into GreedyBear. Until then, I would just like to leave this PR open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants