This project is a POC to create a central database that takes big data from several open source threat-intelligence systems, combine this data with homegrown threat intelligence and analyse them using AI to % match against a list of indicators observed from within a given environment. The system must then return an approximate match to a user that feeds an ML engine by rating the match accordingly.
To provide cybersecurity teams the ability to create possible detections in the gaps of their security tool suite that are not able to be integrated, connecting these ecosystems and providing a threat intelligence capability to security systems that hold vast amounts of valuable datapoints that are unused.
The system from a high level perspective is divided into the following components:
- collectors- numerous crawlers of data feeds to consume data.
- normaliser - normalisers ingest data from bots in real time and parse them into a SCHEMA.
- analysers - consumes normalised data andcorrelates events and indicators using AI.
- workers - shift data through workflows, read and write from databases, maintain and update the system.
The systems tech stack likely changes as the POC progresses and capabilities and features are updated.
Compute (Digital Ocean Droplet)
DataFrames (NumPy, Pandas,)
Artificial Intelligence & ML (PyTorch)