The differences between ML and Rule-Based systems is explained with the example of a spam filter.
Traditional Rule-Based systems are based on a set of characteristics (keywords, email length, etc.) that identify an email as spam or not. As spam emails keep changing over time the system needs to be upgraded making the process untractable due to the complexity of code maintenance as the system grows.
ML can be used to solve this problem with the following steps:
Emails from the user's spam folder and inbox gives examples of spam and non-spam.
Rules/characteristics from rule-based systems can be used as a starting point to define features for the ML model. The value of the target variable for each email can be defined based on where the email was obtained from (spam folder or inbox).
Each email can be encoded (converted) to the values of it's features and target.
A machine learning algorithm can then be applied to the encoded emails to build a model that can predict whether a new email is spam or not spam. The predictions are probabilities, and to make a decision it is necessary to define a threshold to classify emails as spam or not spam.
The notes are written by the community. If you see an error here, please create a PR with a fix. |