v2.1.16.0
What's new in RuleKit version 2.1.16.0?
1. RuleKit and RapidMiner part ways 💔
RuleKit has been using the RapidMiner Java API for various tasks, such as loading data, measuring model performance, etc., since its beginning. From major version 2 RuleKit finally parted ways with RuleMiner. This is mainly due to the recent work of our contributors: Wojciech Górka and Mateusz Kalisch.
This change brings many benefits and other changes such as:
- a huge reduction in the jar file of the RuleKit java package (from 131MB to 40.9MB).
- now the jar file is small enough to fit into the Python package distribution, which means there is no longer a need to download it in an extra step.
Although the license has remained the same (GNU AGPL-3.0 license), for commercial projects that require the ability to distribute RuleKit code as part of a program that cannot be distributed under the AGPL, it may be possible to obtain an appropriate license from the authors. Feel free to contact us!
2. ⚠️ BREAKING CHANGE min_rule_covered
algorithm parameter was removed
Up to this version this parameter was marked as deprecated and its usage only resulted in warning. Now it was completely removed which might be a breaking change.
3. ⚠️ BREAKING CHANGE The classification metric negative_voting_conflicts
is no longer available
As of this version, the metric returned from the RuleClassifier.predict
method with return_metrics=True
no longer includes the negative_voting_conflicts
metric.
In fact, there was no way to calculate this metric without access to the true values of the labels. The predict
method does not take labels as an argument, so previous results for this metric were unfortunately incorrect.
If you really need to calculate this specific metrics you still can but it requires more effort to do so. Here is an example how you can achieve it using currently available API:
import re
from collections import defaultdict
import numpy as np
from sklearn.datasets import load_iris
from rulekit.classification import RuleClassifier
X, y = load_iris(return_X_y=True)
clf = RuleClassifier()
clf.fit(X, y)
prediction: np.ndarray = clf.predict(X)
# 1. Group rules by decision class based on their conclusions
rule_decision_class_regex = re.compile("^.+THEN .+ = {(.+)}$")
grouped_rules: dict[str, list[int]] = defaultdict(lambda: [])
for i, rule in enumerate(clf.model.rules):
rule_decision_class: str = rule_decision_class_regex.search(
str(rule)).group(1)
grouped_rules[rule_decision_class].append(i)
# 2. Get rules covering each example
coverage_matrix: np.ndarray = clf.get_coverage_matrix(X)
# 3. Group coverages of the rules with the same decision class
grouped_coverage_matrix: np.ndarray = np.zeros(
shape=(coverage_matrix.shape[0], len(grouped_rules.keys()))
)
for i, rule_indices in enumerate(grouped_rules.values()):
grouped_coverage_matrix[:, i] = np.sum(
coverage_matrix[:, rule_indices], axis=1
)
grouped_coverage_matrix[grouped_coverage_matrix > 0] = 1
# 4. Find examples with voting conflicts
voting_conflicts_mask: np.ndarray = np.sum(coverage_matrix, axis=1) > 1
# 5. Find examples with negative voting conflicts (where predicted class
# is not equal to actual class)
negative_conflicts_mask: np.ndarray = voting_conflicts_mask[
y != prediction
]
negative_conflicts: int = np.sum(negative_conflicts_mask)
print('Number of negative voting conflicts: ', negative_conflicts)
Not so simple, right?
Perhaps in the future we will add an API to calculate this indicator in a more user-friendly way.
4. 🕰️ DEPRECATION download_jar
command is now deprecated
Due to the removal of RapidMiner's dependencies from the RuleKit Java package, its jar file size has decreased significantly. Now it's small enough to fit into the Python package distribution. There is no need to download it in an extra step using this command as before:
python -m rulekit download_jar
This command will now do nothing and generate a warning. It will be completely removed in the next major version 3.