You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The code for determining the similarity of two condition thresholds is shown below:
PERMISSIBLE_DELTA = 0.1
…
def condition_similarity(condition1: Condition, condition2: Condition):
# Different attributes
if condition1.attribute != condition2.attribute:
return 0
# Different operators
# TODO: Extend???
if condition1.operator != condition2.operator:
return 0
# Handle <= as a special case as per paper
if condition1.operator == Operator.LE and condition2.operator == Operator.LE:
t = abs(PERMISSIBLE_DELTA * condition1.threshold)
x = abs(condition1.threshold - condition2.threshold)
if x == 0:
return 1
return 1 - (x / t) if x < t else 0
return 1
(The original code also contained a bug in the calculation of the tollerance, t, which was fixed in PR #6)
This threshold logic is not appropriate in case of ordinal numbers. For example, the UCI Poker Hand dataset represents the rank of cards as numbers between 1-13. As PERMISSIBLE_DELTA = 1.1, a Queen (12) is has a threshold, t, of 12 * 0.1 = 1.2, which means it would be considered similar to a Jack (11) or King (13), but an Ace (1) would have a threshold, t, of 1 * 0.1 = 0.1 so wouldn’t be considered similar to any other card.
The similar_tree module needs to be modified to allow a list of attributes to be treated as ordinal numbers, and tollerance threshold logic adjusted accordingly. The condition similarity should be 1 if the thresholds represent the same partitioning (e.g. <= 2.0 is the same as <= 2.9 as they both split {1, 2} vs {3, 4, ..}), and 0 otherwise.
Secondly, the code only deals with the case of two <= operators, not two > operators. In the case of two > operators it will return 1 (perfect similarity) even if the thresholds differ.
The text was updated successfully, but these errors were encountered:
The code for determining the similarity of two condition thresholds is shown below:
(The original code also contained a bug in the calculation of the tollerance, t, which was fixed in PR #6)
This threshold logic is not appropriate in case of ordinal numbers. For example, the UCI Poker Hand dataset represents the rank of cards as numbers between 1-13. As
PERMISSIBLE_DELTA
= 1.1, a Queen (12) is has a threshold,t
, of 12 * 0.1 = 1.2, which means it would be considered similar to a Jack (11) or King (13), but an Ace (1) would have a threshold,t
, of 1 * 0.1 = 0.1 so wouldn’t be considered similar to any other card.The
similar_tree
module needs to be modified to allow a list of attributes to be treated as ordinal numbers, and tollerance threshold logic adjusted accordingly. The condition similarity should be 1 if the thresholds represent the same partitioning (e.g. <= 2.0 is the same as <= 2.9 as they both split {1, 2} vs {3, 4, ..}), and 0 otherwise.Secondly, the code only deals with the case of two
<=
operators, not two>
operators. In the case of two>
operators it will return 1 (perfect similarity) even if the thresholds differ.The text was updated successfully, but these errors were encountered: