Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support ordinal types #10

Open
anjsimmo opened this issue Oct 12, 2023 · 0 comments
Open

Support ordinal types #10

anjsimmo opened this issue Oct 12, 2023 · 0 comments
Assignees

Comments

@anjsimmo
Copy link
Contributor

The code for determining the similarity of two condition thresholds is shown below:

PERMISSIBLE_DELTA = 0.1
…
def condition_similarity(condition1: Condition, condition2: Condition):
    # Different attributes
    if condition1.attribute != condition2.attribute:
        return 0

    # Different operators
    # TODO: Extend???
    if condition1.operator != condition2.operator:
        return 0

    # Handle <= as a special case as per paper
    if condition1.operator == Operator.LE and condition2.operator == Operator.LE:
        t = abs(PERMISSIBLE_DELTA * condition1.threshold)
        x = abs(condition1.threshold - condition2.threshold)
        if x == 0:
            return 1
        return 1 - (x / t) if x < t else 0
    return 1

(The original code also contained a bug in the calculation of the tollerance, t, which was fixed in PR #6)

This threshold logic is not appropriate in case of ordinal numbers. For example, the UCI Poker Hand dataset represents the rank of cards as numbers between 1-13. As PERMISSIBLE_DELTA = 1.1, a Queen (12) is has a threshold, t, of 12 * 0.1 = 1.2, which means it would be considered similar to a Jack (11) or King (13), but an Ace (1) would have a threshold, t, of 1 * 0.1 = 0.1 so wouldn’t be considered similar to any other card.

The similar_tree module needs to be modified to allow a list of attributes to be treated as ordinal numbers, and tollerance threshold logic adjusted accordingly. The condition similarity should be 1 if the thresholds represent the same partitioning (e.g. <= 2.0 is the same as <= 2.9 as they both split {1, 2} vs {3, 4, ..}), and 0 otherwise.

Secondly, the code only deals with the case of two <= operators, not two > operators. In the case of two > operators it will return 1 (perfect similarity) even if the thresholds differ.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants