Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Kulczynski Measure and Imbalance Ratio as quality metrics #882

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/sources/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ The CHANGELOG for the current development version is available at

- The `mlxtend.evaluate.bootstrap_point632_score` now supports `fit_params`. ([#861](https://github.com/rasbt/mlxtend/pull/861))
- The `mlxtend/plotting/decision_regions.py` function now has a `contourf_kwargs` for matplotlib to change the look of the decision boundaries if desired. ([#881](https://github.com/rasbt/mlxtend/pull/881) via [[pbloem](https://github.com/pbloem)])
- The `mlxtend.frequent_patterns.metrics` provides **Kulczynski metric** and **Imbalance Ratio** metrics as `kulczynski_measure` and `imbalance_ratio` ([#840](https://github.com/rasbt/mlxtend/issues/840))

##### Changes

Expand Down
389 changes: 389 additions & 0 deletions docs/sources/user_guide/frequent_patterns/metrics.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,389 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Evaluating quality of Association Rules"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Overview"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A strong association rule may or may not be interesting for a specific application. Some measures have been developed to help evaluate association rules. `mlxtend` implements two such measures, Kulczynski Measure and Imbalance Ratio."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Kulczynski Measure:\n",
"\n",
"The Kulczynski measure $K_{A,B}$ can be interpreted as the average between the confidence that $A ⇒ B$ and the confidence that $B ⇒ A$\n",
"\n",
"The Kulczynski measure $K_{A,B} ∈ [0, 1]$ of the itemsets $A ⊆ I$ and\n",
"$B ⊆ I$ such that $A ∩ B = \\varnothing$ is given by\n",
"\n",
"$$K_{A,B} = \\frac{V_{A⇒B} + V_{B⇒A}}{2}$$\n",
"\n",
"$$K_{A,B} = \\frac{1}{2} \\Bigg[\\frac{sup(A \\cup B)}{sup(A)} + \\frac{sup(A \\cup B)}{sup(B)} \\Bigg]$$\n",
"\n",
"- If $K_{A,B} = 0$, then $A ⊆ T$ implies that $B \\nsubseteq T$ for any transaction $T$\n",
"- If $K_{A,B} = 1$, then $A ⊆ T$ implies that $B ⊆ T$ for any transaction $T$\n",
"- Note that the Kulczynski measure is symmetric: $K_{A,B} = K_{B,A}$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Imbalance Ratio:\n",
"The imbalance ratio $I_{A,B}$ can be interpreted as the ratio between the absolute difference between the support count of $A$ and the support count of $B$ and the number of transactions that contain $A$, $B$, or both $A$ and $B$\n",
"- The imbalance ratio $I_{A,B} ∈ [0, 1]$ of the itemsets $A ⊆ I$ and $B ⊆ I$ is given by\n",
"\n",
"$$I_{A,B} =\\frac{|N_A − N_B|}{N_A + N_B − N_{A∪B}}$$\n",
"- If $I_{A,B} = 0$, then $A$ and $B$ have the same support\n",
"- If $I_{A,B} = 1$, then either $A$ or $B$ has zero support\n",
"- Note that the imbalance ratio is symmetric: $I_{A,B} = I_{B,A}$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## References\n",
"\n",
"[1] Chapter 6 of J. Han, M. Kamber, J. Pei, “Data Mining: Concepts and Techniques”, 3rd edition, Elsevier/Morgan Kaufmann, 2012"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Example 1 -- Evaluate Kulczynski Measure of an Association rule:\n"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>antecedents</th>\n",
" <th>consequents</th>\n",
" <th>antecedent support</th>\n",
" <th>consequent support</th>\n",
" <th>support</th>\n",
" <th>confidence</th>\n",
" <th>lift</th>\n",
" <th>leverage</th>\n",
" <th>conviction</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>(Eggs)</td>\n",
" <td>(Kidney Beans)</td>\n",
" <td>0.8</td>\n",
" <td>1.0</td>\n",
" <td>0.8</td>\n",
" <td>1.00</td>\n",
" <td>1.00</td>\n",
" <td>0.00</td>\n",
" <td>inf</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>(Kidney Beans)</td>\n",
" <td>(Eggs)</td>\n",
" <td>1.0</td>\n",
" <td>0.8</td>\n",
" <td>0.8</td>\n",
" <td>0.80</td>\n",
" <td>1.00</td>\n",
" <td>0.00</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>(Eggs)</td>\n",
" <td>(Onion)</td>\n",
" <td>0.8</td>\n",
" <td>0.6</td>\n",
" <td>0.6</td>\n",
" <td>0.75</td>\n",
" <td>1.25</td>\n",
" <td>0.12</td>\n",
" <td>1.6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>(Onion)</td>\n",
" <td>(Eggs)</td>\n",
" <td>0.6</td>\n",
" <td>0.8</td>\n",
" <td>0.6</td>\n",
" <td>1.00</td>\n",
" <td>1.25</td>\n",
" <td>0.12</td>\n",
" <td>inf</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>(Milk)</td>\n",
" <td>(Kidney Beans)</td>\n",
" <td>0.6</td>\n",
" <td>1.0</td>\n",
" <td>0.6</td>\n",
" <td>1.00</td>\n",
" <td>1.00</td>\n",
" <td>0.00</td>\n",
" <td>inf</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>(Onion)</td>\n",
" <td>(Kidney Beans)</td>\n",
" <td>0.6</td>\n",
" <td>1.0</td>\n",
" <td>0.6</td>\n",
" <td>1.00</td>\n",
" <td>1.00</td>\n",
" <td>0.00</td>\n",
" <td>inf</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>(Yogurt)</td>\n",
" <td>(Kidney Beans)</td>\n",
" <td>0.6</td>\n",
" <td>1.0</td>\n",
" <td>0.6</td>\n",
" <td>1.00</td>\n",
" <td>1.00</td>\n",
" <td>0.00</td>\n",
" <td>inf</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>(Eggs, Onion)</td>\n",
" <td>(Kidney Beans)</td>\n",
" <td>0.6</td>\n",
" <td>1.0</td>\n",
" <td>0.6</td>\n",
" <td>1.00</td>\n",
" <td>1.00</td>\n",
" <td>0.00</td>\n",
" <td>inf</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>(Eggs, Kidney Beans)</td>\n",
" <td>(Onion)</td>\n",
" <td>0.8</td>\n",
" <td>0.6</td>\n",
" <td>0.6</td>\n",
" <td>0.75</td>\n",
" <td>1.25</td>\n",
" <td>0.12</td>\n",
" <td>1.6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>(Onion, Kidney Beans)</td>\n",
" <td>(Eggs)</td>\n",
" <td>0.6</td>\n",
" <td>0.8</td>\n",
" <td>0.6</td>\n",
" <td>1.00</td>\n",
" <td>1.25</td>\n",
" <td>0.12</td>\n",
" <td>inf</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>(Eggs)</td>\n",
" <td>(Onion, Kidney Beans)</td>\n",
" <td>0.8</td>\n",
" <td>0.6</td>\n",
" <td>0.6</td>\n",
" <td>0.75</td>\n",
" <td>1.25</td>\n",
" <td>0.12</td>\n",
" <td>1.6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>(Onion)</td>\n",
" <td>(Eggs, Kidney Beans)</td>\n",
" <td>0.6</td>\n",
" <td>0.8</td>\n",
" <td>0.6</td>\n",
" <td>1.00</td>\n",
" <td>1.25</td>\n",
" <td>0.12</td>\n",
" <td>inf</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" antecedents consequents antecedent support \\\n",
"0 (Eggs) (Kidney Beans) 0.8 \n",
"1 (Kidney Beans) (Eggs) 1.0 \n",
"2 (Eggs) (Onion) 0.8 \n",
"3 (Onion) (Eggs) 0.6 \n",
"4 (Milk) (Kidney Beans) 0.6 \n",
"5 (Onion) (Kidney Beans) 0.6 \n",
"6 (Yogurt) (Kidney Beans) 0.6 \n",
"7 (Eggs, Onion) (Kidney Beans) 0.6 \n",
"8 (Eggs, Kidney Beans) (Onion) 0.8 \n",
"9 (Onion, Kidney Beans) (Eggs) 0.6 \n",
"10 (Eggs) (Onion, Kidney Beans) 0.8 \n",
"11 (Onion) (Eggs, Kidney Beans) 0.6 \n",
"\n",
" consequent support support confidence lift leverage conviction \n",
"0 1.0 0.8 1.00 1.00 0.00 inf \n",
"1 0.8 0.8 0.80 1.00 0.00 1.0 \n",
"2 0.6 0.6 0.75 1.25 0.12 1.6 \n",
"3 0.8 0.6 1.00 1.25 0.12 inf \n",
"4 1.0 0.6 1.00 1.00 0.00 inf \n",
"5 1.0 0.6 1.00 1.00 0.00 inf \n",
"6 1.0 0.6 1.00 1.00 0.00 inf \n",
"7 1.0 0.6 1.00 1.00 0.00 inf \n",
"8 0.6 0.6 0.75 1.25 0.12 1.6 \n",
"9 0.8 0.6 1.00 1.25 0.12 inf \n",
"10 0.6 0.6 0.75 1.25 0.12 1.6 \n",
"11 0.8 0.6 1.00 1.25 0.12 inf "
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"from mlxtend.preprocessing import TransactionEncoder\n",
"from mlxtend.frequent_patterns import apriori, association_rules\n",
"from mlxtend.frequent_patterns import metrics\n",
"\n",
"dataset = [['Milk', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'],\n",
" ['Dill', 'Onion', 'Nutmeg', 'Kidney Beans', 'Eggs', 'Yogurt'],\n",
" ['Milk', 'Apple', 'Kidney Beans', 'Eggs'],\n",
" ['Milk', 'Unicorn', 'Corn', 'Kidney Beans', 'Yogurt'],\n",
" ['Corn', 'Onion', 'Onion', 'Kidney Beans', 'Ice cream', 'Eggs']]\n",
"\n",
"te = TransactionEncoder()\n",
"te_ary = te.fit_transform(dataset)\n",
"df = pd.DataFrame(te_ary, columns=te.columns_)\n",
"freq_items = apriori(df, min_support=0.6, use_colnames=True)\n",
"rules = association_rules(freq_items, metric=\"confidence\", min_threshold=0.7)\n",
"rules"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.875"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a = frozenset(['Onion'])\n",
"b = frozenset(['Kidney Beans', 'Eggs'])\n",
"metrics.kulczynski_measure(rules, a, b)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Example 2 -- Evaluate Imabalance Ratio of an Association rule:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.2500000000000001"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"a = frozenset(['Onion'])\n",
"b = frozenset(['Kidney Beans', 'Eggs'])\n",
"metrics.imbalance_ratio(freq_items, a, b)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.2"
},
"orig_nbformat": 4
},
"nbformat": 4,
"nbformat_minor": 2
}
Loading