diff --git a/INSTALL.md b/INSTALL.md new file mode 100644 index 00000000..5ac52534 --- /dev/null +++ b/INSTALL.md @@ -0,0 +1,60 @@ +# Installing SDMetrics + +## Requirements + +**SDMetrics** has been developed and tested on [Python 3.6, 3.7 and 3.8](https://www.python.org/downloads/) + +Also, although it is not strictly required, the usage of a [virtualenv]( +https://virtualenv.pypa.io/en/latest/) is highly recommended in order to avoid +interfering with other software installed in the system where **SDMetrics** is run. + +## Install with pip + +The easiest and recommended way to install **SDMetrics** is using [pip]( +https://pip.pypa.io/en/stable/): + +```bash +pip install sdmetrics +``` + +This will pull and install the latest stable release from [PyPi](https://pypi.org/). + +## Install with conda + +**SDMetrics** can also be installed using [conda](https://docs.conda.io/en/latest/): + +```bash +conda install -c sdv-dev -c conda-forge sdmetrics +``` + +This will pull and install the latest stable release from [Anaconda](https://anaconda.org/). + +## Install from source + +If you want to install **SDMetrics** from source you need to first clone the repository +and then execute the `make install` command inside the `stable` branch. Note that this +command works only on Unix based systems like GNU/Linux and macOS: + +```bash +git clone https://github.com/sdv-dev/SDMetrics +cd SDMetrics +git checkout stable +make install +``` + +## Install for development + +If you intend to modify the source code or contribute to the project you will need to +install it from the source using the `make install-develop` command. In this case, we +recommend you to branch from `master` first: + +```bash +git clone git@github.com:sdv-dev/SDMetrics +cd SDMetrics +git checkout master +git checkout -b +make install-develp +``` + +For more details about how to contribute to the project please visit the [Contributing Guide]( +CONTRIBUTING.rst). diff --git a/MANIFEST.in b/MANIFEST.in index 469520f5..c1d5fd0b 100644 --- a/MANIFEST.in +++ b/MANIFEST.in @@ -3,6 +3,7 @@ include CONTRIBUTING.rst include HISTORY.md include LICENSE include README.md +include sdmetrics/demos/*.pkl recursive-include tests * recursive-exclude * __pycache__ diff --git a/README.md b/README.md index c5dbe790..071989e2 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,8 @@

-“DAI-Lab” -An open source project from Data to AI Lab at MIT. + + DAI-Lab + + An Open Source Project from the Data to AI Lab, at MIT

[![Development Status](https://img.shields.io/badge/Development%20Status-2%20--%20Pre--Alpha-yellow)](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha) @@ -9,218 +11,126 @@ [![Tests](https://github.com/sdv-dev/SDMetrics/workflows/Run%20Tests/badge.svg)](https://github.com/sdv-dev/SDMetrics/actions?query=workflow%3A%22Run+Tests%22+branch%3Amaster) [![Coverage Status](https://codecov.io/gh/sdv-dev/SDMetrics/branch/master/graph/badge.svg)](https://codecov.io/gh/sdv-dev/SDMetrics) -

- -

+ Metrics for Synthetic Data Generation Projects +* Website: https://sdv.dev +* Documentation: https://sdv.dev/SDV +* Repository: https://github.com/sdv-dev/SDMetrics * License: [MIT](https://github.com/sdv-dev/SDMetrics/blob/master/LICENSE) * Development Status: [Pre-Alpha](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha) -* Documentation: https://sdv-dev.github.io/SDMetrics -* Homepage: https://github.com/sdv-dev/SDMetrics # Overview -The **SDMetrics** library provides a set of **dataset-agnostic tools** for evaluating the **quality of a synthetic database** by comparing it to the real database that it is modeled after. It includes a variety of metrics such as: +The **SDMetrics** library provides a set of **dataset-agnostic tools** for evaluating the **quality +of a synthetic database** by comparing it to the real database that it is modeled after. - - **Statistical metrics** which use statistical tests to compare the distributions of the real and synthetic distributions. - - **Detection metrics** which use machine learning to try to distinguish between real and synthetic data. - - **Descriptive metrics** which compute descriptive statistics on the real and synthetic datasets independently and then compare the values. +It supports multiple data modalities: -# Install +* **Single Columns**: Compare 1 dimensional `numpy` arrays representing individual columns. +* **Column Pairs**: Compare how columns in a `pandas.DataFrame` relate to each other, in groups of 2. +* **Single Table**: Compare an entire table, represented as a `pandas.DataFrame`. +* **Multi Table**: Compare multi-table and relational datasets represented as a python `dict` with + multiple tables passed as `pandas.DataFrame`s. +* **Time Series**: Compare tables representing ordered sequences of events. + +It includes a variety of metrics such as: -## Requirements +* **Statistical metrics** which use statistical tests to compare the distributions of the real + and synthetic distributions. +* **Detection metrics** which use machine learning to try to distinguish between real and synthetic data. +* **Efficacy metrics** which compare the performance of machine learning models when run on the synthetic and real data. +* **Bayesian Network and Gaussian Mixture metrics** which learn the distribution of the real data + and evaluate the likelihood of the synthetic data belonging to the learned distribution. +* **Privacy metrics** which evaluate whether the synthetic data is leaking information about the real data. -**SDMetrics** has been developed and tested on [Python 3.6, 3.7 and 3.8](https://www.python.org/downloads/) +# Install -Also, although it is not strictly required, the usage of a [virtualenv]( -https://virtualenv.pypa.io/en/latest/) is highly recommended in order to avoid -interfering with other software installed in the system where **SDMetrics** is run. +**SDMetrics** is part of the **SDV** project and is automatically installed alongside it. For +details about this process please visit the [SDV Installation Guide]( +https://sdv.dev/SDV/getting_started/install.html) -## Install with pip +Optionally, **SDMetrics** can also be installed as a standalone library using the following commands: -The easiest and recommended way to install **SDMetrics** is using [pip]( -https://pip.pypa.io/en/stable/): +**Using `pip`:** ```bash pip install sdmetrics ``` -This will pull and install the latest stable release from [PyPi](https://pypi.org/). - -If you want to install from source or contribute to the project please read the -[Contributing Guide](https://sdv-dev.github.io/SDMetrics/contributing.html#get-started). - -## Install with conda - -**SDMetrics** can also be installed using [conda](https://docs.conda.io/en/latest/): +**Using `conda`:** ```bash conda install -c sdv-dev -c conda-forge sdmetrics ``` -This will pull and install the latest stable release from [Anaconda](https://anaconda.org/). - - -# Basic Usage - -Let's run the demo code from **SDV** to generate a simple synthetic dataset: - -```python3 -from sdv import load_demo, SDV - -metadata, real_tables = load_demo(metadata=True) - -sdv = SDV() -sdv.fit(metadata, real_tables) - -synthetic_tables = sdv.sample_all(20) -``` - -Now that we have a synthetic dataset, we can evaluate it using **SDMetrics** by calling the `evaluate` function which returns an instance of `MetricsReport` with the default metrics: - -```python3 -from sdmetrics import evaluate - -report = evaluate(metadata, real_tables, synthetic_tables) -``` - -## Examining Metrics - -This `report` object makes it easy to examine the metrics at different levels of granularity. For example, the `overall` method returns a single scalar value which functions as a composite score combining all of the metrics. This score can be passed to an optimization routine (i.e. to tune the hyperparameters in a model) and minimized in order to obtain higher quality synthetic data. - -```python3 -print(report.overall()) -``` - -In addition, the `report` provides a `highlights` method which identifies the worst performing metrics. This provides useful hints to help users identify where their synthetic data falls short (i.e. which tables/columns/relationships are not being modeled properly). - -```python3 -print(report.highlights()) -``` - -## Visualizing Metrics +For more installation options please visit the [SDMetrics installation Guide](INSTALL.md) -Finally, the `report` object provides a `visualize` method which generates a figure showing some of the key metrics. - -```python3 -figure = report.visualize() -figure.savefig("sdmetrics-report.png") -``` +# Usage -

- -

+**SDMetrics** is included as part of the framework offered by SDV to evaluate the quality of +your synthetic dataset. For more details about how to use it please visit the corresponding +User Guides: -# Advanced Usage +* [Evaluating Single Table Data](https://sdv.dev/SDV/user_guides/single_table/evaluation.html) +* Evaluating Multi Table Data (Coming soon) +* Evaluating Time Series Data (Coming soon) -## Specifying Metrics +## Standalone usage -Instead of running all the default metrics, you can specify exactly what metrics you -want to run by creating an empty `MetricsReport` and adding the metrics yourself. For -example, the following code only computes the machine learning detection-based metrics. +**SDMetrics** can also be used as a standalone library to run metrics individually. -The `MetricsReport` object includes a `details` method which returns all of the -metrics that were computed. +In this short example we show how to use it to evaluate a toy multi-table dataset and its +synthetic replica by running all the compatible multi-table metrics on it: ```python3 -from sdmetrics import detection -from sdmetrics.report import MetricsReport - -report = MetricsReport() -report.add_metrics(detection.metrics(metadata, real_tables, synthetic_tables)) -``` - -## Creating Metrics - -Suppose you want to add some new metrics to this library. To do this, you simply -need to write a function which yields instances of the `Metric` object: - -```python3 -from sdmetrics.report import Metric - -def my_custom_metrics(metadata, real_tables, synthetic_tables): - name = "abs-diff-in-number-of-rows" - - for table_name in metadata.get_tables(): - - # Absolute difference in number of rows - nb_real_rows = len(real_tables[table_name]) - nb_synthetic_rows = len(synthetic_tables[table_name]) - value = float(abs(nb_real_rows - nb_synthetic_rows)) +import sdmetrics - # Specify some useful tags for the user - tags = set([ - "priority:high", - "table:%s" % table_name - ]) +# Load the demo data, which includes: +# - A dict containing the real tables as pandas.DataFrames. +# - A dict containing the synthetic clones of the real data. +# - A dict containing metadata about the tables. +real_data, synthetic_data, metadata = sdmetrics.load_demo() - yield Metric(name, value, tags) -``` - -To attach your metrics to a `MetricsReport` object, you can use the `add_metrics` -method and provide your custom metrics iterator: - -```python3 -from sdmetrics.report import MetricsReport +# Obtain the list of multi table metrics, which is returned as a dict +# containing the metric names and the corresponding metric classes. +metrics = sdmetrics.multi_table.MultiTableMetric.get_subclasses() -report = MetricsReport() -report.add_metrics(my_custom_metrics(metadata, real_tables, synthetic_tables)) +# Run all the compatible metrics and get a report +sdmetrics.compute_metrics(metrics, real_data, synthetic_data, metadata=metadata) ``` -See `sdmetrics.detection`, `sdmetrics.efficacy`, and `sdmetrics.statistical` for -more examples of how to implement metrics. +The output will be a table with all the details about the executed metrics and their score: -## Filtering Metrics +| metric | name | score | min_value | max_value | goal | +|------------------------------|-----------------------------------------|------------|-------------|-------------|----------| +| CSTest | Chi-Squared | 0.76651 | 0 | 1 | MAXIMIZE | +| KSTest | Inverted Kolmogorov-Smirnov D statistic | 0.75 | 0 | 1 | MAXIMIZE | +| KSTestExtended | Inverted Kolmogorov-Smirnov D statistic | 0.777778 | 0 | 1 | MAXIMIZE | +| LogisticDetection | LogisticRegression Detection | 0.882716 | 0 | 1 | MAXIMIZE | +| SVCDetection | SVC Detection | 0.833333 | 0 | 1 | MAXIMIZE | +| BNLikelihood | BayesianNetwork Likelihood | nan | 0 | 1 | MAXIMIZE | +| BNLogLikelihood | BayesianNetwork Log Likelihood | nan | -inf | 0 | MAXIMIZE | +| LogisticParentChildDetection | LogisticRegression Detection | 0.619444 | 0 | 1 | MAXIMIZE | +| SVCParentChildDetection | SVC Detection | 0.916667 | 0 | 1 | MAXIMIZE | -The `MetricsReport` object includes a `details` method which returns all of the -metrics that were computed. - -```python3 -from sdmetrics.report import MetricsReport - -report = evaluate(metadata, real_tables, synthetic_tables) -report.details() -``` - -To filter these metrics, you can provide a filter function. For example, to only -see metrics that are associated with the `users` table, you can run +# What's next? -```python3 -def my_custom_filter(metric): - if "table:users" in metric.tags: - return True - return False +If you want to read more about each individual metric, please visit the following folders: -report.details(my_custom_filter) -``` +* Single Column Metrics: [sdmetrics/single_column](sdmetrics/single_column) +* Single Table Metrics: [sdmetrics/single_table](sdmetrics/single_table) +* Multi Table Metrics: [sdmetrics/multi_table](sdmetrics/multi_table) -Examples of standard tags implemented by the built-in metrics are shown below. - - - - - - - - - - - - - - - - - -
TagDescription
priority:highThis tag tells the user to pay extra attention to this metric. It typically indicates that the objects being evaluated by the metric are unusually bad (i.e. the synthetic values look very different from the real values).
table:TABLE_NAMEThis tag indicates that the metric involves the table specified by TABLE_NAME. -
column:COL_NAMEThis tag indicates that the metric involves the column specified by COL_NAME. If the column names are not unique across the entire database, then it needs to be combined with the table:TABLE_NAME tag to uniquely identify a specific column.
- -As this library matures, we will define additional standard tags and/or promote them to -first class attributes. +# The Synthetic Data Vault -# What's next? +

+ + + +

This repository is part of The Synthetic Data Vault Project

+

-For more details about **SDMetrics** and all its possibilities and features, please check -the [documentation site](https://sdv-dev.github.io/SDMetrics/). +* Website: https://sdv.dev +* Documentation: https://sdv.dev/SDV diff --git a/conda/meta.yaml b/conda/meta.yaml index 2f9f52d6..8acde4fe 100644 --- a/conda/meta.yaml +++ b/conda/meta.yaml @@ -1,5 +1,9 @@ {% set name = 'sdmetrics' %} +<<<<<<< HEAD +{% set version = '0.1.0.dev2' %} +======= {% set version = '0.0.5.dev0' %} +>>>>>>> master package: name: "{{ name|lower }}" diff --git a/resources/visualize.png b/resources/visualize.png new file mode 100644 index 00000000..1d9924cb Binary files /dev/null and b/resources/visualize.png differ diff --git a/sdmetrics/__init__.py b/sdmetrics/__init__.py index 52186c8e..d0e21a98 100644 --- a/sdmetrics/__init__.py +++ b/sdmetrics/__init__.py @@ -4,10 +4,65 @@ __author__ = 'MIT Data To AI Lab' __email__ = 'dailabmit@gmail.com' -__version__ = '0.0.5.dev0' +__version__ = '0.1.0.dev2' -from sdmetrics.evaluation import evaluate +import pandas as pd + +from sdmetrics import ( + column_pairs, demos, goal, multi_table, single_column, single_table, timeseries) +from sdmetrics.demos import load_demo __all__ = [ - 'evaluate' + 'demos', + 'load_demo', + 'goal', + 'multi_table', + 'column_pairs', + 'single_column', + 'single_table', + 'timeseries', ] + + +def compute_metrics(metrics, real_data, synthetic_data, metadata=None, **kwargs): + """Compute a collection of metrics on the given data. + + Args: + metrics (list[sdmetrics.base.BaseMetric]): + Metrics to compute. + real_data: + Data from the real dataset + synthetic_data: + Data from the synthetic dataset + metadata (dict): + Dataset metadata. + **kwargs: + Any additional arguments to pass to the metrics. + + Returns: + pandas.DataFrame: + Dataframe containing the metric scores, as well as information + about each metric such as the min and max values and its goal. + """ + # Only add metadata to kwargs if passed, to stay compatible + # with metrics that do not expect a metadata argument + if metadata is not None: + kwargs['metadata'] = metadata + + scores = [] + for name, metric in metrics.items(): + try: + score = metric.compute(real_data, synthetic_data, **kwargs) + except Exception: + score = None + + scores.append({ + 'metric': name, + 'name': metric.name, + 'score': score, + 'min_value': metric.min_value, + 'max_value': metric.max_value, + 'goal': metric.goal.name, + }) + + return pd.DataFrame(scores) diff --git a/sdmetrics/base.py b/sdmetrics/base.py new file mode 100644 index 00000000..b83212b1 --- /dev/null +++ b/sdmetrics/base.py @@ -0,0 +1,60 @@ +"""BaseMetric class.""" + + +class BaseMetric: + """Base class for all the metrics in SDMetrics. + + Attributes: + name (str): + Name to use when reports about this metric are printed. + goal (sdmetrics.goal.Goal): + The goal of this metric. + min_value (Union[float, tuple[float]]): + Minimum value or values that this metric can take. + max_value (Union[float, tuple[float]]): + Maximum value or values that this metric can take. + """ + + name = None + goal = None + min_value = None + max_value = None + + @classmethod + def get_subclasses(cls, include_parents=False): + """Recursively find subclasses of this metric. + + If `include_parents` is passed as `True`, intermediate child classes + that also have subclasses will be included. Otherwise, only classes + without subclasses will be included to ensure that they are final + implementations and are ready to be run on data. + + Args: + include_parents (bool): + Whether to include subclasses which are parents to + other classes. Defaults to ``False``. + """ + subclasses = dict() + for child in cls.__subclasses__(): + grandchildren = child.get_subclasses(include_parents) + subclasses.update(grandchildren) + if include_parents or not grandchildren: + subclasses[child.__name__] = child + + return subclasses + + @staticmethod + def compute(real_data, synthetic_data): + """Compute this metric. + + Args: + real_data: + The values from the real dataset. + synthetic_data: + The values from the synthetic dataset. + + Returns: + Union[float, tuple[float]]: + Metric output or outputs. + """ + raise NotImplementedError() diff --git a/sdmetrics/column_pairs/__init__.py b/sdmetrics/column_pairs/__init__.py new file mode 100644 index 00000000..875977db --- /dev/null +++ b/sdmetrics/column_pairs/__init__.py @@ -0,0 +1,13 @@ +"""Metrics to compare column pairs.""" + +from sdmetrics.column_pairs import statistical +from sdmetrics.column_pairs.base import ColumnPairsMetric +from sdmetrics.column_pairs.statistical.kl_divergence import ( + ContinuousKLDivergence, DiscreteKLDivergence) + +__all__ = [ + 'statistical', + 'ColumnPairsMetric', + 'ContinuousKLDivergence', + 'DiscreteKLDivergence', +] diff --git a/sdmetrics/column_pairs/base.py b/sdmetrics/column_pairs/base.py new file mode 100644 index 00000000..e101f797 --- /dev/null +++ b/sdmetrics/column_pairs/base.py @@ -0,0 +1,41 @@ +"""Base class for metrics that compare pairs of columns.""" + +from sdmetrics.base import BaseMetric + + +class ColumnPairsMetric(BaseMetric): + """Base class for metrics that compare pairs of columns. + + Attributes: + name (str): + Name to use when reports about this metric are printed. + goal (sdmetrics.goal.Goal): + The goal of this metric. + min_value (Union[float, tuple[float]]): + Minimum value or values that this metric can take. + max_value (Union[float, tuple[float]]): + Maximum value or values that this metric can take. + """ + + name = None + goal = None + min_value = None + max_value = None + + @staticmethod + def compute(real_data, synthetic_data): + """Compute this metric. + + Args: + real_data (pandas.DataFrame): + The values from the real dataset, passed as pandas.DataFrame + with 2 columns. + synthetic_data (pandas.DataFrame): + The values from the synthetic dataset, passed as a + pandas.DataFrame with 2 columns. + + Returns: + Union[float, tuple[float]]: + Metric output. + """ + raise NotImplementedError() diff --git a/sdmetrics/column_pairs/statistical/__init__.py b/sdmetrics/column_pairs/statistical/__init__.py new file mode 100644 index 00000000..c0c86c0f --- /dev/null +++ b/sdmetrics/column_pairs/statistical/__init__.py @@ -0,0 +1,9 @@ +"""Statistical Metrics to compare column pairs.""" + +from sdmetrics.column_pairs.statistical.kl_divergence import ( + ContinuousKLDivergence, DiscreteKLDivergence) + +__all__ = [ + 'ContinuousKLDivergence', + 'DiscreteKLDivergence', +] diff --git a/sdmetrics/column_pairs/statistical/kl_divergence.py b/sdmetrics/column_pairs/statistical/kl_divergence.py new file mode 100644 index 00000000..100160a9 --- /dev/null +++ b/sdmetrics/column_pairs/statistical/kl_divergence.py @@ -0,0 +1,94 @@ +"""ColumnPair metrics based on Kullback–Leibler Divergence.""" + +import numpy as np +import pandas as pd +from scipy.special import kl_div + +from sdmetrics.column_pairs.base import ColumnPairsMetric +from sdmetrics.goal import Goal +from sdmetrics.utils import get_frequencies + + +class ContinuousKLDivergence(ColumnPairsMetric): + """Continuous Kullback–Leibler Divergence based metric. + + This approximates the KL divergence by binning the continuous values + to turn them into categorical values and then computing the relative + entropy. Afterwards normalizes the value applying `1 / (1 + KLD)`. + + Attributes: + name (str): + Name to use when reports about this metric are printed. + goal (sdmetrics.goal.Goal): + The goal of this metric. + min_value (Union[float, tuple[float]]): + Minimum value or values that this metric can take. + max_value (Union[float, tuple[float]]): + Maximum value or values that this metric can take. + """ + + name = 'Continuous Kullback–Leibler Divergence' + goal = Goal.MAXIMIZE + min_value = 0.0 + max_value = 1.0 + + @staticmethod + def compute(real_data, synthetic_data): + """Compare two pairs of continuous columns using Kullback–Leibler Divergence. + + Args: + real_data (pandas.DataFrame): + The values from the real dataset, passed as pandas.DataFrame + with 2 columns. + synthetic_data (pandas.DataFrame): + The values from the synthetic dataset, passed as a + pandas.DataFrame with 2 columns. + + Returns: + Union[float, tuple[float]]: + Metric output. + """ + real_data[pd.isnull(real_data)] = 0.0 + synthetic_data[pd.isnull(synthetic_data)] = 0.0 + column1, column2 = real_data.columns[:2] + + real, xedges, yedges = np.histogram2d(real_data[column1], real_data[column2]) + synthetic, _, _ = np.histogram2d( + synthetic_data[column1], synthetic_data[column2], bins=[xedges, yedges]) + + f_obs, f_exp = synthetic.flatten() + 1e-5, real.flatten() + 1e-5 + f_obs, f_exp = f_obs / np.sum(f_obs), f_exp / np.sum(f_exp) + + return 1 / (1 + np.sum(kl_div(f_obs, f_exp))) + + +class DiscreteKLDivergence(ColumnPairsMetric): + """Discrete Kullback–Leibler Divergence based metric. + + This computes the KL divergence and afterwards normalizes the + value applying `1 / (1 + KLD)`. + + Attributes: + name (str): + Name to use when reports about this metric are printed. + goal (sdmetrics.goal.Goal): + The goal of this metric. + min_value (Union[float, tuple[float]]): + Minimum value or values that this metric can take. + max_value (Union[float, tuple[float]]): + Maximum value or values that this metric can take. + """ + + name = 'Discrete Kullback–Leibler Divergence' + goal = Goal.MAXIMIZE + min_value = 0.0 + max_value = 1.0 + + @staticmethod + def compute(real_data, synthetic_data): + columns = real_data.columns[:2] + real = real_data[columns].itertuples(index=False) + synthetic = synthetic_data[columns].itertuples(index=False) + + f_obs, f_exp = get_frequencies(real, synthetic) + return 1 / (1 + np.sum(kl_div(f_obs, f_exp))) diff --git a/sdmetrics/constraint/__init__.py b/sdmetrics/constraint/__init__.py deleted file mode 100644 index e290abae..00000000 --- a/sdmetrics/constraint/__init__.py +++ /dev/null @@ -1,41 +0,0 @@ -""" -This module implements constraint checking which makes sure the statistical -properties of the synthetic data match the specified metadata. -""" -from sdmetrics.report import Goal, Metric - - -def metrics(metadata, real_tables, synthetic_tables): - """ - This function takes in (1) a `sdv.Metadata` object which describes a set of - relational tables, (2) a set of "real" tables corresponding to the metadata, - and (3) a set of "synthetic" tables corresponding to the metadata. It yields - a sequence of `Metric` objects. - - Args: - metadata (sdv.Metadata): The Metadata object from SDV. - real_tables (dict): A dictionary mapping table names to dataframes. - synthetic_tables (dict): A dictionary mapping table names to dataframes. - - Yields: - Metric: The next metric. - """ - for table_name in set(real_tables): - key = metadata.get_primary_key(table_name) - for child_name in metadata.get_children(table_name): - child_key = metadata.get_foreign_key(table_name, child_name) - - parent_keys = set(synthetic_tables[table_name][key].values) - child_keys = set(synthetic_tables[child_name][child_key].values) - - yield Metric( - name="foreign-key", - value=float(parent_keys.issuperset(child_keys)), - tags=set([ - "table:%s" % table_name, - "child:%s" % child_name, - ]), - goal=Goal.MAXIMIZE, - unit="binary", - domain=(0.0, 1.0) - ) diff --git a/sdmetrics/datasets/__init__.py b/sdmetrics/datasets/__init__.py deleted file mode 100644 index ffd5fcb1..00000000 --- a/sdmetrics/datasets/__init__.py +++ /dev/null @@ -1,93 +0,0 @@ -""" -This module provides simulated datasets than can be used to experiment with -the SDMetrics library. -""" -import os -from glob import glob - -import pandas as pd -from sdv import Metadata - -_DIR_ = os.path.dirname(__file__) - - -def list_datasets(): - """ - This function returns the list of datasets that are built-in. These - dataset names can be passed to `Dataset.load`. - - Returns: - (List[str]): A list of dataset names. - """ - datasets = [] - for path_to_metadata in glob(os.path.join(_DIR_, "**/metadata.json")): - path_to_dataset = os.path.dirname(path_to_metadata) - dataset_name = os.path.basename(path_to_dataset) - datasets.append(dataset_name) - return datasets - - -class Dataset(): - """ - The Dataset object represents a simulated dataset with metadata, real data, and two - tiers of synthetic data. - - Attributes: - metadata (str): The SDV Metadata object. - tables (Dict[str, DataFrame]): A mapping from table names to the real tables. - lq_synthetic (Dict[str, DataFrame]): A low quality synthetic copy of the tables. - hq_synthetic (Dict[str, DataFrame]): A high quality synthetic copy of the tables. - """ - - def __init__(self, metadata, tables, lq_synthetic, hq_synthetic): - self.metadata = metadata - self.tables = tables - self.lq_synthetic = lq_synthetic - self.hq_synthetic = hq_synthetic - - @staticmethod - def load(dataset, is_path=False): - """This function loads a SDMetrics dataset which consists of a metadata - object, a set of real tables, a set of low quality synthetic tables, and - a set of high quality synthetic tables. - - Arguments: - dataset (str): The name of the dataset (or the path to the dataset). - - Returns: - (Dataset): An instance of the Dataset object. - """ - if is_path: - path_to_dataset = dataset - else: - path_to_dataset = os.path.join(_DIR_, dataset) - - metadata = Metadata(os.path.join(path_to_dataset, "metadata.json")) - tables = Dataset._load_tables(os.path.join(path_to_dataset)) - lq_synthetic = Dataset._load_tables(os.path.join(path_to_dataset, "low_quality")) - hq_synthetic = Dataset._load_tables(os.path.join(path_to_dataset, "high_quality")) - return Dataset(metadata, tables, lq_synthetic, hq_synthetic) - - def save(self, path_to_dataset): - """This exports the dataset to disk at the specified directory. - - Arguments: - path_to_dataset (str): The location to store the dataset. - """ - self.metadata.to_json(os.path.join(path_to_dataset, "metadata.json")) - self._save_tables(path_to_dataset, self.tables) - self._save_tables(os.path.join(path_to_dataset, "low_quality"), self.lq_synthetic) - self._save_tables(os.path.join(path_to_dataset, "high_quality"), self.hq_synthetic) - - @staticmethod - def _load_tables(path_to_tables): - tables = {} - for path_to_csv in glob(os.path.join(path_to_tables, "*.csv")): - table_name = os.path.basename(path_to_csv).replace(".csv", "") - tables[table_name] = pd.read_csv(path_to_csv) - return tables - - @staticmethod - def _save_tables(path_to_tables, tables): - for table_name, df in tables.items(): - df.to_csv(os.path.join(path_to_tables, "%s.csv" % table_name), index=False) diff --git a/sdmetrics/datasets/sdstash-st1/high_quality/table1.csv b/sdmetrics/datasets/sdstash-st1/high_quality/table1.csv deleted file mode 100644 index eeeb8f49..00000000 --- a/sdmetrics/datasets/sdstash-st1/high_quality/table1.csv +++ /dev/null @@ -1,101 +0,0 @@ -x --0.08020410394611338 -0.6090531202951258 -0.309882260166061 -0.5139729333762323 -0.8976880310235987 -0.22972502819708412 -0.36315709435313853 -0.9111909023938428 -0.9699357223272856 -0.32150593405742234 -0.8196233428382076 -0.02047156553567677 -0.40862187948101036 -0.5778453433385543 -0.6180389075429159 -0.5826079919450073 -0.7375109820528915 --0.012373415862248383 -0.48926621641252366 -0.6449333376430064 -0.5439133981388247 -0.7259948546943924 -0.6949472450483942 -0.502707140545422 -0.30891044555352065 -1.0601762747371664 -0.4072205165033259 -0.3416642648505211 -0.32251699465202244 -0.17205598155568175 -0.40882479347551726 -0.014488200814539913 -0.9172375065610439 -0.2923748742083112 -0.7216774320087529 --0.027327032631373482 -0.4982073351212599 -0.8379126773768563 -0.11261337974201366 -0.16000297663362 --0.1737723386554403 -0.0456860895177795 -0.8866164238697604 -0.7910541356449283 -0.1680199369444552 -0.13717871482935495 -0.28705111828536356 -0.8747166383440026 -0.8534847506612324 -0.24069582319262972 -0.883520295699166 -0.47663992846399267 -0.33779848581208843 -0.6060689322388315 -0.16462267211968962 -0.3331549975521262 -0.7157682646375476 -0.41432702225078105 -0.8582033215241348 -0.9237385741744489 -0.7655835322258496 -0.6252039729965092 -0.2647583664777314 --0.11789150222696188 -0.1058444112122237 -0.9491298509935573 --0.050763464863517016 -0.3432035656025237 -0.10117955306788026 -0.47663191746980704 -0.04611018726600537 -0.33621598297058575 -0.9079953499350875 -0.24984749423786107 -0.15196925022333202 -0.4253616613270504 -0.06410383103627748 -0.26711672667799385 -0.6428084473946952 -0.19274108842217186 -0.6011043997847645 -0.9273485866933086 -0.36501625458009307 -0.2770260522538017 -0.7673414304333845 -0.06746065991846965 -0.7924881700188493 -1.0174731222307367 -0.21926186473825535 -1.0774984834450705 -0.8597701056176573 -0.3443035453379035 -0.5108256750710315 -0.08332362449540559 -0.627552717810767 -0.8381720219581053 -0.09810297587970829 -0.3141949847981755 -0.2922753974915657 -0.43515271616776335 diff --git a/sdmetrics/datasets/sdstash-st1/low_quality/table1.csv b/sdmetrics/datasets/sdstash-st1/low_quality/table1.csv deleted file mode 100644 index 8222df15..00000000 --- a/sdmetrics/datasets/sdstash-st1/low_quality/table1.csv +++ /dev/null @@ -1,101 +0,0 @@ -x --1.0028855018221776 -0.32310058139240205 -1.9687792192465965 --0.7318563807119592 -0.29609369169372746 --0.14516650488420102 -1.8777073688869197 -0.5192111030092026 -1.0244136684001584 -0.869389829567331 -0.7350982923149781 --0.7564449941741488 -1.917947434141636 --0.9166526398471938 -0.24986490494711844 -1.0081254578479846 --0.2989381543902381 --0.34062614193113316 -1.390593872714673 -0.38052100542671563 --0.23937052284919869 -1.6289747376199915 -1.0384163344559862 -0.09397264247659554 -1.2890084042812717 -0.5529951771856968 --0.8606689396578348 --2.208308417364486 --0.36825513324115455 --0.6744440655483699 -1.3447447534568007 -0.20020911725660595 -1.936251341558658 --0.876964572602337 -1.4037561072338134 -0.34896100324861423 -0.5111657187540306 -0.6687841768433829 -2.0263745616374362 -0.502046839153131 -0.6724428934469765 -1.718255688131618 --0.3793950699862968 -2.14274555861384 -1.3549623289241681 -1.5078688440911312 -0.9752563390099068 --0.7433552196407304 --0.6350340075909156 -0.8744003446124756 -0.47208117427463536 -0.7298307323752585 -0.990895506761392 -0.23199430093096352 --1.1684943039579503 --0.9255481818549581 -1.3072475336503913 --1.5825150287556942 -0.2805695653376954 -0.24673099247052732 -0.9873256467923439 -2.405763202861248 -0.4569349697632572 -2.2066721293678064 -0.8261547526979678 --0.6909242435970869 -1.444910853974509 -0.1344633407944862 -1.0638918843303888 -1.3150914545909318 -2.181973187295058 -0.2860062298759013 --0.5984572666070119 --0.07635500936045772 --0.3533693652424704 --1.0893338218079538 -0.10998082639241596 -0.4440262472584392 --0.5460883744223078 -0.36134533402722513 --0.12184814848551184 -0.6013637083454997 -1.1207246392933312 -1.4950144572425157 --1.0256071357342942 -1.5765572241415424 -0.7612758950547746 -0.0041309281320912605 --0.4762205958836504 -0.21618299096306637 -1.5800973146448265 --0.6416867828796371 -2.5290397473690014 -0.4899191354618346 -0.11373045210905602 -0.9684668152220431 -0.5193780899356597 -0.6772912897186418 --0.5876686296567976 -0.5546179080802535 diff --git a/sdmetrics/datasets/sdstash-st1/make.py b/sdmetrics/datasets/sdstash-st1/make.py deleted file mode 100644 index c6d64337..00000000 --- a/sdmetrics/datasets/sdstash-st1/make.py +++ /dev/null @@ -1,30 +0,0 @@ -import os - -import numpy as np -import pandas as pd -from sdv import Metadata - -from sdmetrics.datasets import Dataset - -size = 100 -tables = { - "table1": pd.DataFrame({ - "x": np.random.random(size=size) - }) -} -lq_synthetic = { - "table1": pd.DataFrame({ - "x": np.random.random(size=size) + np.random.normal(size=size) - }) -} -hq_synthetic = { - "table1": pd.DataFrame({ - "x": np.random.random(size=size) + np.random.normal(size=size) / 10.0 - }) -} - -metadata = Metadata() -for table_name, df in tables.items(): - metadata.add_table(table_name, data=df) -dataset = Dataset(metadata, tables, lq_synthetic, hq_synthetic) -dataset.save(os.path.dirname(__file__)) diff --git a/sdmetrics/datasets/sdstash-st1/metadata.json b/sdmetrics/datasets/sdstash-st1/metadata.json deleted file mode 100644 index 1b2cf65d..00000000 --- a/sdmetrics/datasets/sdstash-st1/metadata.json +++ /dev/null @@ -1,12 +0,0 @@ -{ - "tables": { - "table1": { - "fields": { - "x": { - "type": "numerical", - "subtype": "float" - } - } - } - } -} \ No newline at end of file diff --git a/sdmetrics/datasets/sdstash-st1/table1.csv b/sdmetrics/datasets/sdstash-st1/table1.csv deleted file mode 100644 index b2e62956..00000000 --- a/sdmetrics/datasets/sdstash-st1/table1.csv +++ /dev/null @@ -1,101 +0,0 @@ -x -0.5152976895263293 -0.20644500012756972 -0.7144737719791164 -0.7639552741273046 -0.789847852301893 -0.2690052966840618 -0.35401808415609737 -0.7730018785485335 -0.5242542965230864 -0.57552502285943 -0.6867018105672488 -0.1707965139545985 -0.4205276821905962 -0.8939370743324755 -0.1551604187223118 -0.10269048881617382 -0.06780540585018746 -0.0014658023142801735 -0.8567495214246276 -0.507993721095559 -0.6762948786894512 -0.244424506375866 -0.6457488218201497 -0.8139312974324804 -0.7150543449571003 -0.022813718922299442 -0.4318661468613413 -0.7487931870768174 -0.21956065828961868 -0.10839409211675155 -0.6522055706470374 -0.488205747892544 -0.2194792771991444 -0.7626715652446779 -0.9262971509669802 -0.8775373455155924 -0.3312469210671408 -0.5906783012848273 -0.9071011384886144 -0.6252155935929389 -0.8432101757954332 -0.8420730670780202 -0.20164355356298613 -0.960118119875518 -0.7983818296917689 -0.49726793957763804 -0.9469495973484978 -0.7404162016141373 -0.3092567832592623 -0.626958221677592 -0.10025486566366293 -0.1827777871925267 -0.003404620478615783 -0.7211448794904967 -0.7139448519649153 -0.00436520519280903 -0.19949233170228242 -0.03376012081050683 -0.7065605156884 -0.57687636985962 -0.31765521779989536 -0.38797308905760497 -0.6654624032644878 -0.7865896250613191 -0.90654740678947 -0.3304260147629695 -0.8916836682648523 -0.04552782095820096 -0.12776026680102737 -0.19150825460504017 -0.8627373685001837 -0.24156222246576708 -0.2074285782081825 -0.708727233369215 -0.6902669149373415 -0.8761296190034704 -0.6516664910630053 -0.4168431932463742 -0.07311031739908636 -0.2831830545635623 -0.5233986767586314 -0.15539684581562896 -0.8954964717706625 -0.6265986514708847 -0.7596470439542783 -0.5076657217168931 -0.9226286664255998 -0.015053312786867723 -0.40350371852549083 -0.5584868673639922 -0.9410505154695019 -0.5862482302570781 -0.07210023368807505 -0.30512047502503514 -0.7349629867217586 -0.7139076362301929 -0.2687352423961783 -0.159264389495834 -0.9620005057179593 -0.6829580488076438 diff --git a/sdmetrics/datasets/sdstash-st2/high_quality/table1.csv b/sdmetrics/datasets/sdstash-st2/high_quality/table1.csv deleted file mode 100644 index 89ff95fc..00000000 --- a/sdmetrics/datasets/sdstash-st2/high_quality/table1.csv +++ /dev/null @@ -1,101 +0,0 @@ -x,y -0.18177958585223764,10.683019701938639 -0.7735030859222969,9.4894302490051 -0.7770302631648719,10.226890981632598 -0.3744840366848735,12.33840604310064 -0.5407195992425569,9.455044219378664 -0.37197777762861034,7.99162712231625 -0.9780202589337507,10.929597493763248 -0.5566511139842961,9.992564846898564 --0.2347942541288341,10.390142188937515 -0.9556252804374838,10.272413067484209 -0.16266601587468768,9.204194154214285 -0.7088561595664351,10.674712867871156 -0.045627627687826106,10.819642843866214 -0.8321158451535884,10.034860859435463 -0.2982094587418628,9.715429481260893 --0.031202337361403265,10.051145412713069 -0.7669908183681662,8.997187227505874 -0.334052725643825,10.927178589509655 -1.052316124864032,8.953841997845634 -0.7009804074666012,12.01005930636321 -0.38895537215202175,10.565300357306597 -0.019957619734458885,9.45965838013841 -0.4950152778339895,9.062561385233355 --0.003703361804610103,7.590327557301075 -0.0026792013959059183,8.357545802638374 -0.7664097680585865,9.993649040232205 -0.0058697750427547934,10.509884698439908 -0.5040442244933349,10.828596571649681 --0.03763793858350217,10.057205853259156 -0.17674679356935183,9.971415977297243 -0.6022956803781814,11.339044770080253 -0.40415301401972664,10.488698521028343 -0.702494102694077,9.126833480561306 -0.8564358109258526,9.48702217399043 -0.5033441396301086,10.467870998857896 -0.7793574299538046,10.610048114171649 -0.26119460589558086,10.51988703553944 --0.025467178066053628,11.59744723016837 -0.12159416023702625,8.857426574025672 -0.4081696710004204,10.23501280212778 -0.8499627661447868,9.629559211972806 --0.03354342717155072,10.152082647667509 -0.4355387131195128,9.112330132136945 -0.50074860439432,11.557151646716543 -0.13993944701073896,10.509806701397931 -0.2860263135547731,7.215787472850548 -0.24486837476371542,9.176730218849317 -0.43733182543397087,9.109980918454204 -1.0847299046995078,10.153028666657969 -0.5684080701820691,10.819082138909375 -0.14364219089588431,9.889918079977432 -0.7293991790632872,7.763795245060157 -0.9230029882708531,11.214872714171525 -0.9852365588080579,11.742365600211523 -0.9084468373820045,11.586692201736367 -0.4023622627639177,10.72311950276763 -0.7332485654447655,9.601164962298604 -0.8764951901409563,9.524170304919943 -0.30652090910808216,9.275711310283452 -0.39332635000445443,10.23700938689483 -0.19879034668330015,10.038104008984433 -0.42027786027361846,10.961553125323238 -0.8866807516543147,10.324566483471099 -0.2273716415564138,10.929901745978885 -0.2877773466220368,11.115581716018859 -0.5591260685252668,11.195548812235984 -0.9468874813842892,9.74832579113569 -0.16707403113831454,10.59691811016325 -0.909103629418907,9.62600428311208 -0.9398173340984431,11.105971982262925 -0.5356889162379349,9.497482552847046 -1.0712978058838283,10.147981304941293 -0.6695948745521602,10.459750843543535 -0.3976646398730438,10.0660145752112 -0.6725010847808285,7.858852018104441 -0.9307545744271719,9.848901829294661 -0.21938057663012406,10.5219878589092 -0.17860382308189737,9.783417880628198 -0.44421509995373015,10.498066329476737 -0.7629683879234429,9.633176400139341 -0.5783066738354545,9.79970372592637 -0.3152364286244593,10.585144590891822 -0.15830470766628638,8.751896481864403 -0.3947603743610643,9.706893936120437 -0.3102095947446845,10.690377749489441 -0.759163939934386,10.604856313300141 -0.8601175493942642,10.338010167637298 -0.4932135535027428,8.720689879248283 -0.09034619412442896,12.191400112444077 -0.6848407053917385,8.817004246618101 -0.10602349924486691,12.236560700080817 -0.3006150011177074,10.678997523565823 -0.9009539952666095,10.357721521634051 -1.1708973211912732,8.956963407311623 -0.9940784400252335,9.063102077362831 -0.707475634713587,8.370083518472336 -0.6162162598610572,11.13507885088745 -0.2361527729837166,8.76695152296602 -0.2464818478441217,10.961664289604117 -1.238889921628858,10.654990953636947 diff --git a/sdmetrics/datasets/sdstash-st2/low_quality/table1.csv b/sdmetrics/datasets/sdstash-st2/low_quality/table1.csv deleted file mode 100644 index 74dbe3d9..00000000 --- a/sdmetrics/datasets/sdstash-st2/low_quality/table1.csv +++ /dev/null @@ -1,101 +0,0 @@ -x,y -2.6666629940706397,10.914135436476322 -2.1625565825851365,8.993907223689314 --0.2443626913393957,13.629303041892296 -0.11556426290779531,11.393535685803482 --0.30286247619140294,9.235755780451381 -2.586938938983626,9.505056546342598 -1.2253399602132982,10.184540873431896 -1.9444976939863685,11.019837009831344 -0.2664519512660953,8.860723808780108 --0.11627358183268072,11.069348238292088 -1.8739780533158785,11.828571390944216 -1.4158040643258087,13.087668892385146 -0.7648652310730384,8.90890729194428 -0.9023443826464551,11.154357934914303 -1.001718346161879,8.689365511388067 -0.6316676360034106,10.352317355577803 -2.013199505405496,8.96309711661918 -0.2632843486380005,11.618506256768349 --0.4378224946898327,10.684420324611223 --0.15771683770246536,10.082300183627213 -0.10050260061171545,10.36034813146827 --0.46578058620776275,9.79247232152277 --0.8811055515972618,10.559504752526474 -1.3619214025927189,10.110361434953981 -0.561735384128764,7.380107262818184 -0.3113706752654969,10.55974637442981 -0.3981992901775735,10.597097167826538 -0.11602958919567385,8.220283700302744 -0.8725322573605846,13.253940450617218 -1.2723609183492446,9.869764377935669 -2.1017905240958012,9.171766839421142 --0.009450390655535101,7.279416831800637 -0.04403087074687924,11.071865342492746 --0.8325293473849246,11.27872635906433 -1.7144116149831685,6.829922614024722 -2.1099937707372636,10.699875385031364 -1.5722480268685868,10.57927165397142 -0.7219242355209121,10.338438042283887 -0.5537383950840872,9.545834300903008 -2.0250407592919832,9.982889229425405 -0.03690442311495917,8.84658078315291 -0.10503051875853181,11.103803653035136 -0.34089137164248173,10.149824559361388 -0.6402490005584835,8.132581403589933 -1.063669862306523,9.339231037407655 --0.345885499592925,8.609167413072909 -0.23103462053232432,10.119031956284823 -1.1395658691817163,8.418844657916846 -0.6803305999171179,8.828545006795878 -1.1343523124147028,11.543087677086643 -0.027313341896598953,11.608484818786096 -2.889602626761082,11.067089799926663 -1.3264596471817323,10.753884807601137 --0.7976891200332498,10.808781126675031 -1.9763206642342666,9.55337397358415 --0.995009090700237,10.260272776239722 --0.09420823892019373,10.944320486423342 -1.5201086353237114,11.732264406933627 -0.18428285367418196,13.203074699874309 -0.0451950815561033,9.425600098323404 -0.39507718652628354,9.434269107495016 -1.4991177783675769,9.66648705254497 -1.624772277831238,8.821993964083694 -0.5728405896909342,10.244460640346418 -1.093711826918214,11.716217310178074 -1.0761417667320596,9.755287922686144 -2.660144748739863,10.559341447018745 -0.44524081498945817,9.04095107736294 --1.054014382510209,8.996237924091941 -1.128543862929818,10.31615434009451 -0.8797469852643894,9.693830747580071 --0.6870040170729148,12.207696240797842 --0.849137854561037,9.990953364250283 -0.22853658905872676,9.840659644486877 -0.15739214736960605,11.22343366282503 -0.09847616920111628,10.18956082561234 -3.0273757553093765,10.663072315107305 --0.45156493755242133,9.596475760491781 -0.2479187265410253,10.421936151419954 -1.1328495807136052,10.087705251114334 -0.862571644061165,12.034819724622523 -0.9516166434908476,9.2573750275016 --1.4873341788641101,9.904601056720121 -1.5842076534484097,9.30570624083148 -0.08425369547484651,9.604106383237399 -1.5715227828475484,11.249359129988504 -1.2967415684010826,12.311951636613031 -0.4720578515755164,8.59096454184605 -2.47168716837107,10.585122021934316 -0.22136466788583586,11.219148744001146 -0.6161587213547804,12.784808242005509 -1.5858071275835597,7.983756258079236 -2.5975263128186104,10.25998396682012 -1.1530988457465736,12.05521504271848 --0.8797468460353832,8.593702703839737 -0.5056320139545794,9.611146744468542 --1.5035134084135975,10.651627930642322 --0.617584179652297,9.313277141264404 -1.5070805595146815,8.164642128362354 --0.21150651546652288,10.428417361092693 diff --git a/sdmetrics/datasets/sdstash-st2/make.py b/sdmetrics/datasets/sdstash-st2/make.py deleted file mode 100644 index aaa30212..00000000 --- a/sdmetrics/datasets/sdstash-st2/make.py +++ /dev/null @@ -1,33 +0,0 @@ -import os - -import numpy as np -import pandas as pd -from sdv import Metadata - -from sdmetrics.datasets import Dataset - -size = 100 -tables = { - "table1": pd.DataFrame({ - "x": np.random.random(size=size), - "y": np.random.normal(size=size, loc=10.0) - }) -} -lq_synthetic = { - "table1": pd.DataFrame({ - "x": np.random.random(size=size) + np.random.normal(size=size), - "y": np.random.normal(size=size, loc=10.0) + np.random.normal(size=size) - }) -} -hq_synthetic = { - "table1": pd.DataFrame({ - "x": np.random.random(size=size) + np.random.normal(size=size) / 10.0, - "y": np.random.normal(size=size, loc=10.0) + np.random.normal(size=size) / 10.0 - }) -} - -metadata = Metadata() -for table_name, df in tables.items(): - metadata.add_table(table_name, data=df) -dataset = Dataset(metadata, tables, lq_synthetic, hq_synthetic) -dataset.save(os.path.dirname(__file__)) diff --git a/sdmetrics/datasets/sdstash-st2/metadata.json b/sdmetrics/datasets/sdstash-st2/metadata.json deleted file mode 100644 index ca3b4687..00000000 --- a/sdmetrics/datasets/sdstash-st2/metadata.json +++ /dev/null @@ -1,16 +0,0 @@ -{ - "tables": { - "table1": { - "fields": { - "y": { - "type": "numerical", - "subtype": "float" - }, - "x": { - "type": "numerical", - "subtype": "float" - } - } - } - } -} \ No newline at end of file diff --git a/sdmetrics/datasets/sdstash-st2/table1.csv b/sdmetrics/datasets/sdstash-st2/table1.csv deleted file mode 100644 index 449eb9bd..00000000 --- a/sdmetrics/datasets/sdstash-st2/table1.csv +++ /dev/null @@ -1,101 +0,0 @@ -x,y -0.6611774877075522,10.227423128091793 -0.8915849685675971,10.772189332288878 -0.8290395145142145,11.491181350898495 -0.3378410195000735,7.980120775285399 -0.5550817414131906,9.950532215475198 -0.26606016707932234,9.368317781014715 -0.16328159082779803,8.155772729814885 -0.23809896286472176,9.668850087777901 -0.4317329587798131,10.935559216184908 -0.2808800517283242,11.00276891532982 -0.3155466516632074,10.04522703805313 -0.2875216493026641,9.10123303384881 -0.4294838680267854,10.21021462931048 -0.5628968136257334,10.040681484936194 -0.486014494531281,11.837616076066734 -0.8532721608357607,10.207384545979261 -0.6289801396457932,10.276216648835957 -0.4002226152114927,11.422868461672781 -0.8428609891131289,8.764392269287645 -0.33182670783427715,8.213251869395284 -0.5602332878155968,10.564429432237079 -0.7327129137442727,10.528211856674796 -0.5239330253017592,7.889581080463904 -0.711827684652229,10.615694183063185 -0.5528102672943702,10.872174923152528 -0.34664118666549715,6.2088048308705766 -0.9897167867803034,9.232466992976281 -0.7620139228284653,8.879989238828818 -0.21283546213209292,10.215786139632455 -0.5160455780162494,10.622484165823925 -0.20024673288084494,8.90306526754825 -0.6662065416045947,9.88860311165576 -0.01183718015146873,10.138574007493371 -0.5971587716121722,12.048068308688132 -0.18805686323258264,8.785872416477474 -0.889010669091212,10.683812214712594 -0.627392932231121,8.516263180641939 -0.629209011279374,9.389568423172381 -0.25127210015655566,10.091578689387774 -0.3883035518325396,10.185998482407848 -0.9542220157691558,9.685080997965144 -0.5435299976972922,11.232785542663963 -0.14160855596431865,10.362662715815361 -0.43834591749656693,7.76379760011946 -0.04187742141266648,11.407905270110032 -0.8789775029666745,9.439616545205837 -0.6736345808996088,11.308257140051237 -0.6085236522918531,10.544818170702312 -0.885347147577855,11.012348726548932 -0.4204834797524887,11.028407172453843 -0.8147408498896069,9.052015493698956 -0.13593776950374947,10.302983636855517 -0.019535612589182794,8.008729527290907 -0.5542446109950969,8.623302849511996 -0.6423478070173537,9.607332071691271 -0.7832152365993793,9.655009001449258 -0.5076334655362357,10.406800901414544 -0.3910963676800008,9.768150486301096 -0.6800494924897732,12.088785938894196 -0.9113161049553626,10.253956060844471 -0.02205341480993772,8.936667866336261 -0.32277879650840946,11.288728859506259 -0.9577901969225755,8.746379532953718 -0.2796866207240132,10.411509715998003 -0.8182362332113298,8.875128743238182 -0.9378200745600737,10.557787268858531 -0.16612951637160323,10.542351504585445 -0.8696400182945767,9.809656862095238 -0.6449930635530672,10.350618628950844 -0.9112570821730007,10.282324504774461 -0.6682447018054063,11.876404565626038 -0.8108055864967738,9.28109809539602 -0.9552367341820432,11.462283874980987 -0.7083208066623619,11.004627177378293 -0.027747587028684184,10.713523630750577 -0.8068375623915376,8.736490473002787 -0.6155460660005723,9.933660355280953 -0.1988389700939981,11.511938745482302 -0.8893648839500184,9.57206171213619 -0.631880370947405,11.77922637743399 -0.06592730201514296,9.59702511413633 -0.9285027582665322,10.0923949082146 -0.36669917913537464,10.158540140381414 -0.8108789570084657,9.942674794753058 -0.26988938186262335,11.17264557931026 -0.3692926397512978,10.16313290313457 -0.7049532847867912,9.507700861477671 -0.45315266055943604,11.111138376027352 -0.8200705706374034,9.629850852871161 -0.5178569467393328,9.680725948342417 -0.07888425671707189,7.742132894858115 -0.5015813223660738,9.48048095156824 -0.8279342846570298,11.518310400784733 -0.016250007875908246,11.039300935625068 -0.4998441553305316,9.313925515858422 -0.6173241812155247,10.218383495482948 -0.9138518740240682,8.780376927785559 -0.31128630097476495,9.47358218021993 -0.7429658619232243,9.746076128146717 -0.6048918421774555,8.09219174711575 diff --git a/sdmetrics/demos.py b/sdmetrics/demos.py new file mode 100644 index 00000000..a7c36341 --- /dev/null +++ b/sdmetrics/demos.py @@ -0,0 +1,78 @@ +"""Functions to load demos with real and synthetic data of different data modalities.""" + +import pathlib +import pickle + + +def load_demo(modality='multi_table'): + """Load demo data of the indicated data modality. + + By default, multi_table demo is loaded. + + Output is the real data, the synthetic data and the metadata dict. + + Args: + modality (str): + Data modality to load. It can be multi_table, single_table + or timeseries. + + Returns: + tuple: + Real data, Synthetic data, Metadata. + """ + demo_path = pathlib.Path(__file__).parent / 'demos' / f'{modality}.pkl' + with open(demo_path, 'rb') as demo_file: + return pickle.load(demo_file) + + +def load_multi_table_demo(): + """Load multi-table demo data. + + The dataset is the ``SDV`` demo data, which consists of three + tables, ``users``, ``sessions`` and ``transactions``, with + simulated data about user browsing sessions and transactions + made during those sessions, and a synthetic copy of it made + by the ``sdv.relational.HMA1`` model. + + Returns: + tuple: + * dict: Real tables. + * dict: Synthetic tables. + * dict: Dataset Metadata. + """ + return load_demo('multi_table') + + +def load_single_table_demo(): + """Load multi-table demo data. + + The dataset is the ``student_placements`` tabular demo from SDV + and a synthetic copy of it made using he ``sdv.tabular.CTGAN`` + model. + + Returns: + tuple: + * pandas.DataFrame: Real table. + * pandas.DataFrame: Synthetic table. + * dict: Table Metadata. + """ + return load_demo('single_table') + + +def load_timeseries_demo(): + """Load time series demo data. + + The dataset is the ``sunglasses`` demo data from the DeepEcho + project, which contains simulated data from a chain of sunglasses + stores, and a synthetic copy of it made by the ``sdv.timeseries.PAR`` + model. + + It has 1 entity column, 1 context column and a datetime sequence index. + + Returns: + tuple: + * pandas.DataFrame: Real table. + * pandas.DataFrame: Synthetic table. + * dict: Table Metadata. + """ + return load_demo('timeseries') diff --git a/sdmetrics/demos/multi_table.pkl b/sdmetrics/demos/multi_table.pkl new file mode 100644 index 00000000..52dd8274 Binary files /dev/null and b/sdmetrics/demos/multi_table.pkl differ diff --git a/sdmetrics/demos/single_table.pkl b/sdmetrics/demos/single_table.pkl new file mode 100644 index 00000000..2aa6c094 Binary files /dev/null and b/sdmetrics/demos/single_table.pkl differ diff --git a/sdmetrics/demos/timeseries.pkl b/sdmetrics/demos/timeseries.pkl new file mode 100644 index 00000000..cb66e9d4 Binary files /dev/null and b/sdmetrics/demos/timeseries.pkl differ diff --git a/sdmetrics/detection/__init__.py b/sdmetrics/detection/__init__.py deleted file mode 100644 index fc948be2..00000000 --- a/sdmetrics/detection/__init__.py +++ /dev/null @@ -1,23 +0,0 @@ -""" -This module implements machine learning methods for detecting synthetic data. -""" -from sdmetrics.detection.tabular import LogisticDetector - - -def metrics(metadata, real_tables, synthetic_tables): - """ - This function takes in (1) a `sdv.Metadata` object which describes a set of - relational tables, (2) a set of "real" tables corresponding to the metadata, - and (3) a set of "synthetic" tables corresponding to the metadata. It yields - a sequence of `Metric` objects. - - Args: - metadata (sdv.Metadata): The Metadata object from SDV. - real_tables (dict): A dictionary mapping table names to dataframes. - synthetic_tables (dict): A dictionary mapping table names to dataframes. - - Yields: - Metric: The next metric. - """ - for detector in [LogisticDetector()]: - yield from detector.metrics(metadata, real_tables, synthetic_tables) diff --git a/sdmetrics/detection/tabular/__init__.py b/sdmetrics/detection/tabular/__init__.py deleted file mode 100644 index cbaadedc..00000000 --- a/sdmetrics/detection/tabular/__init__.py +++ /dev/null @@ -1,8 +0,0 @@ -""" -This module implements machine learning methods for detecting synthetic -rows in a single table. -""" -from sdmetrics.detection.tabular.base import TabularDetector -from sdmetrics.detection.tabular.sklearn import LogisticDetector, SVCDetector - -__all__ = ["TabularDetector", "LogisticDetector", "SVCDetector"] diff --git a/sdmetrics/detection/tabular/base.py b/sdmetrics/detection/tabular/base.py deleted file mode 100644 index e38d3e65..00000000 --- a/sdmetrics/detection/tabular/base.py +++ /dev/null @@ -1,149 +0,0 @@ -import warnings - -import numpy as np -from rdt import HyperTransformer -from sklearn.metrics import roc_auc_score -from sklearn.model_selection import StratifiedKFold - -from sdmetrics.report import Goal, Metric - - -class TabularDetector(): - - name = "" - - def fit(self, X, y): - """This function implements a fit procedure which trains a binary - classification model where class=1 indicates the data is synthetic - and class=0 indicates that the data is real. - - Arguments: - X (np.ndarray): The numerical features (i.e. transformed rows). - y (np.ndarray): The binary classification target. - """ - raise NotImplementedError() - - def predict_proba(self, X): - """This function predicts the probability that each of the samples - comes from the synthetic dataset. - - Arguments: - X (np.ndarray): The numerical features (i.e. transformed rows). - - Returns: - np.ndarray: The probability that the class is 1. - """ - raise NotImplementedError() - - def metrics(self, metadata, real_tables, synthetic_tables): - """ - This function yields a sequence of Metric object. - - Args: - metadata (sdv.Metadata): The Metadata object from SDV. - real_tables (dict): A dictionary mapping table names to dataframes. - synthetic_tables (dict): A dictionary mapping table names to dataframes. - - Yields: - Metric: The next metric. - """ - yield from self._single_table_detection(metadata, real_tables, synthetic_tables) - yield from self._parent_child_detection(metadata, real_tables, synthetic_tables) - - def _single_table_detection(self, metadata, real_tables, synthetic_tables): - # Single Table Detection - for table_name in set(real_tables): - table_fields = list(metadata.get_dtypes(table_name, ids=False).keys()) - auroc = self._compute_auroc( - real_tables[table_name][table_fields], - synthetic_tables[table_name][table_fields]) - - yield Metric( - name=self.name, - value=auroc, - tags=set([ - "detection:auroc", - "table:%s" % table_name - ]), - goal=Goal.MINIMIZE, - unit="auroc", - domain=(0.0, 1.0) - ) - - def _parent_child_detection(self, metadata, real_tables, synthetic_tables): - # Parent-Child Table Detection - for table_name in set(real_tables): - key = metadata.get_primary_key(table_name) - table_fields = [key] + list(metadata.get_dtypes(table_name, ids=False)) - for child_name in metadata.get_children(table_name): - child_key = metadata.get_foreign_key(table_name, child_name) - child_fields = [child_key] + list(metadata.get_dtypes(child_name, ids=False)) - - real = self._denormalize( - real_tables[table_name][table_fields], - key, - real_tables[child_name][child_fields], - child_key - ) - synthetic = self._denormalize( - synthetic_tables[table_name][table_fields], - key, - synthetic_tables[child_name][child_fields], - child_key - ) - - auroc = self._compute_auroc(real, synthetic) - - yield Metric( - name=self.name, - value=auroc, - tags=set([ - "detection:auroc", - "table:%s" % table_name, - "table:%s" % child_name, - ] + (["priority:high"] if auroc > 0.9 else [])), - goal=Goal.MINIMIZE, - unit="auroc", - domain=(0.0, 1.0) - ) - - def _compute_auroc(self, real_table, synthetic_table): - transformer = HyperTransformer() - real_table = transformer.fit_transform(real_table).values - synthetic_table = transformer.transform(synthetic_table).values - - X = np.concatenate([real_table, synthetic_table]) - y = np.hstack([np.ones(len(real_table)), np.zeros(len(synthetic_table))]) - X[np.isnan(X)] = 0.0 - - if len(X) < 20: - warnings.warn("Not enough data, skipping the detection tests.") - - scores = [] - kf = StratifiedKFold(n_splits=3, shuffle=True) - for train_index, test_index in kf.split(X, y): - self.fit(X[train_index], y[train_index]) - y_pred = self.predict_proba(X[test_index]) - auroc = roc_auc_score(y[test_index], y_pred) - if auroc < 0.5: - auroc = 1.0 - auroc - scores.append(auroc) - return np.mean(scores) - - @staticmethod - def _denormalize(table, key, child_table, child_key): - """ - Given a parent table (with a primary key) and a child table (with a foreign key), - this performs an outer join and returns a single flat table. - """ - flat = table.merge( - child_table, - how='outer', - left_on=key, - right_on=child_key) - - del flat[key] - if child_key != key: - del flat[child_key] - - return flat diff --git a/sdmetrics/detection/tabular/sklearn.py b/sdmetrics/detection/tabular/sklearn.py deleted file mode 100644 index 9a3e27cf..00000000 --- a/sdmetrics/detection/tabular/sklearn.py +++ /dev/null @@ -1,53 +0,0 @@ -"""scikit-learn based TabularDetectors.""" - -import numpy as np -from sklearn.impute import SimpleImputer -from sklearn.linear_model import LogisticRegression -from sklearn.pipeline import Pipeline -from sklearn.preprocessing import RobustScaler -from sklearn.svm import SVC - -from sdmetrics.detection.tabular.base import TabularDetector - - -class ScikitLearnDetector(TabularDetector): - - def _get_classifier(self): - """Build and return an instance of a scikit-learn Classifier.""" - raise NotImplementedError() - - def fit(self, X, y): - """This function trains a sklearn pipeline with a robust scalar - and a logistic regression classifier. - - Arguments: - X (np.ndarray): The numerical features (i.e. transformed rows). - y (np.ndarray): The binary classification target. - """ - X[np.isin(X, [np.inf, -np.inf])] = None - self.model = Pipeline([ - ('imputer', SimpleImputer()), - ('scalar', RobustScaler()), - ('classifier', self._get_classifier()), - ]) - self.model.fit(X, y) - - def predict_proba(self, X): - X[np.isin(X, [np.inf, -np.inf])] = None - return self.model.predict_proba(X)[:, 1] - - -class LogisticDetector(ScikitLearnDetector): - - name = "logistic" - - def _get_classifier(self): - return LogisticRegression(solver="lbfgs") - - -class SVCDetector(ScikitLearnDetector): - - name = "svc" - - def _get_classifier(self): - return SVC(probability=True, gamma='scale') diff --git a/sdmetrics/efficacy/__init__.py b/sdmetrics/efficacy/__init__.py deleted file mode 100644 index 3cc0a0f7..00000000 --- a/sdmetrics/efficacy/__init__.py +++ /dev/null @@ -1,7 +0,0 @@ -""" -This module implements classes/methods to help users evaluate the efficacy -of the synthetic data (compared to the real data) on various tasks. -""" -from sdmetrics.efficacy.base import MLEfficacy - -__all__ = ['MLEfficacy'] diff --git a/sdmetrics/efficacy/base.py b/sdmetrics/efficacy/base.py deleted file mode 100644 index 532b815e..00000000 --- a/sdmetrics/efficacy/base.py +++ /dev/null @@ -1,105 +0,0 @@ -import numpy as np -from sklearn.model_selection import KFold - -from sdmetrics.report import Goal, Metric - - -class MLEfficacy(): - - name = "" - target_table_name = "" - target_column_name = "" - - metric_unit = "" - metric_goal = Goal.IGNORE - metric_domain = (float("-inf"), float("inf")) - - def fit(self, X, y): - """This function implements a fit procedure which trains a supervised - learning model. - - Arguments: - X (np.ndarray): The numerical features (i.e. transformed rows). - y (np.ndarray): The output/target value. - """ - raise NotImplementedError() - - def score(self, X, y): - """This function scores this model on the (test) dataset. - - Arguments: - X (np.ndarray): The numerical features (i.e. transformed rows). - y (np.ndarray): The output/target value. - - Returns: - float: The value of the appropriate metric. - """ - raise NotImplementedError() - - def metrics(self, metadata, real_tables, synthetic_tables): - real_table = real_tables[self.target_table_name] - synthetic_table = synthetic_tables[self.target_table_name] - delta_score, synthetic_score = self._evaluate_score(real_table, synthetic_table) - - # Score on the synthetic table. Evaluated on real. - yield Metric( - name=self.name, - value=synthetic_score, - tags=set([ - "efficacy:ml", - "table:%s" % self.target_table_name, - "column:%s" % self.target_column_name, - ]), - goal=self.metric_goal, - unit=self.metric_unit, - domain=self.metric_domain, - description="Score on the real test set using the machine learning" - " model trained on synthetic data." - ) - - # Score on synthetic minus score on real. Evaluated on real. - delta_domain = self.metric_domain[1] - self.metric_domain[0] - yield Metric( - name=self.name, - value=delta_score, - tags=set([ - "efficacy:ml", - "table:%s" % self.target_table_name, - "column:%s" % self.target_column_name, - ]), - goal=self.metric_goal, - unit="delta_%s" % self.metric_unit, - domain=(-delta_domain, delta_domain), - description="Diff in score on real when trained on synthetic vs real." - ) - - def _evaluate_score(self, real, synthetic): - """ - This computes and returns the score of the model on the real test set when - it is trained on the synthetic data. It also returns the difference in score - of the model on the real data when trained on the synthetic data minus the - score when trained on the real data. - """ - real_X = real.loc[:, real.columns != self.target_column_name].values - real_y = real[self.target_column_name].values - - synthetic_X = synthetic.loc[:, synthetic.columns != self.target_column_name].values - synthetic_y = synthetic[self.target_column_name].values - - delta_scores = [] - synthetic_scores = [] - kf = KFold(n_splits=3, shuffle=True) - - for train_index, test_index in kf.split(real_X, real_y): - # Train a model on the real dataset and test on the real dataset. - self.fit(real_X[train_index], real_y[train_index]) - real_score = self.score(real_X[test_index], real_y[test_index]) - - # Train a model on the synthetic dataset and test it on the real test dataset. - self.fit(synthetic_X, synthetic_y) - synthetic_score = self.score(real_X[test_index], real_y[test_index]) - - delta_scores.append(synthetic_score - real_score) - synthetic_scores.append(synthetic_score) - - return np.mean(delta_scores), np.mean(synthetic_scores) diff --git a/sdmetrics/evaluation.py b/sdmetrics/evaluation.py deleted file mode 100644 index 9944d524..00000000 --- a/sdmetrics/evaluation.py +++ /dev/null @@ -1,56 +0,0 @@ -# -*- coding: utf-8 -*- - -"""Evaluation module.""" - -from sdmetrics import constraint, detection, statistical -from sdmetrics.report import MetricsReport - - -def _validate(metadata, real, synthetic): - """ - This checks to make sure the real and synthetic databases correspond to - the given metadata object. - """ - metadata.validate(real) - metadata.validate(synthetic) - - -def _metrics(metadata, real, synthetic): - """ - This function takes in (1) a `sdv.Metadata` object which describes a set of - relational tables, (2) a set of "real" tables corresponding to the metadata, - and (3) a set of "synthetic" tables corresponding to the metadata. It yields - a sequence of `Metric` objects. - - Args: - metadata (sdv.Metadata): The Metadata object from SDV. - real_tables (dict): A dictionary mapping table names to dataframes. - synthetic_tables (dict): A dictionary mapping table names to dataframes. - - Yields: - Metric: The next metric. - """ - _validate(metadata, real, synthetic) - - yield from constraint.metrics(metadata, real, synthetic) - yield from detection.metrics(metadata, real, synthetic) - yield from statistical.metrics(metadata, real, synthetic) - - -def evaluate(metadata, real, synthetic): - """ - This generates a MetricsReport for the given metadata and tables with the - default/built-in metrics. - - Args: - metadata (sdv.Metadata): The Metadata object from SDV. - real_tables (dict): A dictionary mapping table names to dataframes. - synthetic_tables (dict): A dictionary mapping table names to dataframes. - - Returns: - MetricsReport: A report containing the default metrics. - """ - _validate(metadata, real, synthetic) - report = MetricsReport() - report.add_metrics(_metrics(metadata, real, synthetic)) - return report diff --git a/sdmetrics/goal.py b/sdmetrics/goal.py new file mode 100644 index 00000000..edd06b0c --- /dev/null +++ b/sdmetrics/goal.py @@ -0,0 +1,15 @@ +"""SDMetrics Goal Enumeration.""" + +from enum import Enum + + +class Goal(Enum): + """Goal Enumeration. + + This enumerates the `goal` for a metric; the value of a metric can be ignored, + minimized, or maximized. + """ + + IGNORE = "ignore" + MAXIMIZE = "maximize" + MINIMIZE = "minimize" diff --git a/sdmetrics/multi_table/README.md b/sdmetrics/multi_table/README.md new file mode 100644 index 00000000..1a7cbceb --- /dev/null +++ b/sdmetrics/multi_table/README.md @@ -0,0 +1,98 @@ +# Multi Table Metrics + +The metrics found on this folder operate on multi-table datasets, passed as two python `dict`s +containing tables as `pandas.DataFrame`s. + +Implemented metrics: + +* Parent-Child Detection metrics: Metrics that de-normalize each parent-child relationship + in the dataset and then execute a *Single Table Detection Metric* on the generated tables. + * `LogisticParentChildDetection`: Parent-child detection metric based on a `LogisticDetection`. + * `SVCParentChildDetection`: Parent-child detection metric based on a `SVCDetection`. +* Multi Single Table Metrics: Metrics that execute a Single Table Metric on each table from the + dataset and then return the average score obtained by it. + * `CSTest`: Multi Single Table metric based on the Single Table CSTest metric. + * `KSTest`: Multi Single Table metric based on the Single Table KSTest metric. + * `KSTestExtended`: Multi Single Table metric based on the Single Table KSTestExtended metric. + * `LogisticDetection`: Multi Single Table metric based on the Single Table LogisticDetection metric. + * `SVCDetection`: Multi Single Table metric based on the Single Table SVCDetection metric. + * `BNLikelihood`: Multi Single Table metric based on the Single Table BNLikelihood metric. + * `BNLogLikelihood`: Multi Single Table metric based on the Single Table BNLogLikelihood metric. + +## MultiTableMetric + +All the multi table metrics are subclasses form the `sdmetrics.multi_table.MultiTableMetric` +class, which can be used to locate all of them: + +```python3 +In [1]: from sdmetrics.multi_table import MultiTableMetric + +In [2]: MultiTableMetric.get_subclasses() +Out[2]: +{'CSTest': sdmetrics.multi_table.multi_single_table.CSTest, + 'KSTest': sdmetrics.multi_table.multi_single_table.KSTest, + 'KSTestExtended': sdmetrics.multi_table.multi_single_table.KSTestExtended, + 'LogisticDetection': sdmetrics.multi_table.multi_single_table.LogisticDetection, + 'SVCDetection': sdmetrics.multi_table.multi_single_table.SVCDetection, + 'BNLikelihood': sdmetrics.multi_table.multi_single_table.BNLikelihood, + 'BNLogLikelihood': sdmetrics.multi_table.multi_single_table.BNLogLikelihood, + 'LogisticParentChildDetection': sdmetrics.multi_table.detection.parent_child.LogisticParentChildDetection, + 'SVCParentChildDetection': sdmetrics.multi_table.detection.parent_child.SVCParentChildDetection} +``` + +## Multi Table Inputs and Outputs + +All the multi table metrics operate on at least two inputs: + +* `real_data`: A dict containing the table names and data from the real dataset passed as + `pandas.DataFrame`s +* `synthetic_data`: A dict containing the table names and data from the synthetic dataset passed + as `pandas.DataFrame`s + +For example, a `KStestExtended` metric can be used as follows: + +```python3 +In [3]: from sdmetrics.multi_table import KSTestExtended + +In [4]: from sdmetrics import load_demo + +In [5]: real_data, synthetic_data, metadata = load_demo() + +In [6]: KSTestExtended.compute(real_data, synthetic_data) +Out[6]: 0.8194444444444443 +``` + +Some metrics also require additional information, such as the relationships that exist between +the tables. + +For example, this is how you would use a `LogisticParentChildDetection` metric: + +```python3 +In [7]: from sdmetrics.multi_table import LogisticParentChildDetection + +In [8]: foreign_keys = [ + ...: ('users', 'user_id', 'sessions', 'user_id'), + ...: ('sessions', 'session_id', 'transactions', 'session_id') + ...: ] + +In [9]: LogisticParentChildDetection.compute(real_data, synthetic_data, foreign_keys=foreign_keys) +Out[9]: 0.7569444444444444 +``` + +Additionally, all the metrics accept a `metadata` argument which must be a dict following +the Metadata JSON schema from SDV, which will be used to determine which columns are compatible +with each one of the different metrics, as well as to extract any additional information required +by the metrics, such as the mentioned relationships. + +If this dictionary is not passed it will be built based on the data found in the real table, +but in this case some field types may not represent the data accurately (e.g. categorical +columns that contain only integer values will be seen as numerical), and any additional +information required by the metrics will not be populated. + +For example, we could execute the same metric as before by passing the `metadata` dict instead +of having to specify the individual `foreign_keys`: + +```python +In [10]: LogisticParentChildDetection.compute(real_data, synthetic_data, metadata) +Out[10]: 0.7569444444444444 +``` diff --git a/sdmetrics/multi_table/__init__.py b/sdmetrics/multi_table/__init__.py new file mode 100644 index 00000000..01b4baa1 --- /dev/null +++ b/sdmetrics/multi_table/__init__.py @@ -0,0 +1,28 @@ +"""Metrics for multi table datasets.""" + +from sdmetrics.multi_table import detection, multi_single_table +from sdmetrics.multi_table.base import MultiTableMetric +from sdmetrics.multi_table.detection.base import DetectionMetric +from sdmetrics.multi_table.detection.parent_child import ( + LogisticParentChildDetection, ParentChildDetectionMetric, SVCParentChildDetection) +from sdmetrics.multi_table.multi_single_table import ( + BNLikelihood, BNLogLikelihood, CSTest, KSTest, KSTestExtended, LogisticDetection, + MultiSingleTableMetric, SVCDetection) + +__all__ = [ + 'detection', + 'multi_single_table', + 'MultiTableMetric', + 'DetectionMetric', + 'ParentChildDetectionMetric', + 'LogisticParentChildDetection', + 'SVCParentChildDetection', + 'BNLikelihood', + 'BNLogLikelihood', + 'CSTest', + 'KSTest', + 'KSTestExtended', + 'LogisticDetection', + 'SVCDetection', + 'MultiSingleTableMetric', +] diff --git a/sdmetrics/multi_table/base.py b/sdmetrics/multi_table/base.py new file mode 100644 index 00000000..a644b897 --- /dev/null +++ b/sdmetrics/multi_table/base.py @@ -0,0 +1,44 @@ +"""Base Multi Table metric class.""" + +from sdmetrics.base import BaseMetric + + +class MultiTableMetric(BaseMetric): + """Base class for metrics that apply to multiple tables. + + Attributes: + name (str): + Name to use when reports about this metric are printed. + goal (sdmetrics.goal.Goal): + The goal of this metric. + min_value (Union[float, tuple[float]]): + Minimum value or values that this metric can take. + max_value (Union[float, tuple[float]]): + Maximum value or values that this metric can take. + """ + + name = None + goal = None + min_value = None + max_value = None + + @staticmethod + def compute(real_data, synthetic_data, metadata=None): + """Compute this metric. + + Args: + real_data (dict[str, pandas.DataFrame]): + The tables from the real dataset, passed as a dictionary of + table names and pandas.DataFrames. + synthetic_data (dict[str, pandas.DataFrame]): + The tables from the synthetic dataset, passed as a dictionary of + table names and pandas.DataFrames. + metadata (dict): + Multi-table metadata dict. If not passed, it is build based on the + real_data fields and dtypes. + + Returns: + Union[float, tuple[float]]: + Metric output. + """ + raise NotImplementedError() diff --git a/sdmetrics/multi_table/detection/__init__.py b/sdmetrics/multi_table/detection/__init__.py new file mode 100644 index 00000000..7cbaab6b --- /dev/null +++ b/sdmetrics/multi_table/detection/__init__.py @@ -0,0 +1 @@ +"""Machine Learning Detection metrics that work on multiple tables.""" diff --git a/sdmetrics/multi_table/detection/base.py b/sdmetrics/multi_table/detection/base.py new file mode 100644 index 00000000..6b2da8e0 --- /dev/null +++ b/sdmetrics/multi_table/detection/base.py @@ -0,0 +1,47 @@ +"""Base class for Machine Learning Detection metrics that work on multiple tables.""" + +from sdmetrics.multi_table.base import MultiTableMetric + + +class DetectionMetric(MultiTableMetric): + """Base class for Machine Learning Detection based metrics on multiple tables. + + These metrics build a Machine Learning Classifier that learns to tell the synthetic + data apart from the real data, which later on is evaluated using Cross Validation. + + The output of the metric is one minus the average ROC AUC score obtained. + + Attributes: + name (str): + Name to use when reports about this metric are printed. + goal (sdmetrics.goal.Goal): + The goal of this metric. + min_value (Union[float, tuple[float]]): + Minimum value or values that this metric can take. + max_value (Union[float, tuple[float]]): + Maximum value or values that this metric can take. + """ + + name = None + goal = None + min_value = None + max_value = None + + @classmethod + def compute(cls, real_data, synthetic_data, metadata=None): + """Compute this metric. + + Args: + real_data (dict[str, pandas.DataFrame]): + The tables from the real dataset. + synthetic_data (dict[str, pandas.DataFrame]): + The tables from the synthetic dataset. + metadata (dict): + Multi-table metadata dict. If not passed, it is build based on the + real_data fields and dtypes. + + Returns: + Union[float, tuple[float]]: + Metric output. + """ + raise NotImplementedError() diff --git a/sdmetrics/multi_table/detection/parent_child.py b/sdmetrics/multi_table/detection/parent_child.py new file mode 100644 index 00000000..133e40c1 --- /dev/null +++ b/sdmetrics/multi_table/detection/parent_child.py @@ -0,0 +1,122 @@ +"""Base class for Machine Learning Detection metrics that work on parent-child pairs of tables.""" + +import numpy as np + +from sdmetrics.multi_table.detection.base import DetectionMetric +from sdmetrics.single_table.detection import LogisticDetection, SVCDetection +from sdmetrics.utils import NestedAttrsMeta + + +class ParentChildDetectionMetric(DetectionMetric, + metaclass=NestedAttrsMeta('single_table_metric')): + """Base class for Multi-table Detection metrics based on parent-child relationships. + + These metrics denormalize the parent-child relationships from the dataset and then + apply a Single Table Detection metric on the resulting tables. + + The output of the metric is one minus the average ROC AUC score obtained. + + A part from the real and synthetic data, these metrics need to be passed + a list with the foreign key relationships that exist between the tables. + + Attributes: + name (str): + Name to use when reports about this metric are printed. + goal (sdmetrics.goal.Goal): + The goal of this metric. + min_value (Union[float, tuple[float]]): + Minimum value or values that this metric can take. + max_value (Union[float, tuple[float]]): + Maximum value or values that this metric can take. + single_table_metric (sdmetrics.single_table.detection.base.DetectionMetric): + The single table detection metric to use. + """ + + single_table_metric = None + + @staticmethod + def _extract_foreign_keys(metadata): + if not isinstance(metadata, dict): + metadata = metadata.to_dict() + + foreign_keys = [] + for child_table, child_meta in metadata['tables'].items(): + for child_key, field_meta in child_meta['fields'].items(): + ref = field_meta.get('ref') + if ref: + foreign_keys.append((ref['table'], ref['field'], child_table, child_key)) + + return foreign_keys + + @staticmethod + def _denormalize(data, foreign_key): + """Denormalize the child table over the parent.""" + parent_table, parent_key, child_table, child_key = foreign_key + + flat = data[parent_table].merge( + data[child_table], + how='outer', + left_on=parent_key, + right_on=child_key + ) + + del flat[parent_key] + if child_key != parent_key: + del flat[child_key] + + return flat + + @classmethod + def compute(cls, real_data, synthetic_data, metadata=None, foreign_keys=None): + """Compute this metric. + + This denormalizes the parent-child relationships from the dataset and then + applies a Single Table Detection metric on the resulting tables. + + The output of the metric is one minus the average ROC AUC score obtained. + + A part from the real and synthetic data, either a ``foreign_keys`` list + containing the relationships between the tables or a ``metadata`` that can be + used to create such list must be passed. + + Args: + real_data (dict[str, pandas.DataFrame]): + The tables from the real dataset. + synthetic_data (dict[str, pandas.DataFrame]): + The tables from the synthetic dataset. + metadata (dict): + Multi-table metadata dict. If not passed, foreign keys must be + passed. + foreign_keys (list[tuple[str, str, str, str]]): + List of foreign key relationships specified as tuples + that contain (parent_table, parent_key, child_table, child_key). + Ignored if metada is given. + + Returns: + float: + Average of the scores obtained by the single table metric. + """ + if metadata: + foreign_keys = cls._extract_foreign_keys(metadata) + if not foreign_keys: + raise ValueError('No foreign keys given') + + scores = [] + for foreign_key in foreign_keys: + real = cls._denormalize(real_data, foreign_key) + synth = cls._denormalize(synthetic_data, foreign_key) + scores.append(cls.single_table_metric.compute(real, synth)) + + return np.mean(scores) + + +class LogisticParentChildDetection(ParentChildDetectionMetric): + """ParentChild detection metric based on a LogisticRegression.""" + + single_table_metric = LogisticDetection + + +class SVCParentChildDetection(ParentChildDetectionMetric): + """ParentChild detection metric based on a SVC.""" + + single_table_metric = SVCDetection diff --git a/sdmetrics/multi_table/multi_single_table.py b/sdmetrics/multi_table/multi_single_table.py new file mode 100644 index 00000000..fa9ddcf9 --- /dev/null +++ b/sdmetrics/multi_table/multi_single_table.py @@ -0,0 +1,144 @@ +"""MultiTable metrics based on applying SingleTable metrics on all the tables.""" + +from collections import defaultdict + +import numpy as np + +from sdmetrics import single_table +from sdmetrics.multi_table.base import MultiTableMetric +from sdmetrics.utils import NestedAttrsMeta + + +class MultiSingleTableMetric(MultiTableMetric, metaclass=NestedAttrsMeta('single_table_metric')): + """MultiTableMetric subclass that applies a SingleTableMetric on each table. + + This class can either be used by creating a subclass that inherits from it and + sets the SingleTable Metric as the `single_table_metric` attribute, + or by creating an instance of this class passing the underlying SingleTable + metric as an argument. + + Attributes: + name (str): + Name to use when reports about this metric are printed. + goal (sdmetrics.goal.Goal): + The goal of this metric. + min_value (Union[float, tuple[float]]): + Minimum value or values that this metric can take. + max_value (Union[float, tuple[float]]): + Maximum value or values that this metric can take. + single_table_metric (sdmetrics.single_table.base.SingleTableMetric): + SingleTableMetric to apply. + """ + + single_table_metric = None + + def __init__(self, single_table_metric): + self.single_table_metric = single_table_metric + self.compute = self._compute + + def _compute(self, real_data, synthetic_data, metadata=None): + """Compute this metric. + + This applies the underlying single table metric to all the tables + found in the dataset and then returns the average score obtained. + + Args: + real_data (dict[str, pandas.DataFrame]): + The tables from the real dataset. + synthetic_data (dict[str, pandas.DataFrame]): + The tables from the synthetic dataset. + metadata (dict): + Multi-table metadata dict. If not passed, it is build based on the + real_data fields and dtypes. + **kwargs: + Any additional keyword arguments will be passed down + to the single table metric + + Returns: + Union[float, tuple[float]]: + Metric output. + """ + if set(real_data.keys()) != set(synthetic_data.keys()): + raise ValueError('`real_data` and `synthetic_data` must have the same tables') + + if metadata is None: + metadata = {'tables': defaultdict(type(None))} + elif not isinstance(metadata, dict): + metadata = metadata.to_dict() + + values = [] + for table_name, real_table in real_data.items(): + synthetic_table = synthetic_data[table_name] + table_meta = metadata['tables'][table_name] + + score = self.single_table_metric.compute(real_table, synthetic_table, table_meta) + values.append(score) + + return np.nanmean(values) + + @classmethod + def compute(cls, real_data, synthetic_data, metadata=None, **kwargs): + """Compute this metric. + + This applies the underlying single table metric to all the tables + found in the dataset and then returns the average score obtained. + + Args: + real_data (dict[str, pandas.DataFrame]): + The tables from the real dataset. + synthetic_data (dict[str, pandas.DataFrame]): + The tables from the synthetic dataset. + metadata (dict): + Multi-table metadata dict. If not passed, it is build based on the + real_data fields and dtypes. + **kwargs: + Any additional keyword arguments will be passed down + to the single table metric + + Returns: + Union[float, tuple[float]]: + Metric output. + """ + return cls._compute(cls, real_data, synthetic_data, metadata, **kwargs) + + +class CSTest(MultiSingleTableMetric): + """MultiSingleTableMetric based on SingleTable CSTest.""" + + single_table_metric = single_table.multi_single_column.CSTest + + +class KSTest(MultiSingleTableMetric): + """MultiSingleTableMetric based on SingleTable KSTest.""" + + single_table_metric = single_table.multi_single_column.KSTest + + +class KSTestExtended(MultiSingleTableMetric): + """MultiSingleTableMetric based on SingleTable KSTestExtended.""" + + single_table_metric = single_table.multi_single_column.KSTestExtended + + +class LogisticDetection(MultiSingleTableMetric): + """MultiSingleTableMetric based on SingleTable LogisticDetection.""" + + single_table_metric = single_table.detection.LogisticDetection + + +class SVCDetection(MultiSingleTableMetric): + """MultiSingleTableMetric based on SingleTable SVCDetection.""" + + single_table_metric = single_table.detection.SVCDetection + + +class BNLikelihood(MultiSingleTableMetric): + """MultiSingleTableMetric based on SingleTable BNLikelihood.""" + + single_table_metric = single_table.bayesian_network.BNLikelihood + + +class BNLogLikelihood(MultiSingleTableMetric): + """MultiSingleTableMetric based on SingleTable BNLogLikelihood.""" + + single_table_metric = single_table.bayesian_network.BNLogLikelihood diff --git a/sdmetrics/report.py b/sdmetrics/report.py deleted file mode 100644 index b91ca6b0..00000000 --- a/sdmetrics/report.py +++ /dev/null @@ -1,248 +0,0 @@ -# -*- coding: utf-8 -*- - -"""MetricsReport module. - -This module defines the classes Goal, Metric and MetricsReport, which -are used for reporting the results of the different evaluation -metrics executed on the data. -""" - -from enum import Enum - -import pandas as pd - - -class Goal(Enum): - """ - This enumerates the `goal` for a metric; the value of a metric can be ignored, - minimized, or maximized. - """ - - IGNORE = "ignore" - MAXIMIZE = "maximize" - MINIMIZE = "minimize" - - -class Metric(): - """ - This represents a single instance of a Metric. - - Attributes: - name (str): The name of the attribute. - value (float): The value of the attribute. - tags (set(str)): A set of arbitrary strings/tags for the attribute. - goal (Goal): Whether the value should maximized, minimized, or ignored. - unit (str): The "unit" of the metric (i.e. p-value, entropy, mean-squared-error). - domain (tuple): The range of values the metric can take on. - description (str): An arbitrary text description of the attribute. - """ - - def __init__(self, name, value, tags=None, goal=Goal.IGNORE, - unit="", domain=(float("-inf"), float("inf")), description=""): - self.name = name - self.value = value - self.tags = tags if tags else set() - self.goal = goal - self.unit = unit - self.domain = domain - self.description = description - self._validate() - - def _validate(self): - assert isinstance(self.name, str) - assert isinstance(self.value, float) - assert isinstance(self.tags, set) - assert isinstance(self.goal, Goal) - assert isinstance(self.unit, str) - assert isinstance(self.domain, tuple) - assert isinstance(self.description, str) - assert self.domain[0] <= self.value and self.value <= self.domain[1] - assert all(isinstance(t, str) for t in self.tags) - - def __eq__(self, other): - my_attrs = (self.name, self.value, self.goal, self.unit) - your_attrs = (other.name, other.value, other.objective, self.unit) - return my_attrs == your_attrs - - def __hash__(self): - return hash(self.name) + hash(self.value) - - def __str__(self): - return """Metric(\n name=%s, \n value=%.2f, \n tags=%s, \n description=%s\n)""" % ( - self.name, self.value, self.tags, self.description) - - -class MetricsReport(): - """ - The `MetricsReport` object is responsible for storing metrics and providing a user - friendly API for accessing them. - """ - - def __init__(self): - self.metrics = [] - - def add_metric(self, metric): - """ - This adds the given `Metric` object to this report. - """ - assert isinstance(metric, Metric) - self.metrics.append(metric) - - def add_metrics(self, iterator): - """ - This takes an iterator which yields `Metric` objects and adds all - of these metrics to this report. - """ - for metric in iterator: - self.add_metric(metric) - - def overall(self): - """ - This computes a single scalar score for this report. To produce higher quality - synthetic data, the model should try to maximize this score. - - Returns: - float: The scalar value to maximize. - """ - score = 0.0 - for metric in self.metrics: - if metric.goal == Goal.MAXIMIZE: - score += metric.value - elif metric.goal == Goal.MINIMIZE: - score -= metric.value - return score - - def details(self, filter_func=None): - """ - This returns a DataFrame containing all of the metrics in this report. You can - optionally use `filter_func` to specify a lambda function which takes in the - metric and returns True if it should be included in the output. - - Args: - filter_func (function, optional): A function which takes a Metric object - and returns True if it should be included. Defaults to accepting all - Metric objects. - - Returns: - DataFrame: A table listing all the (selected) metrics. - """ - if not filter_func: - def filter_func(metric): - return True - rows = [] - for metric in self.metrics: - if not filter_func(metric): - continue - table_tags = [tag for tag in metric.tags if "table:" in tag] - column_tags = [tag for tag in metric.tags if "column:" in tag] - misc_tags = metric.tags - set(table_tags) - set(column_tags) - rows.append({ - "Name": metric.name, - "Value": metric.value, - "Goal": metric.goal, - "Unit": metric.unit, - "Tables": ",".join(table_tags), - "Columns": ",".join(column_tags), - "Misc. Tags": ",".join(misc_tags), - }) - return pd.DataFrame(rows) - - def highlights(self): - """ - This returns a DataFrame containing all of the metrics in this report which - contain the "priority:high" tag. - - Returns: - DataFrame: A table listing all the high-priority metrics. - """ - return self.details(lambda metric: "priority:high" in metric.tags) - - def visualize(self): - """ - This returns a pyplot.Figure which shows some of the key metrics. - - Returns: - pyplot.Figure: A matplotlib figure visualizing key metricss. - """ - from matplotlib import rcParams - rcParams['font.family'] = 'sans-serif' - rcParams['font.sans-serif'] = ['DejaVu Sans'] - - import numpy as np - import seaborn as sns - import matplotlib.pyplot as plt - plt.style.use('seaborn') - - fig = plt.figure(figsize=(10, 12), constrained_layout=True) - gs = fig.add_gridspec(5, 4) - - # Detectability of synthetic tables - fig.add_subplot(gs[3:, :]) - labels, scores = [], [] - for metric in self.metrics: - tables = [tag.replace("table:", "") - for tag in metric.tags if "table:" in tag] - labels.append(" <-> ".join(tables)) - scores.append(metric.value) - df = pd.DataFrame({"score": scores, "label": labels}) - df = df.groupby("label").agg({"score": "mean"}).reset_index() - df = df.sort_values(["score"], ascending=False) - df = df.head(4) - sns.barplot( - x="label", - y="score", - data=df, - ci=None, - palette=sns.color_palette( - "coolwarm_r", - 7)) - plt.axhline(0.9, color="red", linestyle=":", label="Easy To Detect") - plt.axhline(0.7, color="green", linestyle=":", label="Hard To Detect") - plt.legend(loc="lower right") - plt.title("Detectability of Synthetic Tables", fontweight='bold') - plt.ylabel("auROC") - plt.xlabel("") - - # Coming soon. - fig.add_subplot(gs[1:3, 2:]) - pvalues = np.array([m.value for m in self.metrics if m.unit == "p-value"]) - sizes = [np.sum(pvalues < 0.1), np.sum(pvalues > 0.1)] - labels = ['Reject (p<0.1)', 'Fail To Reject'] - plt.pie(sizes, labels=labels) - plt.axis('equal') - plt.title("Columnwise Statistical Tests", fontweight='bold') - plt.ylabel("") - plt.xlabel("") - - # Coming soon. - fig.add_subplot(gs[:3, :2]) - labels, scores = [], [] - for metric in self.metrics: - if metric.unit != "entropy": - continue - for tag in metric.tags: - if "column:" not in tag: - continue - labels.append(tag.replace("column:", "")) - scores.append(metric.value) - df = pd.DataFrame({"score": scores, "label": labels}) - df = df.groupby("label").agg({"score": "mean"}).reset_index() - df = df.sort_values(["score"], ascending=False) - df = df.head(8) - sns.barplot(x="score", y="label", data=df, ci=None, palette=sns.color_palette("Blues_d")) - plt.title("Column Divergence", fontweight='bold') - plt.ylabel("") - plt.xlabel("") - - # Coming soon. - fig.add_subplot(gs[:1, 2:]) - plt.text(0.5, 0.7, r'Overall Score', fontsize=14, fontweight='bold', ha="center") - plt.text(0.5, 0.4, r'%.2f' % self.overall(), fontsize=36, ha="center") - rectangle = plt.Rectangle((0.2, 0.3), 0.6, 0.6, ec='black', fc='white') - plt.gca().add_patch(rectangle) - plt.ylabel("") - plt.xlabel("") - plt.axis('off') - - fig.tight_layout(pad=2.0) - return fig diff --git a/sdmetrics/single_column/README.md b/sdmetrics/single_column/README.md new file mode 100644 index 00000000..7e852e77 --- /dev/null +++ b/sdmetrics/single_column/README.md @@ -0,0 +1,50 @@ +# Single Column Metrics + +The metrics found on this folder operate on individual columns (or univariate random variables), +passed as two 1 dimensional arrays. + +Implemented metrics: + +* Statistical: Metrics that compare the arrays using statistical tests + * `CSTest`: Chi-Squared test to compare the distributions of two categorical columns. + * `KSTest`: Kolmogorov-Smirnov test to compare the distributions of two numerical columns using + their empirical CDF. + +## SingleColumnMetric + +All the single column metrics are subclasses form the `sdmetrics.single_column.SingleColumnMetric` +class, which can be used to locate all of them: + +```python3 +In [1]: from sdmetrics.single_column import SingleColumnMetric + +In [2]: SingleColumnMetric.get_subclasses() +Out[2]: +{'CSTest': sdmetrics.single_column.statistical.cstest.CSTest, + 'KSTest': sdmetrics.single_column.statistical.kstest.KSTest} +``` + +## Single Column Inputs and Outputs + +All the single column metrics operate on just two inputs: + +* `real_data`: A 1d numpy array, coming from the real dataset. +* `synthetic_data`: A 1d numpy array, coming from the synthetic dataset. + +For example, this how the KSTest metric can be computed for the `age` column +from the demo data: + +```python3 +In [3]: from sdmetrics import load_demo + +In [4]: real_data, synthetic_data, metadata = load_demo() + +In [5]: from sdmetrics.single_column import KSTest + +In [6]: real_column = real_data['users']['age'].to_numpy() + +In [7]: synthetic_column = synthetic_data['users']['age'].to_numpy() + +In [8]: KSTest.compute(real_column, synthetic_column) +Out[8]: 0.8 +``` diff --git a/sdmetrics/single_column/__init__.py b/sdmetrics/single_column/__init__.py new file mode 100644 index 00000000..cae63bf7 --- /dev/null +++ b/sdmetrics/single_column/__init__.py @@ -0,0 +1,14 @@ +"""Metrics for Single columns.""" + +from sdmetrics.single_column import base, statistical +from sdmetrics.single_column.base import SingleColumnMetric +from sdmetrics.single_column.statistical.cstest import CSTest +from sdmetrics.single_column.statistical.kstest import KSTest + +__all__ = [ + 'base', + 'statistical', + 'SingleColumnMetric', + 'CSTest', + 'KSTest', +] diff --git a/sdmetrics/single_column/base.py b/sdmetrics/single_column/base.py new file mode 100644 index 00000000..f508f92b --- /dev/null +++ b/sdmetrics/single_column/base.py @@ -0,0 +1,41 @@ +"""Base SingleColumnMetric class.""" + +from sdmetrics.base import BaseMetric + + +class SingleColumnMetric(BaseMetric): + """Base class for metrics that apply to individual columns. + + Attributes: + name (str): + Name to use when reports about this metric are printed. + goal (sdmetrics.goal.Goal): + The goal of this metric. + min_value (Union[float, tuple[float]]): + Minimum value or values that this metric can take. + max_value (Union[float, tuple[float]]): + Maximum value or values that this metric can take. + """ + + name = None + goal = None + min_value = None + max_value = None + + @staticmethod + def compute(real_data, synthetic_data): + """Compute this metric. + + Args: + real_data (Union[numpy.ndarray, pandas.Series]): + The values from the real dataset, passed as a 1d numpy + array or as a pandas.Series. + synthetic_data (Union[numpy.ndarray, pandas.Series]): + The values from the synthetic dataset, passed as a 1d numpy + array or as a pandas.Series. + + Returns: + Union[float, tuple[float]]: + Metric output. + """ + raise NotImplementedError() diff --git a/sdmetrics/single_column/statistical/__init__.py b/sdmetrics/single_column/statistical/__init__.py new file mode 100644 index 00000000..692879fa --- /dev/null +++ b/sdmetrics/single_column/statistical/__init__.py @@ -0,0 +1,9 @@ +"""Univariate goodness-of-fit tests.""" + +from sdmetrics.single_column.statistical.cstest import CSTest +from sdmetrics.single_column.statistical.kstest import KSTest + +__all__ = [ + 'CSTest', + 'KSTest' +] diff --git a/sdmetrics/single_column/statistical/cstest.py b/sdmetrics/single_column/statistical/cstest.py new file mode 100644 index 00000000..fa467f96 --- /dev/null +++ b/sdmetrics/single_column/statistical/cstest.py @@ -0,0 +1,54 @@ +"""Chi-Squared test based metric.""" + +from scipy.stats import chisquare + +from sdmetrics.goal import Goal +from sdmetrics.single_column.base import SingleColumnMetric +from sdmetrics.utils import get_frequencies + + +class CSTest(SingleColumnMetric): + """Chi-Squared test based metric. + + This metric uses the Chi-Squared test to compare the distributions + of the two categorical columns. It returns the resulting p-value so that + a small value indicates that we can reject the null hypothesis (i.e. and + suggests that the distributions are different). + + Attributes: + name (str): + Name to use when reports about this metric are printed. + goal (sdmetrics.goal.Goal): + The goal of this metric. + min_value (Union[float, tuple[float]]): + Minimum value or values that this metric can take. + max_value (Union[float, tuple[float]]): + Maximum value or values that this metric can take. + """ + + name = 'Chi-Squared' + goal = Goal.MAXIMIZE + min_value = 0.0 + max_value = 1.0 + + @staticmethod + def compute(real_data, synthetic_data): + """Compare two discrete columns using a Chi-Squared test. + + Args: + real_data (Union[numpy.ndarray, pandas.Series]): + The values from the real dataset. + synthetic_data (Union[numpy.ndarray, pandas.Series]): + The values from the synthetic dataset. + + Returns: + float: + The Chi-Squared test p-value + """ + f_obs, f_exp = get_frequencies(real_data, synthetic_data) + if len(f_obs) == len(f_exp) == 1: + pvalue = 1.0 + else: + _, pvalue = chisquare(f_obs, f_exp) + + return pvalue diff --git a/sdmetrics/single_column/statistical/kstest.py b/sdmetrics/single_column/statistical/kstest.py new file mode 100644 index 00000000..0db576b2 --- /dev/null +++ b/sdmetrics/single_column/statistical/kstest.py @@ -0,0 +1,55 @@ +"""Kolmogorov-Smirnov test based Metric.""" + +import pandas as pd +from scipy.stats import ks_2samp + +from sdmetrics.goal import Goal +from sdmetrics.single_column.base import SingleColumnMetric + + +class KSTest(SingleColumnMetric): + """Kolmogorov-Smirnov test based metric. + + This function uses the two-sample Kolmogorov–Smirnov test to compare + the distributions of the two continuous columns using the empirical CDF. + It returns 1 minus the KS Test D statistic, which indicates the maximum + distance between the expteced CDF and the observed CDF values. + + As a result, the output value is 1.0 if the distributions are identical + and 0.0 if they are completely different. + + Attributes: + name (str): + Name to use when reports about this metric are printed. + goal (sdmetrics.goal.Goal): + The goal of this metric. + min_value (Union[float, tuple[float]]): + Minimum value or values that this metric can take. + max_value (Union[float, tuple[float]]): + Maximum value or values that this metric can take. + """ + + name = 'Inverted Kolmogorov-Smirnov D statistic' + goal = Goal.MAXIMIZE + min_value = 0.0 + max_value = 1.0 + + @staticmethod + def compute(real_data, synthetic_data): + """Compare two continuous columns using a Kolmogorov–Smirnov test. + + Args: + real_data (Union[numpy.ndarray, pandas.Series]): + The values from the real dataset. + synthetic_data (Union[numpy.ndarray, pandas.Series]): + The values from the synthetic dataset. + + Returns: + float: + 1 minus the Kolmogorov–Smirnov D statistic. + """ + real_data = pd.Series(real_data).fillna(0) + synthetic_data = pd.Series(synthetic_data).fillna(0) + statistic, _ = ks_2samp(real_data, synthetic_data) + + return 1 - statistic diff --git a/sdmetrics/single_table/README.md b/sdmetrics/single_table/README.md new file mode 100644 index 00000000..973789e2 --- /dev/null +++ b/sdmetrics/single_table/README.md @@ -0,0 +1,150 @@ +# Single Table Metrics + +The metrics found on this folder operate on individual tables, passed as two `pandas.DataFrame`s. + +Implemented metrics: + +* BayesianNetwork Metrics: Metrics that fit a BayesianNetwork to the distribution of the real data + and later on evaluate the likelihood of the synthetic data having been sampled from that + distribution. + * `BNLikelihood`: Returns the average likelihood across all the rows in the synthetic dataset. + * `BNLogLikelihood`: Returns the average log likelihood across all the rows in the synthetic + dataset. +* GaussianMixture Metrics: Metrics that fit a GaussianMixture model to the distribution of the + real data and later on evaluate the likelihood of the synthetic data having been sampled from that + distribution. + * `GMLogLikelihood`: Fits multiple GMMs to the real data using different numbers of components + and returns the average log likelihood given by them to the synthetic data. +* Detection Metrics: Metrics that train a Machine Learning Classifier to distinguish between + the real and the synthetic data. The score obtained by these metrics is the complementary of the + score obtained by the classifier when cross validated. + * `LogisticDetection`: Detection metric based on a LogisticRegression from scikit learn. + * `SVCDetection`: Detection metric based on a SVC from scikit learn. +* ML Efficacy Metrics: Metrics that evaluate the score obtained by a Machine Learning model + when fitted on the synthetic data and then evaluated on the real data. The output is the score + obtained by the model. **warning**: These metrics can only be run on datasets that represent + machine learning problems, and the metric score range depends on the difficulty of the + corresponding problem. + * `BinaryDecisionTreeClassifier`: ML Efficacy metric for binary classifications problems, based + on a DecisionTreeClassifier from scikit-learn. + * `BinaryAdaBoostClassifier`: ML Efficacy metric for binary classifications problems, based + on an AdaBoostClassifier from scikit-learn. + * `BinaryLogisticRegressionClassifier`: ML Efficacy metric for binary classifications problems, based + on a LogisticRegression from scikit-learn. + * `BinaryMLPClassifier`: ML Efficacy metric for binary classifications problems, based + on an MLPClassifier from scikit-learn. + * `MulticlassDecisionTreeClassifier`: ML Efficacy metric for multiclass classifications problems, based + on a DecisionTreeClassifier from scikit-learn. + * `MulticlassMLPClassifier`: ML Efficacy metric for multiclass classifications problems, based + on an MLPClassifier from scikit-learn. + * `LinearRegressionClassifier`: ML Efficacy metric for regression problems, based + on a LinearRegression from scikit-learn. + * `MLPRegressor`: ML Efficacy metric for regression problems, based + on an MLPRegressor from scikit-learn. + * `MLEfficacy`: Generic ML Efficacy metric that detects the type of ML Problem associated + with the dataset by analyzing the target column type and then applies all the metrics + that are compatible with it. +* MultiSingleColumn Metrics: Metrics that apply a Single Column metric on each column from + the table that is compatible with it and then compute the average across all the columns. + * `CSTest`: MultiSingleColumn metric based on applying the Single Column CSTest on all + the categorical variables. + * `KSTest`: MultiSingleColumn metric based on applying the Single Column KSTest on all + the numerical variables. + * `KSTestExtended`: MultiSingleColumn metric based on applying the Single Column KSTest on + all the numerical variables that result from transforming all the columsn from the tables + using an RDT HyperTransformer. +* MultiColumnPairs Metrics: Metrics that apply a ColumnPairs metric on each pair of columns from + the tables which are compatible with it and then compute the average across all the columns pairs. + * `ContinuousKLDivergence`: MultiColumnPairs metric based on applying the ColumnPairs + ContinuousKLDivergence on all the possible pairs of numerical columns. + * `DiscreteKLDivergence`: MultiColumnPairs metric based on applying the ColumnPairs + DiscreteKLDivergence on all the possible pairs of categorical and boolean columns. + +## SingleTableMetric + +All the single table metrics are subclasses form the `sdmetrics.single_table.SingleTableMetric` +class, which can be used to locate all of them: + +```python3 +In [1]: from sdmetrics.single_table import SingleTableMetric + +In [2]: SingleTableMetric.get_subclasses() +Out[2]: +{'BNLogLikelihood': sdmetrics.single_table.bayesian_network.BNLogLikelihood, + 'LogisticDetection': sdmetrics.single_table.detection.sklearn.LogisticDetection, + 'SVCDetection': sdmetrics.single_table.detection.sklearn.SVCDetection, + 'BinaryDecisionTreeClassifier': sdmetrics.single_table.efficacy.binary.BinaryDecisionTreeClassifier, + 'BinaryAdaBoostClassifier': sdmetrics.single_table.efficacy.binary.BinaryAdaBoostClassifier, + 'BinaryLogisticRegression': sdmetrics.single_table.efficacy.binary.BinaryLogisticRegression, + 'BinaryMLPClassifier': sdmetrics.single_table.efficacy.binary.BinaryMLPClassifier, + 'MulticlassDecisionTreeClassifier': sdmetrics.single_table.efficacy.multiclass.MulticlassDecisionTreeClassifier, + 'MulticlassMLPClassifier': sdmetrics.single_table.efficacy.multiclass.MulticlassMLPClassifier, + 'LinearRegression': sdmetrics.single_table.efficacy.regression.LinearRegression, + 'MLPRegressor': sdmetrics.single_table.efficacy.regression.MLPRegressor, + 'GMLogLikelihood': sdmetrics.single_table.gaussian_mixture.GMLogLikelihood, + 'CSTest': sdmetrics.single_table.multi_single_column.CSTest, + 'KSTest': sdmetrics.single_table.multi_single_column.KSTest, + 'KSTestExtended': sdmetrics.single_table.multi_single_column.KSTestExtended, + 'ContinuousKLDivergence': sdmetrics.single_table.multi_column_pairs.ContinuousKLDivergence, + 'DiscreteKLDivergence': sdmetrics.single_table.multi_column_pairs.DiscreteKLDivergence} +``` + +## Single Table Inputs and Outputs + +All the single table metrics operate on at least two inputs: + +* `real_data`: A `pandas.DataFrame` with the data from the real dataset. +* `synthetic_data`: A `pandas.DataFrame` with the data from the synthetic dataset. + +For example, a `LogisticDetection` metric can be used on the `users` table from the +demo data as follows: + +```python3 +In [3]: from sdmetrics.single_table import LogisticDetection + +In [4]: from sdmetrics import load_demo + +In [5]: real_data, synthetic_data, metadata = load_demo() + +In [6]: real_table = real_data['users'] + +In [7]: synthetic_table = synthetic_data['users'] + +In [8]: LogisticDetection.compute(real_table, synthetic_table) +Out[8]: 1.0 +``` + +Some metrics also require additional information, such as the `target` column to use +when running an ML Efficacy metric. + +For example, this is how you would use a `MulticlassDecisionTreeClassifier` on the `country` +column from the demo table `users`: + +```python3 +In [9]: from sdmetrics.single_table import MulticlassDecisionTreeClassifier + +In [10]: MulticlassDecisionTreeClassifier.compute(real_table, synthetic_table, target='country') +Out[10]: (0.05555555555555555,) +``` + +Additionally, all the metrics accept a `metadata` argument which must be a dict following +the Metadata JSON schema from SDV, which will be used to determine which columns are compatible +with each one of the different metrics, as well as to extract any additional information required +by the metrics, such as the `target` column to use for ML Efficacy metrics. + +If this dictionary is not passed it will be built based on the data found in the real table, +but in this case some field types may not represent the data accurately (e.g. categorical +columns that contain only integer values will be seen as numerical), and any additional +information required by the metrics will not be populated. + +For example, we could execute the same metric as before by adding the `target` entry to the +metadata dict: + +```python +In [11]: users_metadata = metadata['tables']['users'].copy() + +In [12]: users_metadata['target'] = 'country' + +In [13]: MulticlassDecisionTreeClassifier.compute(real_table, synthetic_table, metadata=users_metadata) +Out[13]: (0.05555555555555555,) +``` diff --git a/sdmetrics/single_table/__init__.py b/sdmetrics/single_table/__init__.py new file mode 100644 index 00000000..a9a0ad3b --- /dev/null +++ b/sdmetrics/single_table/__init__.py @@ -0,0 +1,58 @@ +"""Metrics for single table datasets.""" + +from sdmetrics.single_table import ( + base, bayesian_network, detection, efficacy, gaussian_mixture, multi_single_column) +from sdmetrics.single_table.base import SingleTableMetric +from sdmetrics.single_table.bayesian_network import BNLikelihood, BNLogLikelihood +from sdmetrics.single_table.detection.base import DetectionMetric +from sdmetrics.single_table.detection.sklearn import ( + LogisticDetection, ScikitLearnClassifierDetectionMetric, SVCDetection) +from sdmetrics.single_table.efficacy.base import MLEfficacyMetric +from sdmetrics.single_table.efficacy.binary import ( + BinaryAdaBoostClassifier, BinaryDecisionTreeClassifier, BinaryEfficacyMetric, + BinaryLogisticRegression, BinaryMLPClassifier) +from sdmetrics.single_table.efficacy.multiclass import ( + MulticlassDecisionTreeClassifier, MulticlassEfficacyMetric, MulticlassMLPClassifier) +from sdmetrics.single_table.efficacy.regression import ( + LinearRegression, MLPRegressor, RegressionEfficacyMetric) +from sdmetrics.single_table.gaussian_mixture import GMLogLikelihood +from sdmetrics.single_table.multi_column_pairs import ( + ContinuousKLDivergence, DiscreteKLDivergence, MultiColumnPairsMetric) +from sdmetrics.single_table.multi_single_column import ( + CSTest, KSTest, KSTestExtended, MultiSingleColumnMetric) + +__all__ = [ + 'bayesian_network', + 'base', + 'detection', + 'efficacy', + 'gaussian_mixture', + 'multi_single_column', + 'SingleTableMetric', + 'BNLikelihood', + 'BNLogLikelihood', + 'DetectionMetric', + 'LogisticDetection', + 'SVCDetection', + 'ScikitLearnClassifierDetectionMetric', + 'MLEfficacyMetric', + 'BinaryEfficacyMetric', + 'BinaryDecisionTreeClassifier', + 'BinaryAdaBoostClassifier', + 'BinaryLogisticRegression', + 'BinaryMLPClassifier', + 'MulticlassEfficacyMetric', + 'MulticlassDecisionTreeClassifier', + 'MulticlassMLPClassifier', + 'RegressionEfficacyMetric', + 'LinearRegression', + 'MLPRegressor', + 'GMLogLikelihood', + 'MultiColumnPairsMetric', + 'ContinuousKLDivergence', + 'DiscreteKLDivergence', + 'MultiSingleColumnMetric', + 'CSTest', + 'KSTest', + 'KSTestExtended', +] diff --git a/sdmetrics/single_table/base.py b/sdmetrics/single_table/base.py new file mode 100644 index 00000000..1d005fe6 --- /dev/null +++ b/sdmetrics/single_table/base.py @@ -0,0 +1,116 @@ +"""Base Single Table metric class.""" + +from operator import attrgetter + +from sdmetrics.base import BaseMetric + + +class SingleTableMetric(BaseMetric): + """Base class for metrics that apply to single tables. + + Input to these family of metrics are ``pandas.DataFrame``s and + ``dict`` representations of the corresponding ``Table`` metadata. + + Attributes: + name (str): + Name to use when reports about this metric are printed. + goal (sdmetrics.goal.Goal): + The goal of this metric. + min_value (Union[float, tuple[float]]): + Minimum value or values that this metric can take. + max_value (Union[float, tuple[float]]): + Maximum value or values that this metric can take. + """ + + name = None + goal = None + min_value = None + max_value = None + + _DTYPES_TO_TYPES = { + 'i': { + 'type': 'numerical', + 'subtype': 'integer', + }, + 'f': { + 'type': 'numerical', + 'subtype': 'float', + }, + 'O': { + 'type': 'categorical', + }, + 'b': { + 'type': 'boolean', + }, + 'M': { + 'type': 'datetime', + } + } + + @classmethod + def _select_fields(cls, metadata, types): + fields = [] + if isinstance(types, str): + types = (types, ) + + for field_name, field_meta in metadata['fields'].items(): + field_type = field_meta['type'] + field_subtype = field_meta.get('subtype') + if any(t in types for t in (field_type, (field_type, ), (field_type, field_subtype))): + fields.append(field_name) + + return fields + + @classmethod + def _validate_inputs(cls, real_data, synthetic_data, metadata=None): + """Validate the inputs and return a valid metadata. + + If a metadata is passed, the data is validated against it. + + If no metadata is passed, one is built based on the ``real_data`` values. + """ + if set(real_data.columns) != set(synthetic_data.columns): + raise ValueError('`real_data` and `synthetic_data` must have the same columns') + + if metadata is not None: + if not isinstance(metadata, dict): + metadata = metadata.to_dict() + + fields = metadata['fields'] + for column in real_data.columns: + if column not in fields: + raise ValueError(f'Column {column} not found in metadata') + + for field in fields.keys(): + if field not in real_data.columns: + raise ValueError(f'Field {field} not found in data') + + return metadata + + dtype_kinds = real_data.dtypes.apply(attrgetter('kind')) + return {'fields': dtype_kinds.apply(cls._DTYPES_TO_TYPES.get).to_dict()} + + @classmethod + def compute(cls, real_data, synthetic_data, metadata=None): + """Compute this metric. + + Real data and synthetic data must be passed as ``pandas.DataFrame`` instances + and ``metadata`` as a ``Table`` metadata ``dict`` representation. + + If no ``metadata`` is given, one will be built from the values observed + in the ``real_data``. + + Args: + real_data (pandas.DataFrame): + The values from the real dataset, passed as a pandas.DataFrame. + synthetic_data (pandas.DataFrame): + The values from the synthetic dataset, passed as a pandas.DataFrame. + metadata (dict): + Table metadata dict. If not passed, it is build based on the + real_data fields and dtypes. + + Returns: + Union[float, tuple[float]]: + Metric output. + """ + raise NotImplementedError() diff --git a/sdmetrics/single_table/bayesian_network.py b/sdmetrics/single_table/bayesian_network.py new file mode 100644 index 00000000..6c82fa08 --- /dev/null +++ b/sdmetrics/single_table/bayesian_network.py @@ -0,0 +1,186 @@ +"""BayesianNetwork based metrics for single table.""" + +import json +import logging + +import numpy as np +from pomegranate import BayesianNetwork + +from sdmetrics.goal import Goal +from sdmetrics.single_table.base import SingleTableMetric + +LOGGER = logging.getLogger(__name__) + + +class BNLikelihood(SingleTableMetric): + """BayesianNetwork Likelihood Single Table metric. + + This metric fits a BayesianNetwork to the real data and then evaluates how + likely it is that the synthetic data belongs to the same distribution. + + The output is the average probability across all the synthetic rows. + + Attributes: + name (str): + Name to use when reports about this metric are printed. + goal (sdmetrics.goal.Goal): + The goal of this metric. + min_value (Union[float, tuple[float]]): + Minimum value or values that this metric can take. + max_value (Union[float, tuple[float]]): + Maximum value or values that this metric can take. + """ + + name = 'BayesianNetwork Likelihood' + goal = Goal.MAXIMIZE + min_value = 0.0 + max_value = 1.0 + + @classmethod + def _likelihoods(cls, real_data, synthetic_data, metadata=None, structure=None): + metadata = cls._validate_inputs(real_data, synthetic_data, metadata) + structure = metadata.get('structure', structure) + fields = cls._select_fields(metadata, ('categorical', 'boolean')) + + if not fields: + return np.full(len(real_data), np.nan) + + LOGGER.debug('Fitting the BayesianNetwork to the real data') + if structure: + if isinstance(structure, dict): + structure = BayesianNetwork.from_json(json.dumps(structure)).structure + + bn = BayesianNetwork.from_structure(real_data[fields].to_numpy(), structure) + else: + bn = BayesianNetwork.from_samples(real_data[fields].to_numpy(), algorithm='chow-liu') + + LOGGER.debug('Evaluating likelihood of the synthetic data') + probabilities = [] + for _, row in synthetic_data[fields].iterrows(): + try: + probabilities.append(bn.probability([row.to_numpy()])) + except ValueError: + probabilities.append(0) + + return np.asarray(probabilities) + + @classmethod + def compute(cls, real_data, synthetic_data, metadata=None, structure=None): + """Compute this metric. + + This fits a BayesianNetwork to the real data and then evaluates how + likely it is that the synthetic data belongs to the same distribution. + + Real data and synthetic data must be passed as ``pandas.DataFrame`` instances + and ``metadata`` as a ``Table`` metadata ``dict`` representation. + + If no ``metadata`` is given, one will be built from the values observed + in the ``real_data``. + + If a ``structure`` is given, either directly or as a ``structure`` first level + entry within the ``metadata`` dict, it is passed to the underlying BayesianNetwork + for fitting. Otherwise, the structure is learned from the data using the ``chow-liu`` + algorithm. + + ``structure`` can be passed as either a tuple of tuples representing only the + network structure or as a ``dict`` representing a full serialization of a previously + fitted ``BayesianNetwork``. In the later scenario, only the ``structure`` will be + extracted from the ``BayesianNetwork`` instance, and then a new one will be fitted + to the given data. + + The output is the average probability across all the synthetic rows. + + Args: + real_data (Union[numpy.ndarray, pandas.DataFrame]): + The values from the real dataset. + synthetic_data (Union[numpy.ndarray, pandas.DataFrame]): + The values from the synthetic dataset. + metadata (dict): + Table metadata dict. If not passed, it is build based on the + real_data fields and dtypes. Optionally, the metadata can include + a ``structure`` entry with the structure of the Bayesian Network. + structure (dict): + Optional. BayesianNetwork structure to use when fitting + to the real data. If not passed, learn it from the data + using the ``chow-liu`` algorith. This is ignored if ``metadata`` + is passed and it contains a ``structure`` entry in it. + + Returns: + float: + Mean of the probabilities returned by the Bayesian Network. + """ + return np.mean(cls._likelihoods(real_data, synthetic_data, metadata, structure)) + + +class BNLogLikelihood(BNLikelihood): + """BayesianNetwork Log Likelihood Single Table metric. + + This metric fits a BayesianNetwork to the real data and then evaluates how + likely it is that the synthetic data belongs to the same distribution. + + The output is the average log probability across all the synthetic rows. + + Attributes: + name (str): + Name to use when reports about this metric are printed. + goal (sdmetrics.goal.Goal): + The goal of this metric. + min_value (Union[float, tuple[float]]): + Minimum value or values that this metric can take. + max_value (Union[float, tuple[float]]): + Maximum value or values that this metric can take. + """ + + name = 'BayesianNetwork Log Likelihood' + goal = Goal.MAXIMIZE + min_value = -np.inf + max_value = 0 + + @classmethod + def compute(cls, real_data, synthetic_data, metadata=None, structure=None): + """Compute this metric. + + This fits a BayesianNetwork to the real data and then evaluates how + likely it is that the synthetic data belongs to the same distribution. + + Real data and synthetic data must be passed as ``pandas.DataFrame`` instances + and ``metadata`` as a ``Table`` metadata ``dict`` representation. + + If no ``metadata`` is given, one will be built from the values observed + in the ``real_data``. + + If a ``structure`` is given, either directly or as a ``structure`` first level + entry within the ``metadata`` dict, it is passed to the underlying BayesianNetwork + for fitting. Otherwise, the structure is learned from the data using the ``chow-liu`` + algorithm. + + ``structure`` can be passed as either a tuple of tuples representing only the + network structure or as a ``dict`` representing a full serialization of a previously + fitted ``BayesianNetwork``. In the later scenario, only the ``structure`` will be + extracted from the ``BayesianNetwork`` instance, and then a new one will be fitted + to the given data. + + The output is the average log probability across all the synthetic rows. + + Args: + real_data (Union[numpy.ndarray, pandas.DataFrame]): + The values from the real dataset. + synthetic_data (Union[numpy.ndarray, pandas.DataFrame]): + The values from the synthetic dataset. + metadata (dict): + Table metadata dict. If not passed, it is build based on the + real_data fields and dtypes. Optionally, the metadata can include + a ``structure`` entry with the structure of the Bayesian Network. + structure (dict): + Optional. BayesianNetwork structure to use when fitting + to the real data. If not passed, learn it from the data + using the ``chow-liu`` algorith. This is ignored if ``metadata`` + is passed and it contains a ``structure`` entry in it. + + Returns: + float: + Mean of the log probabilities returned by the Bayesian Network. + """ + likelihoods = cls._likelihoods(real_data, synthetic_data, metadata, structure) + likelihoods[np.where(likelihoods == 0)] = 1e-8 + return np.mean(np.log(likelihoods)) diff --git a/sdmetrics/single_table/detection/__init__.py b/sdmetrics/single_table/detection/__init__.py new file mode 100644 index 00000000..b987a119 --- /dev/null +++ b/sdmetrics/single_table/detection/__init__.py @@ -0,0 +1,8 @@ +"""Machine Learning Detection metrics for single table datasets.""" + +from sdmetrics.single_table.detection.sklearn import LogisticDetection, SVCDetection + +__all__ = [ + 'LogisticDetection', + 'SVCDetection' +] diff --git a/sdmetrics/single_table/detection/base.py b/sdmetrics/single_table/detection/base.py new file mode 100644 index 00000000..23ea8b18 --- /dev/null +++ b/sdmetrics/single_table/detection/base.py @@ -0,0 +1,81 @@ +"""Base class for Machine Learning Detection metrics for single table datasets.""" + +import numpy as np +from rdt import HyperTransformer +from sklearn.metrics import roc_auc_score +from sklearn.model_selection import StratifiedKFold + +from sdmetrics.goal import Goal +from sdmetrics.single_table.base import SingleTableMetric + + +class DetectionMetric(SingleTableMetric): + """Base class for Machine Learning Detection based metrics on single tables. + + These metrics build a Machine Learning Classifier that learns to tell the synthetic + data apart from the real data, which later on is evaluated using Cross Validation. + + The output of the metric is one minus the average ROC AUC score obtained. + + Attributes: + name (str): + Name to use when reports about this metric are printed. + goal (sdmetrics.goal.Goal): + The goal of this metric. + min_value (Union[float, tuple[float]]): + Minimum value or values that this metric can take. + max_value (Union[float, tuple[float]]): + Maximum value or values that this metric can take. + """ + + name = 'SingleTable Detection' + goal = Goal.MAXIMIZE + min_value = 0.0 + max_value = 1.0 + + @staticmethod + def _fit_predict(X_train, y_train, X_test): + """Fit a classifier and then use it to predict.""" + raise NotImplementedError() + + @classmethod + def compute(cls, real_data, synthetic_data, metadata=None): + """Compute this metric. + + This builds a Machine Learning Classifier that learns to tell the synthetic + data apart from the real data, which later on is evaluated using Cross Validation. + + The output of the metric is one minus the average ROC AUC score obtained. + + Args: + real_data (Union[numpy.ndarray, pandas.DataFrame]): + The values from the real dataset. + synthetic_data (Union[numpy.ndarray, pandas.DataFrame]): + The values from the synthetic dataset. + metadata (dict): + Table metadata dict. If not passed, it is build based on the + real_data fields and dtypes. + + Returns: + float: + One minus the ROC AUC Cross Validation Score obtained by the classifier. + """ + metadata = cls._validate_inputs(real_data, synthetic_data, metadata) + transformer = HyperTransformer(dtype_transformers={'O': 'one_hot_encoding'}) + real_data = transformer.fit_transform(real_data).values + synthetic_data = transformer.transform(synthetic_data).values + + X = np.concatenate([real_data, synthetic_data]) + y = np.hstack([np.ones(len(real_data)), np.zeros(len(synthetic_data))]) + if np.isin(X, [np.inf, -np.inf]).any(): + X[np.isin(X, [np.inf, -np.inf])] = np.nan + + scores = [] + kf = StratifiedKFold(n_splits=3, shuffle=True) + for train_index, test_index in kf.split(X, y): + y_pred = cls._fit_predict(X[train_index], y[train_index], X[test_index]) + roc_auc = roc_auc_score(y[test_index], y_pred) + + scores.append(max(0.5, roc_auc) * 2 - 1) + + return 1 - np.mean(scores) diff --git a/sdmetrics/single_table/detection/sklearn.py b/sdmetrics/single_table/detection/sklearn.py new file mode 100644 index 00000000..8f32ece6 --- /dev/null +++ b/sdmetrics/single_table/detection/sklearn.py @@ -0,0 +1,69 @@ +"""scikit-learn based DetectionMetrics for single table datasets.""" + +from sklearn.impute import SimpleImputer +from sklearn.linear_model import LogisticRegression +from sklearn.pipeline import Pipeline +from sklearn.preprocessing import RobustScaler +from sklearn.svm import SVC + +from sdmetrics.single_table.detection.base import DetectionMetric + + +class ScikitLearnClassifierDetectionMetric(DetectionMetric): + """Base class for Detection metrics build using Scikit Learn Classifiers. + + The base class for these metrics makes a prediction using a scikit-learn + pipeline which contains a SimpleImputer, a RobustScaler and finally + the classifier, which is defined in the subclasses. + """ + + name = 'Scikit-Learn Detection' + + @staticmethod + def _get_classifier(): + """Build and return an instance of a scikit-learn Classifier.""" + raise NotImplementedError() + + @classmethod + def _fit_predict(cls, X_train, y_train, X_test): + """Fit a pipeline to train data and then use it to make prediction on test data.""" + model = Pipeline([ + ('imputer', SimpleImputer()), + ('scalar', RobustScaler()), + ('classifier', cls._get_classifier()), + ]) + model.fit(X_train, y_train) + + return model.predict_proba(X_test)[:, 1] + + +class LogisticDetection(ScikitLearnClassifierDetectionMetric): + """ScikitLearnClassifierDetectionMetric based on a LogisticRegression. + + This metric builds a LogisticRegression Classifier that learns to tell the synthetic + data apart from the real data, which later on is evaluated using Cross Validation. + + The output of the metric is one minus the average ROC AUC score obtained. + """ + + name = "LogisticRegression Detection" + + @staticmethod + def _get_classifier(): + return LogisticRegression(solver="lbfgs") + + +class SVCDetection(ScikitLearnClassifierDetectionMetric): + """ScikitLearnClassifierDetectionMetric based on a SVC. + + This metric builds a SVC Classifier that learns to tell the synthetic + data apart from the real data, which later on is evaluated using Cross Validation. + + The output of the metric is one minus the average ROC AUC score obtained. + """ + + name = "SVC Detection" + + @staticmethod + def _get_classifier(): + return SVC(probability=True, gamma='scale') diff --git a/sdmetrics/single_table/efficacy/__init__.py b/sdmetrics/single_table/efficacy/__init__.py new file mode 100644 index 00000000..12891501 --- /dev/null +++ b/sdmetrics/single_table/efficacy/__init__.py @@ -0,0 +1,27 @@ +from sdmetrics.single_table.efficacy import binary, multiclass, regression +from sdmetrics.single_table.efficacy.base import MLEfficacyMetric +from sdmetrics.single_table.efficacy.binary import ( + BinaryAdaBoostClassifier, BinaryDecisionTreeClassifier, BinaryEfficacyMetric, + BinaryLogisticRegression, BinaryMLPClassifier) +from sdmetrics.single_table.efficacy.multiclass import ( + MulticlassDecisionTreeClassifier, MulticlassEfficacyMetric, MulticlassMLPClassifier) +from sdmetrics.single_table.efficacy.regression import ( + LinearRegression, MLPRegressor, RegressionEfficacyMetric) + +__all__ = [ + 'binary', + 'multiclass', + 'regression', + 'MLEfficacyMetric', + 'BinaryEfficacyMetric', + 'BinaryDecisionTreeClassifier', + 'BinaryAdaBoostClassifier', + 'BinaryLogisticRegression', + 'BinaryMLPClassifier', + 'MulticlassEfficacyMetric', + 'MulticlassDecisionTreeClassifier', + 'MulticlassMLPClassifier', + 'RegressionEfficacyMetric', + 'LinearRegression', + 'MLPRegressor' +] diff --git a/sdmetrics/single_table/efficacy/base.py b/sdmetrics/single_table/efficacy/base.py new file mode 100644 index 00000000..52331a3c --- /dev/null +++ b/sdmetrics/single_table/efficacy/base.py @@ -0,0 +1,123 @@ +"""Base class for Efficacy metrics for single table datasets.""" + +import numpy as np +import rdt +from sklearn.impute import SimpleImputer +from sklearn.pipeline import Pipeline +from sklearn.preprocessing import RobustScaler + +from sdmetrics.single_table.base import SingleTableMetric + + +class MLEfficacyMetric(SingleTableMetric): + """Base class for Machine Learning Efficacy metrics on single tables. + + These metrics fit a Machine Learning model on the synthetic data and + then evaluate it making predictions on the real data. + + Attributes: + name (str): + Name to use when reports about this metric are printed. + goal (sdmetrics.goal.Goal): + The goal of this metric. + min_value (Union[float, tuple[float]]): + Minimum value or values that this metric can take. + max_value (Union[float, tuple[float]]): + Maximum value or values that this metric can take. + model: + Model class to use for the prediction. + model_kwargs: + Keyword arguments to use to create the model instance. + """ + + name = None + goal = None + min_value = None + max_value = None + MODEL = None + MODEL_KWARGS = None + METRICS = None + + @classmethod + def _fit_predict(cls, synthetic_data, synthetic_target, real_data): + """Fit a model in the synthetic data and make predictions for the real data.""" + unique_labels = np.unique(synthetic_target) + if len(unique_labels) == 1: + predictions = np.full(len(real_data), unique_labels[0]) + else: + transformer = rdt.HyperTransformer(dtype_transformers={'O': 'one_hot_encoding'}) + real_data = transformer.fit_transform(real_data) + synthetic_data = transformer.transform(synthetic_data) + + real_data[np.isin(real_data, [np.inf, -np.inf])] = None + synthetic_data[np.isin(synthetic_data, [np.inf, -np.inf])] = None + + model_kwargs = cls.MODEL_KWARGS.copy() if cls.MODEL_KWARGS else {} + model = cls.MODEL(**model_kwargs) + + pipeline = Pipeline([ + ('imputer', SimpleImputer()), + ('scaler', RobustScaler()), + ('model', model) + ]) + + pipeline.fit(synthetic_data, synthetic_target) + + predictions = pipeline.predict(real_data) + + return predictions + + @classmethod + def _validate_inputs(cls, real_data, synthetic_data, metadata, target): + metadata = super()._validate_inputs(real_data, synthetic_data, metadata) + if 'target' in metadata: + target = metadata['target'] + elif target is None: + raise TypeError('`target` must be passed either directly or inside `metadata`') + + return target + + @classmethod + def compute(cls, real_data, synthetic_data, metadata=None, target=None, scorer=None): + """Compute this metric. + + This fits a Machine Learning model on the synthetic data and + then evaluates it making predictions on the real data. + + A ``target`` column name must be given, either directly or as a first level + entry in the ``metadata`` dict, which will be used as the target column for the + Machine Learning prediction. + + Optionally, a list of ML scorer functions can be given. Otherwise, the default + one for the type of problem is used. + + Args: + real_data (Union[numpy.ndarray, pandas.DataFrame]): + The values from the real dataset. + synthetic_data (Union[numpy.ndarray, pandas.DataFrame]): + The values from the synthetic dataset. + target (str): + Name of the column to use as the target. + scorer (Union[callable, list[callable], NoneType]): + Scorer (or list of scorers) to apply. If not passed, use the default + one for the type of metric. + + Returns: + union[float, tuple[float]]: + Scores obtained by the models when evaluated on the real data. + """ + target = cls._validate_inputs(real_data, synthetic_data, metadata, target) + + real_data = real_data.copy() + synthetic_data = synthetic_data.copy() + real_target = real_data.pop(target) + synthetic_target = synthetic_data.pop(target) + + predictions = cls._fit_predict(synthetic_data, synthetic_target, real_data) + + scorer = scorer or cls.SCORER + if isinstance(scorer, (list, tuple)): + scorers = scorer + return tuple((scorer(real_target, predictions) for scorer in scorers)) + else: + return scorer(real_target, predictions) diff --git a/sdmetrics/single_table/efficacy/binary.py b/sdmetrics/single_table/efficacy/binary.py new file mode 100644 index 00000000..3d53b182 --- /dev/null +++ b/sdmetrics/single_table/efficacy/binary.py @@ -0,0 +1,74 @@ +"""Base class for Efficacy metrics for single table datasets.""" + +from sklearn.ensemble import AdaBoostClassifier +from sklearn.linear_model import LogisticRegression +from sklearn.metrics import f1_score +from sklearn.neural_network import MLPClassifier +from sklearn.tree import DecisionTreeClassifier + +from sdmetrics.goal import Goal +from sdmetrics.single_table.efficacy.base import MLEfficacyMetric + + +class BinaryEfficacyMetric(MLEfficacyMetric): + """Base class for Binary Classification Efficacy metrics.""" + + name = None + goal = Goal.MAXIMIZE + min_value = 0 + max_value = 1 + SCORER = f1_score + + +class BinaryDecisionTreeClassifier(BinaryEfficacyMetric): + """Binary DecisionTreeClassifier Efficacy based metric. + + This fits a DecisionTreeClassifier to the synthetic data and + then evaluates it making predictions on the real data. + """ + + MODEL = DecisionTreeClassifier + MODEL_KWARGS = { + 'max_depth': 15, + 'class_weight': 'balanced' + } + + +class BinaryAdaBoostClassifier(BinaryEfficacyMetric): + """Binary AdaBoostClassifier Efficacy based metric. + + This fits an AdaBoostClassifier to the synthetic data and + then evaluates it making predictions on the real data. + """ + + MODEL = AdaBoostClassifier + + +class BinaryLogisticRegression(BinaryEfficacyMetric): + """Binary LogisticRegression Efficacy based metric. + + This fits a LogisticRegression to the synthetic data and + then evaluates it making predictions on the real data. + """ + + MODEL = LogisticRegression + MODEL_KWARGS = { + 'solver': 'lbfgs', + 'n_jobs': 2, + 'class_weight': 'balanced', + 'max_iter': 50 + } + + +class BinaryMLPClassifier(BinaryEfficacyMetric): + """Binary MLPClassifier Efficacy based metric. + + This fits a MLPClassifier to the synthetic data and + then evaluates it making predictions on the real data. + """ + + MODEL = MLPClassifier + MODEL_KWARGS = { + 'hidden_layer_sizes': (50, ), + 'max_iter': 50 + } diff --git a/sdmetrics/single_table/efficacy/mlefficacy.py b/sdmetrics/single_table/efficacy/mlefficacy.py new file mode 100644 index 00000000..5fc41fb4 --- /dev/null +++ b/sdmetrics/single_table/efficacy/mlefficacy.py @@ -0,0 +1,86 @@ +import logging + +import numpy as np + +from sdmetrics.goal import Goal +from sdmetrics.single_table.efficacy.base import MLEfficacyMetric +from sdmetrics.single_table.efficacy.binary import BinaryEfficacyMetric +from sdmetrics.single_table.efficacy.multiclass import MulticlassEfficacyMetric +from sdmetrics.single_table.efficacy.regression import RegressionEfficacyMetric + +LOGGER = logging.getLogger(__name__) + + +class MLEfficacy(MLEfficacyMetric): + """Problem and ML Model agnostic efficacy metric. + + This metric analyzes the target column and applies all the Regression, Binary + Classification or Multiclass Classification metrics to the table depending + on the type of column that needs to be predicted. + + The output is the average score obtained by the different metrics of the + chosen type. + """ + + name = 'Machine Learning Efficacy' + goal = Goal.MAXIMIZE + min_value = -np.inf + max_value = np.inf + + @classmethod + def compute(cls, real_data, synthetic_data, metadata=None, target=None): + """Compute this metric. + + A ``target`` column name must be given, either directly or as a first level + entry in the ``metadata`` dict, which will be used as the target column for the + Machine Learning prediction. + + This analyzes the target column and applies all the Regression, Binary + Classification or Multiclass Classification metrics to the table depending + on the type of column that needs to be predicted. + + The output is the average score obtained by the different metrics of the + chosen type. + + Args: + real_data (Union[numpy.ndarray, pandas.DataFrame]): + The values from the real dataset. + synthetic_data (Union[numpy.ndarray, pandas.DataFrame]): + The values from the synthetic dataset. + target (str): + Name of the column to use as the target. + scorer (Union[callable, list[callable], NoneType]): + Scorer (or list of scorers) to apply. If not passed, use the default + one for the type of metric. + + Returns: + union[float, tuple[float]]: + Scores obtained by the models when evaluated on the real data. + """ + target = cls._validate_inputs(real_data, synthetic_data, metadata, target) + target_type = metadata['fields'][target]['type'] + target_data = real_data[target] + uniques = target_data.unique() + if len(uniques) == 2: + LOGGER.info('MLEfficacy: Selecting Binary Classification metrics') + if target_data.dtype == 'object': + first_label = target_data.unique()[0] + real_data = real_data.copy() + synthetic_data = synthetic_data.copy() + real_data[target] = target_data == first_label + synthetic_data[target] = synthetic_data[target] == first_label + + metrics = BinaryEfficacyMetric.get_subclasses() + elif target_type == 'numerical': + LOGGER.info('MLEfficacy: Selecting Regression metrics') + metrics = RegressionEfficacyMetric.get_subclasses() + elif target_type == 'categorical': + LOGGER.info('MLEfficacy: Selecting Multiclass Classification metrics') + metrics = MulticlassEfficacyMetric.get_subclasses() + else: + raise ValueError(f'Unsupported target type: {target_type}') + + scores = [] + for name, metric in metrics.items(): + LOGGER.info('MLEfficacy: Computing %s', name) + scores.append(metric.compute(real_data, synthetic_data, metadata, target)) diff --git a/sdmetrics/single_table/efficacy/multiclass.py b/sdmetrics/single_table/efficacy/multiclass.py new file mode 100644 index 00000000..a42a2ed6 --- /dev/null +++ b/sdmetrics/single_table/efficacy/multiclass.py @@ -0,0 +1,50 @@ +"""Base class for Efficacy metrics for single table datasets.""" + +from sklearn.metrics import f1_score +from sklearn.neural_network import MLPClassifier +from sklearn.tree import DecisionTreeClassifier + +from sdmetrics.goal import Goal +from sdmetrics.single_table.efficacy.base import MLEfficacyMetric + + +def f1_macro(real_target, predictions): + return f1_score(real_target, predictions, average='macro') + + +class MulticlassEfficacyMetric(MLEfficacyMetric): + """Base class for Multiclass Classification Efficacy Metrics.""" + + name = None + goal = Goal.MAXIMIZE + min_value = 0 + max_value = 1 + SCORER = f1_macro + + +class MulticlassDecisionTreeClassifier(MulticlassEfficacyMetric): + """Multiclass DecisionTreeClassifier Efficacy based metric. + + This fits a DecisionTreeClassifier to the synthetic data and + then evaluates it making predictions on the real data. + """ + + MODEL = DecisionTreeClassifier + MODEL_KWARGS = { + 'max_depth': 30, + 'class_weight': 'balanced', + } + + +class MulticlassMLPClassifier(MulticlassEfficacyMetric): + """Multiclass MLPClassifier Efficacy based metric. + + This fits a MLPClassifier to the synthetic data and + then evaluates it making predictions on the real data. + """ + + MODEL = MLPClassifier + MODEL_KWARGS = { + 'hidden_layer_sizes': (100, ), + 'max_iter': 50 + } diff --git a/sdmetrics/single_table/efficacy/regression.py b/sdmetrics/single_table/efficacy/regression.py new file mode 100644 index 00000000..c7b4be48 --- /dev/null +++ b/sdmetrics/single_table/efficacy/regression.py @@ -0,0 +1,42 @@ +"""Regression Efficacy based metrics.""" + +import numpy as np +from sklearn import linear_model, neural_network +from sklearn.metrics import r2_score + +from sdmetrics.goal import Goal +from sdmetrics.single_table.efficacy.base import MLEfficacyMetric + + +class RegressionEfficacyMetric(MLEfficacyMetric): + """RegressionEfficacy base class.""" + + name = None + goal = Goal.MAXIMIZE + min_value = -np.inf + max_value = 1 + SCORER = r2_score + + +class LinearRegression(RegressionEfficacyMetric): + """LinearRegression Efficacy based metric. + + This fits a LinearRegression to the synthetic data and + then evaluates it making predictions on the real data. + """ + + MODEL = linear_model.LinearRegression + + +class MLPRegressor(RegressionEfficacyMetric): + """MLPRegressor Efficacy based metric. + + This fits a MLPRegressor to the synthetic data and + then evaluates it making predictions on the real data. + """ + + MODEL = neural_network.MLPRegressor + MODEL_KWARGS = { + 'hidden_layer_sizes': (100, ), + 'max_iter': 50 + } diff --git a/sdmetrics/single_table/gaussian_mixture.py b/sdmetrics/single_table/gaussian_mixture.py new file mode 100644 index 00000000..cbf66d6b --- /dev/null +++ b/sdmetrics/single_table/gaussian_mixture.py @@ -0,0 +1,86 @@ +"""GaussianMixture based metrics for single table.""" + +import numpy as np +from sklearn.mixture import GaussianMixture + +from sdmetrics.goal import Goal +from sdmetrics.single_table.base import SingleTableMetric + + +class GMLogLikelihood(SingleTableMetric): + """GaussianMixture Single Table metric. + + This metric fits multiple GaussianMixture models to the real data and then + evaluates how likely it is that the synthetic data belongs to the same + distribution as the real data. + + By default, GaussianMixture models with 10, 20 and 30 components are + fitted a total of 3 times. + + The output is the average log likelihood across all the GMMs. + + Attributes: + name (str): + Name to use when reports about this metric are printed. + goal (sdmetrics.goal.Goal): + The goal of this metric. + min_value (Union[float, tuple[float]]): + Minimum value or values that this metric can take. + max_value (Union[float, tuple[float]]): + Maximum value or values that this metric can take. + """ + + name = 'GaussianMixture Log Likelihood' + goal = Goal.MAXIMIZE + min_value = -np.inf + max_value = np.inf + + @classmethod + def compute(cls, real_data, synthetic_data, metadata=None, + n_components=(10, 20, 30), iterations=3): + """Compute this metric. + + This fits multiple GaussianMixture models to the real data and then + evaluates how likely it is that the synthetic data belongs to the same + distribution as the real data. + + By default, GaussianMixture models with 10, 20 and 30 components are + fitted a total of 3 times. + + Real data and synthetic data must be passed as ``pandas.DataFrame`` instances + and ``metadata`` as a ``Table`` metadata ``dict`` representation. + + If no ``metadata`` is given, one will be built from the values observed + in the ``real_data``. + + The output is the average log likelihood across all the GMMs. + + Args: + real_data (Union[numpy.ndarray, pandas.DataFrame]): + The values from the real dataset. + synthetic_data (Union[numpy.ndarray, pandas.DataFrame]): + The values from the synthetic dataset. + n_components (tuple[int]): + Tuple indicating the number of components to use + for the tests. Defaults to (10, 20, 30) + iterations (int): + Number of times that each number of components should + be evaluated. + + Returns: + float: + Average score returned by the GaussianMixtures. + """ + metadata = cls._validate_inputs(real_data, synthetic_data, metadata) + fields = cls._select_fields(metadata, 'numerical') + if not fields: + return np.nan + + scores = [] + for _ in range(iterations): + for nc in n_components: + gmm = GaussianMixture(nc, covariance_type='diag') + gmm.fit(real_data[fields]) + scores.append(gmm.score(synthetic_data[fields])) + + return np.mean(scores) diff --git a/sdmetrics/single_table/multi_column_pairs.py b/sdmetrics/single_table/multi_column_pairs.py new file mode 100644 index 00000000..cb3f02c6 --- /dev/null +++ b/sdmetrics/single_table/multi_column_pairs.py @@ -0,0 +1,145 @@ +"""SingleTable metrics based on applying a ColumnPairsMetrics on all the possible column pairs.""" + +from itertools import combinations + +import numpy as np + +from sdmetrics import column_pairs +from sdmetrics.single_table.base import SingleTableMetric +from sdmetrics.utils import NestedAttrsMeta + + +class MultiColumnPairsMetric(SingleTableMetric, metaclass=NestedAttrsMeta('column_pairs_metric')): + """SingleTableMetric subclass that applies a ColumnPairsMetric on each possible column pair. + + Attributes: + name (str): + Name to use when reports about this metric are printed. + goal (sdmetrics.goal.Goal): + The goal of this metric. + min_value (Union[float, tuple[float]]): + Minimum value or values that this metric can take. + max_value (Union[float, tuple[float]]): + Maximum value or values that this metric can take. + column_pairs_metric (sdmetrics.column_pairs.base.ColumnPairsMetric): + ColumnPairsMetric to apply. + field_types (dict): + Field types to which the SingleColumn metric will be applied. + """ + + column_pairs_metric = None + column_pairs_metric_kwargs = None + field_types = None + + def __init__(self, column_pairs_metric, **column_pairs_metric_kwargs): + self.column_pairs_metric = column_pairs_metric + self.column_pairs_metric_kwargs = column_pairs_metric_kwargs + self.compute = self._compute + + def _compute(self, real_data, synthetic_data, metadata=None, **kwargs): + """Compute this metric. + + This is done by grouping all the columns that are compatible with the + underlying ColumnPairs metric in groups of 2 and then evaluating them + using the ColumnPairs metric. + + The output is the average of the scores obtained. + + Args: + real_data (Union[numpy.ndarray, pandas.DataFrame]): + The values from the real dataset. + synthetic_data (Union[numpy.ndarray, pandas.DataFrame]): + The values from the synthetic dataset. + metadata (dict): + Table metadata dict. + **kwargs: + Any additional keyword arguments will be passed down + to the column pairs metric + + Returns: + Union[float, tuple[float]]: + Metric output. + """ + metadata = self._validate_inputs(real_data, synthetic_data, metadata) + + fields = self._select_fields(metadata, self.field_types) + + values = [] + for columns in combinations(fields, r=2): + real = real_data[list(columns)] + synthetic = synthetic_data[list(columns)] + values.append(self.column_pairs_metric.compute(real, synthetic)) + + return np.nanmean(values) + + @classmethod + def compute(cls, real_data, synthetic_data, metadata=None, **kwargs): + """Compute this metric. + + Args: + real_data (Union[numpy.ndarray, pandas.DataFrame]): + The values from the real dataset. + synthetic_data (Union[numpy.ndarray, pandas.DataFrame]): + The values from the synthetic dataset. + metadata (dict): + Table metadata dict. + **kwargs: + Any additional keyword arguments will be passed down + to the column pairs metric + + Returns: + Union[float, tuple[float]]: + Metric output. + """ + return cls._compute(cls, real_data, synthetic_data, metadata, **kwargs) + + +class ContinuousKLDivergence(MultiColumnPairsMetric): + """MultiColumnPairsMetric based on ColumnPairs ContinuousKLDivergence. + + This approximates the KL divergence by binning the continuous values + to turn them into categorical values and then computing the relative + entropy. Afterwards normalizes the value applying `1 / (1 + KLD)`. + + Attributes: + name (str): + Name to use when reports about this metric are printed. + goal (sdmetrics.goal.Goal): + The goal of this metric. + min_value (Union[float, tuple[float]]): + Minimum value or values that this metric can take. + max_value (Union[float, tuple[float]]): + Maximum value or values that this metric can take. + column_pairs_metric (sdmetrics.column_pairs.base.ColumnPairsMetric): + ColumnPairs ContinuousKLDivergence. + field_types (dict): + Field types to which the SingleColumn metric will be applied. + """ + + field_types = ('numerical', ) + column_pairs_metric = column_pairs.statistical.kl_divergence.ContinuousKLDivergence + + +class DiscreteKLDivergence(MultiColumnPairsMetric): + """MultiColumnPairsMetric based on ColumnPairs DiscreteKLDivergence. + + This computes the KL divergence and afterwards normalizes the + value applying `1 / (1 + KLD)`. + + Attributes: + name (str): + Name to use when reports about this metric are printed. + goal (sdmetrics.goal.Goal): + The goal of this metric. + min_value (Union[float, tuple[float]]): + Minimum value or values that this metric can take. + max_value (Union[float, tuple[float]]): + Maximum value or values that this metric can take. + column_pairs_metric (sdmetrics.column_pairs.base.ColumnPairsMetric): + ColumnPairs DiscreteKLDivergence. + field_types (dict): + Field types to which the SingleColumn metric will be applied. + """ + + field_types = ('boolean', 'categorical') + column_pairs_metric = column_pairs.statistical.kl_divergence.DiscreteKLDivergence diff --git a/sdmetrics/single_table/multi_single_column.py b/sdmetrics/single_table/multi_single_column.py new file mode 100644 index 00000000..e7a3c8b1 --- /dev/null +++ b/sdmetrics/single_table/multi_single_column.py @@ -0,0 +1,178 @@ +"""SingleTable metrics based on applying a SingleColumnMetric on all the columns.""" + +import numpy as np +from rdt import HyperTransformer + +from sdmetrics import single_column +from sdmetrics.single_table.base import SingleTableMetric +from sdmetrics.utils import NestedAttrsMeta + + +class MultiSingleColumnMetric(SingleTableMetric, + metaclass=NestedAttrsMeta('single_column_metric')): + """SingleTableMetric subclass that applies a SingleColumnMetric on each column. + + This class can either be used by creating a subclass that inherits from it and + sets the SingleColumn Metric as the `single_column_metric` attribute, + or by creating an instance of this class passing the underlying SingleColumn + metric as an argument. + + Attributes: + name (str): + Name to use when reports about this metric are printed. + goal (sdmetrics.goal.Goal): + The goal of this metric. + min_value (Union[float, tuple[float]]): + Minimum value or values that this metric can take. + max_value (Union[float, tuple[float]]): + Maximum value or values that this metric can take. + single_column_metric (sdmetrics.single_column.base.SingleColumnMetric): + SingleColumn metric to apply. + field_types (dict): + Field types to which the SingleColumn metric will be applied. + """ + + single_column_metric = None + single_column_metric_kwargs = None + field_types = None + + def __init__(self, single_column_metric=None, **single_column_metric_kwargs): + self.single_column_metric = single_column_metric + self.single_column_metric_kwargs = single_column_metric_kwargs + self.compute = self._compute + + def _compute(self, real_data, synthetic_data, metadata=None, **kwargs): + """Compute this metric. + + This is done by computing the underlying SingleColumn metric to all the + columns that are compatible with it. + + The output is the average of the scores obtained. + + Args: + real_data (pandas.DataFrame): + The values from the real dataset. + synthetic_data (pandas.DataFrame): + The values from the synthetic dataset. + metadata (dict): + Table metadata dict. + **kwargs: + Any additional keyword arguments will be passed down + to the single column metric + + Returns: + Union[float, tuple[float]]: + Metric output. + """ + metadata = self._validate_inputs(real_data, synthetic_data, metadata) + + fields = self._select_fields(metadata, self.field_types) + scores = [] + for column_name, real_column in real_data.items(): + if column_name in fields: + real_column = real_column.values + synthetic_column = synthetic_data[column_name].values + + score = self.single_column_metric.compute( + real_column, + synthetic_column, + **(self.single_column_metric_kwargs or {}), + **kwargs + ) + scores.append(score) + + return np.nanmean(scores) + + @classmethod + def compute(cls, real_data, synthetic_data, metadata=None, **kwargs): + """Compute this metric. + + This is done by computing the underlying SingleColumn metric to all the + columns that are compatible with it. + + The output is the average of the scores obtained. + + Args: + real_data (pandas.DataFrame): + The values from the real dataset. + synthetic_data (pandas.DataFrame): + The values from the synthetic dataset. + metadata (dict): + Table metadata dict. + **kwargs: + Any additional keyword arguments will be passed down + to the single column metric + + Returns: + Union[float, tuple[float]]: + Metric output. + """ + return cls._compute(cls, real_data, synthetic_data, metadata, **kwargs) + + +class CSTest(MultiSingleColumnMetric): + """MultiSingleColumnMetric based on SingleColumn CSTest. + + This function applies the single column ``CSTest`` metric to all + the discrete columns found in the table and then returns the average + of all the scores obtained. + """ + + field_types = ('boolean', 'categorical') + single_column_metric = single_column.statistical.CSTest + + +class KSTest(MultiSingleColumnMetric): + """MultiSingleColumnMetric based on SingleColumn KSTest. + + This function applies the single column ``KSTest`` metric to all + the numerical columns found in the table and then returns the average + of all the scores obtained. + """ + + field_types = ('numerical', ) + single_column_metric = single_column.statistical.KSTest + + +class KSTestExtended(MultiSingleColumnMetric): + """KSTest variation that transforms everything to numerical before comparing the tables. + + This is done by applying an ``rdt.HyperTransformer`` to the data with the + default values and afterwards applying a regular single_column ``KSTest`` + metric to all the generated numerical columns. + """ + + single_column_metric = single_column.statistical.KSTest + field_types = ('numerical', 'categorical', 'boolean', 'datetime') + + @classmethod + def compute(cls, real_data, synthetic_data, metadata=None): + """Compute this metric. + + Args: + real_data (pandas.DataFrame): + The values from the real dataset. + synthetic_data (pandas.DataFrame): + The values from the synthetic dataset. + metadata (dict): + Table metadata dict. + + Returns: + Union[float, tuple[float]]: + Metric output. + """ + metadata = cls._validate_inputs(real_data, synthetic_data, metadata) + transformer = HyperTransformer() + fields = cls._select_fields(metadata, cls.field_types) + real_data = transformer.fit_transform(real_data[fields]) + synthetic_data = transformer.transform(synthetic_data[fields]) + + values = [] + for column_name, real_column in real_data.items(): + real_column = real_column.values + synthetic_column = synthetic_data[column_name].values + + score = cls.single_column_metric.compute(real_column, synthetic_column) + values.append(score) + + return np.nanmean(values) diff --git a/sdmetrics/statistical/__init__.py b/sdmetrics/statistical/__init__.py deleted file mode 100644 index b9fe0379..00000000 --- a/sdmetrics/statistical/__init__.py +++ /dev/null @@ -1,25 +0,0 @@ -""" -This module implements statistical methods for comparing the distributions of -the two databases. -""" -from sdmetrics.statistical.bivariate import ContinuousDivergence, DiscreteDivergence -from sdmetrics.statistical.univariate import CSTest, KSTest - - -def metrics(metadata, real_tables, synthetic_tables): - """ - This function takes in (1) a `sdv.Metadata` object which describes a set of - relational tables, (2) a set of "real" tables corresponding to the metadata, - and (3) a set of "synthetic" tables corresponding to the metadata. It yields - a sequence of `Metric` objects. - - Args: - metadata (sdv.Metadata): The Metadata object from SDV. - real_tables (dict): A dictionary mapping table names to dataframes. - synthetic_tables (dict): A dictionary mapping table names to dataframes. - - Yields: - Metric: The next metric. - """ - for method in [CSTest(), KSTest(), DiscreteDivergence(), ContinuousDivergence()]: - yield from method.metrics(metadata, real_tables, synthetic_tables) diff --git a/sdmetrics/statistical/bivariate/__init__.py b/sdmetrics/statistical/bivariate/__init__.py deleted file mode 100644 index 0134f7b3..00000000 --- a/sdmetrics/statistical/bivariate/__init__.py +++ /dev/null @@ -1,9 +0,0 @@ -""" -This module implements bivariate KL-divergence/relative-entropy measures. -""" - -from sdmetrics.statistical.bivariate.base import BivariateMetric -from sdmetrics.statistical.bivariate.continuous import ContinuousDivergence -from sdmetrics.statistical.bivariate.discrete import DiscreteDivergence - -__all__ = ["BivariateMetric", "DiscreteDivergence", "ContinuousDivergence"] diff --git a/sdmetrics/statistical/bivariate/base.py b/sdmetrics/statistical/bivariate/base.py deleted file mode 100644 index c184a28e..00000000 --- a/sdmetrics/statistical/bivariate/base.py +++ /dev/null @@ -1,74 +0,0 @@ -from itertools import permutations - -from sdmetrics.report import Metric - - -class BivariateMetric(): - """ - Attributes: - name (str): The name of the bivariate metric. - dtypes (list[str]): The ordered pairs of data types to accept (i.e. - [(float, floatt), (float, intt)]). - """ - - name = "" - dtypes = [] - - @staticmethod - def metric(real_2d, synthetic_2d): - """This function is expected to perform a statistical test on the two - samples, each of which contains two columns, and return a tuple containing - (value, goal, unit, domain). See the Metric object for what these fields - represent. - - Arguments: - real_2d (np.ndarray): Two columns from the real database. - synthetic_2d (np.ndarray): Two columns from the synthetic database. - - Returns: - (str, Goal, str, tuple): A tuple containing (value, goal, unit, domain) - which corresponds to the fields in a Metric object. - """ - raise NotImplementedError() - - def metrics(self, metadata, real_tables, synthetic_tables): - """ - This function iterates over all the pairs of columns in all the tables - and, if the data types match, it yields the Metric. - - Args: - metadata (sdv.Metadata): The Metadata object from SDV. - real_tables (dict): A dictionary mapping table names to dataframes. - synthetic_tables (dict): A dictionary mapping table names to dataframes. - - Yields: - Metric: The next metric. - """ - tables = set(real_tables).union(synthetic_tables) - for name in tables: - dtypes = metadata.get_dtypes(name) - real = real_tables[name] - synthetic = synthetic_tables[name] - yield from self._compute(name, dtypes, real, synthetic) - - def _compute(self, name, dtypes, real, synthetic): - for (col1, col1_type), (col2, col2_type) in permutations( - dtypes.items(), r=2): - if (col1_type, col2_type) not in self.dtypes: - continue - X1 = real[[col1, col2]].values - X2 = synthetic[[col1, col2]].values - value, goal, unit, domain = self.metric(X1, X2) - yield Metric( - name=self.name, - value=value, - tags=set([ - "statistic:bivariate", - "table:%s" % name, - "column:%s" % col1, - "column:%s" % col2, - ]), - goal=goal, - unit=unit, - domain=domain - ) diff --git a/sdmetrics/statistical/bivariate/continuous.py b/sdmetrics/statistical/bivariate/continuous.py deleted file mode 100644 index 77374037..00000000 --- a/sdmetrics/statistical/bivariate/continuous.py +++ /dev/null @@ -1,47 +0,0 @@ - -import numpy as np -from scipy.special import rel_entr - -from sdmetrics.report import Goal -from sdmetrics.statistical.bivariate.base import BivariateMetric - - -class ContinuousDivergence(BivariateMetric): - - name = "continuous-kl" - dtypes = [ - ("float", "float"), - ("float", "int"), - ("int", "int") - ] - - @staticmethod - def metric(real, synthetic): - """ - This approximates the KL divergence by binning the continuous values - to turn them into categorical values and then computing the relative - entropy. - - TODO: - * Investigate a KDE-based approach. - - Arguments: - real (np.ndarray): The values from the real database. - synthetic (np.ndarray): The values from the synthetic database. - - Returns: - (str, Goal, str, tuple): A tuple containing (value, goal, unit, domain) - which corresponds to the fields in a Metric object. - """ - real[np.isnan(real)] = 0.0 - synthetic[np.isnan(synthetic)] = 0.0 - - real, xedges, yedges = np.histogram2d(real[:, 0], real[:, 1]) - synthetic, _, _ = np.histogram2d( - synthetic[:, 0], synthetic[:, 1], bins=[xedges, yedges]) - - f_obs, f_exp = synthetic.flatten() + 1e-5, real.flatten() + 1e-5 - f_obs, f_exp = f_obs / np.sum(f_obs), f_exp / np.sum(f_exp) - - value = np.sum(rel_entr(f_obs, f_exp)) - return value, Goal.MINIMIZE, "entropy", (0.0, float("inf")) diff --git a/sdmetrics/statistical/bivariate/discrete.py b/sdmetrics/statistical/bivariate/discrete.py deleted file mode 100644 index 257ac285..00000000 --- a/sdmetrics/statistical/bivariate/discrete.py +++ /dev/null @@ -1,27 +0,0 @@ - -import numpy as np -from scipy.special import rel_entr - -from sdmetrics.report import Goal -from sdmetrics.statistical.bivariate.base import BivariateMetric -from sdmetrics.statistical.utils import frequencies - - -class DiscreteDivergence(BivariateMetric): - - name = "discrete-kl" - dtypes = [ - ("object", "object"), - ("object", "bool"), - ("bool", "bool") - ] - - @staticmethod - def metric(real, synthetic): - assert real.shape[1] == 2, "Expected 2d data." - assert synthetic.shape[1] == 2, "Expected 2d data." - real = [(x[0], x[1]) for x in real] - synthetic = [(x[0], x[1]) for x in synthetic] - f_obs, f_exp = frequencies(real, synthetic) - value = np.sum(rel_entr(f_obs, f_exp)) - return value, Goal.MINIMIZE, "entropy", (0.0, float("inf")) diff --git a/sdmetrics/statistical/univariate/__init__.py b/sdmetrics/statistical/univariate/__init__.py deleted file mode 100644 index 65421ee0..00000000 --- a/sdmetrics/statistical/univariate/__init__.py +++ /dev/null @@ -1,8 +0,0 @@ -""" -This module implements univariate goodness-of-fit tests. -""" -from sdmetrics.statistical.univariate.base import UnivariateMetric -from sdmetrics.statistical.univariate.cstest import CSTest -from sdmetrics.statistical.univariate.kstest import KSTest - -__all__ = ["UnivariateMetric", "CSTest", "KSTest"] diff --git a/sdmetrics/statistical/univariate/base.py b/sdmetrics/statistical/univariate/base.py deleted file mode 100644 index 4b6dbdc5..00000000 --- a/sdmetrics/statistical/univariate/base.py +++ /dev/null @@ -1,69 +0,0 @@ - -from sdmetrics.report import Metric - - -class UnivariateMetric(): - """ - Attributes: - name (str): The name of the univariate metric. - dtypes (list[str]): The data types to accept (i.e. [float, str]). - """ - - name = "" - dtypes = [] - - @staticmethod - def metric(real_column, synthetic_column): - """This function is expected to perform a statistical test on the two - samples and return a tuple containing (value, goal, unit, domain). See the - Metric object for what these fields represent. - - Arguments: - real_column (np.ndarray): The values from the real database. - synthetic_column (np.ndarray): The values from the synthetic database. - - Returns: - (str, Goal, str, tuple): A tuple containing (value, goal, unit, domain) - which corresponds to the fields in a Metric object. - """ - raise NotImplementedError() - - def metrics(self, metadata, real_tables, synthetic_tables): - """This function iterates over all the columns in all the tables and, if - the data type of a column matches the data types for which this metric is - defined, it computes the metric for that column and yields it. - - Args: - metadata (sdv.Metadata): The Metadata object from SDV. - real_tables (dict): A dictionary mapping table names to dataframes. - synthetic_tables (dict): A dictionary mapping table names to dataframes. - - Yields: - Metric: The next metric. - """ - assert real_tables.keys() == synthetic_tables.keys() - for table_name in real_tables.keys(): - dtypes = metadata.get_dtypes(table_name) - real = real_tables[table_name] - synthetic = synthetic_tables[table_name] - yield from self._compute(table_name, dtypes, real, synthetic) - - def _compute(self, name, dtypes, real, synthetic): - for column_name, column_type in dtypes.items(): - if column_type not in self.dtypes: - continue - x1 = real[column_name].values - x2 = synthetic[column_name].values - value, goal, unit, domain = self.metric(x1, x2) - yield Metric( - name=self.name, - value=value, - tags=set([ - "statistic:univariate", - "table:%s" % name, - "column:%s" % column_name - ] + (["priority:high"] if value < 0.1 and unit == "p-value" else [])), - goal=goal, - unit=unit, - domain=domain - ) diff --git a/sdmetrics/statistical/univariate/cstest.py b/sdmetrics/statistical/univariate/cstest.py deleted file mode 100644 index d7c1e59f..00000000 --- a/sdmetrics/statistical/univariate/cstest.py +++ /dev/null @@ -1,35 +0,0 @@ - -from scipy.stats import chisquare - -from sdmetrics.report import Goal -from sdmetrics.statistical.univariate.base import UnivariateMetric -from sdmetrics.statistical.utils import frequencies - - -class CSTest(UnivariateMetric): - - name = "chisquare" - dtypes = ["object", "bool"] - - @staticmethod - def metric(real_column, synthetic_column): - """This function uses the Chi-squared test to compare the distributions - of the two categorical columns. It returns the resulting p-value so that - a small value indicates that we can reject the null hypothesis (i.e. and - suggests that the distributions are different). - - Arguments: - real_column (np.ndarray): The values from the real database. - synthetic_column (np.ndarray): The values from the synthetic database. - - Returns: - (str, Goal, str, tuple): A tuple containing (value, goal, unit, domain) - which corresponds to the fields in a Metric object. - """ - f_obs, f_exp = frequencies(real_column, synthetic_column) - if len(f_obs) == len(f_exp) == 1: - pvalue = 1.0 - else: - _, pvalue = chisquare(f_obs, f_exp) - - return pvalue, Goal.MAXIMIZE, "p-value", (0.0, 1.0) diff --git a/sdmetrics/statistical/univariate/kstest.py b/sdmetrics/statistical/univariate/kstest.py deleted file mode 100644 index f29dcfac..00000000 --- a/sdmetrics/statistical/univariate/kstest.py +++ /dev/null @@ -1,34 +0,0 @@ -import numpy as np -from scipy.stats import ks_2samp - -from sdmetrics.report import Goal -from sdmetrics.statistical.univariate.base import UnivariateMetric - - -class KSTest(UnivariateMetric): - - name = "kstest" - dtypes = ["float", "int"] - - @staticmethod - def metric(real_column, synthetic_column): - """This function uses the two-sample Kolmogorov–Smirnov test to compare - the distributions of the two continuous columns using the empirical CDF. - It returns the resulting p-value so that a small value indicates that we - can reject the null hypothesis (i.e. and suggests that the distributions - are different). - - Arguments: - real_column (np.ndarray): The values from the real database. - synthetic_column (np.ndarray): The values from the synthetic database. - - Returns: - (str, Goal, str, tuple): A tuple containing (value, goal, unit, domain) - which corresponds to the fields in a Metric object. - """ - real_column = real_column.copy() - synthetic_column = synthetic_column.copy() - real_column[np.isnan(real_column)] = 0.0 - synthetic_column[np.isnan(synthetic_column)] = 0.0 - statistic, pvalue = ks_2samp(real_column, synthetic_column) - return pvalue, Goal.MAXIMIZE, "p-value", (0.0, 1.0) diff --git a/sdmetrics/statistical/utils.py b/sdmetrics/statistical/utils.py deleted file mode 100644 index baaea131..00000000 --- a/sdmetrics/statistical/utils.py +++ /dev/null @@ -1,28 +0,0 @@ -import warnings -from collections import Counter - - -def frequencies(real, synthetic): - """ - Given two iterators containing categorical data, this transforms it into - observed/expected frequencies which can be used for statistical tests. It - adds a regularization term to handle cases where the synthetic data contains - values that don't exist in the real data. - - Args: - real (list): A list of hashable objects. - synthetic (list): A list of hashable objects. - - Yields: - (list, list): The observed and expected frequencies (as a percent). - """ - f_obs, f_exp = [], [] - real, synthetic = Counter(real), Counter(synthetic) - for value in synthetic: - if value not in real: - warnings.warn("Unexpected value %s in synthetic data." % (value,)) - real[value] += 1e-6 # Regularization to prevent NaN. - for value in real: - f_obs.append(synthetic[value] / sum(synthetic.values())) - f_exp.append(real[value] / sum(real.values())) - return f_obs, f_exp diff --git a/sdmetrics/timeseries/__init__.py b/sdmetrics/timeseries/__init__.py new file mode 100644 index 00000000..53564c89 --- /dev/null +++ b/sdmetrics/timeseries/__init__.py @@ -0,0 +1 @@ +"""Metrics for timeseries datasets.""" diff --git a/sdmetrics/utils.py b/sdmetrics/utils.py new file mode 100644 index 00000000..7af49fbb --- /dev/null +++ b/sdmetrics/utils.py @@ -0,0 +1,69 @@ +"""SDMetrics utils to be used across all the project.""" + +import warnings +from collections import Counter + + +def NestedAttrsMeta(nested): + """Metaclass factory that defines a Metaclass with a dynamic attribute name.""" + + class Metaclass(type): + """Metaclass which pulls the attributes from a nested object using properties.""" + + def __getattr__(cls, attr): + """If cls does not have the attribute, try to get it from the nested object.""" + nested_obj = getattr(cls, nested) + if hasattr(nested_obj, attr): + return getattr(nested_obj, attr) + + raise AttributeError(f"type object '{cls.__name__}' has no attribute '{attr}'") + + @property + def name(cls): + return getattr(cls, nested).name + + @property + def goal(cls): + return getattr(cls, nested).goal + + @property + def max_value(cls): + return getattr(cls, nested).max_value + + @property + def min_value(cls): + return getattr(cls, nested).min_value + + return Metaclass + + +def get_frequencies(real, synthetic): + """Get percentual frequencies for each possible real categorical value. + + Given two iterators containing categorical data, this transforms it into + observed/expected frequencies which can be used for statistical tests. It + adds a regularization term to handle cases where the synthetic data contains + values that don't exist in the real data. + + Args: + real (list): + A list of hashable objects. + synthetic (list): + A list of hashable objects. + + Yields: + tuble[list, list]: + The observed and expected frequencies (as a percent). + """ + f_obs, f_exp = [], [] + real, synthetic = Counter(real), Counter(synthetic) + for value in synthetic: + if value not in real: + warnings.warn(f'Unexpected value {value} in synthetic data.') + real[value] += 1e-6 # Regularization to prevent NaN. + + for value in real: + f_obs.append(synthetic[value] / sum(synthetic.values())) + f_exp.append(real[value] / sum(real.values())) + + return f_obs, f_exp diff --git a/setup.cfg b/setup.cfg index 8754246f..6e9c373b 100644 --- a/setup.cfg +++ b/setup.cfg @@ -1,5 +1,5 @@ [bumpversion] -current_version = 0.0.5.dev0 +current_version = 0.1.0.dev2 commit = True tag = True parse = (?P\d+)\.(?P\d+)\.(?P\d+)(\.(?P[a-z]+)(?P\d+))? @@ -34,7 +34,7 @@ universal = 1 [flake8] max-line-length = 99 exclude = docs, .tox, .git, __pycache__, .ipynb_checkpoints -ignore = # keep empty to prevent default ignores +ignore = SFS3 [isort] include_trailing_comment = True @@ -49,3 +49,4 @@ test = pytest [tool:pytest] collect_ignore = ['setup.py'] + diff --git a/setup.py b/setup.py index 520b4afa..e69821d1 100644 --- a/setup.py +++ b/setup.py @@ -12,14 +12,14 @@ history = history_file.read() install_requires = [ - 'sdv>=0.5.0,<0.6', - 'rdt>=0.2.8.dev0,<0.3', + 'rdt>=0.2.10.dev0,<0.3', 'scikit-learn>=0.20,<1', 'scipy>=1.1.0,<2', 'numpy>=1.15.4,<2', - 'pandas>=0.21,<2', + 'pandas>=1,<1.1.5', 'seaborn>=0.9,<0.11', 'matplotlib>=2.2.2,<3.2.2', + 'pomegranate>=0.13.0,<0.13.5', ] setup_requires = [ @@ -27,7 +27,6 @@ ] tests_require = [ - 'parameterized', 'pytest>=3.4.2', 'pytest-cov>=2.6.0', 'pytest-rerunfailures>=9.1.1,<10', @@ -38,32 +37,33 @@ development_requires = [ # general - 'bumpversion>=0.5.3', + 'bumpversion>=0.5.3,<0.6', 'pip>=9.0.1', - 'watchdog>=0.8.3', + 'watchdog>=0.8.3,<0.11', # docs - 'm2r>=0.2.1', - 'nbsphinx>=0.5.0', - 'Sphinx>=2.4.0,<3.0.0', - 'sphinx_rtd_theme>=0.2.4', - 'autodocsumm>=0.1.13', + 'm2r>=0.2.0,<0.3', + 'nbsphinx>=0.5.0,<0.7', + 'Sphinx>=1.7.1,<3', + 'sphinx_rtd_theme>=0.2.4,<0.6', + 'autodocsumm>=0.1.13,<0.2', # style check 'flake8>=3.7.7,<4', + 'flake8-absolute-import>=1.0,<2', 'isort>=4.3.4,<5', # fix style issues - 'autoflake>=1.2', - 'autopep8>=1.4.3', + 'autoflake>=1.1,<2', + 'autopep8>=1.4.3,<2', # distribute on PyPI - 'twine>=1.10.0', + 'twine>=1.10.0,<4', 'wheel>=0.30.0', # Advanced testing - 'coverage>=4.5.1', - 'tox>=2.9.1', + 'coverage>=4.5.1,<6', + 'tox>=2.9.1,<4', ] setup( @@ -98,6 +98,6 @@ test_suite='tests', tests_require=tests_require, url='https://github.com/sdv-dev/SDMetrics', - version='0.0.5.dev0', + version='0.1.0.dev2', zip_safe=False, ) diff --git a/tests/end-to-end/test_sdmetrics.py b/tests/end-to-end/test_sdmetrics.py deleted file mode 100644 index 04486190..00000000 --- a/tests/end-to-end/test_sdmetrics.py +++ /dev/null @@ -1,29 +0,0 @@ -from unittest import TestCase - -from parameterized import parameterized -from sdv import SDV, load_demo - -from sdmetrics import evaluate -from sdmetrics.datasets import Dataset, list_datasets - - -class TestSDMetrics(TestCase): - - def test_integration(self): - metadata, tables = load_demo(metadata=True) - - sdv = SDV() - sdv.fit(metadata, tables) - synthetic = sdv.sample_all(20) - - metrics = evaluate(metadata, tables, synthetic) - metrics.overall() - metrics.details() - metrics.highlights() - - @parameterized.expand(list_datasets()) - def test_data_driven(self, dataset): - dataset = Dataset.load(dataset) - hq_report = evaluate(dataset.metadata, dataset.tables, dataset.hq_synthetic) - lq_report = evaluate(dataset.metadata, dataset.tables, dataset.lq_synthetic) - assert hq_report.overall() > lq_report.overall() diff --git a/tests/unit/detection/__init__.py b/tests/integration/__init__.py similarity index 100% rename from tests/unit/detection/__init__.py rename to tests/integration/__init__.py diff --git a/tests/unit/detection/tabular/__init__.py b/tests/integration/column_pairs/__init__.py similarity index 100% rename from tests/unit/detection/tabular/__init__.py rename to tests/integration/column_pairs/__init__.py diff --git a/tests/integration/column_pairs/statistical/__init__.py b/tests/integration/column_pairs/statistical/__init__.py new file mode 100644 index 00000000..e69de29b diff --git a/tests/integration/column_pairs/statistical/test_kl_divergence.py b/tests/integration/column_pairs/statistical/test_kl_divergence.py new file mode 100644 index 00000000..78d612be --- /dev/null +++ b/tests/integration/column_pairs/statistical/test_kl_divergence.py @@ -0,0 +1,121 @@ +import numpy as np +import pandas as pd + +from sdmetrics.column_pairs.statistical.kl_divergence import ( + ContinuousKLDivergence, DiscreteKLDivergence) + + +class TestContinuousKLDivergence: + + @staticmethod + def ones(): + return pd.DataFrame({ + 'a': [1] * 100, + 'b': [1.0] * 100, + }) + + @staticmethod + def zeros(): + return pd.DataFrame({ + 'a': [0] * 100, + 'b': [0.0] * 100, + }) + + @staticmethod + def real(): + return pd.DataFrame({ + 'a': np.random.normal(size=600), + 'b': np.random.randint(0, 10, size=600), + }) + + @staticmethod + def good(): + return pd.DataFrame({ + 'a': np.random.normal(loc=0.01, size=600), + 'b': np.random.randint(0, 10, size=600), + }) + + @staticmethod + def bad(): + return pd.DataFrame({ + 'a': np.random.normal(loc=5, scale=3, size=600), + 'b': np.random.randint(5, 15, size=600), + }) + + def test_perfect(self): + output = ContinuousKLDivergence.compute(self.ones(), self.ones()) + + assert output == 1 + + def test_awful(self): + output = ContinuousKLDivergence.compute(self.ones(), self.zeros()) + + assert 0.0 <= output < 0.1 + + def test_good(self): + output = ContinuousKLDivergence.compute(self.real(), self.good()) + + assert 0.5 < output <= 1 + + def test_bad(self): + output = ContinuousKLDivergence.compute(self.real(), self.bad()) + + assert 0 <= output < 0.5 + + +class TestDiscreteKLDivergence: + + @staticmethod + def ones(): + return pd.DataFrame({ + 'a': ['a'] * 100, + 'b': [True] * 100, + }) + + @staticmethod + def zeros(): + return pd.DataFrame({ + 'a': ['b'] * 100, + 'b': [False] * 100, + }) + + @staticmethod + def real(): + return pd.DataFrame({ + 'a': ['a', 'b', 'b', 'c', 'c', 'c'] * 100, + 'b': [True, True, True, True, True, False] * 100, + }) + + @staticmethod + def good(): + return pd.DataFrame({ + 'a': ['a', 'b', 'b', 'b', 'c', 'c'] * 100, + 'b': [True, True, True, True, False, False] * 100, + }) + + @staticmethod + def bad(): + return pd.DataFrame({ + 'a': ['a', 'a', 'a', 'a', 'b', 'b'] * 100, + 'b': [True, False, False, False, False, False] * 100, + }) + + def test_perfect(self): + output = DiscreteKLDivergence.compute(self.ones(), self.ones()) + + assert output == 1 + + def test_awful(self): + output = DiscreteKLDivergence.compute(self.ones(), self.zeros()) + + assert 0.0 <= output < 0.1 + + def test_good(self): + output = DiscreteKLDivergence.compute(self.real(), self.good()) + + assert 0.5 < output <= 1 + + def test_bad(self): + output = DiscreteKLDivergence.compute(self.real(), self.bad()) + + assert 0 <= output < 0.5 diff --git a/tests/integration/multi_table/__init__.py b/tests/integration/multi_table/__init__.py new file mode 100644 index 00000000..e69de29b diff --git a/tests/integration/multi_table/test_multi_single_table.py b/tests/integration/multi_table/test_multi_single_table.py new file mode 100644 index 00000000..38e3862c --- /dev/null +++ b/tests/integration/multi_table/test_multi_single_table.py @@ -0,0 +1,93 @@ +import numpy as np +import pandas as pd +import pytest + +from sdmetrics.multi_table.multi_single_table import ( + CSTest, KSTest, LogisticDetection, SVCDetection) + +METRICS = [CSTest, KSTest, LogisticDetection, SVCDetection] + + +@pytest.fixture +def ones(): + data = pd.DataFrame({ + 'a': [1] * 100, + 'b': [True] * 100, + }) + return {'a': data, 'b': data.copy()} + + +@pytest.fixture +def zeros(): + data = pd.DataFrame({ + 'a': [0] * 100, + 'b': [False] * 100, + }) + return {'a': data, 'b': data.copy()} + + +@pytest.fixture +def real_data(): + data = pd.DataFrame({ + 'a': np.random.normal(size=600), + 'b': np.random.randint(0, 10, size=600), + 'c': ['a', 'b', 'b', 'c', 'c', 'c'] * 100, + 'd': [True, True, True, True, True, False] * 100, + }) + return {'a': data, 'b': data.copy()} + + +@pytest.fixture +def good_data(): + data = pd.DataFrame({ + 'a': np.random.normal(loc=0.01, size=600), + 'b': np.random.randint(0, 10, size=600), + 'c': ['a', 'b', 'b', 'b', 'c', 'c'] * 100, + 'd': [True, True, True, True, False, False] * 100, + }) + return {'a': data, 'b': data.copy()} + + +@pytest.fixture +def bad_data(): + data = pd.DataFrame({ + 'a': np.random.normal(loc=5, scale=3, size=600), + 'b': np.random.randint(5, 15, size=600), + 'c': ['a', 'a', 'a', 'a', 'b', 'b'] * 100, + 'd': [True, False, False, False, False, False] * 100, + }) + return {'a': data, 'b': data.copy()} + + +@pytest.mark.parametrize('metric', METRICS) +def test_max(metric, ones): + output = metric.compute(ones, ones.copy()) + + assert output == 1 + + +@pytest.mark.parametrize('metric', METRICS) +def test_min(metric, ones, zeros): + output = metric.compute(ones, zeros) + + assert np.round(output, decimals=5) == 0 + + +@pytest.mark.parametrize('metric', METRICS) +def test_good(metric, real_data, good_data): + output = metric.compute(real_data, good_data) + + assert 0.5 < output <= 1 + + +@pytest.mark.parametrize('metric', METRICS) +def test_bad(metric, real_data, bad_data): + output = metric.compute(real_data, bad_data) + + assert 0 <= output < 0.5 + + +@pytest.mark.parametrize('metric', METRICS) +def test_fail(metric): + with pytest.raises(ValueError): + metric.compute({'a': None, 'b': None}, {'a': None}) diff --git a/tests/integration/multi_table/test_multi_table.py b/tests/integration/multi_table/test_multi_table.py new file mode 100644 index 00000000..383fb294 --- /dev/null +++ b/tests/integration/multi_table/test_multi_table.py @@ -0,0 +1,22 @@ +import pandas as pd + +from sdmetrics import compute_metrics +from sdmetrics.demos import load_multi_table_demo +from sdmetrics.multi_table.base import MultiTableMetric + + +def test_compute_all(): + real_data, synthetic_data, metadata = load_multi_table_demo() + + output = compute_metrics( + MultiTableMetric.get_subclasses(), + real_data, + synthetic_data, + metadata=metadata + ) + + assert not pd.isnull(output.score.mean()) + + scores = output[output.score.notnull()] + + assert scores.score.between(scores.min_value, scores.max_value).all() diff --git a/tests/integration/multi_table/test_parent_child.py b/tests/integration/multi_table/test_parent_child.py new file mode 100644 index 00000000..4ec9ae71 --- /dev/null +++ b/tests/integration/multi_table/test_parent_child.py @@ -0,0 +1,123 @@ +import numpy as np +import pandas as pd +import pytest + +from sdmetrics.multi_table.detection.parent_child import ( + LogisticParentChildDetection, SVCParentChildDetection) + +METRICS = [LogisticParentChildDetection, SVCParentChildDetection] + + +def ones(): + parent = pd.DataFrame({ + 'id': range(10), + 'a': [1] * 10, + 'b': [True] * 10, + }) + child = pd.DataFrame({ + 'parent_id': list(range(10)) * 10, + 'a': [1] * 100, + 'b': [True] * 100, + }) + return {'parent': parent, 'child': child} + + +def zeros(): + parent = pd.DataFrame({ + 'id': range(10), + 'a': [0] * 10, + 'b': [False] * 10, + }) + child = pd.DataFrame({ + 'parent_id': list(range(10)) * 10, + 'a': [0] * 100, + 'b': [False] * 100, + }) + return {'parent': parent, 'child': child} + + +def real_data(): + parent = pd.DataFrame({ + 'id': range(60), + 'a': np.random.normal(size=60), + 'b': np.random.randint(0, 10, size=60), + 'c': ['a', 'b', 'b', 'c', 'c', 'c'] * 10, + 'd': [True, True, True, True, True, False] * 10, + }) + child = pd.DataFrame({ + 'parent_id': list(range(60)) * 10, + 'a': np.random.normal(size=600), + 'b': np.random.randint(0, 10, size=600), + 'c': ['a', 'b', 'b', 'c', 'c', 'c'] * 100, + 'd': [True, True, True, True, True, False] * 100, + }) + return {'parent': parent, 'child': child} + + +def good_data(): + parent = pd.DataFrame({ + 'id': range(60), + 'a': np.random.normal(loc=0.01, size=60), + 'b': np.random.randint(1, 11, size=60), + 'c': ['a', 'b', 'b', 'c', 'c', 'c'] * 10, + 'd': [True, True, True, True, True, False] * 10, + }) + child = pd.DataFrame({ + 'parent_id': list(range(60)) * 10, + 'a': np.random.normal(loc=0.01, size=600), + 'b': np.random.randint(1, 11, size=600), + 'c': ['a', 'b', 'b', 'c', 'c', 'c'] * 100, + 'd': [True, True, True, True, True, False] * 100, + }) + return {'parent': parent, 'child': child} + + +def bad_data(): + parent = pd.DataFrame({ + 'id': range(60), + 'a': np.random.normal(loc=5, scale=3, size=60), + 'b': np.random.randint(5, 15, size=60), + 'c': ['a', 'a', 'a', 'a', 'b', 'b'] * 10, + 'd': [True, False, False, False, False, False] * 10, + }) + child = pd.DataFrame({ + 'parent_id': list(range(60)) * 10, + 'a': np.random.normal(loc=5, scale=3, size=600), + 'b': np.random.randint(5, 15, size=600), + 'c': ['a', 'a', 'a', 'a', 'b', 'b'] * 100, + 'd': [True, False, False, False, False, False] * 100, + }) + return {'parent': parent, 'child': child} + + +FKS = [ + ('parent', 'id', 'child', 'parent_id') +] + + +@pytest.mark.parametrize('metric', METRICS) +def test_max(metric): + output = metric.compute(ones(), ones(), foreign_keys=FKS) + + assert output == 1 + + +@pytest.mark.parametrize('metric', METRICS) +def test_min(metric): + output = metric.compute(ones(), zeros(), foreign_keys=FKS) + + assert np.round(output, decimals=5) == 0 + + +@pytest.mark.parametrize('metric', METRICS) +def test_good(metric): + output = metric.compute(real_data(), good_data(), foreign_keys=FKS) + + assert 0.5 < output <= 1 + + +@pytest.mark.parametrize('metric', METRICS) +def test_bad(metric): + output = metric.compute(real_data(), bad_data(), foreign_keys=FKS) + + assert 0 <= output < 0.5 diff --git a/tests/integration/single_column/__init__.py b/tests/integration/single_column/__init__.py new file mode 100644 index 00000000..e69de29b diff --git a/tests/integration/single_column/statistical/__init__.py b/tests/integration/single_column/statistical/__init__.py new file mode 100644 index 00000000..e69de29b diff --git a/tests/integration/single_column/statistical/test_cstest.py b/tests/integration/single_column/statistical/test_cstest.py new file mode 100644 index 00000000..ecc3b021 --- /dev/null +++ b/tests/integration/single_column/statistical/test_cstest.py @@ -0,0 +1,40 @@ +import numpy as np +import pandas as pd +import pytest + +from sdmetrics.single_column.statistical.cstest import CSTest + + +@pytest.mark.parametrize("array_like", [np.array, pd.Series]) +def test_max(array_like): + data = array_like(['a', 'b', 'b', 'c', 'c', 'c'] * 100) + output = CSTest.compute(data, data) + + assert output == 1 + + +@pytest.mark.parametrize("array_like", [np.array, pd.Series]) +def test_min(array_like): + real = array_like(['a', 'b', 'b', 'c', 'c', 'c'] * 100) + synth = array_like(['d', 'e', 'e', 'f', 'f', 'f'] * 100) + output = CSTest.compute(real, synth) + + assert output == 0 + + +@pytest.mark.parametrize("array_like", [np.array, pd.Series]) +def test_good(array_like): + real = array_like(['a', 'b', 'b', 'c', 'c', 'c'] * 100) + synth = array_like(['a', 'b', 'b', 'b', 'c', 'c'] * 100) + output = CSTest.compute(real, synth) + + assert 0.5 < output <= 1.0 + + +@pytest.mark.parametrize("array_like", [np.array, pd.Series]) +def test_bad(array_like): + real = array_like(['a', 'b', 'b', 'c', 'c', 'c'] * 100) + synth = array_like(['a', 'a', 'a', 'a', 'b', 'c'] * 100) + output = CSTest.compute(real, synth) + + assert 0.0 <= output < 0.5 diff --git a/tests/integration/single_column/statistical/test_kstest.py b/tests/integration/single_column/statistical/test_kstest.py new file mode 100644 index 00000000..8c4d6d4f --- /dev/null +++ b/tests/integration/single_column/statistical/test_kstest.py @@ -0,0 +1,40 @@ +import numpy as np +import pandas as pd +import pytest + +from sdmetrics.single_column.statistical.kstest import KSTest + + +@pytest.mark.parametrize("array_like", [np.array, pd.Series]) +def test_max(array_like): + data = array_like(np.random.normal(size=1000)) + output = KSTest.compute(data, data) + + assert output == 1 + + +@pytest.mark.parametrize("array_like", [np.array, pd.Series]) +def test_min(array_like): + real = array_like(np.random.normal(size=1000)) + synth = array_like(np.random.normal(loc=1000, scale=10, size=1000)) + output = KSTest.compute(real, synth) + + assert output == 0 + + +@pytest.mark.parametrize("array_like", [np.array, pd.Series]) +def test_good(array_like): + real = array_like(np.random.normal(size=1000)) + synth = array_like(np.random.normal(loc=0.1, size=1000)) + output = KSTest.compute(real, synth) + + assert 0.5 < output <= 1.0 + + +@pytest.mark.parametrize("array_like", [np.array, pd.Series]) +def test_bad(array_like): + real = array_like(np.random.normal(size=1000)) + synth = array_like(np.random.normal(loc=3, scale=3, size=1000)) + output = KSTest.compute(real, synth) + + assert 0.0 <= output < 0.5 diff --git a/tests/integration/single_table/__init__.py b/tests/integration/single_table/__init__.py new file mode 100644 index 00000000..e69de29b diff --git a/tests/integration/single_table/efficacy/test_binary.py b/tests/integration/single_table/efficacy/test_binary.py new file mode 100644 index 00000000..c72091f2 --- /dev/null +++ b/tests/integration/single_table/efficacy/test_binary.py @@ -0,0 +1,50 @@ +import numpy as np +import pandas as pd +import pytest + +from sdmetrics.single_table.efficacy.binary import ( + BinaryAdaBoostClassifier, BinaryDecisionTreeClassifier, BinaryLogisticRegression, + BinaryMLPClassifier) + +METRICS = [ + BinaryAdaBoostClassifier, + BinaryDecisionTreeClassifier, + BinaryLogisticRegression, + BinaryMLPClassifier +] + + +def real_data(): + return pd.DataFrame({ + 'a': np.random.normal(size=600), + 'b': np.random.randint(0, 10, size=600), + 'c': ['a', 'b', 'b', 'c', 'c', 'c'] * 100, + 'd': [True, True, True, True, True, False] * 100, + }) + + +def good_data(): + return pd.DataFrame({ + 'a': np.random.normal(loc=0.01, size=600), + 'b': np.random.randint(0, 10, size=600), + 'c': ['a', 'b', 'b', 'b', 'c', 'c'] * 100, + 'd': [True, True, True, True, False, False] * 100, + }) + + +def bad_data(): + return pd.DataFrame({ + 'a': np.random.normal(loc=5, scale=3, size=600), + 'b': np.random.randint(5, 15, size=600), + 'c': ['a', 'a', 'a', 'a', 'b', 'b'] * 100, + 'd': [True, False, False, False, False, False] * 100, + }) + + +@pytest.mark.parametrize('metric', METRICS) +def test_rank(metric): + bad = metric.compute(real_data(), bad_data(), target='d') + good = metric.compute(real_data(), good_data(), target='d') + real = metric.compute(real_data(), real_data(), target='d') + + assert metric.min_value <= bad < good <= real <= metric.max_value diff --git a/tests/integration/single_table/efficacy/test_multiclass.py b/tests/integration/single_table/efficacy/test_multiclass.py new file mode 100644 index 00000000..5b882f78 --- /dev/null +++ b/tests/integration/single_table/efficacy/test_multiclass.py @@ -0,0 +1,50 @@ +import numpy as np +import pandas as pd +import pytest + +from sdmetrics.single_table.efficacy.multiclass import ( + MulticlassDecisionTreeClassifier, MulticlassMLPClassifier) + +METRICS = [ + MulticlassDecisionTreeClassifier, + MulticlassMLPClassifier, +] + + +@pytest.fixture +def real_data(): + return pd.DataFrame({ + 'a': np.random.normal(size=600), + 'b': np.random.randint(0, 10, size=600), + 'c': ['a', 'b', 'b', 'c', 'c', 'c'] * 100, + 'd': [True, True, True, True, True, False] * 100, + }) + + +@pytest.fixture +def good_data(): + return pd.DataFrame({ + 'a': np.random.normal(loc=0.01, size=600), + 'b': np.random.randint(0, 10, size=600), + 'c': ['a', 'b', 'b', 'b', 'c', 'c'] * 100, + 'd': [True, True, True, True, False, False] * 100, + }) + + +@pytest.fixture +def bad_data(): + return pd.DataFrame({ + 'a': np.random.normal(loc=5, scale=3, size=600), + 'b': np.random.randint(5, 15, size=600), + 'c': ['a', 'a', 'a', 'a', 'b', 'b'] * 100, + 'd': [True, False, False, False, False, False] * 100, + }) + + +@pytest.mark.parametrize('metric', METRICS) +def test_rank(metric, real_data, good_data, bad_data): + bad = metric.compute(real_data, bad_data, target='c') + good = metric.compute(real_data, good_data, target='c') + real = metric.compute(real_data, real_data, target='c') + + assert metric.min_value <= bad < good < real <= metric.max_value diff --git a/tests/integration/single_table/efficacy/test_regression.py b/tests/integration/single_table/efficacy/test_regression.py new file mode 100644 index 00000000..1330e04d --- /dev/null +++ b/tests/integration/single_table/efficacy/test_regression.py @@ -0,0 +1,49 @@ +import numpy as np +import pandas as pd +import pytest + +from sdmetrics.single_table.efficacy.regression import LinearRegression, MLPRegressor + +METRICS = [ + LinearRegression, + MLPRegressor, +] + + +@pytest.fixture +def real_data(): + return pd.DataFrame({ + 'a': np.random.normal(size=600), + 'b': np.random.randint(0, 10, size=600), + 'c': ['a', 'b', 'b', 'c', 'c', 'c'] * 100, + 'd': [True, True, True, True, True, False] * 100, + }) + + +@pytest.fixture +def good_data(): + return pd.DataFrame({ + 'a': np.random.normal(loc=0.01, size=600), + 'b': np.random.randint(0, 10, size=600), + 'c': ['a', 'b', 'b', 'b', 'c', 'c'] * 100, + 'd': [True, True, True, True, False, False] * 100, + }) + + +@pytest.fixture +def bad_data(): + return pd.DataFrame({ + 'a': np.random.normal(loc=5, scale=3, size=600), + 'b': np.random.randint(5, 15, size=600), + 'c': ['a', 'a', 'a', 'a', 'b', 'b'] * 100, + 'd': [True, False, False, False, False, False] * 100, + }) + + +@pytest.mark.parametrize('metric', METRICS) +def test_rank(metric, real_data, good_data, bad_data): + bad = metric.compute(real_data, bad_data, target='a') + good = metric.compute(real_data, good_data, target='a') + real = metric.compute(real_data, real_data, target='a') + + assert metric.min_value <= bad < good < real <= metric.max_value diff --git a/tests/integration/single_table/test_single_table.py b/tests/integration/single_table/test_single_table.py new file mode 100644 index 00000000..79f030a7 --- /dev/null +++ b/tests/integration/single_table/test_single_table.py @@ -0,0 +1,105 @@ +import numpy as np +import pandas as pd +import pytest + +from sdmetrics import compute_metrics +from sdmetrics.demos import load_single_table_demo +from sdmetrics.single_table.base import SingleTableMetric +from sdmetrics.single_table.bayesian_network import BNLikelihood, BNLogLikelihood +from sdmetrics.single_table.detection import LogisticDetection, SVCDetection +from sdmetrics.single_table.gaussian_mixture import GMLogLikelihood +from sdmetrics.single_table.multi_column_pairs import ContinuousKLDivergence, DiscreteKLDivergence +from sdmetrics.single_table.multi_single_column import CSTest, KSTest, KSTestExtended + +METRICS = [ + CSTest, + KSTest, + KSTestExtended, + LogisticDetection, + SVCDetection, + ContinuousKLDivergence, + DiscreteKLDivergence, + BNLikelihood, + BNLogLikelihood, + GMLogLikelihood, +] + + +@pytest.fixture +def ones(): + return pd.DataFrame({ + 'a': [1] * 300, + 'b': [True] * 300, + 'c': [1.0] * 300, + 'd': [True] * 300, + }) + + +@pytest.fixture +def zeros(): + return pd.DataFrame({ + 'a': [0] * 300, + 'b': [False] * 300, + 'c': [0.0] * 300, + 'd': [False] * 300, + }) + + +@pytest.fixture +def real_data(): + return pd.DataFrame({ + 'a': np.random.normal(size=1800), + 'b': np.random.randint(0, 10, size=1800), + 'c': ['a', 'b', 'b', 'c', 'c', 'c'] * 300, + 'd': [True, True, True, True, True, False] * 300, + }) + + +@pytest.fixture +def good_data(): + return pd.DataFrame({ + 'a': np.random.normal(loc=0.01, size=1800), + 'b': np.random.randint(0, 10, size=1800), + 'c': ['a', 'b', 'b', 'b', 'c', 'c'] * 300, + 'd': [True, True, True, True, False, False] * 300, + }) + + +@pytest.fixture +def bad_data(): + return pd.DataFrame({ + 'a': np.random.normal(loc=5, scale=3, size=1800), + 'b': np.random.randint(5, 15, size=1800), + 'c': ['a', 'a', 'a', 'a', 'b', 'b'] * 300, + 'd': [True, False, False, False, False, False] * 300, + }) + + +@pytest.mark.parametrize('metric', METRICS) +def test_rank(metric, ones, zeros, real_data, good_data, bad_data): + worst = metric.compute(ones, zeros) + best = metric.compute(ones, ones) + + bad = metric.compute(real_data, bad_data) + good = metric.compute(real_data, good_data) + real = metric.compute(real_data, real_data) + + assert metric.min_value <= worst < best <= metric.max_value + assert metric.min_value <= bad < good < real <= metric.max_value + + +def test_compute_all(): + real_data, synthetic_data, metadata = load_single_table_demo() + + output = compute_metrics( + SingleTableMetric.get_subclasses(), + real_data, + synthetic_data, + metadata=metadata + ) + + assert not pd.isnull(output.score.mean()) + + scores = output[output.score.notnull()] + + assert scores.score.between(scores.min_value, scores.max_value).all() diff --git a/tests/unit/detection/tabular/test_logistic.py b/tests/unit/detection/tabular/test_logistic.py deleted file mode 100644 index d2c18cce..00000000 --- a/tests/unit/detection/tabular/test_logistic.py +++ /dev/null @@ -1,25 +0,0 @@ -import numpy as np - -from sdmetrics.detection.tabular.sklearn import LogisticDetector, SVCDetector - - -def test_logistic_nan_inf(): - """Make sure that NaN and Inf inputs are handled without crashes.""" - detector = LogisticDetector() - - X = np.array([[1, 2, 3, np.inf, None]]).T - y = np.array([1, 0, 0, 1, 1]) - detector.fit(X, y) - - detector.predict_proba(X) - - -def test_svc_nan_inf(): - """Make sure that NaN and Inf inputs are handled without crashes.""" - detector = SVCDetector() - - X = np.array([[1, 2, 3, np.inf, None]]).T - y = np.array([1, 0, 0, 1, 1]) - detector.fit(X, y) - - detector.predict_proba(X) diff --git a/tests/unit/test_report.py b/tests/unit/test_report.py deleted file mode 100644 index d1991323..00000000 --- a/tests/unit/test_report.py +++ /dev/null @@ -1,34 +0,0 @@ -from unittest import TestCase - -from sdmetrics.report import Goal, Metric, MetricsReport - - -class TestMetricsReport(TestCase): - - def test_report(self): - report = MetricsReport() - - report.add_metric(Metric( - name="one", value=10.0, tags=set(["a", "b"]), goal=Goal.MINIMIZE)) - assert report.overall() == -10.0 - assert len(report.details()) == 1 - assert len(report.details(lambda metric: False)) == 0 - - report.add_metric( - Metric( - name="two", - value=3.0, - tags=set(["a"]))) - assert report.overall() == -10.0 - assert len(report.details()) == 2 - assert len(report.details(lambda metric: "a" in metric.tags)) == 2 - assert len(report.details(lambda metric: "b" in metric.tags)) == 1 - - report.add_metric( - Metric( - name="three", - value=5.0, - goal=Goal.MAXIMIZE, - tags=set(["priority:high"]))) - assert report.overall() == -5.0 - assert len(report.highlights()) == 1 diff --git a/tutorials/1_getting_started.ipynb b/tutorials/1_getting_started.ipynb deleted file mode 100644 index 6876085c..00000000 --- a/tutorials/1_getting_started.ipynb +++ /dev/null @@ -1,650 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "slide" - } - }, - "source": [ - "# Getting Started\n", - "In this post, we'll demonstrate some of the core functionality of the **SDMetrics** library by using it to evaluate a synthetic dataset." - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": { - "nbsphinx": "hidden" - }, - "outputs": [], - "source": [ - "import warnings\n", - "warnings.filterwarnings(\"ignore\")" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "slide" - } - }, - "source": [ - "## Generating Synthetic Datasets\n", - "The **SDV** library provides tools for generating synthetic relational databases. Let's start by loading the Walmart dataset and generating a synthetic copy with 20 rows in the root table." - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "> stores\n", - " Store Type Size\n", - " 1 A 151315\n", - " 2 A 202307\n", - " 3 B 37392\n", - " 4 A 205863\n", - " 5 B 34875\n", - "\n", - "> features\n", - " Store Date Temperature Fuel_Price MarkDown1 MarkDown2 MarkDown3 MarkDown4 MarkDown5 CPI Unemployment IsHoliday\n", - " 1 2010-02-05 42.31 2.572 NaN NaN NaN NaN NaN 211.096358 8.106 False\n", - " 1 2010-02-12 38.51 2.548 NaN NaN NaN NaN NaN 211.242170 8.106 True\n", - " 1 2010-02-19 39.93 2.514 NaN NaN NaN NaN NaN 211.289143 8.106 False\n", - " 1 2010-02-26 46.63 2.561 NaN NaN NaN NaN NaN 211.319643 8.106 False\n", - " 1 2010-03-05 46.50 2.625 NaN NaN NaN NaN NaN 211.350143 8.106 False\n", - "\n", - "> depts\n", - " Store Dept Date Weekly_Sales IsHoliday\n", - " 1 1 2010-02-05 24924.50 False\n", - " 1 1 2010-02-12 46039.49 True\n", - " 1 1 2010-02-19 41595.55 False\n", - " 1 1 2010-02-26 19403.54 False\n", - " 1 1 2010-03-05 21827.90 False\n", - "\n" - ] - } - ], - "source": [ - "from sdv import load_demo\n", - "\n", - "metadata, real_tables = load_demo(\"walmart\", metadata=True)\n", - "for table_name, df in real_tables.items():\n", - " print(\">\", table_name)\n", - " print(df.head().to_string(index=False))\n", - " print()" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "> stores\n", - "Type Size Store\n", - " A 158863 0\n", - " A 148881 1\n", - " A 141181 2\n", - " B 39402 3\n", - " A 184711 4\n", - "\n", - "> features\n", - " Date MarkDown1 Store IsHoliday MarkDown4 MarkDown3 Fuel_Price Unemployment Temperature MarkDown5 MarkDown2 CPI\n", - "2011-02-02 14400.357649 0 False 2640.453224 2443.106190 3.550959 6.253251 76.279741 NaN 10794.829869 193.555658\n", - "2010-06-28 1992.568051 0 False NaN NaN 3.352604 4.566446 44.368874 1399.856071 NaN 173.742572\n", - "2011-11-20 15145.367881 0 False 1946.317167 3950.841584 3.539616 5.883201 72.896482 NaN 10712.726537 196.000664\n", - "2011-03-14 NaN 0 False NaN NaN 4.352685 4.818791 72.563082 5790.683617 NaN 182.003539\n", - "2012-02-05 7106.857092 0 False NaN NaN 3.638750 5.102400 46.585475 3104.993430 NaN 188.226169\n", - "\n", - "> depts\n", - " Date Weekly_Sales Store Dept IsHoliday\n", - "2010-03-12 14187.100825 0 86 False\n", - "2012-05-24 1280.838783 0 53 False\n", - "2010-11-18 -25100.204034 0 49 False\n", - "2012-04-26 21091.759568 0 -11 False\n", - "2012-06-07 26532.973434 0 23 False\n", - "\n" - ] - } - ], - "source": [ - "from sdv import SDV\n", - "\n", - "sdv = SDV()\n", - "sdv.fit(metadata, real_tables)\n", - "\n", - "synthetic_tables = sdv.sample_all(100)\n", - "for table_name, df in synthetic_tables.items():\n", - " print(\">\", table_name)\n", - " print(df.head().to_string(index=False))\n", - " print()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Evaluation with SDMetrics\n", - "Now that we have (1) a metadata object, (2) a set of real tables, and (3) a set of fake tables, we can pass them to **SDMetrics** for evaluation. The simplest way to get started with **SDMetrics** is to use the `evaluate` function which generates a report with the default metrics." - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "" - ] - }, - "execution_count": 4, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "from sdmetrics import evaluate\n", - "report = evaluate(metadata, real_tables, synthetic_tables)\n", - "report" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "fragment" - } - }, - "source": [ - "The metrics report can provide an **overall score**. This is a single scalar value which you can pass to an optimization routine (i.e. to tune some hyperparameters in your model)." - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "-52.87242536803069\n" - ] - } - ], - "source": [ - "print(report.overall())" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "slide" - } - }, - "source": [ - "Furthermore, the metrics report can also provide some key highlights. This shows the problem areas where your model performs especially poorly - for example, the below highlights suggest that our model is very bad at modeling the `MarkDownX` columns since the kstest has detected that the distributions look quite different between the real and synthetic versions." - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
NameValueGoalUnitTablesColumnsMisc. Tags
0kstest5.037590e-24Goal.MAXIMIZEp-valuetable:featurescolumn:MarkDown1statistic:univariate,priority:high
1kstest2.999024e-37Goal.MAXIMIZEp-valuetable:featurescolumn:MarkDown4statistic:univariate,priority:high
2kstest3.759181e-190Goal.MAXIMIZEp-valuetable:featurescolumn:MarkDown3statistic:univariate,priority:high
3kstest3.072000e-79Goal.MAXIMIZEp-valuetable:featurescolumn:Fuel_Pricestatistic:univariate,priority:high
4kstest1.294061e-32Goal.MAXIMIZEp-valuetable:featurescolumn:Unemploymentstatistic:univariate,priority:high
5kstest1.148167e-09Goal.MAXIMIZEp-valuetable:featurescolumn:Temperaturestatistic:univariate,priority:high
6kstest5.218656e-16Goal.MAXIMIZEp-valuetable:featurescolumn:MarkDown5statistic:univariate,priority:high
7kstest1.564043e-87Goal.MAXIMIZEp-valuetable:featurescolumn:MarkDown2statistic:univariate,priority:high
8kstest0.000000e+00Goal.MAXIMIZEp-valuetable:featurescolumn:CPIstatistic:univariate,priority:high
9kstest0.000000e+00Goal.MAXIMIZEp-valuetable:deptscolumn:Weekly_Salesstatistic:univariate,priority:high
10kstest0.000000e+00Goal.MAXIMIZEp-valuetable:deptscolumn:Deptstatistic:univariate,priority:high
\n", - "
" - ], - "text/plain": [ - " Name Value Goal Unit Tables \\\n", - "0 kstest 5.037590e-24 Goal.MAXIMIZE p-value table:features \n", - "1 kstest 2.999024e-37 Goal.MAXIMIZE p-value table:features \n", - "2 kstest 3.759181e-190 Goal.MAXIMIZE p-value table:features \n", - "3 kstest 3.072000e-79 Goal.MAXIMIZE p-value table:features \n", - "4 kstest 1.294061e-32 Goal.MAXIMIZE p-value table:features \n", - "5 kstest 1.148167e-09 Goal.MAXIMIZE p-value table:features \n", - "6 kstest 5.218656e-16 Goal.MAXIMIZE p-value table:features \n", - "7 kstest 1.564043e-87 Goal.MAXIMIZE p-value table:features \n", - "8 kstest 0.000000e+00 Goal.MAXIMIZE p-value table:features \n", - "9 kstest 0.000000e+00 Goal.MAXIMIZE p-value table:depts \n", - "10 kstest 0.000000e+00 Goal.MAXIMIZE p-value table:depts \n", - "\n", - " Columns Misc. Tags \n", - "0 column:MarkDown1 statistic:univariate,priority:high \n", - "1 column:MarkDown4 statistic:univariate,priority:high \n", - "2 column:MarkDown3 statistic:univariate,priority:high \n", - "3 column:Fuel_Price statistic:univariate,priority:high \n", - "4 column:Unemployment statistic:univariate,priority:high \n", - "5 column:Temperature statistic:univariate,priority:high \n", - "6 column:MarkDown5 statistic:univariate,priority:high \n", - "7 column:MarkDown2 statistic:univariate,priority:high \n", - "8 column:CPI statistic:univariate,priority:high \n", - "9 column:Weekly_Sales statistic:univariate,priority:high \n", - "10 column:Dept statistic:univariate,priority:high " - ] - }, - "execution_count": 6, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "report.highlights()" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "slide" - } - }, - "source": [ - "In addition, you will also be able to generate a visualization of the metrics." - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [ - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAArsAAANGCAYAAAD9GEbxAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjEsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+j8jraAAAgAElEQVR4nOzdeXhN1/7H8XdGJFGESIiqsQeRiKY105AqVb2plEppampQaq6KsSgNRbkSY1vzEKWpKmosNSc1BUVjSJAYagohZDy/P1znJzVHNHF8Xs/jec7ZZ++1vuvc9LmfrKy9toXRaDQiIiIiImKGLHO6ABERERGRp0VhV0RERETMlsKuiIiIiJgthV0RERERMVsKuyIiIiJithR2RURERMRsKexKrpWcnMzYsWNp0KABlStXxtvbm6FDh5KQkPDQa+Pi4jAYDAwZMuRfqPTeGjRogMFgwGAw4O7uzn/+8x9mz55NRkZGrqlRRETE3FnndAEi99OrVy/Wr19Pu3btqFu3LtHR0UycOJGDBw+yaNEiLCwscrrEh3J1deWbb74hISGBX375ha+++orjx48zbNgwihYtyqJFiyhcuHBOlykiImK2NLMrudLhw4dZv349fn5+BAUFUbt2bdq1a8fIkSOpU6cOSUlJGI1GZs2aRb169ahcuTJ+fn5ERUXds72AgADc3d1N7zt06IDBYAAgIiICg8HA9OnT8fPzo2rVqkyfPp2lS5dSvXp1fHx82L9/PwAhISEYDAZWrlzJm2++SfXq1fnuu+/uOw5bW1s8PT3x9vZm3Lhx1K1bl0WLFnHy5En+/vtvWrZsybfffsvGjRsxGAzMmzfPdO1HH33Ea6+9RkpKClFRUbRs2RJPT0/efvtt9uzZk6n2cePGUa9ePX7++WeuX79O586dcXd3p1WrVowZMwaDwcDevXsB2LhxI76+vlSpUoUWLVpw/PhxAMLDwzEYDCxevBhfX1+8vLwYOXKkqZ6jR48SEBBA1apVeeutt1i9erXpsx9//JHGjRvj4eFB+/btOX/+/GP97y0iIvK0KOxKrrRr1y4A6tevn+l4kyZN6N69O/b29ixdupTg4GB8fHyYMmUKaWlpdOjQgStXrmSpz8WLF9O1a1ecnJwIDQ1l8+bNDB48mLNnzzJp0qRM5y5cuJCBAwfi4uLC+PHjuXTp0iP10axZM4xGI5GRkZmO16pVCwcHBzZv3gzAtWvX2L17N2+88QZpaWl07tyZxMREJk6ciKurK7169TIthwBYtmwZQ4cOpUaNGsyYMYMNGzbQpk0bPvzwQxYvXmw678yZM3Tr1g0HBwcmTZpERkYGAwYMyFTLnDlz6NWrF1WqVGHOnDkcPHiQ1NRUOnfuzN9//83EiRMpXbo0ffr0ITY2lqioKAYOHMjLL79MSEgIJ06cYNSoUY/13YuIiDwtCruSK90OrIUKFbrvOUuXLsXOzo4BAwZQt25dUyDcuHFjlvqsX78+Pj4++Pj4kJycTJs2bWjatCkGg4GYmJhM57Zu3ZrXX38dPz8/0tLSiI2NfaQ+XnjhBQASExMzHbe1tcXb25vIyEhSUlLYsmULqampNG7cmJ07d3Lp0iVatGhBrVq1aNeuHWfOnDH9QgDg4+NDgwYNcHZ2ZuvWrdjZ2dGzZ0+aNGnC66+/bjpv48aNpKSk0KZNG2rUqEGrVq3Ys2cPp0+fNp3z7rvv4u3tTatWrQA4duwYUVFRnDp1Cn9/f+rWrcuXX35JSEgIefPmZe3atRiNRjp27Ejt2rXx9fVl7dq1pKSkPNb3LyIi8jRoza7kSrdD7sWLF+97zt9//03hwoWxsbEBwMnJyXQ8K4oUKQKAnZ0dAI6OjgDY29vfdVOci4sLAPnz5wcgNTX1kfo4e/YscO8Q36hRI5YvX87OnTv5/fffKVCgALVq1TItFxg1alSmGdO4uDiKFy8OQNGiRU3HL1++TOHChbG2ts5UK8DVq1cB6NatW6a+4+LiHji229/p7e+kcOHCpln3222+9957mdo8f/48rq6uD/tKREREniqFXcmVqlWrBsC6deto3Lix6fiCBQvYsGEDI0aMwMXFhaioKFJTU7GxsTHNTt4Z7m6zsbEhNTWVtLQ0rK2tc2xN6fLly7GysqJGjRqkpaVl+qxu3brky5ePTZs2sWnTJnx8fLCxscHZ2RmATp060bBhQ9P5xYsX5+jRowBYWv7/H2kKFSrEkSNHSE9Px8rKyhSwAVNbgwYNwtPT03S8VKlSmQLvP93+ReD293b+/HlWr15N9erVTUF74sSJpvAN6MY7ERHJFbSMQXKlsmXL0qxZM9MOBlu2bGHWrFl8/fXXXLx4kSJFiuDr68v169cZPXo0mzZtYtq0aRQqVAhvb++72itVqpTphrawsDBOnTr1r4wjJSWFvXv3sm3bNvr168eOHTsICAi4ZyDPly8f9erV44cffuDChQu89dZbAHh4eFCsWDF+//13EhISWLlyJcOHDyc5OfmefVavXp1r164xfvx4VqxYkWlZR506dbCzs+O3337j6tWrLFy4kK+//jpTWL4XT09PihUrRlhYGJs2bWLkyJEEBwdjYWFBw4YNsbS0ZM2aNVy9epXp06czZcoUbG1ts/7FiYiIZBOFXcm1RowYQbdu3diwYQOdO3dm9uzZvPvuu3z//fdYWVnh6+vL559/zqpVq+jSpQsODg7MmDHD9Of3O7Vv3x53d3dCQ0PZt28fPj4+/8oY4uPjadmyJR9//DG7du3is88+Iygo6L7nv/nmm1y/fp2CBQtSq1YtAPLkycPkyZPJly8f3bp1Y+3atfj7+2eaRb3Txx9/TM2aNU3BvmnTpgBYWFhQpEgRJk+ezKVLl+jSpQtRUVF06NABe3v7B47D1taWqVOnUqRIEXr06MGhQ4cYO3Ys5cqVw2AwMGbMGA4cOECXLl04ffo0HTt2fGiAFhER+TdYGI1GY04XISLZJyMjg9OnT+Ps7IyNjQ1ffvkl8+bNY926dbz44os5XZ6IiMi/SlMvImZm3rx5+Pj48M033/Dbb7+xatUqSpUqpZvFRETkuaSZXREzk5aWRnBwMCtXruTmzZu4u7szePBgypcvn9OliYiI/OsUdkVERETEbGkZg4iIiIiYLYVdERERETFbeqhENklLS+fy5aScLuOJFCpk98yPAcxjHOYwBtA4spuT093b6omIyINpZjebWFtb5XQJT8wcxgDmMQ5zGANoHCIikvMUdkVERETEbCnsioiIiIjZUtgVEREREbOlG9SySa23W+Z0CSLyDPl51nc5XYKIyHNBM7siIiIiYrYUdkVERETEbCnsioiIiIjZUtgVEREREbOlG9REROS5lp6ezrFjx3K6DJGHKlu2LFZWesjN49LMroiIPNeOHTtGTExMTpch8kAxMTH6pSyLNLMrIiLPvdKlS/Pyyy/ndBki8hRoZldEREREzJbCroiIiIiYLYVdERERETFbCrsiIiIiYrYUdkVERETEbCnsioiIiIjZUtgVEREREbOlsCsiIiIiZkthV0RERETMlsKuiIiIiJgthV0RERERMVsKuyIiIvJEIiIiMBgMBAUFARAeHo7BYCAkJCSHKxMB65wu4N8UGxvLiBEjuHz5MgCenp7069ePxo0b4+LigpWVFTdu3KB27dr06tWLuLg4unfvTnh4eA5XLiIiAtHR0UybNo2IiAgSEhIoWLAgNWrU4JNPPqFs2bI5Xd5jWbp0KfPnzyc2NpakpCScnZ1p1KgRPXr0IG/evDldnpiR5ybspqen061bNwYNGkT16tUxGo2MGDGCSZMmAfDtt99ib29PRkYGHTp0YOfOnbi4uORw1SIiIrdERETQsWNHUlNT8ff3p2zZshw7doywsDDWr1/Pd999h5eXV06X+UgWL17MoEGDKFWqFB07dsTGxobw8HBmzJjBpUuXGD16dE6XKGbkuVnGsGXLFsqWLUv16tUBsLCwoG/fvnTt2jXTeZaWlri5uXHixImcKFNEROQuGRkZDBo0iJs3b/L1118zZMgQWrduzZAhQxg9ejRJSUkMHDgQo9GIv78/BoOBnTt3mq6fM2cOBoPBFCK3bduGv78/VatWpU6dOkyZMsV0bkhICAaDgenTp/Pee+/Rvn174FbYvn1N7dq1GT58OCkpKVkaz+rVqwHo06cPgYGBtG3blu+//x4/Pz8qVKhgOi8yMpL3338fd3d3vL29GTNmTKY+V65cia+vL+7u7lSrVo1evXpx7tw50+cGg4E6deoQFhZG1apVTd/J3Llzefvtt/Hw8KBJkyb89ttvWRqHPBuem7AbExNDxYoVMx3Lmzcvtra2mY7duHGDiIgI3Nzc/s3yRERE7uvPP//k5MmTlCpViqZNm2b6rGnTppQoUYKYmBgOHTpE48aNAdi8ebPpnA0bNgDw1ltvceTIETp27MjZs2fp3r07Xl5eTJgwgeXLl2dqd+bMmVStWpWWLVuSkJBA586diY6ONl0zf/58Zs+enaXxODo6AvDdd9+xfft2UlJScHJyIjg4mHbt2gFw4sQJPv74Y2JiYujWrRseHh589913jBs3DoCtW7fSq1cvEhIS6NGjB2+88QYrV66kXbt2pKammvq6cuUKP/74I7169aJ48eL88MMPjBgxggIFCtCnTx/y5s1L9+7diY+Pz9JYJPd7bpYxpKWlkZ6eft/PAwMDsbKyIiMjg/fff58KFSoQFxf3L1YoIiJyb6dOnQKgdOnSd31mYWFB+fLliYuLIzY2ljfffJPg4GA2b95Mr169uHbtGn/88QclSpTAw8ODkSNHkpqaSrt27WjSpAlNmjRh06ZNLFy4MFOQNhgMDBo0CIBr164xc+ZM7OzsKFy4MB4eHqxevZqdO3cSGBj42OP5+OOP2bRpE1FRUbRt2xYbGxvc3d1p3LgxLVu2JG/evISHh5OcnMynn35qWr7h4OBg+v/yqVOnAhAcHEytWrUAOHnyJH/88QcRERHUqVMHgJSUFPr168err74KwPz58wEICgqiWLFilC5dmsDAQJYsWUKPHj0eeyyS+z03Ybd8+fIsXLgw07Hk5GTTcoXba3ZFRESeNUajEbi1FK948eJUqVKFffv2ceHCBXbu3ElqaipNmjQBbv2lE+Crr77iq6++MrVx/PjxTG3euZwgT548LF68mGXLlmVaRpCUlJSlel9++WXWrl3LihUr2LZtG7t37zb9W7duHXPmzDHVU65cOQBsbGwy1RsdHQ2Au7u76VilSpX4448/OH78uCns/nMst8ffokWLTDX9c/xiPp6bsFuzZk2+/vpr1q9fj4+PD0ajkbFjx5IvX76cLk1EROSBbs/oHjlyBKPRiIWFhekzo9HIkSNHAEw7MjRq1IioqCi2bNnC9u3bAUxh97bbSwNus7TMvLIxT548ptc//PADS5Yswc3NjU6dOnHt2jUGDBjwRGPKnz8//v7++Pv7YzQa2bx5M926dSMyMpLo6GjTDO7tIP9P9zqekZEBkOn7Ae65u8OkSZMyLWUsVKhQlsciudtzs2bX1taW6dOn88MPP9C8eXNatmxJvnz59CcLERHJ9SpUqECZMmWIi4tj6dKlmT5bsWIF8fHxVKpUifLlywO3wi7cWqu7adMmSpUqZbpv5XZwzpcvH/Xq1aNevXpYWFjg7Ox83/6PHj0KQLNmzWjUqJHpL6H3C6IPcu3aNbp168YHH3xAcnIycCuc1qtXj5IlSwK3/vJ6O7gfPnwYuLWrUocOHejQoQMZGRm8/PLLAOzfv9/U9r59+4D/nw2+l9vjd3Z2pl69eri7u2NjY4OTk9Njj0WeDc/NzC6Aq6sr06ZNu+v4/e7CLFGihPbYFRGRHGdhYcGIESPo0KEDAwYMICoqivLly3P8+HEWLlxI/vz5M/2Jv0SJElSuXJm1a9eSnp6Ov7+/6bPmzZuzYMECpkyZQmpqKjExMSxdupQOHTrw+eef37P/2yH0559/Jjk5mZ9//pmiRYsSHR3NihUrKFKkyCOP5fa62927d9OqVSveeust8ubNy86dO4mOjqZcuXJUqlSJwoULM2vWLL7//nusra05dOgQW7ZsISAgAEtLS7p06UL79u0ZMGAAAQEBHDx4kKioKNzc3Ew7L92Lv78/Q4cOJSgoCD8/P9avX8+uXbuYNm2athw1U8/NzK6IiMizzMvLix9++IFGjRqxZs0avvrqK9asWYOvry8//fTTXTsONW7c2LQU4M4lDAaDgUmTJlGqVCmmTZtGZGQknTp1onfv3vftu2XLlvj4+HD06FGWLl3KsGHDaNOmDTdv3uSXX3557LFMmDCBbt26cfPmTSZPnsyoUaOIioqiZcuWzJ49G2tra1xdXZk3bx5lypQhNDSUqKgoAgMDTYG8Vq1aTJs2jfz58zN+/Hi2bNlCs2bN+O677+5aknEnf39/PvvsM9LT05k4cSJXr15lzJgxeHt7P/Y45NlgYczK3yDkLrXebpnTJYjIM+TnWd899jVOTvmfQiVy+0an238WF8mN9HOadZrZFRERERGzpbArIiIiImZLYVdEREREzJbCroiIiIiYLYVdERERETFbCrsiIiIiYrYUdkVERETEbCnsioiIiIjZeq4eFywiImKOwsPD6d+/f5au/f333x/4mNy0tDTWr1/PqlWr2L9/PxcvXiQ1NZUCBQpQpkwZqlevTosWLXB2ds5q+fd0+fJlfvrpJ7Zu3cpff/3FlStXsLS0pECBAhgMBmrXrk2zZs0oUKDAPa+Pi4vDx8fniWqYM2fOAx89LM8GhV0RERG5p4MHD9KvXz/T07vudOHCBS5cuEBkZCRTp07ls88+o23bttnS7+LFiwkODub69et3fXbz5k3OnTvHpk2bCA0NZdCgQbz77rvZ0q+YJ4VdERERM+Lp6flY4e+FF1645/GoqCjatWtnCpx58uShYcOGVKpUCSsrK06cOMHq1atNM73BwcGkpKTQsWPHJ6p/5syZjBo1yvS+ZMmSNGjQAFdXVzIyMjhx4gTr1q3j77//JjExkX79+nHt2jU+/PDDTO0ULFiQoUOHPlbfqampjBkzhpSUFGxsbHB1dX2isUjuoLArIiJiRkqXLs0HH3zwRG1cv36dXr16mYJuxYoVmTRp0l3h77PPPqN3795s3LgRgP/+9780atSIl156KUv9Hj16lLFjx5ret2/fns8++wwrK6tM5wUFBTFw4EB++eUXAEaPHk3dunUz9evg4PDY38OkSZNISUkx9V2iRIksjUNyF92gJiIiIpksWbKE+Ph4APLnz8/06dPvOctpb2/P+PHjKVKkCHBrfW9YWFiW+w0LCyMtLQ2ASpUq0a9fv7uCLtyaZR45ciROTk4ApKSkEB4enuV+AU6ePMm0adMAKFasGJ988skTtSe5h8KuiIiIZLJq1SrTaz8/P4oWLXrfc+3s7GjSpInpfWRkZJb73bt3r+n1G2+88cBz8+TJQ926dU3vo6KistwvwLBhw0hOTgagX79+5MuX74nak9xDyxhEREQkEw8PDwoUKMDff/9NnTp1Hnp+6dKlTa/PnTuX5X6vXr1qev2ggH2vc65cuZLlftetW8eWLVsAePXVV3nrrbey3JbkPgq72WTbikWcP5+Y02U8ESen/M/8GMA8xmEOYwCNQ+RZ9bjbmN1e5wqQkZGR5X4dHR05ceIEAImJD/9v7s7dGgoVKpSlPtPS0kzrhC0sLLK8hZvkXlrGICIiIk8kJibG9PpJdjB49dVXTa8fZTnEvn37TK+9vLyy1OeiRYtM9b/99ttUrlw5S+1I7qWwKyIiYoaMRiPHjh1j9erVLFmyhJUrV7Jnzx5SU1OztZ+UlBRWr15tel+vXr0st+Xv70+ePHkA2LhxI1u3br3vuRs3bjSt07W3t6dFixaP3V9SUhKhoaEA2NjY0LNnzyxULbmdljGIiIiYkdTUVGbOnMncuXNNOyrcyc7OjubNm/PJJ5/g6Oj4xP1NmzaNy5cvA7duGnvvvfey3FaJEiUYMmQIgwYNwmg00rlzZ9q3b0/Tpk156aWXMBqNnDx5kuXLlzNjxgwALC0tGT58+COt8f2nsLAwLl26BECzZs148cUXs1y75F4KuyIiImZk+fLlLF++/L6fJyUlMWfOHNasWcOUKVOoVKlSlvvasWOHabsugLZt21K8ePEstwfQvHlzHB0dGT16NLGxsUydOpWpU6fe81yDwUC/fv2oXbv2Y/eTnJxsCsxWVlZP/DAMyb20jEFERMTMFC1alKCgIFauXMm+ffvYtWsXixYtomXLlqZ9a8+ePUvHjh2zvHvCgQMH6Natm2lZhKenJ926dcuW+l999VX8/f1xcXG57zkvvfQS7777Lm5ublnq46effuL8+fMA1K9fX7O6ZkwzuyIiIs84Ozs7041hHh4ejBgxAgcHB9PnefLkwdPTE09PT3x8fOjSpQtpaWmcP3+er7/+mnHjxj1Wfzt37qRz586mHRNcXV2ZOHEiNjY2TzyWbdu20bt3b9PSiFKlSuHt7U3x4sVJT0/n5MmTbNy4kRMnTjB69GimTZvGuHHjHmmLtDstWLDA9Pr9999/4rol97IwGo3GnC7CXDzrWxOZy/ZK5jAOcxgDaBxPow7JftHR0QC8/PLLOVzJv+e///0vkydPBm6tef39998fec3rqlWr+Pzzz00PYChZsiSzZs16ol0Ybtu/fz+tW7c2td23b1/at2+PpWXmP0SnpaURGhrKlClTgFthftGiRVSsWPGR+tm9e7fpUcLFihXjt99+u6uP3OZ5/DnNLprZFRERyWWOHz9ORETEA8+pXr06ZcqUyVL7H330EVOnTiUjI4OMjAy2b9+Or6/vQ6+bO3cuX331lWkv3QoVKvDtt99m6eawexkyZIgp6AYEBPDxxx/f8zxra2t69uxJfHw8y5YtIzk5meDgYObMmfNI/fz444+m176+vrk+6MqTUdgVERHJZfbu3cvQoUMfeE5wcHCWw26hQoUoVaoUx48fB+DUqVMPPN9oNDJu3Di+/fZb07GaNWsSEhJC/vzZ8xeHAwcOcPDgQdP7wMDAh17Tpk0bli1bBkBERATnzp3D2dn5gdekp6ezbt060/v69etnsWJ5VuhXGRERkedQgQIFTK8f9rSyoUOHZgq6fn5+fPvtt9kWdCHzAyKcnZ0fGlrh1syyhYWF6f2hQ4ceek1kZCQJCQkAFCxYEA8PjyxUK88SzexmE+/2A3O6BPmHxaODcroEEZEs8fPzw8/P76n2ceejdh8UWkeNGkVYWJjpfefOnenVq1e213PlyhXTazs7u0e6xtraGmtra9OOEFevXn3oNZs3bza9rl27tpYwPAcUdkVERJ5xqampnDlzhps3bz7SDUwpKSmcOHHC9P5+e+OGhYUxc+ZM0/v+/fvTtm3bJ673Xu7cPeL2zOvDJCUlZXoi3AsvvPDQa3bv3m16XaVKlceoUJ5V+nVGRETkGbZ8+XI8PDxo2LAhrVu35saNGw+9ZtOmTaYbweDWzW7/9OeffzJixAjT+169ej21oAu39s297fLly5w8efKh1xw4cOC+bdxLSkpKpmvKly//mFXKs0hhV0RE5BlWrVo104Mirl69yty5cx94/s2bNxk/frzpffXq1e/aNiw1NZX+/fubZk3/85//0Llz52yuPLNq1aplWr7wKDsr3Lm84qWXXqJ06dIPPD8mJibTTLDC7vNBYVdEROQZVrRo0Uzre8ePH8/ChQtN24Pd6cKFC3Tq1ImjR48Ct9a89unT567zfvzxR/766y9T+8OGDXviOgMCAjAYDBgMBoKC7r6nIm/evLRp08b0ft68ecydO5d7PQ7AaDTy7bffsmLFCtOxTp06PbSGO5du2Nvb4+Tk9LjDkGeQ1uyKiIg84/r168fevXv566+/yMjIYOjQocyaNYu6devi6upKWloahw8f5rfffiMpKQkACwsLhgwZcte61bS0tEw7L1SoUIGff/75sepp0KDBI+2m8E+ffPIJu3btIjIyEqPRyIgRI1iwYAH16tWjWLFiAJw+fZrff/+d2NhY03VNmzZ9pBv64uLiTK8fZX2vmAeFXRERkWecvb098+fPZ/Dgwfz6668AxMbGZgqEd3JycmL48OE0aNDgrs/Onj2bKRRu2rSJTZs2PVY9ZcqUyVLYzZMnD9OnT2fUqFEsWrQIo9HI8ePHTfsB/5OVlRUff/wx3bt3z7QF2f3cucXanTfEiXlT2BURETED+fPnZ8KECQQGBvLzzz+zZ88eTp48ybVr17C1tcXR0RE3Nzfq1q3Lf/7zH/LkyZPTJd9Tvnz5GDZsGO3btyc8PJxdu3YRExNj2lasQIEClC1bltdee4333nvPNOP7KG7PaoPC7vPEwnivxTDy2LTPbu7zLO+z6+SUn/PnH7zJ+7NA48j+OiT7RUdHAzzSll0iOUU/p1mnG9RERERExGwp7IqIiIiI2VLYFRERERGzpbArIiIiImZLYVdEREREzJbCroiIiIiYLYVdERERETFbCrsiIiIiYrYUdkVERETEbCnsioiIiIjZss7pAkRERHJaTExMTpcg8kAxMTGULl06p8t4Jv3rYTcuLo533nmHypUrm45VqFCBgQMHPvL13bt3Jzw8/L7nuLm58corrwBw/fp1mjdvTqtWrTKdc+jQIdauXUv37t2zMAoRETEXZcuWzekSRB6qdOnS+lnNohyZ2S1dujRz5859au07ODiY2k9JScHPz4969epRokQJADIyMqhYsSIVK1Z8ajWIiMizwcrKipdffjmnyxCRpyRXrNmNiIjINMNavXp1AM6dO0fHjh1p06YNHTp04MyZM4/dtq2tLeXLlycuLo6QkBD69u1L69at2b59u6nP5cuX4+/vT/PmzVm5ciUAa9as4YMPPqB169aMHj06G0YpIiIiIv+2XBF272fixIm0a9eO2bNnExAQwJQpUx67jcuXL3Po0CHKly8PQHp6OgsXLsTS8tbQr1+/zqRJk5g9ezYzZszgl19+ISkpialTpzJ79mzmz59PfHw8e/bsydaxiYiIiMjTlyPLGGJiYggICDC9r1Wr1j3P279/P8ePH2fy5Mmkp6dTuHDhR5GYMsEAACAASURBVGr/2rVrmdr//PPPTde6u7tnOjc2NpaSJUuSJ08e8uTJw5QpUzh48CDx8fF06NABgMTEROLj46latepjjVNEREREclauWLMbGRnJoUOHTO/T09NNrydMmICzs7PpfVxc3EPbv3PN7j/Z2NjcdSwjI+OuY5UqVWLmzJkP7UtEREREcq9csYwhf/78XL58GYCTJ0+SmJgIQJUqVVi3bh0A27dvZ/ny5dned6lSpYiNjSUpKYnk5GTatm1LqVKlOH78OBcvXgRuLac4d+5ctvctIiIiIk9Xrthn12AwYGtrS7du3XjppZdwdXUlIyODTz/9lP79+7Ny5UosLCwIDg7O9r7t7e3p0aMHbdu2JSMjg48++gg7OzsGDhxIYGAgNjY2uLm5UbRo0WzvW0RERESeLguj0WjM6SLMgXf7R9snWP49i0cH5XQJWebklJ/z5xNzuownpnFkfx0iIvJ4csXMblaEhoYSERFx1/GvvvqKF198MQcqEhEREZHc5pkNu59++imffvppTpchIiIiIrlYrrhBTURERETkaVDYFRERERGzpbArIiIiImZLYVdEREREzJbCroiIiIiYLYVdERERETFbCrsiIiIiYrYUdkVERETEbCnsioiIiIjZUtgVEREREbOlsCsiIiIiZkthV0RERETMlsKuiIiIiJgthV0RERERMVsKuyIiIiJitqxzugBzsXHGSM6fT8zpMp6Ik1P+Z34MYD7jEBERkSenmV0RERERMVsKuyIiIiJithR2RURERMRsKeyKiIiIiNlS2BURERERs6WwKyIiIiJmS2FXRERERMyWwq6IiIiImC2FXRERERExWwq7IiIiImK29LjgbPJm/xk5XcJzZX7vFjldgoiIiDwDNLMrIiIiImZLYVdEREREzJbCroiIiIiYLYVdERERETFbCrsiIiIiYrYUdkVERETEbCnsioiIiIjZUtgVEREREbOlsCsiIiIiZkthV0RERETMlsKuiIiIiJgthV0RERERMVsKuyIiIiJithR2RURERMRsKeyKiIiIiNlS2BURERERs6WwKyIiIiJmS2FXRERERMzWQ8NuREQE3bt3z3QsJCSEefPmPbWibouLi8PPz++p9/Mgf/zxBxcvXszRGkREREQkazSz+xA//vijwq6IiIjIM8o6qxdmZGTQsGFD3njjDXbu3EnBggWZNm0aSUlJDBw4kISEBNLT0xk0aBAVKlSgQYMG+Pv7s2LFCjw9PXF0dGTjxo1UrVqVIUOG0LdvX+zt7YmNjeXSpUuMGjWKF154wdRfREQE33zzDdbW1ri4uBAcHEzr1q0ZN24cJUuW5OzZs3Tp0oW+ffsyb948rKys+PPPP+nevTtr1qwhOjqaL7/8kho1arBmzRpmzpyJpaUlHh4e9OvXjyVLlrBnzx4uXrxITEwMgYGBFCtWjHXr1nHkyBFCQkIoXrx4tnzpIiIiIvLvyPLMrqWlJadOncLX15fFixeTkJBAdHQ0c+bMoU6dOsyePZsvvviCMWPGmK6pXLkyP/74I8uXL6datWosXryY5cuXk5GRgZWVFZaWlsyaNYs+ffowZcqUTP0NGTKE8ePHM3/+fAoWLMiyZcvw9fVl5cqVAKxfv563334bS0tLDh8+zNixY+nduzeTJk1iwoQJ9OzZk2XLlpGUlMTUqVOZPXs28+fPJz4+nj179mBpaUl0dDSTJk1i8uTJzJs3j9q1a1OxYkWCg4MVdEVERESeQVkOuxYWFjg4OFChQgUAihUrxtWrV9m/fz8LFy4kICCA4cOHk5iYaLqmcuXKWFtbU6BAAdzc3LC2tsbBwYGbN28C8NprrwHg7u5OTEyM6bqEhASsrKxMgdPLy4vDhw/z9ttvs2bNGgA2bNhA06ZNATAYDNja2lKkSBHKlSuHjY0NRYoU4dq1a8TGxhIfH0+HDh0ICAjg5MmTxMfHA+Dp6YmVlRUuLi5cvXo1q1+NiIiIiOQSD13GYG9vnymwwq3wWbJkSaysrDIdNxqNAAwcOBAvL6+72rrz/Dtf377uzvcWFham9xYWFnedY2FhQaFChXBxcWHfvn0YjUacnZ2JjY3F2vr/h3Xn69sqVarEzJkzMx0LDw+/57kiIiIi8ux66MyuwWDg1KlTnDx5EoBLly6xZcsWatasec/zq1Spwvr16wE4evQos2bNeuRidu7cCcD+/fspU6aM6XiBAgUwGo2mGdgdO3ZQuXJlAHx9fRk2bBiNGzd+pD5Kly7N8ePHTTedTZw4kXPnzt33fAsLC9LT0x95DCIiIiKSezx0KtPGxoaxY8fSr18/ANLT0xkwYABFixa95/kffvgh/fv3p1WrVqYb1B7VjRs3aNOmDVeuXGH06NGZPhs+fDi9e/fGysqKUqVK8fbbbwNQv359Bg8eTKNGjR6pj3z58jFw4EACAwOxsbHBzc3tvmMBqFatGj179iQ0NJTy5cs/8lhEREREJOdZGP+5PiCHBAUF0ahRI+rXr/9Y123fvp3w8PBMN8LlhDf7z8jR/p8383u3uO9nTk75OX8+8b6fPwvMYQygcTyNOkRE5PE804tUQ0ND2bJlC//9739zuhQRERERyYVyTdgdNWrUY1/z6aef8umnnz6FakRERETEHOgJaiIiIiJithR2RURERMRsKeyKiIiIiNlS2BURERERs6WwKyIiIiJmS2FXRETkDsnJyYwdO5YGDRpQuXJlvL29GTp0KAkJCQ+9Ni4uDoPBwJAhQ/6FSh9PQEAA7u7u2drmjBkzaNSoEe7u7tSsWZOuXbuannYKMHv2bOLi4h7aTnx8fKYnrj5qrYmJiYSEhJjeBwUFYTAYOH/+/OMN5A736/t22/f6d2cNjyMyMpJ169ZluVZ5NLlm6zEREZHcoFevXqxfv5527dpRt25doqOjmThxIgcPHmTRokVYWFjkdIlZ8sUXX5CUlJRt7S1ZsoTRo0fz0Ucf8cYbb3DkyBGCg4O5evUqc+fOJS4ujq+++ooKFSpQokSJB7YVHh7OTz/9RNu2bR+r1rVr1xIaGkq3bt0A6NKlC/7+/hQsWPCJx/dPt9uGW0+LLV++PF988QUALi4uWWozJCQEV1dX3njjjWyrU+6mmV0REZH/OXz4MOvXr8fPz4+goCBq165Nu3btGDlyJHXq1CEpKQmj0cisWbOoV68elStXxs/Pj6ioqHu2989Zwg4dOmAwGACIiIjAYDAwffp0/Pz8qFq1KtOnT2fp0qVUr14dHx8f9u/fD9wKRQaDgZUrV/Lmm29SvXp1vvvuOwBq1arFgAEDAFi5ciUGg4FevXoBsGXLFgwGAzt37mTYsGG0bt0agNTUVL744gtq1qyJp6cn7du358yZMwBcv36doKAgatasSbVq1QgNDb3n2Pbt2wfABx98QPXq1fnwww/59ttv6du3L3Fxcfj4+ADw0UcfERISgtFoZNSoUdSqVYtXX32VPn36kJycTEhICKGhocTHx2MwGIiLi3ukWsPDw+nfvz8ABoOBiIgIJk+eTMuWLU2z8Lt27eK9996jatWq+Pn5ERkZCXDfWh6kZMmSeHp64unpiYWFBQ4ODqb3Li4uTJ8+nQYNGlClShV69uxpCut//fUXrVq1omrVqtSqVYsJEyZgNBoJCAggMjKSn376iQYNGgAwf/58fHx8cHd35z//+Q87dux4YE3yaBR2RURE/mfXrl0Adz26vkmTJnTv3h17e3uWLl1KcHAwPj4+TJkyhbS0NDp06MCVK1ey1OfixYvp2rUrTk5OhIaGsnnzZgYPHszZs2eZNGlSpnMXLlzIwIEDcXFxYfz48Vy6dIlXXnmFP//8E4Ddu3fj4uLCnj17ADhw4AC2trZ4eHhkaufnn38mLCyMfv36MWnSJOLj45kwYQIAY8aMYenSpfTo0YOuXbsSEhLC1q1b76q7QoUKALRt25YxY8awadMmXnnlFTw8PChatCiffPIJAEOGDKFFixYsW7aMmTNn8sEHH/Dpp5+yfPlyFi5cSIsWLahUqRJOTk4sWrSIokWLPlKt3t7evP766wAsWrQINze3TNclJCTQqVMnbGxsCA0NxdbWli5dupCYmHjfWrJq5cqVjBs3jnr16jFu3Di2bdvGtGnTABg+fDjXrl1jypQpdO3ale+//54tW7aYZoVff/11QkNDOXnyJMOHD6dhw4bMmDEDV1dXBg8eTHp6epbrklsUdkVERP7ndmAtVKjQfc9ZunQpdnZ2DBgwgLp169K5c2cSExPZuHFjlvqsX78+Pj4++Pj4kJycTJs2bWjatCkGg4GYmJhM57Zu3ZrXX38dPz8/0tLSiI2NxcvLi6NHj5KcnMzu3bsJCAjg7NmznDt3jj///BMPDw9sbW0ztWM0GoFbs44ODg6sWLGC0aNHA7eWBhgMBpo3b07r1q1xdXVl+fLld9XdsmVLAgMDSUpK4rvvviMwMJDatWuzePFibG1tKVmyJADlypXDxcWFGjVqsHTpUgIDA2nRogUAR44cwcXFBQcHB2xtbfH09HzkWh0dHXF0dATA09MTBweHTNf9/vvvJCYm0r59e2rXrs3YsWMZM2YM6enp960lq9auXQtAjx498Pb2pkGDBqbvLCMjg8uXLxMXF4e3tzf79u2jbt26lCtXDgBHR0cqVapkGmdsbCw3btxg7NixrF27FisrqyzXJbco7IqIiPzP7ZB78eLF+57z999/U7hwYWxsbABwcnIyHc+KIkWKAGBnZwdgCnD29vakpqZmOvf22tD8+fMDt/7E7+XlRVpaGrt37+bw4cPUr1+fMmXKsHv3bg4cOMCrr756V5++vr74+fmxcOFC3n//ferVq8eqVauAW4H/8OHDuLm54ebmRnx8/D1vMrOysuKzzz5jx44dLFq0iE6dOpGWlsawYcO4dOnSXeefO3eOoKAgXnnlFV555RWAR5q1fFCtD3Lu3Dng/7/PEiVKUL9+fQoWLJjlWu7n6tWrANSoUQM3Nzd++uknzpw5Q0ZGBoMGDcLFxYVBgwbRoEEDPvjgg3v+rLz00ksEBQWxb98+AgMDqVGjBt98802Wa5L/p7ArIiLyP9WqVQO46w75BQsWEBgYyLlz53BxceHSpUumIHr69Gng3jcp2djYkJqaSlpaGsAT7RJwP5UqVSJfvnwsWLCAF154gbJly1K1alXWr1/P6dOn8fLyuusaW1tbhg8fTmRkJPPmzcPZ2ZkRI0YA4OzszMsvv8ySJUtM/4YOHXpXG8uWLSMsLAxra2s8PT3p3bs3LVq0IDU19Z6/LAQHB3PixAlCQ0OZN2/eI4/vQbU+yO1fQm5/5zExMcybN4/Tp09nuZb7ub30YuHChabvbNGiRQC4ubmxcOFCtm7dysiRI9m/fz9z5869ZzsffvghW7duZdmyZTRq1Ihp06Zx/PjxJ67veaewKyIi8j9ly5alWbNm/PLLL3z11Vds2bKFWbNm8fXXX3Px4kWKFCmCr68v169fZ/To0WzatIlp06ZRqFAhvL2972qvVKlSphvawsLCOHXqVLbXbG1tjYeHB+vXr6dq1arArT/rr1q1CktLS9PM5Z3Gjh1L3bp12bJlC0ajETs7O/LlywfAm2++yfHjxzl8+DCnTp3iiy++YOfOnXe1sX79eoYNG8akSZOIjIzk119/5bfffqNYsWK89NJL5M2bF7j1J/5jx45x8+ZN4NaM8PLly3nhhReIiYnh9OnT5MmTh/Pnz7NmzRoSExMfudY8efIAt3aG+OcvEq+//jp2dnbMmDGDrVu3MmTIEMaPH0/evHkfWEtWNGrUCIBVq1aRkJDAN998w+LFi0lPT8fb25tu3bpx/PhxHBwcsLS0zFT/gQMH2LJlCxs3bsTT05MFCxZw7do18ufPj4WFhel7lKxT2BUREbnDiBEj6NatGxs2bKBz587Mnj2bd999l++//x4rKyt8fX35/PPPWbVqFV26dMHBwYEZM2aYlhbcqX379ri7uxMaGsq+fftMOxRkNy8vL9LT002zuK+88gqpqalUqFDhrrWst+t69dVX6du3Lx07diQ9PZ2xY8cC0K1bN9577z3GjRvH0KFDKVeuHO+8885dbQQHB9O6dWuWLFlC+/btGTx4MBUrVuT777/H1taWGjVqUK5cOcLCwvj111/p3r07Dg4O9O3blzJlytChQwcOHjzImjVr8PPzw8bGhoEDB5qWHzxKrU2bNsXR0ZGRI0cSHR2d6TpHR0cmT55MamoqXbt2JTExkUmTJuHo6PjAWrLC29ub/v37s27dOrp3705aWhrt2rXDxsaGYcOGERcXR/v27Rk2bBhNmzalXbt2APj7+3PixAm+/PJL6taty0cffcTkyZNp06YN27ZtIzg4mOLFi2epJvl/FsbbK6LlibzZf0ZOl/Bcmd+7xX0/c3LKz/nziff9/FlgDmMAjeNp1CEiIo9HM7siIiIiYrb0BLVssia4fa6Y+XkSuWX2SkRERCS7aGZXRERERMyWwq6IiIiImC2FXRERERExWwq7IiIiImK2FHZFRERExGwp7IqIiIiI2VLYFRERERGzpbArIiIiImZLYVdEREREzJbCroiIiIiYLT0uOJt8MOHXnC7hiS3s+VZOlyAiIiKSrTSzKyIiIiJmS2FXRERERMyWwq6IiIiImC2FXRERERExWwq7IiIiImK2FHZFRERExGxp6zERERF5YunpGWQYjaRnGMn437/0DCMZxluvLSwssMtjTd48ih7y79JPnIiIiNwlNS2DsxevcyHhBleup3DlWvL//t16nXAtmavXUki4lsyN5LRHbtfK0gK7vNbY5bXBPq8N9vlssMtrjX0+GxzsbHAqaIdLYTtcCtvj4mincCxPTD9BIiIiz7FrSSmcOJvIibNXOXU2kfjz1zh94TrnE26QkWHM9v7SM4wkJqWSmJT6SOcXcLDFxdEe58J2ODvaUbyIPWVcC/KSS36srLQaUx5OYVdEROQ5kZyaTvTJyxyMucjh2Mscj0/g0tXknC7rgW7NJKfw18nLmY7bWltS2rUA5V8s+L9/hXB1csDS0iKHKpXcSmFXRETETF25lszBmEscjLnIoZhLHItPIC09+2drc0JKWgZ/nbjMXyf+PwTny2NN2RIFePnFQlR52Qn3soWxsbbKwSolN1DYFRERMROpaRnsO3qeiANn2Xf0AvHnr+V0Sf+qG8lpHDh2kQPHLhK+8Sh5bK2oXKYwXhWc8apQlOJODjldouQAhV0REZFn2PUbqew8dI4dB86w6/Dfj3WzmLlLTkln1+G/2XX4bwCKFbanqsEJrwrOeJQropvfnhP6X1lEROQZc/HKDXYcOMuOA2c4cOyC2SxNeNrOXLzOmW3XWbktFlsbK6q7ueDtVQIvQ1Hd7GbGFHZFRESeAcmp6WyNimdNxEkOxlzEqHz7RFJS09m8N57Ne+Mp4GBL3Squ1H/1RV4uWSinS5NsprArIiKSix2LS2B1xAk27Y7j+k0tUXgarlxLYfnWGJZvjcHVyR5vrxfxfqUELoXtc7o0yQYKuyIiIrlM0s1UNu6OY03ECY7FXcnpcp4r8eevM3/VYeavOoxHuSI0rVOG6m4u2tLsGaawKyIikkucPHuVnzYeY3NUPMkp6TldznNv39EL7Dt6AWdHO5rUKs2bNV7CIZ9NTpclj+mphd24uDh8fHxYvHgxHh4epuPNmzenXLlyjBo16qHXd+/enfDw8EzHGzRogIuLC0ajkeTkZPz8/Pjwww+ztfaLFy/y+eefk5KSws2bNxkwYABVq1bN1j5ERERuO3LqMj+siybiz7Nai5sLnbuUxMzlfxK29jANq7+Eb72yFC1kl9NlySN6qjO7L774Ir/++qsp7MbHx5OQkPBI1xof8F/7t99+i729PdevX6dPnz44ODjw7rvvZkvNAEuXLuXdd9/lnXfeITIykpCQEGbMmJFt7YuIiADsO3qexeuOsPfI+ZwuRR7BjeR0lm06zootMdT1dMWvfjlKFy+Q02XJQzzVfTY8PT3Zvn276f3q1aupU6cOAMuXL+f999/H39+fwYMHAxAeHk6PHj1o1aoV586dM133+++/ExgYSHp65j/p2NvbM2TIEL7//nsAIiIiaNmyJa1bt6ZPnz6kpKTQuHFj0tPTSUtLo2rVquzfvx+ADh06EB8fT8OGDRk9ejQtWrQgMDCQjIwMOnTowDvvvAPA2bNncXZ2fnpfkoiIPHciD56l78RNDJyyTUH3GZSeYWTj7jh6fLORr+fu5PSF5+vhHc+apxp2ra2tqVixInv37gVgw4YNvP766wDcuHGDSZMmERYWRmxsLH/99RcA586dY/78+bi4uABw4sQJpkyZwjfffIOV1d2P/CtevDiXL18mIyODIUOGMH78eObPn0/BggVZtmwZbm5uHDlyhEOHDuHu7s7evXvJyMjgwoULuLq6curUKXx9fVm8eDEJCQlER0cDcP78efz8/JgyZQq9e/d+ml+TiIg8J3YeOkf3cRv48vsIDt/xmFt5NhmNsHlvPF2//o3JP0Zx+erNnC5J7uGp36DWuHFjfv31V5ydnSlQoAB2drfWuNjb29OzZ08sLS05cuSIaXlD5cqVsbC4dcfjjRs36Nq1K6NHjyZ//vz37cNoNHL16lWsrKwoXrw4AF5eXuzevZtq1aqxd+9ekpOTadWqFevWreO1116jUqVKADg4OFChQgUAihUrxtWrVwFwcnIiPDycjRs30rdvX2bNmvVUvh8RETF/J85e5fufD7AnWrO45igt3civ22LZsPMU/6lXlvfql8Mur25kyy2e+uNCatasyfbt21mzZg0NGzYEIDk5mS+//JLx48czd+5cKleubDrfxub/fzjOnj2Ll5cXCxYsuG/7sbGxODk5YWFhcdc6XwsLC6pVq0ZUVBR79+6lTp06XLt2jT179lC9enWAu2aLjUYjERERpvDt7e3N4cOHn+xLEBGR59KVa8lM/jGK7uM2Kug+B26mpPPDumg+HrmOpb8fJTVNO2rkBk897Nra2lKpUiWWLFlCgwYNgFszttbW1hQtWpSTJ09y6NAhUlNT77q2dOnSDB06lJMnT7Jly5a7Pr9x4wZffvklbdq0oUCBAhiNRuLj4wHYsWMHlStXpnTp0pw9e5bExEQcHBwoUqQI69ato0aNGvet+bfffmPZsmUA/PXXX6YlFSIiIo8iNS2DnzYepVPwOn7dFktGhrZYeJ4kJqXw/bI/6TRqPTsOnMnpcp57/8o+u40bN+bSpUumpQgFCxakbt26NG/eHIPBQMeOHRk9ejQBAQF3XWthYcHIkSPp3LkzP/zwAwCBgYFYWlqSlJREs2bNaNasGQDDhw+nd+/eWFlZUapUKd5++20AHB0dsbe/9RSUKlWq8McffzwwwHbu3JmgoCDWrFlDamoqQ4cOzc6vQ0REzNiOA2eY8cufnLlwPadLkRx2/vINRs6MpKZ7MTo1c6dwgXw5XdJzycL4oD2+5JF9MOHXnC7hiS3s+RbnzyfmdBlPzMkp/zM/DnMYA2gcT6OOJxEXF8c777xjWjqWkpJC+fLlGTZs2D1vAAaYPn06r7322mPtNf7HH39QpkwZChcunOl4eno6nTt3ZvDgwZQsWfKR20tPT2fYsGGmG4jHjBnDiy++mOmcK1eu0Lt3b+zt7Zk4cSIAffr0ISAgAE9Pz0fu60lcTrzJpMVRRPx59l/pT54tdnmt+ahJJZrUKmW6N0n+HU99GYOIiOQepUuXZu7cucydO5dFixaRmprKL7/8ct/zO3bs+NgP1fnxxx+5ePHiXcfDwsLw8vJ65KAbHx/PsWPHWLp0KRYWFoSFhdGpUydCQkLuOveLL77Ay8sr07GgoCCGDRv2wH3bs8vG3XF0/fo3BV25r6SbaUwN38fnIZs5ceZqTpfzXFHYFRF5jlWpUoUTJ04AMH/+fFq1aoW/v79pB5qgoCA2bNhAWloagwYN4qOPPsLf358dO3YAsH37dvz9/WnRogWzZs1i69atrFu3jv79+3P69OlMfS1YsAB/f38AXn/9dSZMmEDr1q0JCAgw7YQDEB0dzeeff27agz0iIgIfHx8A6tSpQ2Rk5F3jGDFixF1h18nJidKlS2fa7z27XU68yciZEYybv4vEpLvvPRH5p8MnLtNz/EbmrDxISqpuYPs3/CtrdkVEJPdJTU1lw4YN+Pv7Ex8fz5o1a5g/fz4AH3zwAY0bNzadu2LFCooUKcKIESO4dOkSbdu2ZdmyZQwfPpz58+dToEABunTpQsuWLalYsSKDBw82bQUJt/ZQt7a2pmDBgsCtnXAqVKhAz549+frrr/n555+pVq0a48ePx8LCgk6dOpmWH5w/fx5HR0fg1o496enppKenZ1p64eDgcM8xVq9enYiICGrVqpW9Xx7w++44pv20n8SklGxvW8xbWrqRxeuPsG3fGT4PeJUyrnoK29OksCsi8n/s3Xl8jNfiBvBntuyRZJLIRoiGEIkkSIQSBEUXW2sLaV0qWvtSP9ulYte61VpbvZZStVRVq6jiWkpjF1SV2CUkssi+z8zvD9dckUWQyZnl+X4+/dxklneeN+/FMydnzjEhN2/e1H4Y+OrVq4iKikKHDh2wb98+3Lx5E++++y4AICcnB/Hx8drnXbx4EcePH8eZM2cAPFpCMjU1FTKZTFtEv/rqq3JfNykpqdRulCEhIQAAf39/nDp1CllZWVAoFIiOjtYeEyi5JOXzcnV1xdmzZ1/4+WV5mJWPlT9cQMxFfsqeXk5CcjY+WnIEg99qjDdb1xMdx2ix7BIRmZDHc3YBYPTo0SXmz4aFhWHOnDklHr9t2zbt11FRUejWrZv2+/T0dKjV6pfOpNFoIJFIEBUVhd27d2P48OHw9/fHkCFD4OrqCmdnZ+0c4MLCQigUinI/UFfe8avK+bhkLPr2DNKzC6rsmGTaiorV+OrHizgf51hsCgAAIABJREFUl4wxfYNgY2UmOpLR4ZxdIiITNXHiRCxatAh5eXlo3LgxTpw4gby8PGg0GsyZMwf5+f/b+jQgIAAHDhwAAKSmpmLx4sWwt7eHSqVCUlISNBoNhg0bhszMTEgkEqhUJeciuri4ICkpqcRtp06dAvBo1LhevXqQy+Xo1q0bNm/ejNatW2Pq1Kk4cuQIwsLCtK996NCh55qSkJSUVCVrpWs0Gnx/4CpmrIph0SWdOP5nIkZ/dgiXb6aJjmJ0WHaJiExU7dq10blzZ6xcuRLu7u4YNGgQBg4ciN69e8PJyQkWFhbax3bt2hU2Njbo168fhg0bhqZNmwIAZs6ciVGjRqFPnz4ICQlBjRo1EBISgrFjxyIuLk77fBcXFxQWFmp3pwSA8+fPIyIiAn///Te6d+9eIlvbtm2xZs0ahIaGomPHjigoKECvXr3wzTffYOTIkQAeLYt27tw5qFQqREZGYt68eTh58iQiIyO1H0o7deqUdsfMF5WTV4S5a09i/e7L3ByCdCr5YR6mrDiK7w9crZZVREwF19mtIlxnV3/oy5qoL8MYzgHgeegiR3WbMGECunfvjrCwsJc+1saNG5GdnY1hw4YhPDwcO3fu1G74owtpaWkYOnQotm3b9sLrmt66n4n5607iHjeIoGoW2MAZHw1oBjsbc9FRDB5HdomIqEzfffcdzp8/D19f3yo5Xt++fXH69GncvXu3So73LPPmzcP06dNfuOgePHMXHy05wqJLQsReTcbEJb8j/oH4N9qGjiO7VYQju/pDX0bhXoYxnAPA89BFDtI9lUqNr3/6E7uO3RQdhQg2lgpMGRSMJt7OoqMYLI7sEhER/VdufhGi/32cRZf0RnZeET5eFYP9J2+LjmKwWHaJiIgApGbkYcryYzh3NVl0FKISilUafLElFt/s+osfXHsBLLtERGTy7iRmYuLS33HjXoboKETl2vafOCzccJrbDD8nll0iIjJpf99Ow+TlR5H8ME90FKJnOnb+HqauOIbMHG5TXVksu0REZLLO/v0A07/8A1m5RaKjEFXalTsPMW3lMWRwg5NKYdklIiKT9Pu5BMxecxz5hfyVMBmeW/czMW3lMaRnsfA+C8suERGZnCPn4rFo42kUq/hhHzJctxOzMHXlMTzMyn/2g02YXHQAY2Esa9QSERm7mIv38dl3Z8Gdf8kY3E3KwtQVxzD3w1ehrGHx7CeYII7sEhGRyTj79wN8suE0VGy6ZETiH2Rj6oqjSM3ghyzLwrJLREQm4eL1FMxddxLFKrXoKERVLiE5B1NWHENKOgvv01h2iYjI6P19Ow2zVx/n+qRk1O6n5OCfX/6BrFwuS/Ykll0iIjJq1+LTMXNVDPIKWHTJ+CUkZ2P26hN8Y/cEll0iIjJad5OyMOOrGOTkF4uOQlRtLt9Kw2ebznJr4f9i2SUiIqOUmVOI2atP8Fe6ZJKOnb+HNTsviY6hF1h2iYjI6BSr1FjwzSncT80RHYVImB2Hr+Pn36+LjiEcyy4RERmdL7dfwMXrKaJjEAm3+qc/EXPxnugYQrHsEhGRUfn59+vYe/y26BhEekGtARZtPIu/b6WJjiIMyy4RERmNs1ceYPXPnKdI9KTCIhXmrTuJh5mmua0wyy4RERmF+AdZ+GTDaai5OxpRKQ+zCvDpt2dMcvdAuegAxmLSxqOiIzzTR68FiI5ARKQTOXlFmL36BHLyikRHIdJbF6+n4Lu9fyOyayPRUaoVR3aJiMjgLd92HvdSuPIC0bN8f+AqzvydJDpGtWLZJSIig3bg1B38HpsgOgaRQdBogM++O4uU9DzRUaoNyy4RERms+yk5+OrHC6JjEBmUzJxCfLLhNFQqtego1YJll4iIDJJKpcaijaeRV6ASHYXI4Fy+lYZ1u/4SHaNasOwSEZFB2rj3b1y9ky46BpHB2nH4Ok5fNv75uyy7RERkcC5eS8EP/4kTHYPI4C3dGotsI1/FhGWXiIgMSnZuIT777gxMcLlQoiqXlpmPf/90UXQMnWLZJSIig7Jqx0WkZJjmTlBEunDg1F2c+itRdAydYdklIiKDcfFaCg6eiRcdg8joLN92Hrn5xjmdgWWXiIgMQrFKjZXbz4uOQWSUUjPysX73ZdExdIJll4iIDMKPh67hblK26BhERmvPHzfx9+000TGqHMsuERHpvaS0XGzed1V0DCKjptYAy78/j2Ij22yCZZeIiPTeVz9eQGERN48g0rVb9zPxy9GbomNUKZZdIiLSazEX7+PUX8a/8D2Rvti6/4pRrb3LsktERHorv6AYq3YY9xqgRPomK7cIW/ZdER2jyrDsEhGR3vrx8HWkpOeJjkFkcnYdu4mktFzRMaoEyy4REeml7NxC/HT4mugYRCapqFiN9bv/Eh2jSrDsEhGRXvrh4DXk5BeLjkFksn6PTUDc3YeiY7w0eVUebMGCBbh06RKSk5ORl5cHT09P2NnZYdmyZVX5Ms/t77//hrm5Oby8vITmICKiynmYlY+dR2+IjkFk0jQaYM3OS5g/vLXoKC+lSsvu5MmTAQDbt29HXFwcJk2aVJWHr5BarYZUWvZA9b59++Dn51epslvRcYiIqHpsOxCHgkIuNUYk2p/XU3HyUiJCGruKjvLCqrTsPq24uBgzZ87EnTt3UFhYiLFjxyI0NBQDBgxAy5YtceTIEdjb2yMsLAw7d+6EUqnEypUr8fnnnyMpKQkPHjxAYmIiJk+ejDZt2uC3337D2rVrIZVK0aRJE0yaNAlLly7FnTt3EB8fj2+++QbTpk3DvXv3kJ+fj5EjR8Ld3R2bN2+GUqmEo6Mjxo4di507d8La2hoLFy5E/fr14eHhgdWrVyMvLw8TJ05EYmJiqdchIqLqkZKehz0xt0THIKL/2rDnskGXXZ0OYe7atQtOTk5Yv349VqxYgXnz5j16UakULi4u2Lp1K+7evQtLS0ts2bIFt27dwv379yGTyZCWlobVq1fjiy++wOLFi5Gbm4svv/wS33zzDTZu3IiEhAScO3cOAKBSqbBp0yZkZWWhefPm2LhxI7744gssXboUPj4+aNOmDcaPH48mTZqUm/XatWtYvXo1vL29y30dIiLSvc37rqCo2Lh2cCIyZLfuZ+L0ZcNd61qnI7sXL17E8ePHcebMGQBAQUEBCgsLAQB+fn4AACcnJ/j6+mq/zs5+tO95cHAwAMDb2xv379/HrVu3kJCQgCFDhgAAsrKykJCQAADw9/cHANja2uKvv/5CREQEACA9Pb3SWRs0aAAzMzNcu3atzNcJCgp68R8EERFVyv2UHOw/eUd0DCJ6yg8H49C8kYvoGC9Ep2UXAKKiotCtW7dSt8tksjK/1mg0pR4rkUgAAL6+vli7dm2J+5YuXQqFQgEA2LlzJzIyMrBx40akpqaiT58+FWYrLv7fp3zNzMy0X5f1OkREpHs/HIyDSl363wEiEuvP66m4euchGng6iI7y3HQ6jSEgIAAHDhwAAKSmpmLx4sWVfu7p06cBAHFxcXB3d4eXlxdu3LiB1NRUAMCSJUuQlFRySD0jIwOenp6QSCT49ddftaPIEokEKtWjDzrY2tpqR3z/+qv0+nGVeR0iIqp6WbmFOHgmXnQMIirHDwfjREd4ITod2e3atSuOHz+Ofv36obi4GKNGjar0c62trfGPf/wDKSkpmDp1KiwtLTFt2jQMHToUCoUCjRs3Rs2aNUs8p3Pnzvjwww9x4cIF9OzZE7Vq1cLKlSvRvHlzzJs3D9bW1hg4cCDGjBmDhg0bwtXVFWp1yXlhlXkdIiKqenuP30ZhEVdgINJXxy/ex72UbLg72YiO8lwkmrLmDQi2dOlSODg4YODAgaKjVNqkjUdFR3imj14LqPB+Z2dbJCdnVVMa3TGG8zCGcwB4HrrIYaxUag2GztuH5IfcGphIn3VtWRfD36m4T+gbLihLRETCHf/zPosukQE4cOoO0rMKRMd4Ljr/gNqLeJ7pDkREZPh2/s7d0ogMQWGxGruO3cSALg1FR6k0juwSEZFQN+9l4NKNVNExiKiS9p+8DbUBrZrCsktEREJxVJfIsKRk5CM2Lll0jEpj2SUiImGycgtx+CyXGyMyNAdOGc7mLyy7REQkzLHz91DIrYGJDM7xPxORk1ckOkalsOwSEZEwR84liI5ARC+gsEiFI7GG8eeXZZeIiIRIzcjDpRspomMQ0QsylKkMLLtERCTE77H3YEAf6Caip1y5/RB3k8RvuPMsLLtERCTE77H8YBqRoTOE0V2WXSIiqnb3U3Jw9U666BhE9JIMYd4uyy4REVW7IxzVJTIKyQ/zcPNehugYFWLZJSKiasdVGIiMx+nLSaIjVIhll4iIqtXdpCzcSdT/D7UQUeWc+otll4iISEvfR4GI6PlcuZ2GzJxC0THKxbJLRETV6uzfD0RHIKIqpNbo95tYll0iIqo2+YXFuHQzVXQMIqpip/5KFB2hXHLRAYzFwgGtkZzMOWhERBX583oqiorVomMQURU7d+UBVCo1ZDL9G0fVv0RERGS0Yq8mi45ARDqQk1+Mv26miY5RJpZdIiKqNhevpYiOQEQ6cv6afr6ZZdklIqJqkZVbiJv39XvxeSJ6cVduPxQdoUwsu0REVC0uXkuBRiM6BRHpytU7D6FW698fcpZdIiKqFlyFgci45eYX4+4D/fuwPssuERFVi+vxnMJAZOz0cSoDyy4REemcRqPBzXssu0TGjmWXiIhMUmJqLnLzi0XHICId+/u2/i0/xrJLREQ6dyOBo7pEpiA+KQu5+UWiY5TAsktERDp3PSFddAQiqgZqzaNVGfQJtwuuIit+uyA6Qrl6B3mJjkBEJo4ju0Sm49b9TAQ2qCk6hhZHdomISOeus+wSmYx7yTmiI5TAsktERDqVlpmP9KwC0TGIqJrcS8kWHaEEll0iItKpW/czRUcgomqUwJFdIiIyJQ/SckVHIKJqlJqRh4IilegYWiy7RESkU8npeaIjEFE10miAxBT9Gd1l2SUiIp1KfsiRXSJTo0/zdll2iYhIpx485MgukanRpxUZWHaJiEinOI2ByPQkJHNkl4iITIBarUFaBssukalJ0aM3uSy7RESkMw+z8lGs0oiOQUTVLCu3UHQELZZdIiLSmWTO1yUySZm5RaIjaLHsEhGRzqRwCgORScrmyC4REZmCnLxi0RGISIDc/GKoVGrRMQCw7BIRkQ4VFLLsEpmqLD2ZysCyS0REOpPHsktksvTlQ2osu0REpDP5BSrREYhIEJZdIiIyevkc2SUyWVk5LLtERGTkOLJLZLpy8vXjzS7LLhER6QxHdolMl1pt5KsxxMfHw8fHBxcuXChx+zvvvIPJkydX6vm9evUqdXt4eDgiIiLQv39/9OrVC99++22VZX7SyZMn0bJlSxw8eFAnxyciMgX5hRzZJTJVerLyGOS6PHjt2rWxZ88eNGnSBACQkJCA9PT0Sj1Xoyl/e8mvv/4a1tbWyMnJwYQJE2BjY4MePXpUSWYAuHPnDtauXYumTZtW2TGJiExRAcsukclSV9DlqpNOy25gYCBiYmK03+/duxetW7dGfn4+fvnlF6xfvx5SqRT169fH7NmzsX37dhw+fBjJyckYP3689nmHDx/Gt99+iy+//LLE8a2trTFjxgwMGzYMPXr0wIkTJ/DZZ59BLpfD1dUV8+fPR7du3bBr1y5oNBoEBwdj/fr18Pf3x5AhQzBr1iwMGjQIHTt2xOnTp2Fvb4+vvvoKzs7OWLZsGaZNm6bLHw8RkdGTSEQnIJHkckDKCZMmS6PRj6FdnZZduVyORo0aITY2FoGBgTh48CAGDx6MvXv3Ii8vD8uXL4ezszMiIyNx5coVAEBSUhI2bdqEhIQEAMDt27excuVKfP3115DJZKVew93dHQ8fPoRarcaMGTOwdu1auLu7Y/bs2fj555/RuHFjxMXFoaioCP7+/oiNjUXjxo2RkpICDw8P3L17F927d8ekSZPQu3dvXL16FQ0bNtTlj4WIyGTI5Ww6pqxB2zjczLouOgYJInO2B1BPdAzdll0A6NKlC/bs2QMXFxfY2dnBysoKwKNR2bFjx0IqlSIuLk47vcHPzw+S/w4F5OXlYcSIEVi4cCFsbW3LfQ2NRoPMzEzIZDK4u7sDAJo1a4azZ88iJCQEsbGxKCgoQEREBPbv34/g4GD4+voCAGxsbLTl1s3NDZmZmTr7WRARmRqFjGXXlOWpc0RHIIFk0tKDlCLo/G+hli1bIiYmBr/99hs6deoEACgoKMDs2bOxePFibNiwAX5+ftrHKxQK7deJiYlo1qwZvvvuu3KPf+vWLTg7O0MikZSa5yuRSBASEoLz588jNjYWrVu3RnZ2Ns6dO4cWLVoAQKnR4ormChMR0fPhyK5pyyrkAJIpk5tK2TUzM4Ovry+2bduG8PBwAI9GbOVyOWrWrIk7d+7g8uXLKCoqvX+yl5cXZs6ciTt37uDo0aOl7s/Ly8Ps2bPx3nvvwc7ODhqNRjv94fjx4/Dz84OXlxcSExORlZUFGxsbODk5Yf/+/QgNDdXtiRMRERQsuybLTKFBTlGu6BgkkFSiH3/+dT6NAXg0lSEtLU07FcHe3h5t2rTBO++8Ax8fH0RFRWHhwoWIjIws9VyJRIK5c+figw8+wNatWwEAQ4cOhVQqRW5uLnr27ImePXsCAGbNmoXx48dDJpOhbt26eOONNwAASqUS1tbWAICAgACcOnUKrq6u5eY9dOgQVq9ejRs3buDSpUvYsGED1qxZU6U/EyIiU8BpDKbLXglkiQ5BQunLyK5Ew9/bV4kVv1149oME6R3kVanHOTvbIjnZ8P9qMobzMIZzAHgeushhaFb8cB57/rglOgYJUK9BMe7b7xcdgwQa32ooQmuLX8aVb7mJiEhnOLJruixtuHueqZNLq2UCwTPxbyEiItIZztk1XQrLQtERSDA7C/34bRT/FiIiIp2xsTITHYEEkZgViI5Agikt7UVHAMCyS0REOqSsYS46AglSLOVKDKZMIpHA3qKG6BgAWHaJiEiHHGwtREcgQQo03FDClNmb1zCdTSWIiMh0Ke1Ydk1VTnG26AgkkL5MYQBYdomISIeUNVh2TZFEokEmd08zaQ5WLLtERGQCbK3MuCKDCbKtARSrufSYKVNa2ImOoMW/gYiISKccbPkhNVNjZ8/9qkydkiO7RERkKjiVwfRY2apERyDBOGeXiIhMhgPLrskxt+aGEqaOZZeIiEyGq6O16AhUzWTmLLumjmWXiIhMhqeLfmwZStVHLc8THYEEY9klIiKTUceNZdfUFEm4e5opM5ebw8rMUnQMLZZdIiLSqdoutpBIRKeg6pSn5oYSpszD1kV0hBJYdomISKcszORwUVqJjkHVKIsbSpg0b2Vd0RFKYNklIiKdq+NaQ3QEqiZmCg1yijiNwZR5O9YVHaEEuegAxmL4a02QnJwlOgYRkV7ydLXFiUuJomNQNbBXAvzX0LTpW9nlyC4REekcR3ZNh42dWnQEEshKYQkPW1fRMUpg2SUiIp2r48ayayosbYpERyCBXlF6QqJnn0hl2SUiIp2rVdMG5mYy0TGoGigsuKGEKfNWeomOUArLLhER6ZxcJkWjOkrRMag6mBWITkAC6dt8XYBll4iIqonfK46iI1A1UMm4e5opq69ny44BLLtERFRN/F5xEh2BqkGBhhtKmCpHKwfYW9qJjlEKyy4REVWLBp4OMFNw3q6xyylm2TVV+raZxGMsu0REVC0Ucika1nEQHYN0LJO7p5ms+no4Xxdg2SUiomrkV4/zdo2ZbQ0NitXFomOQIBzZJSIik8d5u8bNzkEjOgIJYi4319uyy+2Cq0irN/qKjlDKT+v+LToCEVEJPnUcoJBLUVTMXbaMkbWtCqmiQ5AQQW6NYSY3Ex2jTBzZJSKiamOmkKGxF6cyGCtza+6eZqpCawWJjlAull0iIqpWLfxcRUcgHZFxQwmTpJAp0NTNT3SMcrHsEhFRtQr1cxMdgXRErcgXHYEECHD1hYXCQnSMcrHsEhFRtXKyt4R3Lf1beJ5eXrEkR3QEEkCfpzAALLtERCRAqD9Hd41RnpobSpgamVSGZu7+omNUiGWXiIiqXesAD9ERSAeyCrNER6Bq5l/TB9ZmVqJjVIhll4iIqp2Hsw3quXMqgzFRKDTILuI0BlPTQs+nMAAsu0REJEjrQHfREagK2XMnaJMjlUgRXCtQdIxnYtklIiIh2gRyKoMxsbVTiY5A1ayRszdqmNuIjvFMLLtERCSEq6M1GtfjBhPGwtKWZdfUGMIUBoBll4iIBOrasq7oCFRFFBaFoiNQNZJIJAgxgCkMAMsuEREJ1KqJO+xszETHoCogMeOGEqYk0LUxlJb2omNUCssuEREJo5BL0THYU3QMqgIqWZ7oCFSNutZvJzpCpbHsEhGRUF1a1oVEIjoFvawCDZcdMxVutjUR4OorOkalsewSEZFQro7WCKzvLDoGvaQcFTeUMBWdvdtCYkDvUFl2iYhIuK6t6oqOQC8psyBTdASqBhZyc7Tzaik6xnNh2SUiIuFCfF3haGchOga9INsaGhSpi0XHoGoQVqcFrBSWomM8F5ZdIiISTiaTonNoXdEx6AXVsNeIjkDVQCKR4PUG7UXHeG46K7vx8fHw8fHBhQsXStz+zjvvYPLkyZV6fq9evUrdHh4ejoiICPTv3x+9evXCt99+W2WZn5aSkoLg4GCcOHFCZ69BRESPvNXaC1YWctEx6AVY1+CGEqYg2CMA7jVcRcd4bjod2a1duzb27Nmj/T4hIQHp6emVeq5GU/67xK+//hqbNm3Chg0bcPToUezYseOls5blk08+Qe3atXVybCIiKsnGygxvtq4nOga9AAurItERqBr0aNhZdIQXotOyGxgYiJiYGO33e/fuRevWrQEAv/zyC/r06YN+/fph+vTpAIDt27djzJgxiIiIQFJSkvZ5hw8fxtChQ6FSlXznaG1tjRkzZmD16tUAgBMnTqBv374YMGAAJkyYgMLCQnTp0gUqlQrFxcUICgrCxYsXAQBDhgxBQkICOnXqhIULF6J3794YOnQo1Go1ACAmJgY2NjZo0KCB7n5ARERUQo+2r8DSXCY6Bj0nmUWB6AikY41rNoC3Y13RMV6ITsuuXC5Ho0aNEBsbCwA4ePAg2rZtCwDIy8vD8uXLsXnzZty6dQtXrlwBACQlJWHjxo1wdX00TH779m2sXLkSn332GWSy0n8Buru74+HDh1Cr1ZgxYwYWL16MjRs3wt7eHj///DMaN26MuLg4XL58Gf7+/oiNjYVarUZKSgo8PDxw9+5ddO/eHd9//z3S09Nx9epVFBYWYsWKFRg3bpwufzxERPQUWyszvPEqR3cNjUbO3dOMXXcDHdUFAJ1PjurSpQv27NkDFxcX2NnZwcrKCsCjUdmxY8dCKpUiLi5OO73Bz89Pu3ZbXl4eRowYgYULF8LW1rbc19BoNMjMzIRMJoO7uzsAoFmzZjh79ixCQkIQGxuLgoICREREYP/+/QgODoav76PFkG1sbNCwYUMAgJubGzIzM7Fq1Sr079+/wtckIiLd6NH2Ffxy9AbyCzkP1FAUSXJFRyAd8rKvjUA3w9lE4mk6X42hZcuWiImJwW+//YZOnToBAAoKCjB79mwsXrwYGzZsgJ+fn/bxCoVC+3ViYiKaNWuG7777rtzj37p1C87OzpBIJKXm+UokEoSEhOD8+fOIjY1F69atkZ2djXPnzqFFixYAUGq0WKPR4OjRo1i3bh369OmDQ4cOITo6GnFxcS/9syAiomezszFH11ZeomPQc8hTZ4uOQDrU2+9N0RFeis7LrpmZGXx9fbFt2zaEh4cDeDRiK5fLUbNmTdy5cweXL19GUVHpye1eXl6YOXMm7ty5g6NHj5a6Py8vD7Nnz8Z7770HOzs7aDQaJCQkAACOHz8OPz8/eHl5ITExEVlZWbCxsYGTkxP279+P0NDQcjNv3rwZW7duxdatW9GuXTt8/PHHqF+/fhX9RIiI6Fl6tfOGuRnn7hqKrCJuKGGsAlwboblHE9ExXkq1rPHSpUsXpKWlaacF2Nvbo02bNnjnnXfg4+ODqKgoLFy4EJGRkaWeK5FIMHfuXHzwwQfYunUrAGDo0KGQSqXIzc1Fz5490bNnTwDArFmzMH78eMhkMtStWxdvvPEGAECpVMLa2hoAEBAQgFOnTmnnBBMRkf6xtzVH15Z1sePwddFR6BkUCg2yC3NExyAdkEmkeC+wt+gYL02iqWiNL6q0Vm/0FR2hlJ/W/fu5Hu/sbIvkZMPf29wYzsMYzgHgeegihynJzi1E1PwDyMotFB2FKuBcU4PsuntFxyAd6OLdDoOb6V+/eV7cQY2IiPSSjZUZIrs2FB2DnsHGjh8kNEa2Ztbo42/Yc3UfY9klIiK91Tm0Luq524mOQRWwtCkWHYF0oLffm7AxsxYdo0qw7BIRkd6SSiWI6ukvOgZVwMySu6cZm9p27njtlTDRMaoMyy4REem1xvUc0TaolugYVA6JGTeUMDaDgnpDKjWeimg8Z0JEREbrH2/5chthPaWS5YmOQFWouUcA/F2Ma648yy4REek9RztL9O7QQHQMKkOBhsuOGQuFVI53A98WHaPKsewSEZFB6NHWG+5OxvGBGWOSo+Luacbi9QbhcLVxFh2jyrHsEhGRQVDIpRjbrymkEtFJ6EmZBdw9zRg4WNjhbd+uomPoBMsuEREZjEZeSvRs5y06Bv2XjS1QpOZqDIZOAglGtHgPFgoL0VF0gmWXiIgMyoAujVDXrYboGATAzkEtOgJVgbcadkQT10aiY+gMyy4RERkUhVyK8RFNIZfxnzDRrGtw9zRD94qyDvr5dxcdQ6f4NwURERkcL3c7RHT2ER3D5JlbcQqDIbOUW2BMyyGQS417WT+WXSIiMkhvt6+PRnWVomOYNLl5oegI9BKBgk2QAAAgAElEQVTeb9bfKFdfeBrLLhERGSSpVIJx/ZvCwsy4R6X0mUbBDSUMVVidFmhTN0R0jGrBsktERAbLzckaH/RqIjqGySqS5IqOQC/AzaYm3m/WT3SMasOyS0REBq1DsCfeeNVLdAyTlKfmhhKGRiaVYUzLwUa7zFhZWHaJiMjgvd/dD43rOYqOYXKyi7JER6Dn1N+/O+op64iOUa1YdomIyODJZVJMerc5nOxMZ7RKNLkcyC7MER2DnkOAqy/e8ukoOka1k4sOYCz+2LUFycl8h0tEJIqDrQWmDArB5OVHUVTMzQ50zV6pQQ40omNQJTlY2mFEi/cgkZjeftsc2SUiIqPRwNMBH/IDa9XC1o5vKAyFtcIS08JGwd7CNHceZNklIiKj0qlFHXRtVVd0DKNnacMNJQyBQirHxNYfwtPeQ3QUYVh2iYjI6ET18If/K06iYxg1hSXLrr6TSCQYFfoP+NasLzqKUCy7RERkdOQyKf45OASv1LITHcVoSc0KREegZxgc1BehtZuKjiEcyy4RERklKwsFooe2hIeztegoRkkl5e5p+qyXbxd0rt9WdAy9wLJLRERGy87GHLOiWnFJMh0oAJcd01fhXq3Qz7+76Bh6g2WXiIiMWk2lFWYNawVbKzPRUYxKrorLbeqjZu7+iGo+QHQMvcKyS0RERq+2iy1mDg2FpblMdBSjkVGQKToCPaWBYz2Ma/k+pFLWuyfxp0FERCahgacDpg4KgVzGf/pelo2tBkVqrsagTzxsXTGpzYcwk/M3GE/jn3giIjIZgQ1qYsp7wVDI+c/fy7Bz4M5p+sTZSolpbUfB1txGdBS9xO2Cq0i7wdNERwAAfL9wsugIRER6LaSxKz5+PxRz155AXoFKdByDZG2rQproEATg0YjuP9uNhqOVg+goeotvbYmIyOQE1HfG7GGtYGOpEB3FIJlbcwqDPqjn4InoDhNYdJ+BZZeIiEySTx0lFoxoDWUNc9FRDI7MnBtKiObrXB8z2o9FDU5deCaWXSIiMll13GpgwYg2qKm0Eh3FsChYdkVq6u6PqW1HwUphKTqKQWDZJSIik+bmZI1PRrZGbReOkFVWkSRXdAST1c6rJSa+OgxmMk7BqSyWXSIiMnmOdpZYMKINfDw597Ey8tXZoiOYpL5+b2F4yLuQSble9PNg2SUiIgJQw9oM84a/inZNa4mOoveyirihRHWSS+UY1eIfeLvx66KjGCSWXSIiov8yU8gwYUAzvPt6I0gkotPoJ7kcyC7MER3DZFgrLDGt7Si0qRsiOorB4jq7RERET+ndoQFqu9jis+/OIq+gWHQcvWLvoEEOuKlEdXCzrYmJrT9ArRpuoqMYNJZdIiKiMoT6ueFfY8Iwb91JxD/gHNXHbO3U4Liu7rWr2xKDm/WFhfzFlsaLj4/HW2+9BT8/P+1tDRs2xLRpZW+CNXfuXLz77rvYsWMHHBwcMHDgQO197733HtRqNW7cuAGlUgl7e3u0aNECI0eOrPTrFxYWon79+oiOjoZMVvac41WrViE4OBhBQUGVPs9Tp06hXr16cHR0LPcxLLtERETlqO1ii3+NCcPnm88h5uJ90XH0gqUtR7p1yVJhgaHNItC6TvBLH8vLywsbNmyo1GPLK8EA8M033wAAJk+ejM6dO6N9+/Yv9PqTJk3Czp070aNHjzIfHxUVVanjPumHH37A4MGDWXaJiIhelJWFAlPeC8aPh65jw57LKFapRUcSSmFZCLDv6kR9ZV2MaTkENW2cdPYaxcXFmDJlCu7du4f8/HyMHDkS7du3R2RkJKZPn17p4yQmJmLq1KkoLCyETCbD3LlzUatWxR/uDAgIwO3btwEAGzduxK5du6BWq9GlSxcMGjRIW6bbtGmDmTNn4s6dOygsLMTYsWMRGhqKmJgYfPHFF1CpVHjjjTdQv3597N+/H3FxcVi6dCnc3d3LfF1+QI2IiOgZJBIJerX3xmdjw1DH1VZ0HKGkZvmiIxgdiUSCHo06Y1aHj3RadAEgIyMDzZs3x8aNG/HFF19g6dKlL3ScL774Ar169cK3336LiIgILFmypMLHFxUV4eDBg/Dz80NCQgJ+++03bNy4EZs2bcKvv/6KxMRE7WN37doFJycnrF+/HitWrMC8efMAALNmzcKKFSuwefNmxMTEoGnTpmjUqBHmz59fbtEFOLJLRERUaV7udlg8ri3W776Mn45ch8YEP6elkrHsViUHCzuMDB0Ef5eGVX7smzdvIjIyUvt9q1atMGTIEPz111+IiIgAAKSnp7/Qsf/880989NFHAIDmzZtj2bJlFb7+1atXERUVhQ4dOmDfvn24efMm3n33XQBATk4O4uPjtc+7ePEijh8/jjNnzgAACgoKkJqaCplMBqVSCQD46quvKp2VZZeIiOg5KOQyDOnmh2BfF3y++RySH+aJjlStCjX8eFpVaermh+Et3kMNc93s3lfWnN0ffvgBGRkZ2LhxI1JTU9GnT58XOrZEIoHmv+/2NBoNpNLSkwWefP3Ro0fD09NTe19YWBjmzJlT4vHbtm3Tfh0VFYVu3bppv09PT4da/WJTiDiNgYiI6AU08XbG0gntTW4TihwVV6Z4WQqpHIOCemNy2AidFd3yZGRkwNPTExKJBL/++isKCwtf6Dj+/v6IiYkBABw/frzEqg9lmThxIhYtWoS8vDw0btwYJ06cQF5eHjQaDebMmYP8/P/9xiAgIAAHDhwAAKSmpmLx4sWwt7eHSqVCUlISNBoNhg0bhszMTEgkEqhUqgpfm2WXiIjoBVlbKjBhQDP8X2Rz2Nu82BJRhiazMEN0BIPmrayLuR3/D683CBfy+p07d8ahQ4cwaNAg2NraolatWli5cuVzH2fUqFH48ccfMXDgQPz4448YPXp0hY+vXbs2OnfujJUrV8Ld3R2DBg3CwIED0bt3bzg5OcHCwkL72K5du8LGxgb9+vXDsGHD0LRpUwDAzJkzMWrUKPTp0wchISGoUaMGQkJCMHbsWMTFxZX72hKNxhRnHFW9doPLX7KjOn2/cPILP9fZ2RbJyVlVmEYMYzgPYzgHgOehixykv7LzirDx18vY/cctqNXG+U+rtbUG6sZ7RccwSA4Wdoho0gNhdVtAwu35yjRhwgR0794dYWFhVXpcjuwSERFVARtLBYb1bILPx7VF43rlr/lpyOyVxlnidUkhlaNHo8744vWZaOsVyqJbju+++w7nz5+Hr69vlR/baD+g9uTOHfn5+TAzM8PIkSPRsmXL5zrOr7/+ii5duugoJRERGRsvdzssGNEah87GY+3OP5GWWSA6UpWxrqFCmugQBiSkViDeDXhb58uJGYOIiAjtChFVzWjLLlDyU4B37tzBBx98gJUrV6JOnTqVPsaqVatYdomI6Lm1a1oLIb4u2LzvKnb+fh3FKsMfFTW3LhIdwSB42nlgUFBv+Ln4iI5CMPKy+yRPT09ERUVh/fr1qFevXqldOyZOnAhra2vcunULaWlpWLBgAf744w9cuXIFI0eOLHP9OCIioopYWSgw+K3G6BTiiW9/vYyYi/cNem1emXkhYDwD1VXO1twGff3eRMd6bcpciovEMJmyCwANGzbEDz/8gGvXrmHjxo0AgP79+6NLly6QyWSQSqVYt24dDh8+jJUrV2Lp0qX4+uuvWXSJiOil1HaxxZT3QnDzXgY2/XYFx/800NKryGfZLYNMIsVr3m3R2+8N2JhZi45DTzGpsiuVSnHy5Em4uLiUuWtHcHAwgEdrx3366afCchIRkXHycrfD1EGGW3qLJLmiI+gVc5kZ2tYNxZs+HeBqW1N0HCqHSZXdc+fOwcPDA61atapw1w6NRsNPSxIRkc4YaunNV3NDCeDRMmKd67fFa6+EwcacI7n6zmQmlMTHx2P16tVYuHBhubt2nD59GsCjPZnr1asHAOAyxEREpCuPS+/n49ohLMgDcpl+/7OcXSR+vWmR6trXwoiQ97D8zTno5duVRddAGPXI7s2bNxEZGQm1Wg2VSoXo6GgEBwdrd+2QSCTo2LGjdteOvLw8vPfee8jIyMDChQsBAI0aNULfvn2xZcsWkadCRERGrJ6HHSYObI70rAL8duI2fj1+C8kP80THKkEu0yCr0PRGdiWQIMitMd706QA/l4ai49ALMNqyW6tWLZw7d67M+wYMGIABAwaUur1Tp05o3759idu++eYbneQjIiJ6mr2tOfp0bIB3wuvj1F+J2P3HLZy7+kAvpjjYKYFc6EGQamImUyCsbijebBAO9xquouPQSzDasktERGSopFIJWvi5oYWfG+6n5GD3Hzdx4NQdZOWKW+e2hp0apvDxNDebmmjrFYpOr7SBrbmN6DhUBVh2/2vBggWiIxAREZXi5mSNId388O7rvjh39QGOxibg5KVE5OQXV2sOS5vqfb3q5GSlRCvPZmhVuznqKT1Fx6EqxrJLRERkABRyKUJ8XRHi64qiYjXOXX2AY+fv4cSf96ul+JpZFgIqnb9MtXGwtEOLWkF41bM5GjjW4ypMRoxll4iIyMA8XXxjrz7A0fP3cPJSIrLzdDPVQWJeAEOfx1DHzgPNPJqguXsTvKKsw4JrIlh2iYiIDJhCLkWwryuCfV2hVmtwIyEDF66l4OL1FFy6kYq8gqoZ9VXJ9Gt1iMqwVFiggaMXgtz80NwjADWtHUVHIgFYdomIiIyEVCqBd217eNe2R6/23lCp1LgWn/6o/F5LwV+30lBQ+GJzEQo1OVWctmrJpDLUsfOAt7IuvB0f/edh68rRW2LZJSIiMlYymRQ+dZTwqaNE7w4NUKxS40ZCBm7ey8Cte5m4lZiJW/cyKzX1IVelX2vsutg4w1tZB/UdveCtrIu6DrVhJlOIjkV6iGWXiIjIRMhlUjTwdEADT4cSt6ek5+HW/cxHJfh+Ju4mZeHBwzzkPFGCMwozqzsurM2soLS0h9LSHg6Wdqhp7YRXlJ7wVtblsmBUaSy7REREJs7J3hJO9pZo3silxO25+UVIfpiHlPRcpJs5IqMgC5n5WcgoyELGf/83syAbKvXzT42wlJs/KrFW9lBa2EFpZa8tto/KrT3M5WZVdYpkwlh2iYiIqExWFgrUcVOgjlsNANxFjAyTVHQAIiIiIiJdYdklIiIiIqPFsktERERERotll4iIiIiMFssuERERERktll0iIiIiMlosu0RERERktFh2iYiIiMhocVOJKnJozVwkJ2eJjkFERERET+DILhEREREZLZZdIiIiIjJaLLtEREREZLRYdomIiIjIaEk0Go1GdAgiIiIiIl3gyC4RERERGS2WXSIiIiIyWiy7RERERGS0WHaJiIiIyGix7BIRERGR0WLZJSIiIiKjxbJLREREREZLLjqAofn8889x/PhxFBYWIjo6Gv7+/tr7zp07h4ULF6KgoACdOnXC8OHDBSatWEXnER4eDldXV8hkMgDAokWL4OLiIipqha5evYrhw4dj0KBBGDhwYIn7DOV6VHQOhnQtPvvsM5w4cQJFRUUYOnQounbtqr3PUK5FRedgSNeCiIieoKFKi4mJ0QwZMkSj0Wg0V65c0URERJS4/7XXXtPcu3dPo1KpNO+8847m9u3bImI+07POo3379prs7GwR0Z5LTk6OZuDAgZp//vOfmg0bNpS63xCux7POwVCuxcmTJzXvv/++RqPRaB4+fKhp06ZNifsN4Vo86xwM5VoQEVFJnMbwHE6cOIEOHToAABo0aIAHDx4gLy8PAHD37l3Y2dnBzc0NUqkU7dq1w9GjR0XGLVdF52FIzMzM8PXXX6NmzZql7jOU61HRORiSoKAgfP755wAAW1tbFBUVQa1WAzCca1HRORARkeFi2X0OycnJUCqV2u+VSiVSUlIAAA8ePChxn6Ojo/Y+fVPReTw2ffp09O/fH59++ik0erqjtFwuh4WFRZn3Gcr1qOgcHjOUa2FtbQ0A2LZtG9q2bQup9NFfL4Z0Lco7h8cM4VoQEVFJnLP7HBQKRYnvNRoNJBLJM+/TN8/KOnr0aLRq1QqOjo4YNWoU9uzZg9dff726Y74UQ7oeFTG0a7F//35s3boVa9eu1d5maNeirHMADO9aEBHRIyy7z8HZ2Rmpqana79PS0uDk5AQAqFmzZon7UlJS9PZX0xWdBwD06NFD+3Xr1q1x7dq1as1XFQzpelTEkK7F77//jhUrVmD16tWoUaOG9nZDuhblnQNgWNeCiIj+h9MYnkNYWBgOHDgAALh06RJq166t/RW0q6sriouLce/ePahUKhw8eBBhYWEi45arovPIzs7GwIEDtXN4z5w5g/r16wvL+qIM6XqUx5CuRVZWFhYsWIBVq1bBwcGhxH2Gci0qOgdDuhZERFQSR3afg5+fHxo2bIiePXtCJpNh7ty52L59O2xtbdGpUydMnToVw4cPh0QiQbdu3eDm5iY6cpmedR5du3ZFREQELCws4Ovriy5duoiOXKY///wTCxcuREJCAuRyOfbu3Yvw8HDUqlXLYK7Hs87BUK7F7t27kZGRgXHjxmlva9GiBXx8fAzmWjzrHAzlWhARUUkSDT9lQURERERGitMYiIiIiMhosewSERERkdFi2SUiIiIio8WyS0RERERGi2WXiIiIiIwWyy4RERERGS2WXSIiIiIyWiy7RERERGS0WHaJiIiIyGix7BIRERGR0WLZJSIiIiKjxbJLREREREaLZZeIiIiIjBbLLhEREREZLZZdIiIiIjJaLLtEREREZLRYdomIiIjIaLHsEhEREZHRYtklIiIiIqPFsktERERERotll4iIiIiMFssuEQlx4sQJ+Pj4aP8LDg7GsGHDcO7cuUo9PyEhAevWrXvpHD/++CMuX778zMdNnjwZPj4+SE5OLnXf43NZtWoVACAyMhL+/v5l3ldVuZ+k0Wgwfvx4BAYGolevXqXuf/DgASZMmIBWrVrB398fHTt2xLJly6DRaF7qdZ8+lyfP+3lkZWVh6dKl2u8r+lmXJzIyssT/n578b/v27eU+Lzw8HF26dCnzvqVLl8LHxwexsbGVPxki0jssu0QkVEREBDZt2oRp06bh5s2biIyMxOnTp5/5vO3bt2P9+vUv9dq5ubmIjo6uVNmtSOPGjbFlyxZ07979mfdVRe6nJSUlYdeuXejXr1+ZxW7cuHH4/fffMXnyZHz99dfw9fXF0qVL8eOPP77U61bVuezbtw/Lli3Tfj98+HBs2bIF9vb2lT7Gxx9/jC1btmjfVLRt2xZbtmzBli1b0K5du5fOSESGi2WXiIRyc3ND06ZN0aNHD6xduxYSiQSff/45AEClUmH+/Plo06YNgoKCEB0dDZVKhaVLl2LZsmVISEiAj48P4uPjkZKSghEjRiA4OBitW7fGli1btK9x7do1REZGIigoCF27dsXevXsBAEFBQcjLy8OUKVMwefJkAMDq1asRFhaGoKAgDBs2DGlpaSXyHjx4EG3btkVwcLC2LF66dAl9+/bFTz/9VOr8nrzv6dwrV66Ej48PduzYoX18eHg4unXrVuo4OTk5mDFjBpo3b46AgAAMHz4cDx48APCo2AHA2rVrER4eXuq5Fy5cgJeXF7p164bQ0FDMnTsXy5cvR8uWLfHzzz+XmyE+Ph4+Pj749NNPMXLkSAQGBmLw4MHakdinr8Fje/bsQatWrdC+fXvExMRob1+1ahXCw8MREBCAsWPHIjc3F9u3b8eUKVMAAD4+Pjhx4gRWrFiBvn37Ij09HQBw5swZvP322wgKCkKvXr1w8uTJUufo7e2NwMBA+Pr6AgCUSiUCAwMRGBgIa2trTJo0CSEhIWjRogXmzJlTalR72bJlaNasGcLDw8t983P+/Hn07dsXgYGBeOONN7S/hSgqKsLHH3+Mli1ban9G9+/fL/MYRFT9WHaJSG94eHggJCQEZ8+eRVFREdasWYN169ahb9++iI6Oxvfff48ffvgBvXv3hq+vL5ydnbFlyxbUrFkTU6ZMwbFjxxAdHY2ePXvi448/xo0bN1BUVIQPPvgADx48wJIlS+Dl5YUJEybg1q1bmDFjBgDgww8/xPDhw3H69Gl88sknaNeuHWbNmoXDhw9jxYoVJTIeOnQIs2fPhqOjI2bOnIns7OxKn9/TuQcOHAhLS0scOnQIAHDz5k0kJCSU+Wv1OXPmYNu2bRg3bhzmz5+PEydOYNy4cQCgHRV95513SoyQPtawYUPExsYiMjIS69atw71799CxY0e4ubmhY8eOz8ywbds2hIeHo1+/fjh27Fi51wB49Abl9OnTmDlzJjIyMjBv3jwAwO7du/Gvf/0LYWFh+Ne//oU//vgDX331Fdq1a6ct61u2bEHjxo1LZE9PT8ewYcOgUCiwbNkymJmZYfjw4cjKyqr0z3316tXYsWMHxo8fj7fffhsbNmzA/v37tfffu3cPubm5mDVrFlJTUxEdHV3qGLm5ufjggw+QlZWFJUuWwMPDA+PGjYNarcZPP/2EzZs3Y9KkSVi+fDkSEhK0b9iISDy56ABERE+ys7ODSqVCTk4O9u3bhxo1auCDDz4AAHz//ff45Zdf0KdPH9jY2CAjIwOBgYHIzc3F0aNH0a5dO7z22mvo2LEj1qxZg927dyM0NBR3797F5MmT0aZNG/j6+uLChQuwsLCAt7c3AMDT0xOenp6wsbHBjh07ULt2bVhYWGD+/Pm4du1aiXxRUVEIDAzE9evXsWDBAsTGxkKhUFTq3FxdXUvkBh6Nov7+++8oLi7GsWPHAKBU2VWpVNi5cydatmyJAQMGAHg0F3jz5s1ITExEo0aNtMd/PLL5pEWLFiE6OhoxMTHaUVEfHx8sXboUderUeWaGx3OBs7KysHbtWly/fr3Mc3mcdcSIEVAqlfjpp59w+PBhAI+mKgDAmDFjYGtri/DwcPzyyy8YN24clEql9nWedvjwYWRlZWHw4MF49dVXUadOHcTFxUGlUlXqZw4A3bp1Q/v27dGgQQNcv34dq1evRlxcHDp16gQAkMlkGD9+PORyOXbu3ImDBw+isLCwxDFOnz6NtLQ0REVFoVWrVlAoFBg0aBDOnDmjHSW+cuUKvLy8sGvXLsjl/OeVSF/wTyMR6ZXExEQoFAptkcrMzCwx2ufh4VHqOdnZ2VCr1fjPf/5T4rEJCQnaX/U/LlSOjo5o3749AOD27duljhMdHY0///wTRUVFAFCqVD0ewXRwcADwaOTR2dn5hc/3zTffxK5du3DmzBkcPXoUDRo0QL169Uo85uHDhygqKoKLi4v2tsev+eDBA+25ladOnTpYs2YN0tPTcfLkSWzbtg2HDx/Gp59+imXLlpWb4fHUhMeva2trCwDan01ZzMzMtHlsbW21j83MzAQAhIaGah8rk8mgVqsrzJ6UlATgf9evVq1aqFWrVoXPedr169cxb9483L59W1tMn7yuSqVSW04fv87jKRSPPc6/YMECLFiwQHt7fHw8unfvjrNnz2LTpk1Ys2YNHB0dMWPGjHI/+EZE1Ytll4j0xt27d3H+/HmEhoZCLpfDxcUF6enp+Pe//619TFkjZo6OjpDL5QgNDcXYsWO1t9vZ2SExMREAtJ/sT05Oxt69e9GiRYtSx1myZAnOnTuHTz75BPXq1cM//vGPUo9JTk6Gu7s7UlNTAfyv9L6oNm3awN7eHvv27cOJEyfw/vvvl3qMUqmEmZmZtvgB0M4JdXFxqbB8JiUlYc+ePQgKCkJAQABee+01dOrUCc2bN9ceozIZXtbjNwmbNm2q9Eg48L9S//j63bx5E8eOHUN4eDjc3d0rdYzp06dDo9FoC//j6R+PpaWlobi4GHK5HKmpqZBIJKU+HPe48A8bNkw7IgwA7u7uMDMzw6xZsxAdHY3z589j3rx5mDNnDssukZ7gnF0iEur+/fuIjY3F7t278f7770MikWDMmDEAgM6dOyM9PR0xMTFITk7GnDlztHMtzc3NkZycjN9++w25ubno0KEDzp8/j7t37+LSpUuYPn06rl27hsDAQLi5uWHz5s04cuQI5s6di/nz50MikcDCwgLAo1+VX758Gfn5+QAeFer//Oc/kMlkSEhIwI0bN7R5V65ciSNHjmDLli2wsrIq81fvFXkyd1ZWFhQKBTp37oytW7ciNze3zIIklUrx5ptv4vjx49i6dSt27tyJ3bt349VXXy0x2lsWmUyGL774AhMmTMDu3btx6tQpfPnll8jOzkbLli0BoFIZKnMuFencuTMA4Ndff0V6ejo+++wzfP/995BKpTA3NwfwaG7w08uNtW3bFlZWVlizZg2OHTuGGTNmYPHixdprVxn5+fmQyWRQqVQ4cOAALCws8PfffyMlJQXAo5HqxYsXY+fOnfjjjz8QHBwMMzOzEsdo0qQJ3NzccPjwYaSnp2P37t2YNWsWCgoKsGjRIrRp0wZHjx6FRqOBlZUVLC0tK52PiHSLZZeIhPruu+/Qt29fTJkyBTVr1sT69evRpEkTAEDfvn0xbNgwbNiwARMnToSDgwP69+8PAOjVqxcUCgWmTZuGpKQkzJw5E6+++io+/vhjLFmyBCEhIWjbti3MzMzw5ZdfwsnJCWPGjMHly5exaNEieHt7o1GjRmjatCkOHDiADRs2YOjQofD09MQ///lPFBYWYsqUKUhJScHmzZu1v25v3749Jk2ahMzMTMyfPx/W1tbPdb5P5waAt956CwUFBahfvz5eeeWVMp83bdo09OzZEwsXLsSMGTMQFhaGRYsWPfP1nJyc8M0336BOnTqIjo7GoEGDsGXLFnz44YcYPXq09nGVyVCZcylPu3btMGXKFOzfvx+jR49GcXGxduT8zTffhFKpxNy5c3H16tUSz1MqlVixYgWKioowYsQIZGVlYfny5c+cuvGkjz76SLvqRocOHdC9e3ccOXIEp0+fhlqthre3N9RqNaZPnw4PDw/MnDmz1DHMzc2xYsUKWFpaYtSoUdi3b4V9uRYAACAASURBVB/69esHd3d3DB48GM2bN8fEiRMRFRUFlUpVqWtDRNVDonnZVcWJiOilXL9+Ha+//jpGjx6NESNGmGwGIiJd4JxdIiKBdu/ejc2bN8PCwgJ9+vQx2QxERLrCkV0iIoECAgJgY2ODqVOn4o033jDZDEREusKyS0RERERGix9QIyIiIiKjxTm7VSQ5ufJbVxIRERFRac7OtlV+TI7sEhEREZHRYtklIiIiIqPFsktERERERotll4iIiIiMFssuERERERktll0iIiIiMlosu0RERERktFh2iYiIiMhosewSERERkdFi2SUiIiIio8WyS0RERERGi2WXiIiIiIwWy24VUTbzg23UIO33Zr/8DGUzP5jv+EF7m+3woVA28/v/9u48PIoyXePw090JBsxCAnFkVbaQow2ijQseDURE0CNoUBxEQQQckG1UZJNNGCKBiLIPjiLMNGBEAoyAiMoOEpbDHrawDQEEQkIkG2Sr80eOFSOKgIROKr/7urzk+/qrqvdNV7ofikpHys6WJNmSkxXkcsp34FvmGh/3LAW5nPJes8qc83/xOQU2bWKOHQkHFeRyqsLYSHOu/MTxCnI55di9y5yr2LKZAiL+xxx7bYpTkMup8h//3Zy7dcQQBbmcsp86WdhL44by79qpsJdlSwt6iZ1X2EufHgW9ZGUV9JJ6vqCXt98o7GWuu6CXVSsKe3mpnQIffaCwlyOHCnoZM6qwl8kTFORyymvn9sJenmyugDatCnvZurmgl4+mFvYyclhBL4nHzbnABxvJv/NLhb18s6ygl3mfmXO+b/Qq6CU9vaCXtAsFvfTra665JWaOglxOlVvxTWEvndor8GGXObYfO6ogl1O3jn63sJdpkwt62ba1sJenn1DFp58o7GXb1oJepk0u7GX0uwW9HDta2MvDLvl3al/Yy4pvCnqJmVPYS7++CnI5ZUu7UDCRnl7Qyxu9CnuZ91lBL98sK+yl80sKfLBRYS+Jxwt6GTmssJePphb0snWzORfQppUqPtm8sJed2wt6mTzBnKswZlTBuXnkUGEvjz4g/5famWPvVSsU5HLKZ667sJe33yjoJfV8wURWVsH3WZ8ehb3EzivoZdnSwl66dlJQ44aFvZw6WdDLiCGFvXz894JeNsUV9hLxP6rYspk5duzeVdDLxPGFvYyNLOgl4WBhL02byP/F5wp7WbOqoBf3rMJeBr5V0EtycsFEdnZBLz1fK+xlUWxBL0u+NOf8/tK54Nz8qZfTPxT0MmyQOecz4x8F32cbNxT28lwbVXw8rLCXvfEF32cfjCvs5f2ogl4O7C/sJfy/FfDCs4W9rF9b0MusGebcre/0L+jl7NmCiby8gl56dDHXlFu8qKCXxYsKe+nRpaCXvDxJku3s2YJe3ulf2MusGQW9rF9b2MsLzyow/L8Lezmwv6CX96MKe/lgXEEve+PNuYqPhynguTaFvWzcUNDLjH8U9jJsUMH32ekfzDley3kt57W8ZLyW30iEXQAAAFiWzTAMw9NFWEFSUpqnSwAAACjVgoP9bvg+ubILAAAAyyLsAgAAwLIIuwAAALAswi4AAAAsi7ALAAAAyyLsAgAAwLIIuwAAALAswi4AAAAsi7ALAAAAyyLsAgAAwLIIuwAAALAswi4AAAAsi7ALAAAAyyLsAgAAwLIIuwAAALAswi4AAAAsi7ALAAAAyyLsAgAAwLIIuwAAALAswi4AAAAsi7ALAAAAyyLsAgAAwLIIuwAAALAswi4AAAAsy8vTBXjKhAkTFBcXp+zsbI0cOVINGjQwH5s9e7a+/PJL2e12OZ1ODRkyRDabzYPVAgAA4HqUySu7cXFx2rNnj2JiYhQVFaWoqCjzsfT0dH3yySeaO3euYmJidPjwYe3YscOD1QIAAOB6lckru5s2bVLz5s0lSSEhITp79qyysrJUvnx5eXt7y9vbW+np6fL19VVWVpYqVqxYbLUcH92n2PYNz6s5dLKnSwAAoEwrk1d2k5KSFBQUZI6DgoJ07tw5SdItt9yiXr16qWXLlmrRooUaNWqkWrVqeapUAAAA/AFl8squt7d3kbFhGOY9uenp6froo4+0bNky+fr66tVXX9XevXt11113XXGfgYEV5OXluOZajl/zFihNgoP9PF0CAABlWpkMu8HBwUpOTjbHKSkpqly5siTp8OHDuuOOO8wrv/fdd5/i4+N/N+yeP59ZfAWj1EpKSvN0CQAAlBrFcZGoTN7GEBYWphUrVkiS4uPjVaNGDfn4+EiSqlatqiNHjig7O1uStG/fPt15552eKhUAAAB/QJm8sut0OhUaGqqIiAg5HA5FRkZqwYIF8vPzU4sWLdS5c2d16NBBXl5euvfee3X//fd7umQAAABcB5thGIani7CC6/3naj6Nwdr4NAYAAK4etzEAAAAA14CwCwAAAMsi7AIAAMCyCLsAAACwLMIuAAAALIuwCwAAAMsi7AIAAMCyCLsAAACwLMIuAAAALIuwCwAAAMsi7AIAAMCyCLsAAACwLMIuAAAALIuwCwAAAMsi7AIAAMCyCLsAAACwLMIuAAAALIuwCwAAAMsi7AIAAMCyCLsAAACwLMIuAAAALIuwCwAAAMsi7AIAAMCyCLsAAACwLMLuDeJyO/WXbzqb4yWHv5TL7dSihFhzrud3r8nldio7L1uSlJyVrNYB8zW2Qpy5ZuEtB9U6YL42eZ0y5/r6fqf2/l+a42P2H9U6YL4+Kr/DnJvps1utA+brgCPFnOvkv0Td/Zab4x1eZ9U6YL5ibtlnzk0ov0WtA+brjC3DnHsmIFYDfVeb49Xex9U6YL6+LnfEnHv31vVqHTBfF5UrSbpgu6TWAfP1XoWN5pp/l0tQ64D5ivM6ac694btCL/gvMsfH7RfUOmC+ppXfZs7902ePWgfM1z5Hsjn3qv9Xes1vmTne7UhS64D5mnvLXnNuYvmtah0wXz/Y0825iIAF6u+7yhyv805U64D5WlrusDk36tYNah0wX5nKkSSlK1utA+YrssL35prF5Q6pdcB8bfA+Yc695btSzwUsNMcn7GlqHTBfU8r/rzk3bcdkudxObTuz1Zx7esETenrBE+Z425mtcrmdmrZjsjk3euO7crmdOvbjUXPu4bkudfqqvTle8Z9v5HI7FbN/jjnXb3VfudxOpWVfKOglJ10ut1NvrOxlrpl34DO53E59c6zw69l52Ut6cE4jc5yYdlwut1Mjvx9mzn20c6pcbqe2nt5szrVZ2EpPxjY3xzvPbpfL7dTk7RPMuTGbRsnldupI6iFz7tHPHtBLS9uZ41XHV8jldmruPrc59/bqN+RyO5V68bwkKSs3Sy63U31W9DDXxB6cJ5fbqWVHl5pzXZd3UuPZDc3xqfSTcrmdGrFhiDn38a6/y+V2atMPhd97EYv+Ry3nNzPHu8/tksvt1MT/HW/Ojd0cKZfbqYTzB825pjFN9OKS58zxmsRVcrmdcu+dZc4NXPuWXG6nkrMKzunsvGy53E71/O41c82ihFi53E4tOVz4vf6XbzrL5Xaa49MZP8jldmrY+kHm3Izd/5DL7dTGUxvMuee+bKPHvwgzx3uT4+VyO/XB1nHm3PtbouRyO3UgZb85F/75f+uFxc+a4/Un18rldmrWnhnm3Dvr+svldups5llJUl5+nlxup3p828Vcs/jwIrncTi0+XPi93uPbLnK5ncrLz5Mknc08K5fbqXfW9TfXzNozQy63U+tPrjXnXlj8rMI//29zfCBlv1xup97fEmXOfbB1nFxup/Ymx5tzj38Rpue+bGOON57aIJfbqRm7/2HODVs/SC63U6czfjDnrve13OV2auDat8w17r2z5HI7tSax8PXnxSXPqWlME3OccP6gXG6nxm6ONOcm/u94udxO7T63y5xrOb+ZIhb9jzne9EOcXG6nPt71d3NuxIYhcrmdOpVe+HrbeHZDdV3eyRwvO7pULrdTsQfnmXN9VvSQy+1UVm6WJCn14nm53E69vfoNc83cfW653E6tOr7CnHtpaTs9+tkD5vhI6iG53E6N2TTKnJu8fYJcbqd2nt1uzj0Z21xtFrYyx1tPb5bL7dRHO6eacyO/HyaX26nEtOPm3INzGqnzspfM8TfHlsnldmregc/MuTdW9pLL7VR6TsF7QFr2BbncTvVb3ddcE7N/jlxup1b85xtzrtNX7fXwXJc5PvbjUbncTo3e+K45x2v5zX8tv5EIuwAAALAsm2EYhqeLsIKkpLTr2u746D43uBKUJDWHTv79RQAAQJIUHOx3w/fJlV0AAABYFmEXAAAAlkXYBQAAgGV5eboAADdeztfTPV0Ciol3qx6/vwgAYOLKLgAAACyLsAsAAADLIuwCAADAsgi7AAAAsCzCLgAAACyLsAsAAADLIuwCAADAsgi7AAAAsCzCLgAAACyLsAsAAADLIuwCAADAsgi7AAAAsCzCLgAAACyLsAsAAADLIuwCAADAsgi7AAAAsCzCLgAAACyLsAsAAADLIuwCAADAsgi7AAAAsCzCLgAAACyrzIbdCRMmqH379mrbtq12795d5LHTp0/r5ZdfVrt27TR8+HAPVQgAAIA/qkyG3bi4OO3Zs0cxMTGKiopSVFRUkcc//PBD9enTR1988YXsdrtOnjzpoUoBAADwR5TJsLtp0yY1b95ckhQSEqKzZ88qKyvLfDw+Pl4PPvigJOndd99VtWrVPFInAAAA/hgvTxfgCUlJSQoNDTXHQUFBOnfunGrUqKELFy7Ix8dHQ4YM0eHDh3X//ferX79+v7vPwMAK8vJyXHMtx695C5QmwcF+HjnuKY8cFTeDp84pACitymTY9fb2LjI2DEM2m02SlJ2drSNHjmjixIn605/+pO7du2vlypV67LHHrrjP8+czi61elF5JSWmeLgEWwzkFwMqK4y/0ZfI2huDgYCUnJ5vjlJQUVa5cWZIUGBio6tWrq1q1avLy8tLDDz+sw4cPe6pUAAAA/AFlMuyGhYVpxYoVkgruz61Ro4Z8fHwkSQ6HQ1WrVlViYqIkaefOnapVq5bHagUAAMD1K5O3MTidToWGhioiIkIOh0ORkZFasGCB/Pz81KJFCw0ePFjDhw9XVlaW6tWrZ/4wGwAAAEqXMhl2Jal///5FxvXr1zf/fMcdd2jmzJk3uyQAAADcYGXyNgYAAACUDYRdAAAAWBZhFwAAAJZF2AUAAIBlEXYBAABgWYRdAAAAWBZhFwAAAJZF2AUAAIBlEXYBAABgWYRdAAAAWBZhFwAAAJZF2AUAAIBlEXYBAABgWYRdAAAAWBZhFwAAAJZF2AUAAIBleXm6AABAyWc7st7TJaCYGLUf8XQJQLHiyi4AAAAsq9SG3fPnz6tz585KT08353bu3KlXX31VmZmZHqwMAAAAJUWpDbtRUVFq0qSJfH19zbl77rlHjz76qKKjoz1YGQAAAEqKUht2jxw5ou7du18236VLF+3bt88DFQEAAKCkKbVh1+FweLoEAAAAlHClNuza7XYdP378svmDBw/Ky4sPmQAAAEAp/uix119/XV27dtVrr70mp9OpvLw8bd++XTNnzlRUVJSnywMAAEAJUGrD7qOPPqqpU6fq448/1meffabc3FyFhoZqypQpuvvuuz1dHgAAuIILaUmeLgHFxN8v2NMlFFFqw64khYSEKDo6WqmpqXI4HPLz8/N0SQAAAChBSnXYnTNnjj7++GOlpaUpPz9flSpVUvfu3dWuXTtPlwYAAIASoNSG3S+++ELLly/XjBkzVKdOHUnSoUOHNHr0aHl5eSkiIsLDFQIAAMDTSu2nMcTGxmratGlm0JWkunXravLkyfr88889WBkAAABKilIbdu12e5HfnvYTPz8/2e2lti0AAADcQKU2FWZkZPzqfH5+vrKysm5yNQAAACiJSm3YfeCBBxQdHa38/HxzLjc3V1FRUWrSpIkHKwMAAEBJUWp/QO2tt97S0KFD9fjjj6t+/frKz8/Xvn37dO+99yo6OtrT5QEAAKAEKLVht3z58ho/frwSExOVkJBg/lKJmjVrero0AAAAlBClNuxKUkJCgo4cOaKGDRuqSpUq5vyyZcv05JNPerAyAAAAlASl9p7duXPnqlevXlqyZInat2+vDRs2KCUlRX379tXMmTM9XR4AAABKgFJ7ZTc2NlaLFi1ShQoVdPLkSXXt2lU5OTl65ZVX1LFjR0+XBwAAgBKg1IbdChUqqEKFCpKkatWqycfHR59++qmqVq3q4coAAABQUpTa2xhsNluRsb+/P0EXAAAARZTaK7uXLl1SYmLib45r1KjhibIAAABQgpTasJuUlKTOnTvLMAxz7pVXXpFUcNV3xYoVnioNAAAAJUSpDbsrV670dAkAAAAo4Upt2P3JxIkTL5vLz8/Xm2++6YFqAAAAUJKU2h9Q+4nD4TD/MwxDu3btUkpKiqfLAgAAQAlQ6q/s9u7d+7K5UaNGeaASAAAAlDSl/sruL+Xm5urQoUOeLgMAAAAlQKm/stu0adMin7mblpamiIgID1YEAACAkqLUh925c+cqMzNTe/bskc1mk5+fnyZMmODpsgAAAFAClPqwO2vWLK1bt07JycmqVq2aTp48qW7dunm6LAAAAJQApf6e3V27dunrr79WaGioFi1apI8//lgZGRmeLgsAAAAlQKkPu3Z7QQt5eXnKy8tTo0aNtGvXLg9XBQAAgJKg1N/GEBoaqlmzZsnpdKpz586qW7eufvzxx9/dbsKECYqLi1N2drZGjhypBg0aXLZm/Pjx2rFjh9xud3GUDgAAgGJW6sPuiBEjdOHCBVWoUEFLlixRamqqevToccVt4uLitGfPHsXExOjgwYMaOXKk5syZU2TNoUOHtGXLFnl7exdn+QAAAChGpT7sSpK/v78k6dlnn72q9Zs2bVLz5s0lSSEhITp79qyysrJUvnx5c83YsWP11ltvafLkyTe+YAAAANwUpf6e3euRlJSkoKAgcxwUFKRz586Z4wULFujBBx9U1apVPVEeAAAAbhBLXNm9Vr+8NcEwDPMXU6SmpurLL7/UJ598otOnT1/1PgMDK8jLy3HNtRy/5i1QmgQH+3nkuKc8clTcDJ46p84d8chhcRN46py6kJbkkeOi+HnqnPotZTLsBgcHKzk52RynpKSocuXKkgru501KSlKHDh2UnZ2t48eP67333tM777xzxX2eP59ZrDWjdEpKSvN0CbAYT51Ttt9fglKK1yncaH/knCqOoFwmb2MICwvTihUrJEnx8fGqUaOGfHx8JEmtWrXS0qVLNW/ePE2ZMkV333337wZdAAAAlExl8squ0+lUaGioIiIi5HA4FBkZqQULFsjPz08tWrTwdHkAAAC4Qcpk2JWk/v37FxnXr1//sjXVq1fnM3YBAABKsTJ5GwMAAADKBsIuAAAALIuwCwAAAMsi7AIAAMCyCLsAAACwLMIuAAAALIuwCwAAAMsi7AIAAMCyCLsAAACwLMIuAAAALIuwCwAAAMsi7AIAAMCyCLsAAACwLMIuAAAALIuwCwAAAMsi7AIAAMCyCLsAAACwLMIuAAAALIuwCwAAAMsi7AIAAMCyCLsAAACwLMIuAAAALIuwCwAAAMsi7AIAAMCyCLsAAACwLMIuAAAALIuwCwAAAMsi7AIAAMCyCLsAAACwLMIuAAAALIuwCwAAAMsi7AIAAMCyCLsAAACwLMIuAAAALIuwCwAAAMsi7AIAAMCyCLsAAACwLMIuAAAALIuwCwAAAMsi7AIAAMCyCLsAAACwLMIuAAAALIuwCwAAAMsi7AIAAMCyCLsAAACwLMIuAAAALIuwCwAAAMsi7AIAAMCyCLsAAACwLMIuAAAALIuwCwAAAMsi7AIAAMCyvDxdgKdMmDBBcXFxys7O1siRI9WgQQPzsc2bN+uDDz6QJN1xxx0aM2aM7Hb+XgAAAFDalMkEFxcXpz179igmJkZRUVGKiooq8viwYcM0ceJExcTE6OLFi1qzZo2HKgUAAMAfUSbD7qZNm9S8eXNJUkhIiM6ePausrCzz8S+++EJ/+tOfJEmBgYFKT0/3SJ0AAAD4Y8rkbQxJSUkKDQ01x0FBQTp37pxq1KghSfL395cknT17Vhs3btRf//rX391nYGAFeXk5rrmW49e8BUqT4GA/jxz3lEeOipvBU+fUuSMeOSxuAk+dUxfSkjxyXBQ/T51Tv6VMhl1vb+8iY8MwZLPZiswlJyerR48eGjJkiAIDA393n+fPZ97QGmENSUlpni4BFuOpc8r2+0tQSvE6hRvtj5xTxRGUy+RtDMHBwUpOTjbHKSkpqly5sjlOT09Xt27d1LdvX4WFhXmiRAAAANwAZTLshoWFacWKFZKk+Ph41ahRQz4+PubjUVFR6tixo5o1a+ahCgEAAHAjlMnbGJxOp0JDQxURESGHw6HIyEgtWLBAfn5+euSRR7Ro0SL95z//0cKFCyVJTz/9tP785z97uGoAAABcqzIZdiWpf//+Rcb169c3/7xnz56bXQ4AAACKQZm8jQEAAABlA2EXAAAAlkXYBQAAgGURdgEAAGBZhF0AAABYFmEXAAAAlkXYBQAAgGURdgEAAGBZhF0AAABYFmEXAAAAlkXYBQAAgGURdgEAAGBZhF0AAABYFmEXAAAAlkXYBQAAgGURdgEAAGBZhF0AAABYFmEXAAAAlkXYBQAAgGURdgEAAGBZhF0AAABYlpenCwAAALCqM2fOqOfrr6tu3bpF5ocOGyY/P78bcoxxY8cqOTlZZ86ckZeXlypVqqSaNWuqV+/eV9yu9dNP66677lJubq7y8vMVERGhpk2b/ub6zIwM7d+/X/e5XNdU36pV3yk8/PFr2uZGIuwCAAAUo2rVq2vsuHHFtv8BAwdKkmbPnq0Af3+1btPmqra79dZbzbpSU1P17rvvKigoSA0aNPjV9YcOH9a2bduuOezOnv1Pwi4AAEBZc/ToUU2ZMkXeXl6y2Wx6Z8gQlS9fXtHR0UpJTtbFixfV4aWXtG/vXlWrXl0tW7aUJHX/y18U/f778vf3/819Z2Rk6MMPPtCFCxeUn5+v119/XXV+cXX55ypWrKiuXbsqdv58NWjQQBs2bNDCBQtks9sVWr++unbrpmlTpyozM1PVqlXTAw8+qEkTJyo7O1sOh0N/feMNBQcHa0FsrDZt2qLc3Fy9/nof7d+/V4cOHdQ77/TXe+9F3/Cv4dXgnl0AAFBm1AwL0219+5rjW5cvV82wMN26ZIk5d1u/fqoZFiZlZ0uS7CkpqhkWpsojRphr/GJiCtb8AT+mpuq1bt0UNXas7nY6tWrlSh07dkw//vijot9/X5HvvaeM9HQ91ry51q5ZI0k6/p//6PYqVa4YdCXp34sWqV5IiMZFR6t7jx6a/tFHv1tPndq1lZiYqIsXL+rzzz/XmKgoRUdH68yZM9q3b5+ee/55hYWF6cmnnpLb7VZERITGREWpzTPPKOazz3T69GmtW7dOU6d+rBEjRmv58q/UoUMn+fr6eizoSlzZBQAAKFYnT5zQwAEDzHH16tXVp29f+fn7a+annyonJ0fJyclqFh6u6tWrKyM9XdHR0WrSpInCH3tMDodDGRkZ+jE1VRvj4hQeHv67x0xISFD7F1+UJNWrV0+nTp783W1sdrsMw9DJEyd09swZDR0yRFLBVeIzZ84U3f/BgzqRmKjPPvtM+fn5CqhYUYcPH1a9kBDZ7XZVr15DgwcPv5YvU7Eh7AIAgDLj+Nq1RcYZLVsq4/9vD/jJ2fHji4zzg4Iu2y6tfXultW9/Vcf8rXt2P5o+Xe3atdP9DzygL+bN06XsbPn4+GjCxImKj4/Xsq++0vp16zRo8GA1a9ZM33//vXbu2KHhP7vC/FtsNpsMw7iq+n6yb+9e1apVS5JUu04dvffee0Ue//bbb4uMBw0erMqVK5vjDRs2yMjPv6Zj3gzcxgAAAOABaenpqlK1qrKzs7V582bl5ubq0KFDWrt2rRo2bKgePXro4MGDkqSmzZrpm2++UVClSvLx8fndfdcLCdHOHTskSfv27dMdd9xxxfUXLlzQzJkz9cyzz6pa9eo6kZio1NRUSdJst1vJycmy22zKy8uTJNWvX19xcXGSpB07dmj16tWqU6eO9u/fr7y8PKWkJGvw4H6SpPz8awvdNxpXdgEAAIrRL29jkKQuXbvqmWeeUeTo0apSpYraPvecpk+fLpfLpZUrV2rZV18pJydHHTt2lCQFBgaqfPnyV/xosJ9r06aNPvzgAw3o31+SfvVjyDIyMjRwwADlG4ayL13S8+3aqWHDhpKk7j16aPjw4fJyOFS3Xj0FBQWpTt26mjlzpoJvu00vvfyyPhg/XmvWrJFN0lv9+un2229XeHi4evbspvz8fP3lLz0lSSEh9dW9+6v66KOZ1/sl/ENsxrVe48avSkpKu67tjo/uc4MrQUlSc+hkjxw35+vpHjkuip93qx4eOa7tyHqPHBfFz6j9iEeOeyEtySPHLa0uXLigIe+8owkTJ8rhcHi6nCvy9wu+7m2Dg2/MZw//HFd2AQAASrC4uDjNmT1bXbt1K/FBtyQi7AIAAJRgDz30kB566CFPl1Fq8QNqAAAAsCzCLgAAACyLsAsAAADLIuwCAADAsvgBNQAAgGJy5swZRUZGatKkSebc7NmzFeDvr9Zt2lzTvrKysvR6jx6a9c9/mnPz58/Xls2blZGRoXPnzpm/PGJ0ZKS8vb1/c18DBwzQxYsXVa5cOWVlZemhhx7Six06XPHTHtavW6dHHn30mmresWOb7rjjTgUGBl3TdjcSYRcAAKAEys/Pl91+5X+Ef/755/X8889r165dWvzllxoydOhV7//Nt97SnXfeqdzcXE2dOlWff/65OnTo8Ktrc3JytHDhwmsOu0uXfqkXX3yZsAsAAFAWffLxx9q3f79ycnL01FNPqVWrVvpg/Hg5vLx05RyYUwAAEOlJREFU4ccf1a9fP0VGRio7O1t3O51Xvd8ZM2Zob3y88vLy1LpNGzVv3vw313p5eal79+56rVs3vfjii0pMTNTfp02TIalChQrq16+fZs2apWPHjmnqlCnq8frrmjJlin744Qfl5OSoU6dOuueee7Rjxw653W7JsKlFi5aqVau21q1braNHj2j06HG6/fbbb8BX7Npxzy4AACgzmi4I01/X9jXHy48vV9MFYVp6bIk51299PzVdEKbsvGxJUsrFFDVdEKYRm0aYa2ISYtR0QdhVHfOnXxf803/fffutJCknO1tBQUF6//33FR0drTmzZ5vb+Pv7a9jw4Vq5apVq1aql6PffV506da7qeLt379axo0f1/vjxiho7VnNmz1ZmZuYVt/Hx8VHFihWVfO6cpk+frj59+yoqKkr33nuvlixZoueff17VqldXr969tWbNGgUGBioqKkrDhw/XPz76SJL092nTNHzYME2f/qm2bt2sBg0aqW7dEL3zznCPBV2JK7sAAADFqlr16ho7bpw5nv3/oda7XDmlpaWp/9tvy+FwKDU11VwTEhIiSTp+/LgaNGggSeb/f09CQoIa3nOPbDabfHx8VL16dZ06dUp169a94nY2u135hqFDCQmaOGGCpILbF0Lq1y+y7uDBg9q5Y4fi9+yRJF3KzlZqaqrsdrsCKlaUw+HQuHETrqrWm4GwCwAAyow1bdcWGbes2VIta7YsMjf+kfFFxkE+QZdt175ee7Wv1/4P1bJz507t2r1bY8eOld1u13Nt25qPeXkVRDTDMGT7/znDMK5qvzabTfrZWuOnuSvIzMxU6vnzqlSpkhwOh6LGji2yzZkzZ4qsf+GFFxT+2GPmOC0tTfn5+VdV383GbQwAAAAekJaWpttuu01eXl76fsMGGYahnJycImuqV6+uhEOHJBWE46sRUq+euTYzM1M/nDqlatWq/eb6vLw8fTR9ulq2aiWHw6HatWtr65YtkqQ1q1drx/btstlsys/LkyTVr19fG+PiJEmpqan656xZ8vPzU35+vpKTk2UYhgYMeENpaWmy2+3K+//tPIUruwAAAB5w3733Knb+fA0aNEgP3H+/mjRpor9Pm1ZkTfPmzTX6b3/T4EGDdLfTeVVXd+92OlW7dm31e+st5eXnq3PnzvLx8bls3YcffCAfHx9lZmbq/gce0IsvvihJ6t6jhyZPmqR5X3whn1tu0YCBA1W+fHnl5ORozJgxGjBggHbu3Fmw/7w8vfTyy5KkXr17a/Tf/ia73Uvh4Y/Lz89PjRrdp+HDBysyMlq1a1/dPcc3ms242mviuKKkpLTr2u746D43uBKUJDWHTvbIcXO+nu6R46L4ebfq4ZHj2o6s98hxUfyM2o945LgX0pI8clwUP3+/4OveNjjY7wZWUoDbGAAAAGBZhF0AAABYFmEXAAAAlkXYBQAAgGURdgEAAGBZhF0AAABYVpkNuxMmTFD79u3Vtm1b7d69u8hj27dvV/v27RUREaFpv/i8OwAAAJQeZTLsxsXFac+ePYqJiVFUVJSioqKKPD5o0CB9+OGHio2N1apVq3T8+HEPVQoAAIA/okyG3U2bNql58+aSpJCQEJ09e1ZZWVmSpMTERAUEBKhKlSqy2+1q1qyZ1q/nw9QBAABKozIZdpOSkhQUFGSOg4KCdO7cOUnS2bNnizxWqVIl8zEAAACULl6eLsATvL29i4wNw5DNZvvdx67ken+9XfDEWde1HXBFHft7ugJYTfCTnq4AFlMcvxYW+DVl8spucHCwkpOTzXFKSooqV64sSbrtttuKPHbu3DnddtttN71GAAAA/HFlMuyGhYVpxYoVkqT4+HjVqFFDPj4+kqTbb79dubm5OnXqlPLy8rRq1SqFhYV5slwAAABcJ5thGIani/CE6Ohoff/993I4HIqMjFR8fLz8/PzUokULbdmyRZGRkbLZbGrTpo1effVVT5cLAACA61Bmwy4AAACsr0zexgAAAICygbALAAAAyyLs4oZavny5p0tAKZGbm6t27dqpf/+r/5i09PR0fslLKXY9z7lU+p/3U6dOadeuXZ4uo8zKyMjQY489ds3bbdmypcinM3kK76t/HGEXN8yJEye0dOnSq1qbn59fzNWgpEtKStKlS5cUHR191dvEx8drw4YNV7WWc6zkuZ7nXPLM856QkKAffvihyNzhw4d18uTJa95XXFzcVYddztuSIzY29rrCbkpKivbs2VNkLi0tTdu3b7/mffG+emOUyV8qgRvj1KlTevvtt2Wz2ZSbm6uAgADt2rVLU6ZM0SuvvKLBgwcrNTVVeXl5GjZsmO666y49/vjjatiwoR588EE1btxYo0aNkmEY8vX11dixYyVJb7zxhi5evKisrCwNHz5cjRo18nCnKA7vvfeeEhMTNXjwYGVmZprnytChQxUaGqqZM2fq66+/Vn5+vpo2barevXtr1KhRSk9P15133qnt27erZcuWCg8P16pVq7R8+XJFRUVxjpVgP3/OhwwZoiFDhpS4533btm365JNPZLfbNWLEiMseHzZsmCpXrqxu3bopJCTkssfXr1+vCRMmyOFwqHLlyho6dKimTJkiLy8vValSRdWrV9eoUaNks9nk6+urqKgoHThwQDNmzFBWVpb69++v06dPa+bMmbLb7WrYsKEGDhyovXv3auTIkbLb7SpXrpw+/PDDIr/tE0Wlp6erb9++unjxoho3bmzOb926VR9++KHsdruqVq2qyMhILVy4UOvXr1dmZqZOnz6tzp076/bbb9d3332nhIQETZ48WZ9++qn27Nmjixcvqn379mrfvv1lxzxx4oQ+/fRT7d+//7J/vXA4HPr88881ceJEvfrqq2ratOll2//yPXXcuHGKjIzkffVGMIDr9OmnnxqTJ082DMMwdu7caUyZMsXo06ePYRiGMXnyZGP69OmGYRjG7t27jQ4dOhiGYRihoaHGoUOHDMMwjM6dOxvHjh0zDMMwZs+ebUyfPt1Yvny5MWjQIMMwDOPYsWPGd999d1N7ws2TmJhoREREGFOnTjXmzZtnGIZhHDx40OjSpYthGIYxY8YMIysry8jPzzcee+wxIy0tzYiNjTWioqIMwzCMgQMHGitXrjQMwzBWrlxpDBw40DAMzrGS7Kfn3DCMEve8b9myxejYsaPRv39/IyEh4Yp9bN++3ejRo4fRo0cP48CBA0Ue6969u7F582bDMAzjq6++Mk6dOmVMmjTJcLvdhmEYRseOHY1t27aZvU6YMMGIi4szwsPDjUuXLhkZGRlGRESEcenSJcMwDKNPnz7Gtm3bjL/97W/GwoULDcMwjPXr1xv79++/yq962TR79mzznPnqq6+M8PBwwzAMIyIiwkhNTTUMwzCioqKMxYsXG7GxsUbr1q2N3Nxc4/z580ZYWJiRn59vvPzyy8aBAweM8+fPG4899phhGIZx6dIlY/bs2UWOdfLkSaNfv35Gx44djTVr1lyxrhMnThgjR440XnjhhcvOwV++p27ZssWIi4vjffUG4MourluTJk3Uu3dvpaenq0WLFmrcuLEOHDggSdqzZ4969uwpSXI6nTp27JgkqXz58qpTp46kgn+aHDp0qCQpOztbDRo00LPPPqvx48dr+PDhevzxx9W8efOb3xhuqt27d+vMmTP68ssvJUmXLl2SJHl5ealLly5yOBxKSUlRamrqVe2Pc6x0KGnP+8KFC1WnTh29/fbbuvXWW694rEaNGikyMlIjRozQd999V+QK7xNPPKERI0aoTZs2euqpp1SlSpUi2x4+fFj33nuvJKlx48aaNm2aHnroIYWEhKhcuXI6dOiQTp48qa5du0oq+OfvkydPKjw8XO+++66OHTumVq1aqX79+lf1dSmrDh8+rPvvv1+S9MADD0iSfvzxRx09elS9e/eWJGVmZqpSpUoKCgqSy+WSw+FQxYoV5evrq/Pnz5v7qlixomrUqKGePXvqiSee0HPPPVfkWFu2bNGpU6f0/vvvq2rVqlesq1q1aho0aJBmzJih2bNnFzkPf+09ddOmTebjvK9eP8IurltoaKj+/e9/a926dRozZkyRFwCbzSbjZx/hbLPZJEne3t7mnMPh0L/+9S/zsZ8sXrxYGzdu1D//+U9t3rxZb7/9djF3Ak8bMmSIXC6XOU5MTJTb7dbChQvl6+urJ5988rJtfn7e5OXlmX/mHCs9StLzPmzYMMXGxqpTp05q1qyZOnbsqIoVK2r48OE6evSoHn74Yb3++us6ffq0ZsyYod27d6tDhw566qmnihyjbdu2euSRR/Tdd9+pc+fOmjp16m/2bxiG7PaCH50pV66cOX/XXXdp5syZl63/4osvtGrVKr355psaOHCgmjVr9pv7LusMwzCf/5+/FwUHB8vtdhdZu2DBgt/c9iczZ87U7t279e9//1tz587VvHnzzMfatGkjf39/DRo0SNWqVdNrr72m2rVra+7cuVq2bJkCAwM1adIkZWZm6vPPP9fixYvVqlUrTZo0qcgxfu09tXbt2ubjvK9eP8IurtvSpUtVs2ZNtWrVSoZhaO7cufLz85MkNWjQQBs3btQ999yj7du3q27dupdt/1//9V9au3atmjZtqqVLlyooKEiGYSg3N1dNmzZVYGCgxo8ff7Pbwk12zz33aMWKFXK5XDp06JDWr1+vxo0bq1KlSvL19dWOHTt0+vRp5eTkyG63Kzc3V5Lk6+trXvX75Q+D/IRzrOQqac+7j4+PXnrpJbVv315ff/21evXqpWHDhmnUqFHmmrVr12rWrFl65ZVXNGTIkF899tSpU9WpUyd16NBBCQkJOnDggHkPpiTVq1dP27Zt03333ae4uDg5nc4i29eqVUtHjhxRcnKyKlWqpEmTJunPf/6zvv32WzVr1kwRERFKTk5WfHw8YfcKatWqpfj4eLVq1UpxcXGSpICAANlsNh08eFAhISFyu93m1d/t27crLy9PaWlpunjxoipWrCibzaa8vDydOHFCq1ev1ssvv6y7775bDz/8sPLy8uRwOCQVhM7w8HCFh4dr27ZtGj9+vJo1a6YOHTqoQ4cOkqTk5GT17NlTbdu2VUxMTJG/3Pzkl++p69atU926dc1zh/fV68dvUMN1i4+P16hRo1SuXDnl5ubqzTff1IABA/Tkk0+qZ8+e5o30kjRixAjVq1dPDz74oPnPMocPH9awYcNks9nk4+Oj8ePHKyMjQwMGDJDdbld2drb++te/6uGHH/ZkmygmJ06cUN++ffWvf/1LgwcPVnJysvmDSnfddZe6d++uzMxMNWrUSHa7Xfv27dPAgQPVpUsXde3aVffff7+59qcANG7cOM6xEuyn53zBggVKT08v8c97dnZ2kVDyy/GvWbRokdxutypUqKAKFSpo/Pjx2rlzpwYPHqz+/furfv36GjlypGw2mwIDAzVmzBjFx8drzpw55pW+b775RtOnT5e3t7fuvvtuDRs2TOvXr9fEiRNVvnx5SdK4ceMuu0UChS5cuKBevXrJbrercePGio2N1erVq7V161aNHTtWXl5euv322zV27FgtWbJE3377rXJycnTmzBl169ZNzzzzjKZMmaLFixdr4sSJ+sc//qEffvhB+fn5atmypbp06XLF4//yXMnNzZXdbjev5P+aX76njhgxQrfddpvatm3L++ofRNgFAABl1oIFC5SQkKCBAwd6uhQUEz5nFwAAAJbFlV0AAABYFld2AQAAYFmEXQAAAFgWYRcAAACWRdgFAACAZRF2AQAAYFmEXQAAAFjW/wG4i7hRWLF7KQAAAABJRU5ErkJggg==\n", - "text/plain": [ - "
" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "report.visualize();" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "slideshow": { - "slide_type": "slide" - } - }, - "source": [ - "Finally, you can also print a detailed report containing all of the metrics that were computed." - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
NameValueGoalUnitTablesColumnsMisc. Tags
0foreign-key1.000000Goal.MAXIMIZEbinarytable:storeschild:features
1foreign-key1.000000Goal.MAXIMIZEbinarytable:storeschild:depts
2logistic0.536250Goal.MINIMIZEauroctable:featuresdetection:auroc
3logistic0.609923Goal.MINIMIZEauroctable:storesdetection:auroc
4logistic0.516956Goal.MINIMIZEauroctable:deptsdetection:auroc
........................
90continuous-kl5.439273Goal.MINIMIZEentropytable:featurescolumn:CPI,column:Unemploymentstatistic:bivariate
91continuous-kl1.993923Goal.MINIMIZEentropytable:featurescolumn:CPI,column:Temperaturestatistic:bivariate
92continuous-kl1.417703Goal.MINIMIZEentropytable:featurescolumn:CPI,column:MarkDown5statistic:bivariate
93continuous-kl1.668086Goal.MINIMIZEentropytable:featurescolumn:CPI,column:MarkDown2statistic:bivariate
94continuous-kl0.345429Goal.MINIMIZEentropytable:deptscolumn:Weekly_Sales,column:Deptstatistic:bivariate
\n", - "

95 rows × 7 columns

\n", - "
" - ], - "text/plain": [ - " Name Value Goal Unit Tables \\\n", - "0 foreign-key 1.000000 Goal.MAXIMIZE binary table:stores \n", - "1 foreign-key 1.000000 Goal.MAXIMIZE binary table:stores \n", - "2 logistic 0.536250 Goal.MINIMIZE auroc table:features \n", - "3 logistic 0.609923 Goal.MINIMIZE auroc table:stores \n", - "4 logistic 0.516956 Goal.MINIMIZE auroc table:depts \n", - ".. ... ... ... ... ... \n", - "90 continuous-kl 5.439273 Goal.MINIMIZE entropy table:features \n", - "91 continuous-kl 1.993923 Goal.MINIMIZE entropy table:features \n", - "92 continuous-kl 1.417703 Goal.MINIMIZE entropy table:features \n", - "93 continuous-kl 1.668086 Goal.MINIMIZE entropy table:features \n", - "94 continuous-kl 0.345429 Goal.MINIMIZE entropy table:depts \n", - "\n", - " Columns Misc. Tags \n", - "0 child:features \n", - "1 child:depts \n", - "2 detection:auroc \n", - "3 detection:auroc \n", - "4 detection:auroc \n", - ".. ... ... \n", - "90 column:CPI,column:Unemployment statistic:bivariate \n", - "91 column:CPI,column:Temperature statistic:bivariate \n", - "92 column:CPI,column:MarkDown5 statistic:bivariate \n", - "93 column:CPI,column:MarkDown2 statistic:bivariate \n", - "94 column:Weekly_Sales,column:Dept statistic:bivariate \n", - "\n", - "[95 rows x 7 columns]" - ] - }, - "execution_count": 8, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "report.details()" - ] - } - ], - "metadata": { - "file_extension": ".py", - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.8.6" - }, - "mimetype": "text/x-python", - "name": "python", - "npconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": 3 - }, - "nbformat": 4, - "nbformat_minor": 4 -} diff --git a/tutorials/2_writing_metrics.ipynb b/tutorials/2_writing_metrics.ipynb deleted file mode 100644 index 897de10c..00000000 --- a/tutorials/2_writing_metrics.ipynb +++ /dev/null @@ -1,461 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Writing Custom Metrics\n", - "Let's start by generating a simple synthetic dataset." - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": { - "nbsphinx": "hidden" - }, - "outputs": [], - "source": [ - "import warnings\n", - "\n", - "warnings.filterwarnings(\"ignore\")" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "2020-06-25 19:47:19,922 - INFO - modeler - Modeling users\n", - "2020-06-25 19:47:19,923 - INFO - metadata - Loading transformer CategoricalTransformer for field country\n", - "2020-06-25 19:47:19,924 - INFO - metadata - Loading transformer CategoricalTransformer for field gender\n", - "2020-06-25 19:47:19,924 - INFO - metadata - Loading transformer NumericalTransformer for field age\n", - "2020-06-25 19:47:19,941 - INFO - modeler - Modeling sessions\n", - "2020-06-25 19:47:19,942 - INFO - metadata - Loading transformer CategoricalTransformer for field device\n", - "2020-06-25 19:47:19,942 - INFO - metadata - Loading transformer CategoricalTransformer for field os\n", - "2020-06-25 19:47:19,954 - INFO - modeler - Modeling transactions\n", - "2020-06-25 19:47:19,955 - INFO - metadata - Loading transformer DatetimeTransformer for field timestamp\n", - "2020-06-25 19:47:19,955 - INFO - metadata - Loading transformer NumericalTransformer for field amount\n", - "2020-06-25 19:47:19,955 - INFO - metadata - Loading transformer BooleanTransformer for field approved\n", - "2020-06-25 19:47:20,710 - INFO - modeler - Modeling Complete\n" - ] - } - ], - "source": [ - "from sdv import load_demo, SDV\n", - "\n", - "sdv = SDV()\n", - "metadata, real_tables = load_demo(metadata=True)\n", - "\n", - "sdv.fit(metadata, real_tables)\n", - "synthetic_tables = sdv.sample_all(20)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Next, we'll create an empty `MetricsReport` object to hold our custom metrics." - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [], - "source": [ - "from sdmetrics.report import MetricsReport\n", - "\n", - "report = MetricsReport()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Generic Metric\n", - "The simplest way to create a custom metric is to use the generic metric API. You simply write a function which yields a sequence of Metric objects, attach it to a metrics report, and you're ready to go!" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [], - "source": [ - "from sdmetrics.report import Metric\n", - "\n", - "def my_custom_metrics(metadata, real_tables, synthetic_tables):\n", - " name = \"abs-diff-in-number-of-rows\"\n", - "\n", - " for table_name in real_tables:\n", - "\n", - " # Absolute difference in number of rows\n", - " nb_real_rows = len(real_tables[table_name])\n", - " nb_synthetic_rows = len(synthetic_tables[table_name])\n", - " value = float(abs(nb_real_rows - nb_synthetic_rows))\n", - "\n", - " # Specify some useful tags for the user\n", - " tags = set([\n", - " \"priority:high\",\n", - " \"table:%s\" % table_name\n", - " ])\n", - "\n", - " yield Metric(name, value, tags)\n", - " \n", - "report.add_metrics(my_custom_metrics(metadata, real_tables, synthetic_tables))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Statistic Metric\n", - "Alternatively, if you're looking to create a statistical metric which looks at univariate or bivariate distributions, you can subclass the `UnivariateMetric` class and fill in a single function. The base class will handle identifying the columns which have the correct data type, traversing the tables, and so on. You can simply focus on the math." - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [], - "source": [ - "from scipy.stats import chisquare\n", - "\n", - "from sdmetrics.report import Goal\n", - "from sdmetrics.statistical.univariate import UnivariateMetric\n", - "from sdmetrics.statistical.utils import frequencies\n", - "\n", - "class CSTest(UnivariateMetric):\n", - "\n", - " name = \"chisquare\"\n", - " dtypes = [\"object\", \"bool\"]\n", - "\n", - " @staticmethod\n", - " def metric(real_column, synthetic_column):\n", - " \"\"\"This function uses the Chi-squared test to compare the distributions\n", - " of the two categorical columns. It returns the resulting p-value so that\n", - " a small value indicates that we can reject the null hypothesis (i.e. and\n", - " suggests that the distributions are different).\n", - "\n", - " Arguments:\n", - " real_column (np.ndarray): The values from the real database.\n", - " synthetic_column (np.ndarray): The values from the fake database.\n", - "\n", - " Returns:\n", - " (str, Goal, str, tuple): A tuple containing (value, goal, unit, domain)\n", - " which corresponds to the fields in a Metric object.\n", - " \"\"\"\n", - " f_obs, f_exp = frequencies(real_column, synthetic_column)\n", - " statistic, pvalue = chisquare(f_obs, f_exp)\n", - " return pvalue, Goal.MAXIMIZE, \"p-value\", (0.0, 1.0)\n", - "\n", - "report.add_metrics(CSTest().metrics(metadata, real_tables, synthetic_tables))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Detection Metric\n", - "Similarly, if you're looking to create a detection metric, you can subclass the `TabularDetector` class and fill in the `fit` and `predict_proba` functions. The base class will handle denormalizing parent-child relationships, etc. so you can focus on the machine learning." - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "from sklearn.pipeline import Pipeline\n", - "from sklearn.preprocessing import RobustScaler\n", - "from sklearn.svm import SVC\n", - "\n", - "from sdmetrics.detection.tabular import TabularDetector\n", - "\n", - "\n", - "class SVCDetector(TabularDetector):\n", - "\n", - " name = \"svc\"\n", - "\n", - " def fit(self, X, y):\n", - " \"\"\"This function trains a sklearn pipeline with a robust scalar\n", - " and a support vector classifier.\n", - "\n", - " Arguments:\n", - " X (np.ndarray): The numerical features (i.e. transformed rows).\n", - " y (np.ndarray): The binary classification target.\n", - " \"\"\"\n", - " self.model = Pipeline([\n", - " ('scalar', RobustScaler()),\n", - " ('classifier', SVC(probability=True, gamma='scale')),\n", - " ])\n", - " self.model.fit(X, y)\n", - "\n", - " def predict_proba(self, X):\n", - " return self.model.predict_proba(X)[:, 1]\n", - "\n", - "report.add_metrics(SVCDetector().metrics(metadata, real_tables, synthetic_tables))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Collecting Metrics\n", - "Now that we've generated all the metrics, we can explore the value of each metric using the standard `MetricsReport` interface which allows users to summarize, visualize, and explore the metrics at various levels of granularity." - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
ColumnsGoalMisc. TagsNameTablesUnitValue
0Goal.IGNOREpriority:highabs-diff-in-number-of-rowstable:users10.000000
1Goal.IGNOREpriority:highabs-diff-in-number-of-rowstable:sessions9.000000
2Goal.IGNOREpriority:highabs-diff-in-number-of-rowstable:transactions171.000000
3column:countryGoal.MAXIMIZEstatistic:univariatechisquaretable:usersp-value0.980379
4column:genderGoal.MAXIMIZEpriority:high,statistic:univariatechisquaretable:usersp-value0.000000
5column:deviceGoal.MAXIMIZEstatistic:univariatechisquaretable:sessionsp-value0.880443
6column:osGoal.MAXIMIZEstatistic:univariatechisquaretable:sessionsp-value0.779993
7column:approvedGoal.MAXIMIZEstatistic:univariatechisquaretable:transactionsp-value0.778779
8Goal.MINIMIZEdetection:aurocsvctable:usersauroc0.716270
9Goal.MINIMIZEdetection:aurocsvctable:transactionsauroc0.962614
10Goal.MINIMIZEdetection:aurocsvctable:sessionsauroc0.631614
11Goal.MINIMIZEdetection:aurocsvctable:sessions,table:usersauroc0.681944
12Goal.MINIMIZEpriority:high,detection:aurocsvctable:sessions,table:transactionsauroc0.900529
\n", - "
" - ], - "text/plain": [ - " Columns Goal Misc. Tags \\\n", - "0 Goal.IGNORE priority:high \n", - "1 Goal.IGNORE priority:high \n", - "2 Goal.IGNORE priority:high \n", - "3 column:country Goal.MAXIMIZE statistic:univariate \n", - "4 column:gender Goal.MAXIMIZE priority:high,statistic:univariate \n", - "5 column:device Goal.MAXIMIZE statistic:univariate \n", - "6 column:os Goal.MAXIMIZE statistic:univariate \n", - "7 column:approved Goal.MAXIMIZE statistic:univariate \n", - "8 Goal.MINIMIZE detection:auroc \n", - "9 Goal.MINIMIZE detection:auroc \n", - "10 Goal.MINIMIZE detection:auroc \n", - "11 Goal.MINIMIZE detection:auroc \n", - "12 Goal.MINIMIZE priority:high,detection:auroc \n", - "\n", - " Name Tables Unit \\\n", - "0 abs-diff-in-number-of-rows table:users \n", - "1 abs-diff-in-number-of-rows table:sessions \n", - "2 abs-diff-in-number-of-rows table:transactions \n", - "3 chisquare table:users p-value \n", - "4 chisquare table:users p-value \n", - "5 chisquare table:sessions p-value \n", - "6 chisquare table:sessions p-value \n", - "7 chisquare table:transactions p-value \n", - "8 svc table:users auroc \n", - "9 svc table:transactions auroc \n", - "10 svc table:sessions auroc \n", - "11 svc table:sessions,table:users auroc \n", - "12 svc table:sessions,table:transactions auroc \n", - "\n", - " Value \n", - "0 10.000000 \n", - "1 9.000000 \n", - "2 171.000000 \n", - "3 0.980379 \n", - "4 0.000000 \n", - "5 0.880443 \n", - "6 0.779993 \n", - "7 0.778779 \n", - "8 0.716270 \n", - "9 0.962614 \n", - "10 0.631614 \n", - "11 0.681944 \n", - "12 0.900529 " - ] - }, - "execution_count": 7, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "report.details()" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.9" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} diff --git a/tutorials/3_ml_efficacy.ipynb b/tutorials/3_ml_efficacy.ipynb deleted file mode 100644 index 6f4893e4..00000000 --- a/tutorials/3_ml_efficacy.ipynb +++ /dev/null @@ -1,364 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Machine Learning Efficacy\n", - "In this tutorial, we will write a custom metrics generator to evaluate the efficacy of a synthetic dataset on a machine learning task." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Loading the Dataset\n", - "The Boston housing prices dataset is available through sklearn." - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "from sklearn.datasets import load_boston\n", - "from sklearn.model_selection import train_test_split\n", - "\n", - "X, y = load_boston(return_X_y=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We will want to rearrange this dataset into a DataFrame where the last column is the regression target." - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
x0x1x2x3x4x5x6x7x8x9x10x11x12y
00.0063218.02.310.00.5386.57565.24.09001.0296.015.3396.904.9824.0
10.027310.07.070.00.4696.42178.94.96712.0242.017.8396.909.1421.6
20.027290.07.070.00.4697.18561.14.96712.0242.017.8392.834.0334.7
30.032370.02.180.00.4586.99845.86.06223.0222.018.7394.632.9433.4
40.069050.02.180.00.4587.14754.26.06223.0222.018.7396.905.3336.2
\n", - "
" - ], - "text/plain": [ - " x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 \\\n", - "0 0.00632 18.0 2.31 0.0 0.538 6.575 65.2 4.0900 1.0 296.0 15.3 \n", - "1 0.02731 0.0 7.07 0.0 0.469 6.421 78.9 4.9671 2.0 242.0 17.8 \n", - "2 0.02729 0.0 7.07 0.0 0.469 7.185 61.1 4.9671 2.0 242.0 17.8 \n", - "3 0.03237 0.0 2.18 0.0 0.458 6.998 45.8 6.0622 3.0 222.0 18.7 \n", - "4 0.06905 0.0 2.18 0.0 0.458 7.147 54.2 6.0622 3.0 222.0 18.7 \n", - "\n", - " x11 x12 y \n", - "0 396.90 4.98 24.0 \n", - "1 396.90 9.14 21.6 \n", - "2 392.83 4.03 34.7 \n", - "3 394.63 2.94 33.4 \n", - "4 396.90 5.33 36.2 " - ] - }, - "execution_count": 2, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "import numpy as np\n", - "import pandas as pd\n", - "\n", - "dataset = np.concatenate([X, np.expand_dims(y, 1)], axis=1)\n", - "dataset = pd.DataFrame(dataset)\n", - "dataset.columns = [\"x%s\" % i for i in range(X.shape[1])] + [\"y\"]\n", - "dataset.head()" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Generating Synthetic Data\n", - "We'll use copulas to generate a synthetic copy of the data." - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [], - "source": [ - "from copulas.multivariate import GaussianMultivariate\n", - "\n", - "model = GaussianMultivariate()\n", - "model.fit(dataset)\n", - "synthetic_dataset = model.sample(len(dataset))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Creating the Metadata\n", - "This dataset only has a single table; however, we still need to create the Metadata object to let SDMetrics know about this table." - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [], - "source": [ - "from sdv import Metadata\n", - "\n", - "real_tables = {\n", - " \"boston_housing_prices\": dataset\n", - "}\n", - "synthetic_tables = {\n", - " \"boston_housing_prices\": synthetic_dataset\n", - "}\n", - "\n", - "metadata = Metadata()\n", - "metadata.add_table(\"boston_housing_prices\", dataset)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Evaluating Efficacy\n", - "Let's write a custom efficacy metric which attempts to predict `y` from all other columns." - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [], - "source": [ - "from sklearn.linear_model import LinearRegression\n", - "\n", - "from sdmetrics.efficacy import MLEfficacy\n", - "\n", - "\n", - "class MyCustomEfficacyMetric(MLEfficacy):\n", - " \n", - " name = \"housing_prices_prediction\"\n", - " \n", - " # Specify the table + target column\n", - " target_table_name = \"boston_housing_prices\"\n", - " target_column_name = \"y\"\n", - " \n", - " # Define the output of the score function\n", - " metric_unit = \"r_squared\"\n", - " metric_domain = (-np.inf, 1.0)\n", - "\n", - " def fit(self, X, y):\n", - " \"\"\"\n", - " Arguments:\n", - " X (np.ndarray): The numerical features (i.e. transformed rows).\n", - " y (np.ndarray): The binary classification target.\n", - " \"\"\"\n", - " self.model = LinearRegression()\n", - " self.model.fit(X, y)\n", - "\n", - " def score(self, X, y):\n", - " \"\"\"\n", - " Arguments:\n", - " X (np.ndarray): The numerical features (i.e. transformed rows).\n", - " y (np.ndarray): The binary classification target.\n", - "\n", - " Returns:\n", - " float: The value of the appropriate metric.\n", - " \"\"\"\n", - " return self.model.score(X, y)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Let's go ahead and run this." - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Metric(\n", - " name=housing_prices_prediction, \n", - " value=0.73, \n", - " tags={'column:y', 'table:boston_housing_prices', 'efficacy:ml'}, \n", - " description=Score on the real test set using the machine learning model trained on synthetic data.\n", - ")\n", - "\n", - "Metric(\n", - " name=housing_prices_prediction, \n", - " value=0.01, \n", - " tags={'column:y', 'table:boston_housing_prices', 'efficacy:ml'}, \n", - " description=Diff in score on real when trained on synthetic vs real.\n", - ")\n", - "\n" - ] - } - ], - "source": [ - "generator = MyCustomEfficacyMetric()\n", - "\n", - "for metric in generator.metrics(metadata, real_tables, synthetic_tables):\n", - " print(metric)\n", - " print()" - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.6.9" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -}