From d2c5f54ff5c4d4f29b278c033cee18e67f84b42b Mon Sep 17 00:00:00 2001 From: Yu Wu Date: Wed, 10 Jul 2024 17:50:14 +0800 Subject: [PATCH] update doc Signed-off-by: Yu Wu --- doc/2.0/fate/components/README.md | 25 +++--- doc/2.0/fate/components/README.zh.md | 81 ++++++++++--------- .../fate/components/feature_correlation.md | 34 ++++++++ 3 files changed, 88 insertions(+), 52 deletions(-) create mode 100644 doc/2.0/fate/components/feature_correlation.md diff --git a/doc/2.0/fate/components/README.md b/doc/2.0/fate/components/README.md index 72ffe0b61d..777fada88c 100644 --- a/doc/2.0/fate/components/README.md +++ b/doc/2.0/fate/components/README.md @@ -20,23 +20,24 @@ provide: For tutorial on running modules directly(without FATE-Client) with launcher, please refer [here](../ml/run_launchers.md). -| Algorithm | Module Name | Examples | Description | Data Input | Data Output | Model Input | Model Output | -|--------------------------------------------------|------------------------|------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------|-----------------------------------------------------------|-------------------------------|--------------| -| [Reader](readme.md) | | | Component to passing namespace,name to downstream tasks | | output_data | | | -| [PSI](psi.md) | PSI | [psi](../../../../examples/pipeline/psi) | Compute intersect data set of multiple parties without leakage of difference set information. Mainly used in hetero scenario task. | input_data | output_data | | | -| [Sampling](sample.md) | Sample | [sample](../../../../examples/pipeline/sample) | Federated Sampling data so that its distribution become balance in each party.This module supports local and federation scenario. | input_data | output_data | | | -| [Data Split](data_split.md) | DataSplit | [data split](../../../../examples/pipeline/data_split) | Split one data table into 3 tables by given ratio or count, this module supports local and federation scenario | input_data | train_output_data, validate_output_data, test_output_data | | | -| [Feature Scale](feature_scale.md) | FeatureScale | [feature scale](../../../../examples/pipeline/feature_scale) | module for feature scaling and standardization. | train_data, test_data | train_output_data, test_output_data | input_model | output_model | -| [Data Statistics](statistics.md) | Statistics | [statistics](../../../../examples/pipeline/statistics) | This component will do some statistical work on the data, including statistical mean, maximum and minimum, median, etc. | input_data | | | output_model | -| [Hetero Feature Binning](feature_binning.md) | HeteroFeatureBinning | [hetero feature binning](../../../../examples/pipeline/hetero_feature_binning) | With binning input data, calculates each column's iv and woe and transform data according to the binned information. | train_data, test_data | train_output_data, test_output_data | input_model | output_model | -| [Hetero Feature Selection](feature_selection.md) | HeteroFeatureSelection | [hetero feature selection](../../../../examples/pipeline/hetero_feature_selection) | Provide 3 types of filters. Each filters can select columns according to user config | train_data, test_data | train_output_data, test_output_data | input_models, input_model | output_model | +| Algorithm | Module Name | Examples | Description | Data Input | Data Output | Model Input | Model Output | +|--------------------------------------------------|------------------------|------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------|-----------------------------------------------------------|-----------------------------|--------------| +| [Reader](readme.md) | | | Component to passing namespace,name to downstream tasks | | output_data | | | +| [PSI](psi.md) | PSI | [psi](../../../../examples/pipeline/psi) | Compute intersect data set of multiple parties without leakage of difference set information. Mainly used in hetero scenario task. | input_data | output_data | | | +| [Sampling](sample.md) | Sample | [sample](../../../../examples/pipeline/sample) | Federated Sampling data so that its distribution become balance in each party.This module supports local and federation scenario. | input_data | output_data | | | +| [Data Split](data_split.md) | DataSplit | [data split](../../../../examples/pipeline/data_split) | Split one data table into 3 tables by given ratio or count, this module supports local and federation scenario | input_data | train_output_data, validate_output_data, test_output_data | | | +| [Feature Scale](feature_scale.md) | FeatureScale | [feature scale](../../../../examples/pipeline/feature_scale) | module for feature scaling and standardization. | train_data, test_data | train_output_data, test_output_data | input_model | output_model | +| [Data Statistics](statistics.md) | Statistics | [statistics](../../../../examples/pipeline/statistics) | This component will do some statistical work on the data, including statistical mean, maximum and minimum, median, etc. | input_data | | | output_model | +| [Hetero Feature Binning](feature_binning.md) | HeteroFeatureBinning | [hetero feature binning](../../../../examples/pipeline/hetero_feature_binning) | With binning input data, calculates each column's iv and woe and transform data according to the binned information. | train_data, test_data | train_output_data, test_output_data | input_model | output_model | +| [Hetero Feature Selection](feature_selection.md) | HeteroFeatureSelection | [hetero feature selection](../../../../examples/pipeline/hetero_feature_selection) | Provide 3 types of filters. Each filters can select columns according to user config | train_data, test_data | train_output_data, test_output_data | input_models, input_model | output_model | | [Coordinated-LR](logistic_regression.md) | CoordinatedLR | [coordinated LR](../../../../examples/pipeline/coordinated_lr) | Build hetero logistic regression model through multiple parties. | train_data, validate_data, test_data, cv_data | train_output_data, test_output_data, cv_output_datas | input_model, warm_start_model | output_model | | [Coordinated-LinR](linear_regression.md) | CoordinatedLinR | [coordinated LinR](../../../../examples/pipeline/coordinated_linr) | Build hetero linear regression model through multiple parties. | train_data, validate_data, test_data, cv_data | train_output_data, test_output_data, cv_output_datas | input_model, warm_start_model | output_model | | [Homo-LR](logistic_regression.md) | HomoLR | [homo lr](../../../../examples/pipeline/homo_lr) | Build homo logistic regression model through multiple parties. | train_data, validate_data, test_data, cv_data | train_output_data, test_output_data, cv_output_datas | input_model, warm_start_model | output_model | | [Homo-NN](homo_nn.md) | HomoNN | [homo nn](../../../../examples/pipeline/homo_nn) | Build homo neural network model through multiple parties. | train_data, validate_data, test_data, cv_data | train_output_data, test_output_data, cv_output_datas | input_model, warm_start_model | output_model | | [Hetero-NN](hetero_nn.md) | HeteroNN | [hetero nn](../../../../examples/pipeline/hetero_nn) | Build hetero neural network model through multiple parties. | train_data, validate_data, test_data | train_output_data, test_output_data | warm_start_model, input_model | output_model | | [Hetero Secure Boosting](hetero_secureboost.md) | HeteroSecureBoost | [hetero secureboost](../../../../examples/pipeline/hetero_secureboost) | Build hetero secure boosting model through multiple parties | train_data, test_data, cv_data | train_output_data, test_output_data, cv_output_datas | warm_start_model, input_model | output_model | -| [Evaluation](evaluation.md) | Evaluation | [evaluation](../../../../examples/pipeline/hetero_secureboost) | Output the model evaluation metrics for user. | input_datas | | | | -| [Union](union.md) | Union | [union](../../../../examples/pipeline/union) | Combine multiple data tables into one. | input_datas | output_data | | | +| [Evaluation](evaluation.md) | Evaluation | [evaluation](../../../../examples/pipeline/hetero_secureboost) | Output the model evaluation metrics for user. | input_datas | | | | +| [Union](union.md) | Union | [union](../../../../examples/pipeline/union) | Combine multiple data tables into one. | input_datas | output_data | | | | [SSHE-LR](logistic_regression.md) | SSHELR | [SSHE LR](../../../../examples/pipeline/sshe_lr) | Build hetero logistic regression model through two parties. | train_data, validate_data, test_data, cv_data | train_output_data, test_output_data, cv_output_datas | input_model, warm_start_model | output_model | | [SSHE-LinR](linear_regression.md) | SSHELinR | [SSHE LinR](../../../../examples/pipeline/sshe_linr) | Build hetero linear regression model through two parties. | train_data, validate_data, test_data, cv_data | train_output_data, test_output_data, cv_output_datas | input_model, warm_start_model | output_model | +| [Feature Correlation](feature_correlation.md) | FeatureCorrelation | [Feature Correlation](../../../../examples/pipeline/feature_correlation) | Compute feature correlation locally or in hetero-federated setting. | input_data | | input_model | output_model | diff --git a/doc/2.0/fate/components/README.zh.md b/doc/2.0/fate/components/README.zh.md index 9dfda97458..bbc7758677 100644 --- a/doc/2.0/fate/components/README.zh.md +++ b/doc/2.0/fate/components/README.zh.md @@ -12,43 +12,44 @@ Federatedml模块包括许多常见机器学习算法联邦化实现。所有模 如需不通过FATE-Client直接调用算法模块,请查看此[教程](../ml/run_launchers.md). -| 算法 | 模块名 | 描述 | 数据输入 | 数据输出 | 模型输入 | 模型输出 | -|--------------------------------------------------|------------------------|--------------------------------------------|-----------------------------------------------|-----------------------------------------------------------|----------------------------------------|--------------------| -| [Reader](reader.md) | | 传递用户指定输入数据表给下游组件 | | output_data | | | -| [PSI](psi.md) | PSI | 计算两方的相交数据集,而不会泄漏任何差异数据集的信息。主要用于纵向任务 | input_data | output_data | | | -| [Sampling](sample.md) | Sample | 对数据进行联邦采样,使得数据分布在各方之间变得平衡。这一模块同时支持本地和联邦场景。 | input_data | output_data | | | -| [Data Split](data_split.md) | DataSplit | 将数据集切分成训练、验证、测试集。 | input_data | train_output_data, validate_output_data, test_output_data | | | -| [Feature Scale](feature_scale.md) | FeatureScale | 特征归一化和标准化。 | train_data, test_data | train_output_data, test_output_data | input_model | output_model | -| [Data Statistics](statistics.md) | Statistics | 计算各类统计指标。 | input_data | | | output_model | -| [Hetero Feature Binning](feature_binning.md) | HeteroFeatureBinning | 使用分箱的输入数据,计算每个列的iv和woe,并根据合并后的信息转换数据。 | train_data, test_data | train_output_data, test_output_data | input_model | output_model | -| [Hetero Feature Selection](feature_selection.md) | HeteroFeatureSelection | 提供多种类型的filter。每个filter都可以根据用户配置选择列。 | train_data, test_data | train_output_data, test_output_data | input_models, input_model | output_model | -| [Coordinated-LR](logistic_regression.md) | CoordinatedLR | 通过多方构建纵向逻辑回归模块。 | train_data, validate_data, test_data, cv_data | train_output_data, test_output_data, cv_output_datas | input_model, warm_start_model | output_model | -| [Coordinated-LinR](linear_regression.md) | CoordinatedLinR | 通过多方建立纵向线性回归模块 | train_data, validate_data, test_data, cv_data | train_output_data, test_output_data, cv_output_datas | input_model, warm_start_model | output_model | -| [Homo-LR](logistic_regression.md) | HomoLR | 通过多方构建横向逻辑回归模块。 | train_data, validate_data, test_data, cv_data | train_output_data, test_output_data, cv_output_datas | input_model, warm_start_model | output_model | -| [Homo-NN](homo_nn.md) | HomoNN | 通过多方构建横向神经网络模块。 | train_data, validate_data, test_data, cv_data | train_output_data, test_output_data, cv_output_datas | input_model, warm_start_model | output_model | -| [Hetero-NN](hetero_nn.md) | HeteroNN | 通过多方构建纵向联邦神经网络模型。 | train_data, validate_data, test_data | train_data_output, predict_data_output | train_model_input, predict_model_input | train_model_output | -| [Hetero Secure Boosting](hetero_secureboost.md) | HeteroSecureBoost | 通过多方构建纵向联邦梯度提升树模型。 | train_data, test_data, cv_data | train_data_output, test_data_output, cv_output_datas | train_model_input, predict_model_input | train_model_output | -| [Evaluation](evaluation.md) | Evaluation | 评估二分类、多分类、回归等指标。 | input_data | | | | -| [Union](union.md) | Union | 将多个数据表合并成一个。 | input_data | output_data | | | -| [SSHE-LR](logistic_regression.md) | SSHELR | 通过两方构建纵向逻辑回归模块。 | train_data, validate_data, test_data, cv_data | train_output_data, test_output_data, cv_output_datas | input_model, warm_start_model | output_model | -| [SSHE-LinR](linear_regression.md) | SSHELinR | 通过两方构建纵向线性回归模块。 | train_data, validate_data, test_data, cv_data | train_output_data, test_output_data, cv_output_datas | input_model, warm_start_model | output_model | -| 算法 | 模块名 | 描述 | 样例 | 数据输入 | 数据输出 | 模型输入 | 模型输出 | -|--------------------------------------------------|------------------------|--------------------------------------------|------------------------------------------------------------------------------------|-----------------------------------------------|-----------------------------------------------------------|----------------------------------------|--------------------| -| [Reader](reader.md) | | 传递用户指定输入数据表给下游组件 | | | output_data | | | -| [PSI](psi.md) | PSI | 计算两方的相交数据集,而不会泄漏任何差异数据集的信息。主要用于纵向任务 | [psi](../../../../examples/pipeline/psi) | input_data | output_data | | | -| [Sampling](sample.md) | Sample | 对数据进行联邦采样,使得数据分布在各方之间变得平衡。这一模块同时支持本地和联邦场景。 | [sample](../../../../examples/pipeline/sample) | input_data | output_data | | | -| [Data Split](data_split.md) | DataSplit | 将数据集切分成训练、验证、测试集。 | [data split](../../../../examples/pipeline/data_split) | input_data | train_output_data, validate_output_data, test_output_data | | | -| [Feature Scale](feature_scale.md) | FeatureScale | 特征归一化和标准化。 | [feature scale](../../../../examples/pipeline/feature_scale) | train_data, test_data | train_output_data, test_output_data | input_model | output_model | -| [Data Statistics](statistics.md) | Statistics | 计算各类统计指标。 | [statistics](../../../../examples/pipeline/statistics) | input_data | | | output_model | -| [Hetero Feature Binning](feature_binning.md) | HeteroFeatureBinning | 使用分箱的输入数据,计算每个列的iv和woe,并根据合并后的信息转换数据。 | [hetero feature binning](../../../../examples/pipeline/hetero_feature_binning) | train_data, test_data | train_output_data, test_output_data | input_model | output_model | -| [Hetero Feature Selection](feature_selection.md) | HeteroFeatureSelection | 提供多种类型的filter。每个filter都可以根据用户配置选择列。 | [hetero feature selection](../../../../examples/pipeline/hetero_feature_selection) | train_data, test_data | train_output_data, test_output_data | input_models, input_model | output_model | -| [Coordinated-LR](logistic_regression.md) | CoordinatedLR | 通过多方构建纵向逻辑回归模块。 | [coordinated LR](../../../../examples/pipeline/coordinated_lr) | train_data, validate_data, test_data, cv_data | train_output_data, test_output_data, cv_output_datas | input_model, warm_start_model | output_model | -| [Coordinated-LinR](linear_regression.md) | CoordinatedLinR | 通过多方建立纵向线性回归模块 | [coordinated LinR](../../../../examples/pipeline/coordinated_linr) | train_data, validate_data, test_data, cv_data | train_output_data, test_output_data, cv_output_datas | input_model, warm_start_model | output_model | -| [Homo-LR](logistic_regression.md) | HomoLR | 通过多方构建横向逻辑回归模块。 | [homo lr](../../../../examples/pipeline/homo_lr) | train_data, validate_data, test_data, cv_data | train_output_data, test_output_data, cv_output_datas | input_model, warm_start_model | output_model | -| [Homo-NN](homo_nn.md) | HomoNN | 通过多方构建横向神经网络模块。 | [homo nn](../../../../examples/pipeline/homo_nn) | train_data, validate_data, test_data, cv_data | train_output_data, test_output_data, cv_output_datas | input_model, warm_start_model | output_model | -| [Hetero-NN](hetero_nn.md) | HeteroNN | 通过多方构建纵向联邦神经网络模型。 | [hetero nn](../../../../examples/pipeline/hetero_nn) | train_data, validate_data, test_data | train_data_output, predict_data_output | train_model_input, predict_model_input | train_model_output | -| [Hetero Secure Boosting](hetero_secureboost.md) | HeteroSecureBoost | 通过多方构建纵向联邦梯度提升树模型。 | [hetero secureboost](../../../../examples/pipeline/hetero_secureboost) | train_data, test_data, cv_data | train_data_output, test_data_output, cv_output_datas | train_model_input, predict_model_input | train_model_output | -| [Evaluation](evaluation.md) | Evaluation | 评估二分类、多分类、回归等指标。 | [evaluation](../../../../examples/pipeline/hetero_secureboost) | input_data | | | | -| [Union](union.md) | Union | 将多个数据表合并成一个。 | [union](../../../../examples/pipeline/union) | input_data_list | output_data | | | -| [SSHE-LR](logistic_regression.md) | SSHELR | 通过两方构建纵向逻辑回归模块。 | [SSHE LR](../../../../examples/pipeline/sshe_lr) | train_data, validate_data, test_data, cv_data | train_output_data, test_output_data, cv_output_datas | input_model, warm_start_model | output_model | -| [SSHE-LinR](linear_regression.md) | SSHELinR | 通过两方构建纵向线性回归模块。 | [SSHE LinR](../../../../examples/pipeline/sshe_linr) | train_data, validate_data, test_data, cv_data | train_output_data, test_output_data, cv_output_datas | input_model, warm_start_model | output_model | +| 算法 | 模块名 | 描述 | 数据输入 | 数据输出 | 模型输入 | 模型输出 | +|--------------------------------------------------|------------------------|----------------------------------------------|-----------------------------------------------|-----------------------------------------------------------|----------------------------------------|--------------------| +| [Reader](reader.md) | | 传递用户指定输入数据表给下游组件 | | output_data | | | +| [PSI](psi.md) | PSI | 计算两方的相交数据集,而不会泄漏任何差异数据集的信息。主要用于纵向任务 | input_data | output_data | | | +| [Sampling](sample.md) | Sample | 对数据进行联邦采样,使得数据分布在各方之间变得平衡。这一模块同时支持本地和联邦场景。 | input_data | output_data | | | +| [Data Split](data_split.md) | DataSplit | 将数据集切分成训练、验证、测试集。 | input_data | train_output_data, validate_output_data, test_output_data | | | +| [Feature Scale](feature_scale.md) | FeatureScale | 特征归一化和标准化。 | train_data, test_data | train_output_data, test_output_data | input_model | output_model | +| [Data Statistics](statistics.md) | Statistics | 计算各类统计指标。 | input_data | | | output_model | +| [Hetero Feature Binning](feature_binning.md) | HeteroFeatureBinning | 使用分箱的输入数据,计算每个列的iv和woe,并根据合并后的信息转换数据。 | train_data, test_data | train_output_data, test_output_data | input_model | output_model | +| [Hetero Feature Selection](feature_selection.md) | HeteroFeatureSelection | 提供多种类型的filter。每个filter都可以根据用户配置选择列。 | train_data, test_data | train_output_data, test_output_data | input_models, input_model | output_model | +| [Coordinated-LR](logistic_regression.md) | CoordinatedLR | 通过多方构建纵向逻辑回归模块。 | train_data, validate_data, test_data, cv_data | train_output_data, test_output_data, cv_output_datas | input_model, warm_start_model | output_model | +| [Coordinated-LinR](linear_regression.md) | CoordinatedLinR | 通过多方建立纵向线性回归模块 | train_data, validate_data, test_data, cv_data | train_output_data, test_output_data, cv_output_datas | input_model, warm_start_model | output_model | +| [Homo-LR](logistic_regression.md) | HomoLR | 通过多方构建横向逻辑回归模块。 | train_data, validate_data, test_data, cv_data | train_output_data, test_output_data, cv_output_datas | input_model, warm_start_model | output_model | +| [Homo-NN](homo_nn.md) | HomoNN | 通过多方构建横向神经网络模块。 | train_data, validate_data, test_data, cv_data | train_output_data, test_output_data, cv_output_datas | input_model, warm_start_model | output_model | +| [Hetero-NN](hetero_nn.md) | HeteroNN | 通过多方构建纵向联邦神经网络模型。 | train_data, validate_data, test_data | train_data_output, predict_data_output | train_model_input, predict_model_input | train_model_output | +| [Hetero Secure Boosting](hetero_secureboost.md) | HeteroSecureBoost | 通过多方构建纵向联邦梯度提升树模型。 | train_data, test_data, cv_data | train_data_output, test_data_output, cv_output_datas | train_model_input, predict_model_input | train_model_output | +| [Evaluation](evaluation.md) | Evaluation | 评估二分类、多分类、回归等指标。 | input_data | | | | +| [Union](union.md) | Union | 将多个数据表合并成一个。 | input_data | output_data | | | +| [SSHE-LR](logistic_regression.md) | SSHELR | 通过两方构建纵向逻辑回归模块。 | train_data, validate_data, test_data, cv_data | train_output_data, test_output_data, cv_output_datas | input_model, warm_start_model | output_model | +| [SSHE-LinR](linear_regression.md) | SSHELinR | 通过两方构建纵向线性回归模块。 | train_data, validate_data, test_data, cv_data | train_output_data, test_output_data, cv_output_datas | input_model, warm_start_model | output_model | +| 算法 | 模块名 | 描述 | 样例 | 数据输入 | 数据输出 | 模型输入 | 模型输出 | +|--------------------------------------------------|------------------------| -------------------------------------------- |------------------------------------------------------------------------------------|-----------------------------------------------|-----------------------------------------------------------|----------------------------------------|--------------------| +| [Reader](reader.md) | | 传递用户指定输入数据表给下游组件 | | | output_data | | | +| [PSI](psi.md) | PSI | 计算两方的相交数据集,而不会泄漏任何差异数据集的信息。主要用于纵向任务 | [psi](../../../../examples/pipeline/psi) | input_data | output_data | | | +| [Sampling](sample.md) | Sample | 对数据进行联邦采样,使得数据分布在各方之间变得平衡。这一模块同时支持本地和联邦场景。 | [sample](../../../../examples/pipeline/sample) | input_data | output_data | | | +| [Data Split](data_split.md) | DataSplit | 将数据集切分成训练、验证、测试集。 | [data split](../../../../examples/pipeline/data_split) | input_data | train_output_data, validate_output_data, test_output_data | | | +| [Feature Scale](feature_scale.md) | FeatureScale | 特征归一化和标准化。 | [feature scale](../../../../examples/pipeline/feature_scale) | train_data, test_data | train_output_data, test_output_data | input_model | output_model | +| [Data Statistics](statistics.md) | Statistics | 计算各类统计指标。 | [statistics](../../../../examples/pipeline/statistics) | input_data | | | output_model | +| [Hetero Feature Binning](feature_binning.md) | HeteroFeatureBinning | 使用分箱的输入数据,计算每个列的iv和woe,并根据合并后的信息转换数据。 | [hetero feature binning](../../../../examples/pipeline/hetero_feature_binning) | train_data, test_data | train_output_data, test_output_data | input_model | output_model | +| [Hetero Feature Selection](feature_selection.md) | HeteroFeatureSelection | 提供多种类型的filter。每个filter都可以根据用户配置选择列。 | [hetero feature selection](../../../../examples/pipeline/hetero_feature_selection) | train_data, test_data | train_output_data, test_output_data | input_models, input_model | output_model | +| [Coordinated-LR](logistic_regression.md) | CoordinatedLR | 通过多方构建纵向逻辑回归模块。 | [coordinated LR](../../../../examples/pipeline/coordinated_lr) | train_data, validate_data, test_data, cv_data | train_output_data, test_output_data, cv_output_datas | input_model, warm_start_model | output_model | +| [Coordinated-LinR](linear_regression.md) | CoordinatedLinR | 通过多方建立纵向线性回归模块 | [coordinated LinR](../../../../examples/pipeline/coordinated_linr) | train_data, validate_data, test_data, cv_data | train_output_data, test_output_data, cv_output_datas | input_model, warm_start_model | output_model | +| [Homo-LR](logistic_regression.md) | HomoLR | 通过多方构建横向逻辑回归模块。 | [homo lr](../../../../examples/pipeline/homo_lr) | train_data, validate_data, test_data, cv_data | train_output_data, test_output_data, cv_output_datas | input_model, warm_start_model | output_model | +| [Homo-NN](homo_nn.md) | HomoNN | 通过多方构建横向神经网络模块。 | [homo nn](../../../../examples/pipeline/homo_nn) | train_data, validate_data, test_data, cv_data | train_output_data, test_output_data, cv_output_datas | input_model, warm_start_model | output_model | +| [Hetero-NN](hetero_nn.md) | HeteroNN | 通过多方构建纵向联邦神经网络模型。 | [hetero nn](../../../../examples/pipeline/hetero_nn) | train_data, validate_data, test_data | train_data_output, predict_data_output | train_model_input, predict_model_input | train_model_output | +| [Hetero Secure Boosting](hetero_secureboost.md) | HeteroSecureBoost | 通过多方构建纵向联邦梯度提升树模型。 | [hetero secureboost](../../../../examples/pipeline/hetero_secureboost) | train_data, test_data, cv_data | train_data_output, test_data_output, cv_output_datas | train_model_input, predict_model_input | train_model_output | +| [Evaluation](evaluation.md) | Evaluation | 评估二分类、多分类、回归等指标。 | [evaluation](../../../../examples/pipeline/hetero_secureboost) | input_data | | | | +| [Union](union.md) | Union | 将多个数据表合并成一个。 | [union](../../../../examples/pipeline/union) | input_data_list | output_data | | | +| [SSHE-LR](logistic_regression.md) | SSHELR | 通过两方构建纵向逻辑回归模块。 | [SSHE LR](../../../../examples/pipeline/sshe_lr) | train_data, validate_data, test_data, cv_data | train_output_data, test_output_data, cv_output_datas | input_model, warm_start_model | output_model | +| [SSHE-LinR](linear_regression.md) | SSHELinR | 通过两方构建纵向线性回归模块。 | [SSHE LinR](../../../../examples/pipeline/sshe_linr) | train_data, validate_data, test_data, cv_data | train_output_data, test_output_data, cv_output_datas | input_model, warm_start_model | output_model | +| [Feature Correlation](feature_correlation.md) | FeatureCorrelation | 计算本地或纵向联邦下的相关性系数。 | [Feature Correlation](../../../../examples/pipeline/feature_correlation) | input_data | | input_model | output_model | diff --git a/doc/2.0/fate/components/feature_correlation.md b/doc/2.0/fate/components/feature_correlation.md new file mode 100644 index 0000000000..9badd02914 --- /dev/null +++ b/doc/2.0/fate/components/feature_correlation.md @@ -0,0 +1,34 @@ +# Feature Correlation + +Feature correlation provides computation of Pearson correlation matrix for local and hetero-federated scenario. +To switch between the two modes, set `local_only` to `True` or `False` accordingly. + +Pearson Correlation Coefficient is a measure of the linear correlation between two variables, $X$ and $Y$, defined as, + +$$\rho_{X,Y} = \frac{cov(X, Y)}{\sigma_X\sigma_Y} = \frac{E[(X-\mu_X)(Y-\mu_Y)]}{\sigma_X\sigma_Y} = E\left[\left(\frac{X-\mu_X}{\sigma_X}\cdot\frac{Y-\mu_Y}{\sigma_Y}\right)\right]$$ + +Let + +$$\tilde{X} = \frac{X-\mu_X}{\sigma_X}, \tilde{Y}=\frac{Y-\mu_Y}{\sigma_Y}$$ + +then, + +$$\rho_{X, Y} = E[\tilde{X}\tilde{Y}]$$ + +## Implementation Detail + +We use an MPC protocol called SPDZ for Heterogeneous Pearson Correlation +Coefficient calculation. SPDZ([Ivan Damg˚ard](https://eprint.iacr.org/2011/535.pdf), +[Marcel Keller](https://eprint.iacr.org/2017/1230.pdf)) is a +multiparty computation scheme based on somewhat homomorphic encryption +(SHE). + + +## Features + +- local Pearson correlation efficient computation +- hetero-federated Pearson correlation efficient computation +- local VIF computation +- computation on select cols only(use `skip_col`) + +