Skip to content

Commit

Permalink
Del SIC and LinearGIC (#117)
Browse files Browse the repository at this point in the history
* doc update

* doc
  • Loading branch information
bbayukari authored Aug 5, 2024
1 parent 811f07c commit 116401b
Show file tree
Hide file tree
Showing 6 changed files with 96 additions and 148 deletions.
58 changes: 46 additions & 12 deletions docs/source/feature/DataScienceTool.rst
Original file line number Diff line number Diff line change
Expand Up @@ -80,13 +80,53 @@ Information Criterion

Information criterion is a statistical measure used to assess the goodness of fit of a model while penalizing model complexity. It helps in selecting the optimal model from a set of competing models. In the context of sparsity-constrained optimization, information criterion can be used to evaluate different sparsity levels and identify the most suitable support size.
There is another way to evaluate sparsity levels, which is information criterion. The smaller the information criterion, the better the model.
There are four types of information criterion can be implemented in ``skscope.utilities``: Akaike information criterion `[1]`_, Bayesian information criterion (BIC, `[2]`_), extend BIC `[3]`_, and special information criterion (SIC `[4]`_).


.. list-table:: Some information criterions implemented in the module ``skscope.utilities``.
:header-rows: 1

* - **``skscope.utilities``**
- **Description**
- **Literature**
* - ``AIC``
- Akaike information criterion
- `[1]`_
* - ``BIC``
- Bayesian information criterion
- `[2]`_
* - ``EBIC``
- Extend Bayesian information criterion
- `[3]`_
* - ``LinearSIC``
- Special information criterion
- `[4]`_
* - ``GIC``
- Generalized information criterion
- `[5]`_


Why ``LinearSIC`` is Necessary
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

When discussing information criteria, we often involve the likelihood function of the model. For instance, the classic AIC formula is :math:`AIC = -2\log(L) + 2k`, where :math:`k`` is the number of effective parameters and :math:`L` is the value of the likelihood function. In the context of maximum likelihood estimation, the objective function to be optimized is typically set as the negative log-likelihood, i.e., :math:`loss = -\log(L)`. This is the modeling approach we encourage, and the information criteria implemented in skscope, including ``AIC``, ``BIC``, ``GIC``, and ``EBIC``, are based on this assumption.

However, the most commonly used linear models in machine learning do not follow this approach; they typically use the mean squared error (MSE) as the loss function. This difference in setting renders many of the aforementioned information criteria in skscope potentially inapplicable. To facilitate sparsity selection for users employing linear models, we provide a special version of GIC for linear models, named ``LinearSIC``. The prefix "Linear" indicates that this information criterion is used for linear models, and "SIC" is derived from the literature `[4]`_.

In summary, to achieve the same effect as using ``ic_type='gic'`` in abess `<https://abess.readthedocs.io/en/latest/Python-package/linear/Linear.html#abess.linear.LinearRegression>`_:

- For linear models using MSE as the loss function, use ``LinearSIC``.
- For other models using negative log-likelihood as the loss function, use ``GIC``.


Usage
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If sparsity is list and ``cv=None``, the solver will use information criterions to evaluate the sparsity level.
The input parameter ``ic_method`` in the solvers of skscope can be used to choose the information criterion. It should be a method to compute information criterion which has the same parameters with this example:

.. code-block:: python
def SIC(
def GIC(
objective_value: float,
dimensionality: int,
effective_params_num: int,
Expand Down Expand Up @@ -122,21 +162,13 @@ Here is an example using SIC to find the optimal support size.
Please note that the effectiveness of information criterion heavily depends on the implementation of the objective function. Even for the same model, different objective function implementations often correspond to different IC implementations. Before usage, carefully check whether the objective function and the information criterion implementations match.


- In ``skscope.utilities``, we implemented a special information criterion named ``utilities.LinearSIC``. It's used to select the sparsity level in linear model and is equivalent to using ic type='gic' in `abess <https://abess.readthedocs.io/en/latest/Python-package/linear/Linear.html#abess.linear.LinearRegression>`_.

- The difference between SIC and LinearSIC: ``utilities.SIC`` assumes that the objective function is the negative logarithmic likelihood function of a statistical model; ``utilities.LinearSIC`` assumes that the objective function is the sum of squared residuals, specifically adapted to linear models.

- GIC (Generalized information criterion) refers to SIC in ``skscope.utilities``, i.e., the functions of ``utilities.GIC`` and ``utilities.SIC`` are completely identical, and ``utilities.LinearGIC`` and ``utilities.LinearSIC`` are the same.




Cross Validation
^^^^^^^^^^^^^^^^^^^^

Cross-validation is a technique used to assess the performance and generalization capability of a machine learning model. It involves partitioning the available data into multiple subsets, or folds, to train and test the model iteratively.

To utilizing cross validation `[5]`_, there are some requirements:
To utilizing cross validation `[6]`_, there are some requirements:

1. The objective function must take data as input.

Expand Down Expand Up @@ -188,4 +220,6 @@ Reference

- _`[4]` Zhu, J., Wen, C., Zhu, J., Zhang, H., & Wang, X. (2020). A polynomial algorithm for best-subset selection problem. Proceedings of the National Academy of Sciences, 117(52), 33117-33123.

- _`[5]` Hastie, T., Tibshirani, R., Friedman, J. H., & Friedman, J. H. (2009). The elements of statistical learning: data mining, inference, and prediction (Vol. 2, pp. 1-758). New York: springer.
- _`[5]` Junxian Zhu, Jin Zhu, Borui Tang, Xuanyu Chen, Hongmei Lin, Xueqin Wang (2023). Best-Subset Selection in Generalized Linear Models: A Fast and Consistent Algorithm via Splicing Technique. https://arxiv.org/abs/2308.00251.

- _`[6]` Hastie, T., Tibshirani, R., Friedman, J. H., & Friedman, J. H. (2009). The elements of statistical learning: data mining, inference, and prediction (Vol. 2, pp. 1-758). New York: springer.
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
"cells": [
{
"cell_type": "code",
"execution_count": null,
"execution_count": 1,
"id": "1ac67c32",
"metadata": {},
"outputs": [],
Expand Down Expand Up @@ -66,7 +66,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 2,
"id": "4020f0a5",
"metadata": {},
"outputs": [],
Expand All @@ -87,7 +87,7 @@
},
{
"cell_type": "code",
"execution_count": 12,
"execution_count": 3,
"id": "dd0d0594",
"metadata": {},
"outputs": [],
Expand Down Expand Up @@ -125,7 +125,7 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": 4,
"id": "da82b90a",
"metadata": {},
"outputs": [],
Expand All @@ -144,7 +144,7 @@
},
{
"cell_type": "code",
"execution_count": 14,
"execution_count": 5,
"id": "29c7bf03",
"metadata": {},
"outputs": [],
Expand All @@ -159,24 +159,24 @@
"id": "415b29a7",
"metadata": {},
"source": [
"Here, we use SIC to decide the optimal support size. There are four types of information criterion can be implemented in `skscope.utilities`:\n",
"Here, we use GIC to decide the optimal support size. There are four types of information criterion can be implemented in `skscope.utilities`:\n",
"- Akaike information criterion (AIC)\n",
"- Bayesian information criterion (BIC)\n",
"- Extend BIC (EBIC)\n",
"- Special information criterion (SIC)\n",
"- Generalized information criterion (GIC)\n",
" \n",
"You can just need one line of code to call any IC, here we use SIC:"
"You can just need one line of code to call any IC, here we use GIC:"
]
},
{
"cell_type": "code",
"execution_count": 17,
"execution_count": 6,
"id": "3f8fb0dd",
"metadata": {},
"outputs": [],
"source": [
"from skscope.utilities import SIC\n",
"solver = ScopeSolver(p, sparsity = range(10), sample_size = n, ic_method = SIC)\n",
"from skscope.utilities import GIC\n",
"solver = ScopeSolver(p, sparsity = range(10), sample_size = n, ic_method = GIC)\n",
"params = solver.solve(logistic_loss, jit=True)"
]
},
Expand All @@ -196,7 +196,7 @@
},
{
"cell_type": "code",
"execution_count": 21,
"execution_count": 7,
"id": "6b080ad7",
"metadata": {},
"outputs": [
Expand Down Expand Up @@ -232,7 +232,7 @@
},
{
"cell_type": "code",
"execution_count": 22,
"execution_count": 8,
"id": "95bab082",
"metadata": {},
"outputs": [
Expand Down Expand Up @@ -277,22 +277,20 @@
"id": "0a968a95",
"metadata": {},
"source": [
"Considering `skscope` also support cross validation (CV), we will use CV to select the optimal support set and compare its runtime with that of SIC. We first record the runtime of using SIC. "
"Considering `skscope` also support cross validation (CV), we will use CV to select the optimal support set and compare its runtime with that of GIC. We first record the runtime of using GIC. "
]
},
{
"cell_type": "code",
"execution_count": 25,
"execution_count": 9,
"id": "67358a0b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"True support set: [ 90 97 340 395 477]\n",
"skscope estimated support set: [ 90 97 340 395 477]\n",
"Runtime of SIC: 0.7247357368469238 seconds\n"
"Runtime of GIC: 0.6969401836395264 seconds\n"
]
}
],
Expand All @@ -301,12 +299,12 @@
"# Record start time\n",
"start_time = time.time()\n",
"\n",
"solver_ic = ScopeSolver(p, sparsity = range(10), sample_size = n, ic_method = SIC)\n",
"solver_ic = ScopeSolver(p, sparsity = range(10), sample_size = n, ic_method = GIC)\n",
"params_ic = solver_ic.solve(logistic_loss, jit=True)\n",
"\n",
"# Calculate runtime\n",
"runtime = time.time() - start_time\n",
"print(\"Runtime of SIC:\", runtime, \"seconds\")"
"print(\"Runtime of GIC:\", runtime, \"seconds\")"
]
},
{
Expand Down Expand Up @@ -360,7 +358,7 @@
"id": "a8cacd2e",
"metadata": {},
"source": [
"Comparing the results of SIC and CV criteria, we find that while CV maintains high accuracy in variable selection, SIC exhibits a clear time advantage."
"Comparing the results of GIC and CV criteria, we find that while CV maintains high accuracy in variable selection, GIC exhibits a clear time advantage."
]
},
{
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,7 @@
},
"source": [
"We use `skscope` to solve the sparse possion regression problem.\n",
"After defining the data generation and loss function, we can call `ScopeSolver` to solve the sparse-constrained optimization problem. We will use SIC to decide the optimal support size."
"After defining the data generation and loss function, we can call `ScopeSolver` to solve the sparse-constrained optimization problem. We will use GIC to decide the optimal support size."
]
},
{
Expand All @@ -164,9 +164,9 @@
},
"outputs": [],
"source": [
"from skscope.utilities import SIC\n",
"from skscope.utilities import GIC\n",
"\n",
"solver = ScopeSolver(p, sparsity = range(1,10), sample_size = n, ic_method = SIC)\n",
"solver = ScopeSolver(p, sparsity = range(1,10), sample_size = n, ic_method = GIC)\n",
"params = solver.solve(poisson_loss, jit=True)"
]
},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -199,21 +199,21 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Special Information Criterion (SIC)\n",
"### Generalized Information Criterion (GIC)\n",
"\n",
"The form of the Special Information Criterion (SIC) is given by:\n",
"The form of the Generalized Information Criterion (GIC) is given by:\n",
"\n",
"$$ \\text{SIC} = -2\\log(L) + k\\log(p)\\log\\log(n) ,$$\n",
"$$ \\text{GIC} = -2\\log(L) + k\\log(p)\\log\\log(n) ,$$\n",
"\n",
"where $L$ is the maximized likelihood, and $k$ is the number of parameters. It can be observed that the penalty term of SIC is dependent on the dimension $p$, which allows SIC to still select relatively sparse solutions even when $p$ is large.\n",
"where $L$ is the maximized likelihood, and $k$ is the number of parameters. It can be observed that the penalty term of GIC is dependent on the dimension $p$, which allows GIC to still select relatively sparse solutions even when $p$ is large.\n",
"\n",
"The modified extended Bayesian information criterion (MEBIC), proposed by in [[5]](#refer-anchor-5), is equivalent to SIC when sample size is sufficiently large.\n",
"The modified extended Bayesian information criterion (MEBIC), proposed by in [[5]](#refer-anchor-5), is equivalent to GIC when sample size is sufficiently large.\n",
"\n",
"SIC can be utilized in the following scenarios:\n",
"GIC can be utilized in the following scenarios:\n",
"\n",
"- **Linear regression**: \n",
"\n",
" `skscope` is a method based on the splicing algorithm. SIC used in the splicing algorithm for linear regression is consistent under certain conditions [[6]](#refer-anchor-6). These conditions include constraints on the sample size $n$, dimension $p$, and the true sparsity level $s^*$:\n",
" `skscope` is a method based on the splicing algorithm. GIC used in the splicing algorithm for linear regression is consistent under certain conditions [[6]](#refer-anchor-6). These conditions include constraints on the sample size $n$, dimension $p$, and the true sparsity level $s^*$:\n",
" $$\\frac{s^*\\log(p)\\log\\log(n)}{n}=o(1).$$\n",
"\n",
"- **Single index models**:\n",
Expand All @@ -228,10 +228,10 @@
"\n",
" where $f(\\cdot)$ is a link function and $\\varepsilon$ is the error term.\n",
"\n",
" SIC used in the splicing algorithm for single index models is consistent under certain conditions [[7]](#refer-anchor-7). These conditions also include:\n",
" GIC used in the splicing algorithm for single index models is consistent under certain conditions [[7]](#refer-anchor-7). These conditions also include:\n",
" $$\\frac{s^*\\log(p)\\log\\log(n)}{n}=o(1).$$\n",
"\n",
"Below, we utilize SIC as the criterion for model selection within the `skscope`. "
"Below, we utilize GIC as the criterion for model selection within the `skscope`. "
]
},
{
Expand Down Expand Up @@ -607,7 +607,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"In the above experiment, it can be observed that in high dimensions, SIC and EBIC can obtain the true active set, while AIC, BIC and cross-validation may select more variables."
"In the above experiment, it can be observed that in high dimensions, GIC and EBIC can obtain the true active set, while AIC, BIC and cross-validation may select more variables."
]
},
{
Expand Down Expand Up @@ -674,7 +674,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.4"
"version": "3.10.13"
}
},
"nbformat": 4,
Expand Down
6 changes: 6 additions & 0 deletions docs/source/userguide/quickstart_practitioner.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,14 +19,20 @@ Specifically, the submodule ``skscope.skmodel`` in :ref:`skscope <skscope_packag

* - **skmodel**
- **Description**
- **Document**
* - PortfolioSelection
- Construct sparse Markowitz portfolio
- `Portfolio selection <../gallery/Miscellaneous/portfolio-selection.html>`_
* - NonlinearSelection
- Select relevant features with nonlinear effect
- `Non-linear feature selection via HSIC-SCOPE <../gallery/Miscellaneous/hsic-splicing.html>`_
* - RobustRegression
- A robust regression dealing with outliers
- `Robust regression <../gallery/LinearModelAndVariants/robust-regression.html>`_
* - MultivariateFailure
- Multivariate failure time model in survival analysis
- `Multivariate failure time model <../gallery/SurvivalModels/multivariate-failure-time-model.html>`_
* - IsotonicRegression
- Fit the data with a non-decreasing curve
- `Isotonic Regression <../gallery/LinearModelAndVariants/isotonic-regression.html>`_

Loading

0 comments on commit 116401b

Please sign in to comment.