Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Adds selective hyperparameter optimization #58

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
112 changes: 106 additions & 6 deletions docs/available_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,115 @@ This document contains a list of all the models available in the _ageml_ package

## Model List

- [Linear Regression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html) (`linear_reg` in ageml)
- [Linear Regression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html) (`linear_reg` in `ageml`)
- __Hyperparameters__: None</br></br>
- [Ridge Regression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html) (`ridge` in ageml)
- [Ridge Regression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html) (`ridge` in `ageml`)
- __Hyperparameters__: `alpha`</br></br>
- [Lasso Regression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html) (`lasso` in ageml)
- [Lasso Regression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html) (`lasso` in `ageml`)
- __Hyperparameters__: `alpha`</br></br>
- [XGBoost Regression](https://xgboost.readthedocs.io/en/latest/python/python_api.html#module-xgboost.sklearn) (`xgboost` in ageml)
- [XGBoost Regression](https://xgboost.readthedocs.io/en/latest/python/python_api.html#module-xgboost.sklearn) (`xgboost` in `ageml`)
- __Hyperparameters__: `eta`, `gamma`, `max_depth`, `min_child_weight`, `max_delta_step`, `subsample`, `colsample_bytree`, `colsample_bylevel`, `colsample_bynode`, `lambda`, `alpha`</br></br>
- [Epsilon-Support Vector Regression](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVR.html#sklearn.svm.SVR) (`linear_svr` in ageml)
- [Epsilon-Support Vector Regression](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVR.html#sklearn.svm.SVR) (`linear_svr` in `ageml`)
- __Hyperparameters__: `C`, `epsilon`</br></br>
- [Random Forest Regression](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html) (`rf` in ageml)
- [Random Forest Regression](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html) (`rf` in `ageml`)
- __Hyperparameters__: `n_estimators`, `max_depth`, `min_samples_split`, `min_samples_leaf`, `max_features`, `min_impurity_decrease`, `max_leaf_nodes`, `min_weight_fraction_leaf`


### Model Hyperparameters

When specifying model hyperparameter ranges, be aware that a log operation can be applied to some of them, aligned to the `scikit-learn` API. See the table below for reference of the hyperparameters that have this constraint explicitly implemented in `ageml` (or alternatively, import and print the `AgeML.model_hyperparameter_types` dictionary):
<table>
<tr>
<th>Model</th>
<th>Parameter</th>
<th>Type</th>
</tr>
<tr>
<td rowspan="1">Lasso Regression</td>
<td>alpha</td>
<td>log</td>
</tr>
<tr>
<td rowspan="2">Epsilon-Support Vector Regression</td>
<td>C</td>
<td>log</td>
</tr>
<tr>
<td>epsilon</td>
<td>log</td>
</tr>
<tr>
<td rowspan="11">XGBoost Regression</td>
<td>eta</td>
<td>float</td>
</tr>
<tr>
<td>gamma</td>
<td>float</td>
</tr>
<tr>
<td>max_depth</td>
<td>int</td>
</tr>
<tr>
<td>min_child_weight</td>
<td>int</td>
</tr>
<tr>
<td>max_delta_step</td>
<td>int</td>
</tr>
<tr>
<td>subsample</td>
<td>float</td>
</tr>
<tr>
<td>colsample_bytree</td>
<td>float</td>
</tr>
<tr>
<td>colsample_bylevel</td>
<td>float</td>
</tr>
<tr>
<td>colsample_bynode</td>
<td>float</td>
</tr>
<tr>
<td>lambda</td>
<td>log</td>
</tr>
<tr>
<td>alpha</td>
<td>log</td>
</tr>
<tr>
<td rowspan="9">Random Forest Regression</td>
<td>n_estimators</td>
<td>int</td>
</tr>
<tr>
<td>max_depth</td>
<td>int</td>
</tr>
<tr>
<td>min_samples_split</td>
<td>int</td>
</tr>
<tr>
<td>min_samples_leaf</td>
<td>int</td>
</tr>
<tr>
<td>max_features</td>
<td>int</td>
</tr>
<tr>
<td>min_impurity_decrease</td>
<td>log</td>
</tr>
<tr>
<td>max_leaf_nodes</td>
<td>int</td>
</tr>
</table>
74 changes: 49 additions & 25 deletions src/ageml/commands.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,8 +100,8 @@ def configure_parser(self):
self.parser.add_argument(
"-ht",
"--hyperparameter_tuning",
nargs=1,
default=["0"],
nargs="+",
default=["2"],
help=messages.hyperparameter_grid_description,
)

Expand Down Expand Up @@ -164,12 +164,40 @@ def configure_args(self, args):
else:
args.model_params = {}

# Set hyperparameter grid search value
if len(args.hyperparameter_tuning) > 1 or not args.hyperparameter_tuning[0].isdigit():
raise ValueError("Hyperparameter grid points must be a non negative integer.")
# Parse hyperparameter_tuning values
hyperparam_tuning = args.hyperparameter_tuning
if not hyperparam_tuning[0].isdigit() or int(convert(hyperparam_tuning[0])) < 2:
raise ValueError("Hyperparameter grid points must be an integer greater than 1.")
else:
args.hyperparameter_tuning = args.hyperparameter_tuning[0]
args.hyperparameter_tuning = int(convert(args.hyperparameter_tuning))
args.hyperparameter_tuning = int(convert(hyperparam_tuning[0]))

hyperparameter_params = {}
if len(hyperparam_tuning) > 1:
for item in hyperparam_tuning[1:]:
if item.count("=") != 1:
err_msg = (
"Hyperparameter tuning parameters must be in the format "
"param1=value1_low,value1_high param2=kernel_A,kernel_B,kernel_C..."
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should also check that always two values given a low and a high. What happens if someone gives C=1,2,3? this should through an error. As the user should write C=1,3 and ht=3 to have 3 hyperparameter points 1,2,3. Also ht should be a minimum of 2.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about hyperparameters like kernels? where you want to choose different kernels. This should not be affected by ht.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed! thanks for this

raise ValueError(err_msg)
key, values = item.split("=")
values = [convert(value) for value in values.split(",")]

vals_are_str = all([isinstance(value, str) for value in values])
vals_are_num = all([isinstance(value, (int, float)) for value in values])
# If not 2 values provided in numerical hyperparams, raise error
if vals_are_num and len(values) != 2:
err_msg = "Numerical hyperparameter values must be exactly two numbers (e.g.: param1=2,3)."
raise ValueError(err_msg)
# If no value provided in categorical hyperparams, raise error
elif vals_are_str and len(values) < 1:
err_msg = "Categorical hyperparameter values must be at least one string (e.g.: param1=kernel_A)."
raise ValueError(err_msg)
hyperparameter_params[key] = values

# Add attribute to args
args.hyperparameter_params = hyperparameter_params

# Set polynomial feature extension value
if len(args.feature_extension) > 1 or not args.feature_extension[0].isdigit():
raise ValueError("Polynomial feature extension degree must be a non negative integer.")
Expand Down Expand Up @@ -227,13 +255,11 @@ def configure_parser(self):
help=messages.factors_long_description,
)

self.parser.add_argument("--covariates", metavar="FILE", required=False,
help=messages.covar_long_description)
self.parser.add_argument("--clinical", metavar="FILE", required=False,
help=messages.clinical_long_description)
self.parser.add_argument("--covcorr_mode", metavar="MODE", required=False,
choices=["cn", "each", "all"],
help=messages.covcorr_mode_long_description)
self.parser.add_argument("--covariates", metavar="FILE", required=False, help=messages.covar_long_description)
self.parser.add_argument("--clinical", metavar="FILE", required=False, help=messages.clinical_long_description)
self.parser.add_argument(
"--covcorr_mode", metavar="MODE", required=False, choices=["cn", "each", "all"], help=messages.covcorr_mode_long_description
)


class ClinicalGroups(Interface):
Expand Down Expand Up @@ -284,11 +310,10 @@ def configure_parser(self):
)

# Optional arguments
self.parser.add_argument("--covariates", metavar="FILE", required=False,
help=messages.covar_long_description)
self.parser.add_argument("--covcorr_mode", metavar="MODE", required=False,
choices=["cn", "each", "all"],
help=messages.covcorr_mode_long_description)
self.parser.add_argument("--covariates", metavar="FILE", required=False, help=messages.covar_long_description)
self.parser.add_argument(
"--covcorr_mode", metavar="MODE", required=False, choices=["cn", "each", "all"], help=messages.covcorr_mode_long_description
)


class ClinicalClassification(Interface):
Expand Down Expand Up @@ -372,12 +397,11 @@ def configure_parser(self):
)

# Optional arguments
self.parser.add_argument("--covariates", metavar="FILE", required=False,
help=messages.covar_long_description)
self.parser.add_argument("--covcorr_mode", metavar="MODE", required=False,
choices=["cn", "each", "all"],
help=messages.covcorr_mode_long_description)

self.parser.add_argument("--covariates", metavar="FILE", required=False, help=messages.covar_long_description)
self.parser.add_argument(
"--covcorr_mode", metavar="MODE", required=False, choices=["cn", "each", "all"], help=messages.covcorr_mode_long_description
)

def configure_args(self, args):
"""Configure argumens with required fromatting for modelling.

Expand Down
7 changes: 5 additions & 2 deletions src/ageml/messages.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,8 +108,11 @@

hyperparameter_grid_description = (
"Number of points for which the hyperparameter optimization Grid Search will train\n"
"a model. The parameter ranges are predefined. An integer is required.\n"
"(e.g. -ht 100 / --hyperparameter_tuning 100)"
"a model, and parameter ranges to sample from. An integer is required, followed \n"
"by the parameters to optimize. (e.g. -ht 10 C=1,3 kernel=linear,rbf)\n"
"For more information on how to specify the hyperparameter ranges, refer to the \n"
"documentation in the ageml repository:\n"
"https://github.com/compneurobilbao/ageml/tree/main/docs/available_models.md#model-hyperparameters"
)

thr_long_description = "Threshold for classification. Default: 0.5 \n" "The threshold is used for assingning hard labels. (e.g. --thr 0.5)"
Expand Down
Loading
Loading