From e4c208fd53011ac10598aa8ef00b306c5eb13031 Mon Sep 17 00:00:00 2001 From: JERRYenSHU503 <1929891932@qq.com> Date: Sat, 27 Apr 2024 10:09:46 +0800 Subject: [PATCH 01/20] Update model-selection.ipynb --- .../ml-advanced/model-selection.ipynb | 165 ++++++++++++++++-- 1 file changed, 149 insertions(+), 16 deletions(-) diff --git a/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb b/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb index f22e18baf6..96c62d7b31 100644 --- a/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb +++ b/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb @@ -49,7 +49,9 @@ "\n", "### Overview\n", "\n", - "Remember that the main objective of any machine learning model is to generalize the learning based on training data, so that it will be able to do predictions accurately on unknown data. As you can notice the words 'Overfitting' and 'Underfitting' are kind of opposite of the term 'Generalization'. Overfitting and underfitting models don't generalize well and results in poor performance.\n", + "Remember that the main objective of any machine learning model is to generalize the learning based on training data, so that it will be able to do predictions accurately on unknown data. Here are a few concepts: the first is 'Hypothesis', the second is 'Truth'. When we obtain data and train it, we propose a hypothesis, and the process of forcing the hypothesis to be as close to the truth as possible is our training process. This process is called 'fitting', which means the model tries to learn the patterns, relationships, or rules in the data in order to make predictions or classifications on unknown data. Due to the existence of errors in the hypothesis, we introduce the concepts of generalization error and empirical error (training error). The generalization error represents the error in unknown samples when we fit the model to the truth. It is uncertain. On the other hand, the empirical error represents the error on the training set, and it can be determined. In order to reduce the error and approach the truth, we need model evaluation. However, due to the occurrence of overfitting, a smaller error does not necessarily indicate a better model.\n", + "\n", + "As you can notice the words 'Overfitting' and 'Underfitting' are kind of opposite of the term 'Generalization'. Overfitting and underfitting models don't generalize well and results in poor performance.\n", "\n", "These are the samples of over-fitting and under-fitting in regression:\n", "\n", @@ -60,6 +62,8 @@ "Over-fitting and under-fitting in regression\n", ":::\n", "\n", + "During the fitting process, we have an important parameter called 'bias'. It refers to the deviation of the model from the true relationship when attempting to fit the data.\n", + "\n", "### Underfitting\n", "\n", "* Underfitting occurs when machine learning model don't fit the training data well enough. It is usually caused by simple function that cannot capture the underlying trend in the data.\n", @@ -84,6 +88,7 @@ "* A good fitting model generalizes the learnings from training data and provide accurate predictions on new data\n", "* To get the good fitting model, keep training and testing the model till you get the minimum train and test error. Here important parameter is 'test error' because low train error may cause overfitting so always keep an eye on test error fluctuations. The sweet spot is just before the test error start to rise.\n", "\n", + "In summary the goal of model selection is to find a model that fits the training data well and has low prediction error on new unknown data. If a model that is too simple is chosen, it may not fit the training data well, resulting in underfitting. On the other hand, if a model that is too complex is chosen, overfitting may occur, leading to a decrease in predictive performance on new data.\n", "Now let's take a look at another example, hoping it will be helpful for your understanding.\n", "\n", ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/model-selection/classification.png\n", @@ -163,6 +168,15 @@ "source": [ "## Bias variance tradeoff\n", "\n", + "In this section we talk about Bias Variance tradeoff \n", + "\n", + "So what is Bias and Variance? Or to say why they are so importent in model selection? \n", + "\n", + "Bias refers to the model's incorrect assumptions or simplifications about the problem. When a model has high bias, it may overlook some key features or patterns in the data, resulting in systematic errors in the predictions. In other words, a high-bias model tends to produce incorrect predictions.\n", + "\n", + "Variance refers to the sensitivity or volatility of the model to the training data. When a model has high variance, it is very sensitive to small perturbations in the training data and may overfit the noise and details in the training data, leading to poor generalization to new data. In other words, a high-variance model is more prone to the influence of randomness and produces larger prediction errors.\n", + "\n", + "Here are some illustrations showing the relationship between bias and variance in data fitting.\n", ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/model-selection/graphicalillustration.png\n", "---\n", "name: graphicalillustration-ms\n", @@ -170,14 +184,91 @@ "Graphical illustration of variance and bias\n", ":::\n", "\n", - "\n", - "\n", ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/model-selection/total_error.png\n", "---\n", "name: Model-complexity-ms\n", "---\n", "Model complexity v.s. error\n", - ":::" + ":::\n", + "\n", + "### Metrics\n", + "\n", + "Were there some ways that can be used to represent the bias and variance of a model?\n", + "\n", + "First, when we start training, \n", + "\n", + "We utilize the confusion matrix to obtain a set of parameters.\n", + "\n", + "The confusion matrix displays the correspondence between the predicted results of a model and the actual labels in the form of a table, helping us understand the model's performance on different classes.\n", + "\n", + "The structure of the confusion matrix table is as follows:\n", + " \n", + " Predicted Positive Predicted Negative\n", + "Actual Positive True Positive (TP) False Negative (FN)\n", + "Actual Negative False Positive (FP) True Negative (TN)\n", + "\n", + "The meanings of the four values are as follows:\n", + "True Positive (TP): The number of positive instances correctly predicted as positive by the model.\n", + "False Negative (FN): The number of positive instances incorrectly predicted as negative by the model.\n", + "False Positive (FP): The number of negative instances incorrectly predicted as positive by the model.\n", + "True Negative (TN): The number of negative instances correctly predicted as negative by the model.\n", + "\n", + "After understanding the meaning of the matrix, we can use the following algorithms to calculate the desired metrics:\n", + "\n", + "Accuracy: The ratio of the number of correctly predicted samples to the total number of samples.\n", + "Accuracy = (TP + TN) / (TP + TN + FP + FN)\n", + "\n", + "Precision: The proportion of true positive predictions among the predicted positive instances, measuring the prediction accuracy of the model.\n", + "Precision = TP / (TP + FP)\n", + "\n", + "Recall: The proportion of true positive predictions among the actual positive instances, measuring the model's ability to identify positives.\n", + "Recall = TP / (TP + FN)\n", + "\n", + "F1 Score: The harmonic mean of precision and recall, considering both the accuracy and the identification ability of the model.\n", + "F1 Score = 2 * (Precision * Recall) / (Precision + Recall)\n", + "\n", + "When evaluating the bias of a model, we usually consider metrics such as precision, accuracy, and F1 score. A lower F1 score may indicate that the model has issues in balancing accuracy and identification ability, but it cannot be simply equated to lower bias. By considering multiple metrics and the specific requirements of the application scenario, a more comprehensive assessment of the model's performance can be achieved.\n", + "\n", + "Does a lower recall rate indicate better bias?\n", + "\n", + "**No**, a lower recall rate does not indicate better bias. In machine learning, recall rate is a metric that measures the model's ability to identify positive instances. A higher recall rate indicates that the model can better identify positive instances, while a lower recall rate means that the model may miss some true positive instances.\n", + "\n", + "Then does a lower F1 score indicate better bias?\n", + "\n", + "**No**, a lower F1 score does not indicate better bias. The F1 score is the harmonic mean of precision and recall, which considers both the accuracy and the identification ability of the model.Bias refers to the extent to which a model makes incorrect assumptions or oversimplifies the problem, and it is related to the model's prediction accuracy. A lower bias indicates that the model can better fit the training data and is closer to the true underlying relationship.\n", + "\n", + "The F1 score aims to consider both the precision and recall of the model. For certain applications, we are concerned with both the model's prediction accuracy (precision) and its ability to identify positive instances (recall). Therefore, a higher F1 score indicates that the model performs well in balancing prediction accuracy and identification ability.\n", + "\n", + "All these metrics are primarily used to measure the performance of a model on a specific dataset, while model bias typically refers to the systematic deviation of the model from the trends in the dataset, which may affect the model's ability to generalize.\n", + "\n", + "Then is there any way to indirectly indicate the bias of a model?\n", + "\n", + "Analyzing the difference between training error and validation error, Holdout Method,Cross-Validation, and Bootstrapping are all viable approaches.\n", + "\n", + "So what are these method?\n", + "\n", + "### Holdout Method\n", + "\n", + "Splitting the dataset into mutually exclusive training and testing sets, using the training set to train the model, and then evaluating the model's performance using the testing set. By comparing the performance on different models using the validation set, we can select the best-performing model. The sampling criteria require stratified sampling, which means dividing the data proportionally based on data types. \n", + "\n", + "However, since different partitioning methods yield different data samples, the results of model evaluation also differ. Typically, we choose a large portion of the dataset (70-80%) as the training set and the remaining portion as the testing set.\n", + "By splitting the dataset, we can observe that the testing set only represents a small portion of the total dataset, which can lead to unstable evaluation results.\n", + "\n", + "### Cross-Validation\n", + "\n", + "Splitting the dataset into K mutually exclusive subsets (K-fold cross-validation), using each subset as a validation set in turn and the remaining subsets as training sets to train the model and evaluate its performance. By averaging or aggregating the results from K validations, the best model can be selected.\n", + "\n", + "The stability and fidelity of the results in cross-validation evaluation method largely depend on the value of K. Additionally, when the sample size is small but can be clearly separated, leave-one-out method (LOOCV) can be used.\n", + "\n", + "Cross-validation provides high precision, but it can be time-consuming when dealing with large datasets.\n", + "\n", + "In general, using 10-fold cross-validation is sufficient to indirectly assess the generalization ability of a model.\n", + "\n", + "### Bootstrapping\n", + "\n", + "Bootstrapping, also known as resampling or sampling with replacement, is a technique where each time a copy of a sample is selected from a dataset containing m samples and added to the resulting dataset. This process is repeated m times, resulting in a dataset with m samples. (Some samples may appear multiple times in the resulting dataset.) This resulting dataset is then used as the training set.\n", + "\n", + "Since the sampling is conducted independently, the probability that a specific sample is never selected in m iterations of sampling is [(1-1/m)^m]. As m approaches infinity, i.e., m→∞, the limit of this probability is 1/e, where e is the base of the natural logarithm and approximately equal to 2.71828. Therefore, when m is sufficiently large, the probability that a specific sample is never selected in m iterations of sampling is close to 1/e." ] }, { @@ -209,24 +300,19 @@ "\n", "You'll explore how the capacity of a network can affect its performance in the exercise.\n", "\n", - "## Early Stopping\n", + "Determining an appropriate model capacity is a crucial task in model selection. Here are some common methods and guidelines to help determine the right model capacity:\n", "\n", - "We mentioned that when a model is too eagerly learning noise, the validation loss may start to increase during training. To prevent this, we can simply stop the training whenever it seems the validation loss isn't decreasing anymore. Interrupting the training this way is called **early stopping**.\n", + "Rule of thumb: In general, if the dataset is small or the task is relatively simple, choosing a lower-capacity model may be more suitable to avoid overfitting. For larger datasets or complex tasks, a higher-capacity model may be able to better fit the data.\n", "\n", - "Once we detect that the validation loss is starting to rise again, we can reset the weights back to where the minimum occured. This ensures that the model won't continue to learn noise and overfit the data.\n", + "Cross-validation: This method has been mentioned earlier in the previous text, and it is an extremely important approach in model selection. Therefore, it is necessary to mention this method multiple times and gain a deeper understanding of it.\n", "\n", - "Training with early stopping also means we're in less danger of stopping the training too early, before the network has finished learning signal. So besides preventing overfitting from training too long, early stopping can also prevent *underfitting* from not training long enough. Just set your training epochs to some large number (more than you'll need), and early stopping will take care of the rest.\n", + "Learning curves: Learning curves can help determine if the model capacity is appropriate. By plotting the performance of the model on the training set and the validation set as the number of training samples increases, one can observe the model's fitting and generalization abilities. If the model performs poorly on both the training set and the validation set, it may be underfitting due to low capacity. If the model performs well on the training set but poorly on the validation set, it may be overfitting due to high capacity. Adjustments to the model capacity can be made based on the trend of the learning curve.\n", "\n", - "## Adding Early Stopping\n", + "Regularization: Adjusting the model capacity through regularization techniques (which we will also mention in the text later). Increasing the regularization parameter can reduce model capacity and decrease the risk of overfitting. Decreasing the regularization parameter can increase model capacity and improve fitting ability. By evaluating the model performance on the validation set with different regularization parameters, an appropriate regularization parameter value can be chosen.\n", "\n", - "In Keras, we include early stopping in our training through a callback. A **callback** is just a function you want run every so often while the network trains. The early stopping callback will run after every epoch. (Keras has [a variety of useful callbacks](https://www.tensorflow.org/api_docs/python/tf/keras/callbacks) pre-defined, but you can [define your own](https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/LambdaCallback), too.)\n", + "Model comparison experiments: Train and evaluate models with different capacities and compare their performance on the validation set. By comparing the generalization performance of different-capacity models, select the model capacity with the best performance.\n", "\n", - ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/model-selection/traintestoverfitting.png\n", - "---\n", - "name: EarlyStopping-ms\n", - "---\n", - "Early stopping\n", - ":::" + "Considering the above methods and guidelines, selecting an appropriate model capacity requires a balance between theory and practice and decision-making based on the specific problem and available resources. The ultimate goal is to choose a model that performs well on both the training data and new data, achieving good generalization ability.\n" ] }, { @@ -296,6 +382,7 @@ "ElasticNet \n", ":::\n", "\n", + "Both are very common regularization techniques, but they are suitable for different scenarios. L1 regularization is suitable for situations that require feature selection or demand model interpretability. On the other hand, L2 regularization is more general and applicable in most cases to prevent overfitting and improve model generalization ability.\n", "\n", "### The impact of the value of $\\lambda$ \n", "\n", @@ -306,6 +393,46 @@ "The impact of the value of $\\lambda$ \n", ":::\n", "\n", + "The value of $\\lambda$ has a significant impact on weight regularization.\n", + "\n", + "When $\\lambda$ is small, the effect of weight regularization is relatively minor. The network is more likely to learn complex patterns and structures, which can lead to overfitting. This means that the model may perform well on the training data but have poor generalization on new data.\n", + "\n", + "When $\\lambda$ is large, the effect of weight regularization becomes more pronounced. The network is constrained to simpler patterns and structures, reducing the risk of overfitting. This can improve the model's generalization on new data but may result in a slight decrease in performance on the training data.\n", + "\n", + "Choosing the appropriate value of $\\lambda$ requires adjustment and optimization based on the specific problem and dataset. Typically, cross-validation or other evaluation methods can be used to select the optimal $\\lambda$ value, finding a balance between model complexity and generalization ability.\n", + "\n", + "When using regularization during model training, its effect can be better understood. Let's take the example of linear regression.\n", + "\n", + "Suppose we have a dataset containing house area and prices, and we want to use a linear regression model to predict house prices. We can define a linear regression model that includes an intercept term and a coefficient for the house area.\n", + "\n", + "Without regularization, the objective of the model is to minimize the mean squared error (MSE) on the training data. This means the model will try to find the best-fitting line in the training data to minimize the differences between the predicted values and the actual values.\n", + "\n", + "However, if the training data contains noise or outliers, or if the training set is relatively small, the model may overfit the data, leading to a decrease in prediction performance on new data. In such cases, regularization can help control the complexity of the model and reduce the risk of overfitting.\n", + "\n", + "By adding L2 regularization (Ridge regularization) to the linear regression model, we introduce the square of the L2 norm of the parameters as a penalty term in the loss function. This encourages the model to prefer smaller parameter values during training, preventing the parameters from becoming too large.\n", + "\n", + "The effect of regularization is achieved by balancing the trade-off between minimizing the training error and minimizing the penalty term. A larger regularization parameter will penalize larger parameter values more strongly, making the model smoother and reducing the differences between parameters. This helps reduce the risk of overfitting and improves the model's generalization ability on new data.\n", + "\n", + "In summary, the role of regularization in linear regression models is to control the complexity of the model, reduce the risk of overfitting, and improve the model's generalization ability on new data.\n", + "\n", + "## Early Stopping\n", + "\n", + "We mentioned that when a model is too eagerly learning noise, the validation loss may start to increase during training. To prevent this, we can simply stop the training whenever it seems the validation loss isn't decreasing anymore. Interrupting the training this way is called **early stopping**.\n", + "\n", + "Once we detect that the validation loss is starting to rise again, we can reset the weights back to where the minimum occured. This ensures that the model won't continue to learn noise and overfit the data.\n", + "\n", + "Training with early stopping also means we're in less danger of stopping the training too early, before the network has finished learning signal. So besides preventing overfitting from training too long, early stopping can also prevent *underfitting* from not training long enough. Just set your training epochs to some large number (more than you'll need), and early stopping will take care of the rest.\n", + "\n", + "## Adding Early Stopping\n", + "\n", + "In Keras, we include early stopping in our training through a callback. A **callback** is just a function you want run every so often while the network trains. The early stopping callback will run after every epoch. (Keras has [a variety of useful callbacks](https://www.tensorflow.org/api_docs/python/tf/keras/callbacks) pre-defined, but you can [define your own](https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/LambdaCallback), too.)\n", + "\n", + ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/model-selection/traintestoverfitting.png\n", + "---\n", + "name: EarlyStopping-ms\n", + "---\n", + "Early stopping\n", + ":::\n", "\n", "## Dropout\n", "\n", @@ -358,6 +485,8 @@ "How to choose a good model\n", ":::\n", "\n", + "The above image illustrates well why we consider bias as an important aspect in model selection and even in machine learning. When our model understands the signal, its improvement is positive. However, once the model starts to understand the noise, the bias of the model starts to increase. This is where cross-validation, mentioned earlier, comes into play.\n", + "\n", ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/model-selection/Bias-vs.webp\n", "---\n", "name: Conclusion-ms\n", @@ -365,6 +494,10 @@ "Conclusion \n", ":::\n", "\n", + "The purpose of model selection is to choose the best model among multiple candidate models for a given machine learning problem. The best model refers to the one that performs well on the training data and has good generalization ability to unseen new data.\n", + "\n", + "The importance of model selection lies in the fact that different models may have different adaptability to the nature of the data and the complexity of the problem. Selecting an appropriate model can improve the model's prediction accuracy, robustness, and interpretability.\n", + "\n", "## Your turn! 🚀\n", "\n", "Machine learning model selection and dealing with overfitting and underfitting are crucial aspects of the machine learning pipeline. In this assignment, you'll have the opportunity to apply your understanding of these concepts and techniques. Please complete the following tasks:\n", From b10f1bb430594b922771ea8c626df125f11a859b Mon Sep 17 00:00:00 2001 From: JERRYenSHU503 <1929891932@qq.com> Date: Sat, 27 Apr 2024 13:46:34 +0800 Subject: [PATCH 02/20] Update model-selection.ipynb --- .../ml-advanced/model-selection.ipynb | 77 +++++++++++++++---- 1 file changed, 63 insertions(+), 14 deletions(-) diff --git a/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb b/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb index 96c62d7b31..be51a1a7ce 100644 --- a/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb +++ b/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb @@ -49,7 +49,11 @@ "\n", "### Overview\n", "\n", - "Remember that the main objective of any machine learning model is to generalize the learning based on training data, so that it will be able to do predictions accurately on unknown data. Here are a few concepts: the first is 'Hypothesis', the second is 'Truth'. When we obtain data and train it, we propose a hypothesis, and the process of forcing the hypothesis to be as close to the truth as possible is our training process. This process is called 'fitting', which means the model tries to learn the patterns, relationships, or rules in the data in order to make predictions or classifications on unknown data. Due to the existence of errors in the hypothesis, we introduce the concepts of generalization error and empirical error (training error). The generalization error represents the error in unknown samples when we fit the model to the truth. It is uncertain. On the other hand, the empirical error represents the error on the training set, and it can be determined. In order to reduce the error and approach the truth, we need model evaluation. However, due to the occurrence of overfitting, a smaller error does not necessarily indicate a better model.\n", + "Remember that the main objective of any machine learning model is to generalize the learning based on training data, so that it will be able to do predictions accurately on unknown data. Here are a few concepts: the first is 'Hypothesis', the second is 'Truth'. When we obtain data and train it, we propose a hypothesis, and the process of forcing the hypothesis to be as close to the truth as possible is our training process. \n", + "\n", + "### Fitting process\n", + "\n", + "This process is called 'fitting', which means the model tries to learn the patterns, relationships, or rules in the data in order to make predictions or classifications on unknown data. Due to the existence of errors in the hypothesis, we introduce the concepts of generalization error and empirical error (training error). The generalization error represents the error in unknown samples when we fit the model to the truth. It is uncertain. On the other hand, the empirical error represents the error on the training set, and it can be determined. In order to reduce the error and approach the truth, we need model evaluation. However, due to the occurrence of overfitting, a smaller error does not necessarily indicate a better model.\n", "\n", "As you can notice the words 'Overfitting' and 'Underfitting' are kind of opposite of the term 'Generalization'. Overfitting and underfitting models don't generalize well and results in poor performance.\n", "\n", @@ -64,26 +68,26 @@ "\n", "During the fitting process, we have an important parameter called 'bias'. It refers to the deviation of the model from the true relationship when attempting to fit the data.\n", "\n", - "### Underfitting\n", + "#### Underfitting\n", "\n", "* Underfitting occurs when machine learning model don't fit the training data well enough. It is usually caused by simple function that cannot capture the underlying trend in the data.\n", "* Underfitting models have high error in training as well as test set. This behavior is called as 'Low Bias'\n", "* This usually happens when we try to fit linear function for non-linear data.\n", "* Since underfitting models don't perform well on training set, it's very easy to detect underfitting\n", "\n", - "#### How To Avoid Underfitting?\n", + "##### How To Avoid Underfitting?\n", "* Increasing the model complexity. e.g. If linear function under fit then try using polynomial features\n", "* Increase the number of features by performing the feature engineering\n", "\n", - "### Overfitting\n", + "#### Overfitting\n", "* Overfitting occurs when machine learning model tries to fit the training data too well. It is usually caused by complicated function that creates lots of unnecessary curves and angles that are not related with data and end up capturing the noise in data.\n", "* Overfitting models have low error in training set but high error in test set. This behavior is called as 'High Variance'\n", "\n", - "#### How To Avoid Overfitting?\n", + "##### How To Avoid Overfitting?\n", "* Since overfitting algorithm captures the noise in data, reducing the number of features will help. We can manually select only important features or can use model selection algorithm for same\n", "* We can also use the 'Regularization' technique. It works well when we have lots of slightly useful features. Sklearn linear model(Ridge and LASSO) uses regularization parameter 'alpha' to control the size of the coefficients by imposing a penalty. Please refer below tutorials for more details.\n", "\n", - "### Good Fitting \n", + "#### Good Fitting \n", "* It is a sweet spot between Underfitting and Overfitting model\n", "* A good fitting model generalizes the learnings from training data and provide accurate predictions on new data\n", "* To get the good fitting model, keep training and testing the model till you get the minimum train and test error. Here important parameter is 'test error' because low train error may cause overfitting so always keep an eye on test error fluctuations. The sweet spot is just before the test error start to rise.\n", @@ -191,7 +195,16 @@ "Model complexity v.s. error\n", ":::\n", "\n", - "### Metrics\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "id": "2f23ec03", + "metadata": {}, + "source": [ + "\n", + "## Metrics\n", "\n", "Were there some ways that can be used to represent the bias and variance of a model?\n", "\n", @@ -228,7 +241,15 @@ "F1 Score = 2 * (Precision * Recall) / (Precision + Recall)\n", "\n", "When evaluating the bias of a model, we usually consider metrics such as precision, accuracy, and F1 score. A lower F1 score may indicate that the model has issues in balancing accuracy and identification ability, but it cannot be simply equated to lower bias. By considering multiple metrics and the specific requirements of the application scenario, a more comprehensive assessment of the model's performance can be achieved.\n", - "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "id": "1130ca67", + "metadata": {}, + "source": [ + "## Method\n", "Does a lower recall rate indicate better bias?\n", "\n", "**No**, a lower recall rate does not indicate better bias. In machine learning, recall rate is a metric that measures the model's ability to identify positive instances. A higher recall rate indicates that the model can better identify positive instances, while a lower recall rate means that the model may miss some true positive instances.\n", @@ -288,9 +309,17 @@ "\n", "Ideally, we would create models that learn all of the signal and none of the noise. This will practically never happen. Instead we make a trade. We can get the model to learn more signal at the cost of learning more noise. So long as the trade is in our favor, the validation loss will continue to decrease. After a certain point, however, the trade can turn against us, the cost exceeds the benefit, and the validation loss begins to rise.\n", "\n", - "This trade-off indicates that there can be two problems that occur when training a model: not enough signal or too much noise. **Underfitting** the training set is when the loss is not as low as it could be because the model hasn't learned enough *signal*. **Overfitting** the training set is when the loss is not as low as it could be because the model learned too much *noise*. The trick to training deep learning models is finding the best balance between the two.\n", + "This trade-off indicates that there can be two problems that occur when training a model: not enough signal or too much noise. Underfitting the training set is when the loss is not as low as it could be because the model hasn't learned enough *signal*. Overfitting the training set is when the loss is not as low as it could be because the model learned too much *noise*. The trick to training deep learning models is finding the best balance between the two.\n", "\n", "We'll look at a couple ways of getting more signal out of the training data while reducing the amount of noise.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "id": "5957cb79", + "metadata": {}, + "source": [ "\n", "## Capacity\n", "\n", @@ -312,7 +341,7 @@ "\n", "Model comparison experiments: Train and evaluate models with different capacities and compare their performance on the validation set. By comparing the generalization performance of different-capacity models, select the model capacity with the best performance.\n", "\n", - "Considering the above methods and guidelines, selecting an appropriate model capacity requires a balance between theory and practice and decision-making based on the specific problem and available resources. The ultimate goal is to choose a model that performs well on both the training data and new data, achieving good generalization ability.\n" + "Considering the above methods and guidelines, selecting an appropriate model capacity requires a balance between theory and practice and decision-making based on the specific problem and available resources. The ultimate goal is to choose a model that performs well on both the training data and new data, achieving good generalization ability." ] }, { @@ -413,7 +442,14 @@ "\n", "The effect of regularization is achieved by balancing the trade-off between minimizing the training error and minimizing the penalty term. A larger regularization parameter will penalize larger parameter values more strongly, making the model smoother and reducing the differences between parameters. This helps reduce the risk of overfitting and improves the model's generalization ability on new data.\n", "\n", - "In summary, the role of regularization in linear regression models is to control the complexity of the model, reduce the risk of overfitting, and improve the model's generalization ability on new data.\n", + "In summary, the role of regularization in linear regression models is to control the complexity of the model, reduce the risk of overfitting, and improve the model's generalization ability on new data.\n" + ] + }, + { + "cell_type": "markdown", + "id": "a1b5d9f0", + "metadata": {}, + "source": [ "\n", "## Early Stopping\n", "\n", @@ -432,7 +468,14 @@ "name: EarlyStopping-ms\n", "---\n", "Early stopping\n", - ":::\n", + ":::\n" + ] + }, + { + "cell_type": "markdown", + "id": "03100d74", + "metadata": {}, + "source": [ "\n", "## Dropout\n", "\n", @@ -464,8 +507,14 @@ "\n", "Consider the neurons at the output layer. During training, each neuron usually get activations only from two neurons from the hidden layer (while being connected to four), due to dropout. Now, imagine we finished the training and remove dropout. Now activations of the output neurons will be computed based on four values from the hidden layer. This is likely to put the output neurons in unusual regime, so they will produce too large absolute values, being overexcited \n", "\n", - "To avoid this, the trick is to multiply the input connections' weights of the last layer by 1-p (so, by 0.5). Alternatively, one can multiply the outputs of the hidden layer by 1-p, which is basically the same \n", - "\n", + "To avoid this, the trick is to multiply the input connections' weights of the last layer by 1-p (so, by 0.5). Alternatively, one can multiply the outputs of the hidden layer by 1-p, which is basically the same " + ] + }, + { + "cell_type": "markdown", + "id": "755eed48", + "metadata": {}, + "source": [ "\n", "## Conclusions\n", "\n", From 8c4657229d68fd206c5e2aa2a376c66fd60c6b50 Mon Sep 17 00:00:00 2001 From: JERRYenSHU503 <1929891932@qq.com> Date: Sat, 27 Apr 2024 15:54:46 +0800 Subject: [PATCH 03/20] Update model-selection.ipynb --- .../ml-advanced/model-selection.ipynb | 72 ++++++++++++------- 1 file changed, 45 insertions(+), 27 deletions(-) diff --git a/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb b/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb index be51a1a7ce..a368a198de 100644 --- a/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb +++ b/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb @@ -292,29 +292,6 @@ "Since the sampling is conducted independently, the probability that a specific sample is never selected in m iterations of sampling is [(1-1/m)^m]. As m approaches infinity, i.e., m→∞, the limit of this probability is 1/e, where e is the base of the natural logarithm and approximately equal to 2.71828. Therefore, when m is sufficiently large, the probability that a specific sample is never selected in m iterations of sampling is close to 1/e." ] }, - { - "cell_type": "markdown", - "id": "b4f22209", - "metadata": {}, - "source": [ - "## Interpreting the Learning Curves\n", - "\n", - "You might think about the information in the training data as being of two kinds: *signal* and *noise*. The signal is the part that generalizes, the part that can help our model make predictions from new data. The noise is that part that is *only* true of the training data; the noise is all of the random fluctuation that comes from data in the real-world or all of the incidental, non-informative patterns that can't actually help the model make predictions. The noise is the part might look useful but really isn't.\n", - "\n", - "We train a model by choosing weights or parameters that minimize the loss on a training set. You might know, however, that to accurately assess a model's performance, we need to evaluate it on a new set of data, the *validation* data. \n", - "\n", - "When we train a model we've been plotting the loss on the training set epoch by epoch. To this we'll add a plot the validation data too. These plots we call the **learning curves**. To train deep learning models effectively, we need to be able to interpret them.\n", - "\n", - "Now, the training loss will go down either when the model learns signal or when it learns noise. But the validation loss will go down only when the model learns signal. (Whatever noise the model learned from the training set won't generalize to new data.) So, when a model learns signal both curves go down, but when it learns noise a *gap* is created in the curves. The size of the gap tells you how much noise the model has learned.\n", - "\n", - "Ideally, we would create models that learn all of the signal and none of the noise. This will practically never happen. Instead we make a trade. We can get the model to learn more signal at the cost of learning more noise. So long as the trade is in our favor, the validation loss will continue to decrease. After a certain point, however, the trade can turn against us, the cost exceeds the benefit, and the validation loss begins to rise.\n", - "\n", - "This trade-off indicates that there can be two problems that occur when training a model: not enough signal or too much noise. Underfitting the training set is when the loss is not as low as it could be because the model hasn't learned enough *signal*. Overfitting the training set is when the loss is not as low as it could be because the model learned too much *noise*. The trick to training deep learning models is finding the best balance between the two.\n", - "\n", - "We'll look at a couple ways of getting more signal out of the training data while reducing the amount of noise.\n", - "\n" - ] - }, { "cell_type": "markdown", "id": "5957cb79", @@ -344,6 +321,29 @@ "Considering the above methods and guidelines, selecting an appropriate model capacity requires a balance between theory and practice and decision-making based on the specific problem and available resources. The ultimate goal is to choose a model that performs well on both the training data and new data, achieving good generalization ability." ] }, + { + "cell_type": "markdown", + "id": "5617ad72", + "metadata": {}, + "source": [ + "## Interpreting the Learning Curves\n", + "\n", + "You might think about the information in the training data as being of two kinds: *signal* and *noise*. The signal is the part that generalizes, the part that can help our model make predictions from new data. The noise is that part that is *only* true of the training data; the noise is all of the random fluctuation that comes from data in the real-world or all of the incidental, non-informative patterns that can't actually help the model make predictions. The noise is the part might look useful but really isn't.\n", + "\n", + "We train a model by choosing weights or parameters that minimize the loss on a training set. You might know, however, that to accurately assess a model's performance, we need to evaluate it on a new set of data, the *validation* data. \n", + "\n", + "When we train a model we've been plotting the loss on the training set epoch by epoch. To this we'll add a plot the validation data too. These plots we call the **learning curves**. To train deep learning models effectively, we need to be able to interpret them.\n", + "\n", + "Now, the training loss will go down either when the model learns signal or when it learns noise. But the validation loss will go down only when the model learns signal. (Whatever noise the model learned from the training set won't generalize to new data.) So, when a model learns signal both curves go down, but when it learns noise a *gap* is created in the curves. The size of the gap tells you how much noise the model has learned.\n", + "\n", + "Ideally, we would create models that learn all of the signal and none of the noise. This will practically never happen. Instead we make a trade. We can get the model to learn more signal at the cost of learning more noise. So long as the trade is in our favor, the validation loss will continue to decrease. After a certain point, however, the trade can turn against us, the cost exceeds the benefit, and the validation loss begins to rise.\n", + "\n", + "This trade-off indicates that there can be two problems that occur when training a model: not enough signal or too much noise. Underfitting the training set is when the loss is not as low as it could be because the model hasn't learned enough *signal*. Overfitting the training set is when the loss is not as low as it could be because the model learned too much *noise*. The trick to training deep learning models is finding the best balance between the two.\n", + "\n", + "We'll look at a couple ways of getting more signal out of the training data while reducing the amount of noise.\n", + "\n" + ] + }, { "cell_type": "markdown", "id": "037aa612", @@ -353,10 +353,21 @@ "\n", "You may be familiar with Occam's Razor principle: given two explanations for something, the explanation most likely to be correct is the 'simplest' one, the one that makes the least amount of assumptions. This also applies to the models learned by neural networks: given some training data and a network architecture, there are multiple sets of weights values (multiple models) that could explain the data, and simple models are less likely to overfit than complex ones.\n", "\n", - "A 'simple model' in this context is a model where the distribution of parameter values has less entropy (or a model with fewer parmeters altogether, as we saw in the section above). Thus a common way to mitigate overfitting is to put constraints on the complexity of a network by forcing its weeights only to take small values, which makes the distribution of weight values more 'regular'. This is called 'weight regularization', and it is done by adding to the loss function of the network a cost associated with having large weights. This cost comes in two flavors:\n", + "A 'simple model' in this context is a model where the distribution of parameter values has less entropy (or a model with fewer parmeters altogether, as we saw in the section above). Thus a common way to mitigate overfitting is to put constraints on the complexity of a network by forcing its weeights only to take small values, which makes the distribution of weight values more 'regular'. This is called 'weight regularization', and it is done by adding to the loss function of the network a cost associated with having large weights. \n", + "\n", + "Let's consider a target function with a regularization term, which can be represented as:\n", + "\n", + "J(θ) = L(θ) + λR(θ)\n", + "\n", + "Here, J(θ) is the target function, θ represents the model's parameters, L(θ) is the loss function (typically the model's error on the training data), R(θ) is the regularization term, and λ is the regularization parameter.\n", + "\n", + "The loss function L(θ) measures how well the model fits the training data, and our goal is to minimize it. The regularization term R(θ) constrains or penalizes the values of the model's parameters, and it controls the complexity of the model.\n", "\n", - "- L1 regularization, where the cost added is proportional to the aboslute value of the weights coefficients (i.e. to what is called the 'L1 norm' of the weights).\n", - "- L2 regularization, where the cost added is proportional to the square of the value of the weights coefficients (i.e. to what is called the 'L2 norm' of the weights). L2 regularization is also called weight decay in the context of neurral networks. Don't let the different name confuse you: weight decay is mathematically the exact same as L2 regularization.\n", + "The regularization parameter λ determines the weight of the regularization term in the target function. When λ approaches 0, the impact of the regularization term becomes negligible, and the model's objective is primarily to minimize the loss function. On the other hand, when λ approaches infinity, the regularization term's impact becomes significant, and the model's objective is to minimize the regularization term as much as possible, leading to parameter values tending towards zero.\n", + "\n", + "There are two forms of this cost: L1 regularization (also known as Lasso regression) with the regularization term R(θ) represented as the sum of the absolute values of the parameters θ: R(θ) = ||θ||₁. L1 regularization can induce certain parameters of the model to become zero, thereby achieving feature selection and sparsity.\n", + "\n", + "L2 regularization (also known as Ridge regression) with the regularization term R(θ) represented as the square root of the sum of the squares of the parameters θ: R(θ) = ||θ||₂. L2 regularization encourages the parameter values of the model to gradually approach zero but not exactly become zero, hence it does not possess the ability for feature selection.\n", "\n", "In `tf.keras`, weight regularization is added by passing weight regularizer instances to layers as keyword arguments. Let's add L2 weight regularization now.\n", "\n", @@ -415,6 +426,10 @@ "\n", "### The impact of the value of $\\lambda$ \n", "\n", + "We notice that the objective function contains not only the regularization term but also the regularization parameter $\\lambda$.\n", + "\n", + "The selection of the regularization parameter is an important part of regularization, and it needs to be fine-tuned during the model training process.\n", + "\n", ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/model-selection/lagrange-animation.gif\n", "---\n", "name: impact-of-lambda-ms\n", @@ -442,7 +457,10 @@ "\n", "The effect of regularization is achieved by balancing the trade-off between minimizing the training error and minimizing the penalty term. A larger regularization parameter will penalize larger parameter values more strongly, making the model smoother and reducing the differences between parameters. This helps reduce the risk of overfitting and improves the model's generalization ability on new data.\n", "\n", - "In summary, the role of regularization in linear regression models is to control the complexity of the model, reduce the risk of overfitting, and improve the model's generalization ability on new data.\n" + "In summary, the role of regularization in linear regression models is to control the complexity of the model, reduce the risk of overfitting, and improve the model's generalization ability on new data.\n", + "\n", + "In this section, we primarily utilize learning curves to optimize the regularization parameter, also known as the learning curve.\n", + "\n" ] }, { From 656c3c725e3eb610b71be301745e540d285388df Mon Sep 17 00:00:00 2001 From: JERRYenSHU503 <1929891932@qq.com> Date: Sat, 27 Apr 2024 16:43:29 +0800 Subject: [PATCH 04/20] Create confusion_matrix.jpg --- images/model-selection/confusion_matrix.jpg | Bin 0 -> 15031 bytes 1 file changed, 0 insertions(+), 0 deletions(-) create mode 100644 images/model-selection/confusion_matrix.jpg diff --git a/images/model-selection/confusion_matrix.jpg b/images/model-selection/confusion_matrix.jpg new file mode 100644 index 0000000000000000000000000000000000000000..854d9b54f26e5985d30e325ce1c64dd1a1c39c9d GIT binary patch literal 15031 zcmd5?1zgn2_TOdc?vPID?rtQdJEcLCR1hR3R7wz}QyK(GrMpYIq@|JW`tPD%^sXYvjB8CX<2Ci5C{MO!4KeS67U27cU|BS5a5xJ z!4EnbDk>T}0VWpsA)v-1!~?(7lOLqP*UFtAq> z0Aw%@g$_aoW5r~Up5IE~Jk(6CRk>UoQ$*%-vz)YX`oB&@o&6h=APddo9-t~1|0V;Z zA@;ZpZ7Ol>8UxLqeyI)K*7sevzJ3s00mSmjt#?7mKcWDD?^zPzdx0V_NcMi9j^M(x zPWL_!*A?I$8~-JPxPmTffS^Fl$A_DJvWI}=dhjq-j%^M|jZSo;^OUeWVDG8O26+-S zJQn9G$1)-U_A{-}lobnjiihAaH)=BT18=(0JOH=97bv99uJ;x{*FzArWSPEql?z`6 zfIuQ3K`;mmi69jYMy?ZEINiq=j+KC;9CJMb;~=fl;qZ-3=-zrT@x6#W?nC*iKQJ(s zDC*(;$`VODGpkPv*JF^8&ylyUTX#O0L^~+o$J=8G}h`e-63xX2n*{8>w4Xb#u1votIlee#V~wL4{vgr2?&UBK#cjl zpc^?gJ-qovK`7xR3x~ijtAFRiRxC-90EhJPdJdPnq7G_rzrx0M4&W9v-qiNTj*s3y zfZvSu8}!fB`8xFb%3p!uO+E_l#1y`5LAqGyVNN}2McfWGsbcjHcccC)@$h4den<10 z;-qwjCs}E8bSv-V9x-DkJ^zb@WwJ7R36f2Y_(ZO9dG1T5+>F0U*mh|N>Cp6r6P+-c zYU*+6$V>lb5O4!tqHU%z>T{73fBg?IVa?%lchhLh{vR;j>8_9@ULA0J@Vo5fF~#L) zq=S1witqiyWZ7%Bl&W%`2_aM%pjyk)sO22yA6`!t7bdyHE_CsW9sPk5VEVqZ-Xu{< zwD)%q*m|gccc&2|=3rrn&fQ^iCVX%&KEVPOl08?gyGh!8>9B|U*Avjg0_*%4J{m8% z)I)|KTbAa?IM?sCJ_t+F?kgdrf)xeBKdm7`x$k-VTA4pt2LO0tJh07xlL(#(F65+W zUWi(i4c+VBU~0jg#uUm^V5{Y`rw8z^F>kMES5Ss)*KOLC;+Ne1?vM%hc`BJULzU%4 z!EN|ljezAOwCv5ZkEfJDN%DQobCWj-?m9hH)_?>OC}iW^_=0~j8Q)%((T(Eq+)iCq zFbsf!ui?}@!{*;ros6@C006+x)`LGUXG+}^kmtzk$pk155+3*JE zgzD*)igG?NCdf?koM^%%)61=bEgtKqX-`dsXe(QnE=D}g_SEaa-3~Rc2nFp+7U6`Q z*B5?pij*Vy5Wim4eaRA92p0el{@DFkwPpm>QXfLHjwd^j#wETxDI7~r5Ui_X|AVE8 zvNXHoEAf|VDmHxiAXDH7#~n)Akc35@mX&I`%u6(0mKsCwZ{>-H&^evqIvmHFZwR>l zNtLhopPB93w4ZJD4gJXrzJP!507E7LGv3y!Vj(8~~VmwIJrU-&c3g1i4rF(|+3Q+y-AM4GRM(N7xz z#U5$OxPVS!B&YtoXqO9(N#02rJAOEX!=EV%>4Ow;A)bG-gI~r0fL#Yf)dzQO zgY)^vz2o8N9!=E`-e3D5)z6}S>-b5uui)=hPslcJI#Guy)BxHJi|QUIYm^hN~`)A?{x!u%r+!xu)X#eL3R$z5j*xBxt5pC49L(RDZ?E z^YJ%b|7i`rkn^B}w-E*!9u{pvBAy@(Y6xc=^2A9ia$R^Z2)j9%T`{Y<7~; zl!gf(j!=Zb%z3#pn^P=NiueLrrPIU2=El@(?_0_*x-u1;5@@5G&z#*pG0V&Jkq}8{ zpraT0Qbwr8jb>UF-|XOc#id5>%hNjBHrzmmO~xeX{TWmiz&GIPS63<^5QsIs%w}EV zqC%j-jG&^@sqf?572Idx{e~`*j&d26l|h+K%_kHd-7V2p5fjB5e{N{J-%mAemrbT6 z%bcq8y=!V99HI(#oU7ID2X#dc*mSxHDJ9sn0m@cUOxdS3=1S;KU*fwCRd1`bXzB8k zPti9aCi4U?^DVZXl3$ptJRc!qG;4#7a6pD-6Hd&shK3wF zp39Qa$7ny`+~>%7l>3q^n2VcjDfiq$)yKZg%j&^HRFUCLdGrL5WKBmg657&sL<13c zLzGS!VIrtU1fKwMU7~RzS(b_goo!K=qh)3?JLpmr{6e>a1!3QuQ?1s&RVIy9%Tgbf z0!n;B3Kyb?L9Voin%{Gr=Y<{_fQgRC3FJUCm0p5L)X>}$7i-LCP2F?#SX}y9;WDmQ z+?hj5#Y@giIr;DN-5(YGFsG$LN$uxa_9BvO+;6WH19)0NMsshG(}FW?HB| zr|hFliPosdXg zidaL8@k0dkqpxH#?i#kMZq`BMRIZtAc3 z zebUjhFLB!RTj(Ehv70Im?NQ(eD|6Ds!f_2kBQmoD9G zS2RL#g#KXeZwq=6tzzr!uR1`Ip`=J2Bpo7-vo}B|B0oHY@sdz2qC10hiH@atLyyh5 z$qs}l2V;>HHLmOIF_$H?|F5e;IKw~~3VQ#=UX?ffqj#?Wn!=-bq_)FhWovB{2d0~k z-X7LypBd&qF8oh-*Z^$CUQGnJOub$D*hOMD{iXU!O%+=}z;D#geg6r{x}REFN4Y(C z>3TCR_4vZ8rQP~Rj}1@+)rYuo8CJ}$06Y@PMJ5Ym9^sRsmqtR@1I|U~<}*`^-Wo~J zoK)WM_d_E9Ee1@$?XGV4&vW!rOWQF9$2dH>pORPUJ+|ava&}|rwHc}}CHaA4g0?h% zr9Hwx9=)TrtI1J#?voVwdLQU#qz0<~6o%L)#mUOgK=N=TN`eJD3UCy6J-pC)U30SU zGHUgA)ZPn7HEwDZJs~gmIWKo2cMaYPDvG_00i=VM#UX$C-ui;qojd7?nCY;Ini7TT z99d{GJ-QZDA}6%^D)hF4hV-Xat{yyiHt9AF#bpiqt{D2e4(ZfN?wt$8`h;QgaA-JO zDE`k*33z(?hZyl)+exXo`$eLA%~&v!9ByJoE0WTBm?$F2he!yL)u&q*Q}SV&BQwQ_ zs}Q(9UjWi4QN09y>c_j4VLdy^Ge?gluQuNzXttX*NQLpLQGd6cJOJJRnR&t0r>R9z z!sJeEfa0Dr7ZNN9J6s1!DcYx!SfpmlM6_m95_%g#B%B?80&f_sRP>v!vYwaQ5f|H; z-mop$Fx%WhuY&<<}njvse;qo!avgA;qj||A{of;pBNh6wJ+NWnIv^yBE zVbA))%VNttDc9a^D_N6*@@IPme7TDizT9FG*uFrk&AJmVe9JDyiz%VqkWz3VXYKCV zy}Q)r_os0f?`8*+;V0%C_~jee361YAeqzVp;@;(>uT0m#1!P5 zOuEOEE>8UVr?>RS9jNs~z3}6g77BPBB}#xZ~moz=SFv*{t2h zL(d=68>?{ax4qeM8$61&ZRvPIEmol-4KtV~$A=#ZF`KUa zVsu7iocB}=YkeE%>3!vw@qYMZJ9io#xajYsx6?!dd5v=}EHz{6X)jG2Tpf#4q2L0a z{eByRO$&{2%!86v7|>Z_danS;S*CuP=|%5Z-tK?`pTz|tl21B^on*-g2Kz$ZO;brhQMH-T?sqT z$=xOK9fIW>X3U}RsqQ4OiMKTxQ$wNFs)NKhOYrdHpN8h;TXv()&sO21r(T2w5q(J* z3!wnyPKiat2fz9a2EN~EJmTy>zm4duFP+LB%qe+JoouQmiTJZ)TWaE zVBq_SM5HVuNa!z;V7Fo3q1UALc;C*Q)Bxwjn1 z{Xck_%ysk2_BiNIVxv5|J8Lb>jX2l~V`H_tyYrSTH2&iq1~;VmU-0?t_WzK>Ry@h< zPt<6R8RJtEJPc4v8<)C$cGZ3hB-|Sl@xcd+;bGSEX0VY&tApZUu)Km$P(l5?cT)gT z$T4e%uaPuxHwei=08YtrIqL{AfIo6fc_2;2)p;%Q# z24O*nB=N)w znH_$LY^XBoF|P;$L|C(|7G|kH)_Y=+YP#)G!n+7ru+9jdmQ<}YS^b6iSe|9Y(7K=~ zJ0RQf^LpA>eMF{$GEe0gM`*%}YH-w2ZN zvxGy761?q&y471 z>KyhkK6?t|6T6MOI;tNis1dyFSRZ;}x8oi;*#2Jb*4L`2YknX#vEf^+pAP74{BM9&n|%6y5Q)YXhiN^dp; z6X_6@Pr{VFdt=){J_{R}DHD5Dnrs2xEYo-aR!%~WKvGz8%&eJ?_s#UmxKT2#${gpa zO4RO`#WnHwzb&3q%>T~8j6*)qTT#Iu+1}45)lU|QLO+)EkLow8Q;{u8QHd3p@~fZb zuiedg(zeI}@{0miT%UAbHqI9AxtSNdTz{7F9;}RgaJh!H-bUgAl_=S)Smd+C+ipfI z2(5g_HU1`H5+!v_v9e8Sv0P26kNvIvdwz0uA*--qnFIVi^vJqdn?e3WY5GRL#<{ec z?xg6v9yt)VYt6S_jbfvkKXF>gb+luY2OvaFg1X8QvXa;&K1+ zFzTp9qFJ#N!ILOpeOvP70{;Cl`UheA&%@}8K9Dd9)HN}+`0+}8eMeM?pDn%@;I!k+ zKZr#5;|(>WpyzKt-F0oqW-#C-(kyoG)BLM1-*LR2t<&FqG-LF~!-eJSQq0&CK}sOhg4xFQC0Qs~AY`ZRERE z6~A5j+}RfYzb4-f$nDQeGxccdIca204O>@m>Du*Z#{YMTWC?oT*KIV;2sq463Q;i~ z9H{-5MOT1>FP}5uukTqf06-8B4*WwI5X_ga`ys!gQGv$5!sZZF#Ux{MU?mqbpi-JStf3C6giKA(UW$L}W8Kkibo z^?B#}hEwYXLkLdmvjr)Kw09r$t$qPaXmb&-n_CS(Xx?PJ@#dho)TZjrgJ=}3M_W`d zLA_!w>a1(s|`Kt&N8tzB%IL)%Un5**&Q9a4JO^79A=6yvT z_jC8n8*jD>Uw;@Hx&nN}KgafxpMdif=pAUt-7*5zRa@?87Vdbl7BDeVXofXt8sHvH za2d=T3P-|hBHqqhAoZ=3Iyy?s@Y9&Cg&LW8+<2M>+{M6q9u7bXe4MT^0{~Di-td5f zNKz1$t}%&KT4Yl-()l5G^XY{4^w^?<3$6uyez%5BOl}FAhQ*!S?iP*MXL%erntuvs zwXP=Vf7oYWV#3wcWh^b#v9>B6_+Xxq)+H|$9sT^*@Y8L%kqGGl5dj9_$OjzOA|roj zdZK0yF{i4-3+-L^3N37{WQJx=%ejiYhKhR>vv91qEwFK_xD~V9%Ypm#`2Dh=H7)u3a9i7oude`lx6U{6PZh^{e&3d^ z%D98bC?*v|`xXY5EUg(^+xI!T$}b_>f)G={DueJejOAo~{+TOI@_ne1%~{RI)rfr+ zK01fn|DcQRaOr(Ujj+sFGB`k(!2tpV4Fe7kXwdfn!BlmC#=vG5#Udl;U}aMQM~9fW za`=nu_yGS94H6%s)D3m2P-_zT%&++y|DB3%Tv@F=CCg4sX~H+L3`W2P@xc)GW;WgA zpTgX!&8?@Co><(WNId1|3ik^t<~XcoOWf~Y!&d<6TWfMrdeZQW zOdgfBWU(D6KHCd$D%oRiYIbm%+M?o;GR7bms?ry4asN2olU|c#*p(f|b1_iWZ-5w6=^!g_wV0>m|4P+TG*Zto1bOuRKbdRSt$6}VwOr2WQr z3RVuI6RFo|>}Cry<|1Ir;&A3^$U*rP0M?Q*+<{=xha6M3`v^Rq1fDixURI44ieuXo z_P*V5%#7asMvD&(UniW?*b(7E;Ccl%7jjjcU%!g>R_!?xSvHBe4}Mu?3YCBjq}1Sv zb!2biPT)y%)3^@d>>X=OejOH8kFGYM26AmH0ySyalL&xDkRB)-=87YPfL9#;-D(j_l&*odG*&J}1l*z3@N94(M@ z1sYQ_vKSqMRYLhi1Qwg9xJn}oyO=6AIi;Ed1&7aDM_4jrDHBtBr?3A#j0lONRXV!q z)8bKF`c<-*wx&(PUfLQLCRzvnMk6@?g{I4YiSIXgKQv+434L0V3uJ!{i??mN+WF=j zt*5?&)F^Y&9TNLU{Z@}@&gTGK&YtcMAg;Eju_f#$h=wexvOt zk&+)3O&1>a#&yP!q2&PmgYOXMeKaqR(;fAKPU$e@3gF6U5q_j8i;)Gr4#3<@Z1i}x z&-oY*y0(hGYLg+y^qlgxt-={}mTg4BtLliOc4Zv){)@^4tqYmsR`k*Sl0{TfD<(C8 zxz-@@aS+l5I(EO5NCP!nU~$Ju%xm}AHUoQ+*2x?Ol*>?$4YfQLER@g{#09VAz4PU57l3U^41s-$R{J0My7}Pg#U7mi}Sy zi1a7{kgmnIl47~4<`Xy4eV+pw-1F}HQ)xL71eC0UZ3?YIS)1Hg_~tj}IM5Q4LNv*p zXIp2aDLhH>jxOs{l1!uuKKthsYlV;eUyYh;dBQDCG$CcUT5 z&0jk&qrYV)v=xe5vV0waFa2z?%7<@SuPMT%LyKRoX;U2JHF|SxYiJfdber> zTY3s-OD$JajT$EKtJY}=Aar+sa)PsBb&7hO4rV>?!PzHM)S!@xHL$$-l2AThi@W+e z`m6XpC+~=V454ii9#F9q9kC>5-dI;Ucv3HfNlly2$5#77Q)XXy|7J%GCt;?0_GlXK zo>gFI&eKmx=>_FE;M}7qX#RVUk}><5d4(?%eX6YGzMoOHjYiC~5)92d^xaZAL3wOA z78Lo-e-q*@xdOa^a1{N@4-v8e41_%tgt^aetNn`nF3q(p*K@Ch1CzO&CdmcXvmtVa z>?>0O^p$7F#M=d~*1X6q3~0Hfqi`g&LB4SX zKqLpP=vottbY%N0SjX;&5*>RNr*d{5KcU_VR54yvNG?92SCMfZ&*cSjw+c9`6 zM3DR$E{u0h3&!XhB&fy=%eiSI3|m0a!U1BmT-cNokUn96?QjyDWOx&;0I(_CM~xMN zXoFy%GY0!yNKHhQkeCcPNC$0LL@hWUqg4>Vkkwgu+&5=*-o@1@|Ye|9CaaZoG!v}j|ZCRh`#Se&5X z$k*)uCM`kRabL@?e1EbQFV>scHSVM*-Oonyza#&flORHk3yzNgC8`vn;K9Nd3x+0 zwryn2D%tMG2+rDo{Brs#1>NqKJ5@2;JdG6sUc7p3EU88^1x{+dUeSdk-rRkco7W0$ z4H`dFG~>_))G(0(3Zd-}#@ehC?wELW`I=*88pbKC z9%A(Kx%))5{*Os~tfA2p$dYwzs);KyXFRL3q)+nJwtx=_d~1R$c~Cpw)LvA$lB2IZ zS|D31#o-NVBs#+L)_q=wr0EK=R@7Z{4 zFI%uN9!i2Xb#0u+(N97HB?o5L6_sA(i*QuK{1`49zBT!P$lV$>?XxqB%-g!aBEnsc z1|!6&Z4o87AEbfAn$z!M4HA^Uci!2DzxK9&%C1zAr$(kn>Tf>)7!yfbe{2=0U%whv zR!$OeMEy&FoP4Q0sQ&^1(g%G{2K8J1_jpMQK~upE;Oiq{pt+Wj=g^JWbs7fLp41o3 z4F!E}#r;CE^n9tVBjc3zP6p=~uVYr{okKx&YSN{LlDnd4Ta#dauop3ed;i0pry%>{ ziG(YhyB+7`MymP0>1bgqIoB>_*YY&+&b*E9`!N;fRk9|wVT0t6TGO~UIk`=tQ#!0X zJzBuwP_TQucb~xYV|?_Sb85z#C@DXJ7f(nIqn16Vk2M#3PPc{HzsK<=WnkHh#Jt2q z7z@wF_E)7|y=j)swRmigIqNYSXJqKLJZTb|Th*j>M@XwD)C<#Cg=vu)Da35o<~8tN z`_$JH5R_hl|Ar9Hfzeua@1%seGTEwCF+bL2aG#`^F87ygU4s~Cw(0tVaFT_-_gY; zoU}Tg)=|=3u49Hdy^yk}akD_N9ZYfJWUEOl?$ls}Vl}zDOeXc{%BECS%t(5PShAP5 zCLe8<=~@#F`3^o+6|8JKSU-E2)9a2T_9$A&>^a?5P}P#CV90IIsNIj7QZ;j%b7UL| zC&(nv9e+j?aZ2VDBtq<}H}sZQg-Wpm5znRQ53ob0x=+J9(7uh-x}-VyzeL zRZvmreNjMSAFPb&!aYz9{0tqTY>VFyAhi2&Ls#Zr)NFY4SSz{{>+X(nu0zxFblx#W zzT3l0)8|`@^KQ~@qM~oG_R*-C+`!hSJe`aZLfJEyf-aGf$C05ziTk>tK#tK3ctRyj1a&9fSSGpk4kb$t6%+A@}bx|_iSuwc& z%oZi{a#JSVCVav!yY*QT%X$=ys313_v;@62Dk}rtRMtwd@ z|9m?AE-)bJ$D#hhm1ObzQ4dK!ZJ>i$_kW2U*Zcl=!1E`^L>I=_{$0TG?G=DSp Date: Sat, 27 Apr 2024 16:43:31 +0800 Subject: [PATCH 05/20] Update model-selection.ipynb --- .../ml-advanced/model-selection.ipynb | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb b/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb index a368a198de..4d9f212dcc 100644 --- a/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb +++ b/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb @@ -215,10 +215,13 @@ "The confusion matrix displays the correspondence between the predicted results of a model and the actual labels in the form of a table, helping us understand the model's performance on different classes.\n", "\n", "The structure of the confusion matrix table is as follows:\n", - " \n", - " Predicted Positive Predicted Negative\n", - "Actual Positive True Positive (TP) False Negative (FN)\n", - "Actual Negative False Positive (FP) True Negative (TN)\n", + "\n", + ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/model-selection/confusion_matrix.png\n", + "---\n", + "name: confusion_matrix\n", + "---\n", + "confusion matrix\n", + ":::\n", "\n", "The meanings of the four values are as follows:\n", "True Positive (TP): The number of positive instances correctly predicted as positive by the model.\n", From 992975f7bb9b51c18b9de378b725be7e0fbae58c Mon Sep 17 00:00:00 2001 From: JERRYenSHU503 <1929891932@qq.com> Date: Sat, 27 Apr 2024 23:33:48 +0800 Subject: [PATCH 06/20] Update model-selection.ipynb --- .../ml-advanced/model-selection.ipynb | 319 ++++++++++++------ 1 file changed, 208 insertions(+), 111 deletions(-) diff --git a/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb b/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb index 4d9f212dcc..6c5b3fe61c 100644 --- a/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb +++ b/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb @@ -49,11 +49,7 @@ "\n", "### Overview\n", "\n", - "Remember that the main objective of any machine learning model is to generalize the learning based on training data, so that it will be able to do predictions accurately on unknown data. Here are a few concepts: the first is 'Hypothesis', the second is 'Truth'. When we obtain data and train it, we propose a hypothesis, and the process of forcing the hypothesis to be as close to the truth as possible is our training process. \n", - "\n", - "### Fitting process\n", - "\n", - "This process is called 'fitting', which means the model tries to learn the patterns, relationships, or rules in the data in order to make predictions or classifications on unknown data. Due to the existence of errors in the hypothesis, we introduce the concepts of generalization error and empirical error (training error). The generalization error represents the error in unknown samples when we fit the model to the truth. It is uncertain. On the other hand, the empirical error represents the error on the training set, and it can be determined. In order to reduce the error and approach the truth, we need model evaluation. However, due to the occurrence of overfitting, a smaller error does not necessarily indicate a better model.\n", + "Remember that the main objective of any machine learning model is to generalize the learning based on training data, so that it will be able to do predictions accurately on unknown data. Here are a few concepts: the first is 'Hypothesis', the second is 'Truth'. When we obtain data and train it, we propose a hypothesis, and the process of forcing the hypothesis to be as close to the truth as possible is our training process. This process is called 'fitting', which means the model tries to learn the patterns, relationships, or rules in the data in order to make predictions or classifications on unknown data. Due to the existence of errors in the hypothesis, we introduce the concepts of generalization error and empirical error (training error). The generalization error represents the error in unknown samples when we fit the model to the truth. It is uncertain. On the other hand, the empirical error represents the error on the training set, and it can be determined. In order to reduce the error and approach the truth, we need model evaluation. However, due to the occurrence of overfitting, a smaller error does not necessarily indicate a better model.\n", "\n", "As you can notice the words 'Overfitting' and 'Underfitting' are kind of opposite of the term 'Generalization'. Overfitting and underfitting models don't generalize well and results in poor performance.\n", "\n", @@ -68,26 +64,26 @@ "\n", "During the fitting process, we have an important parameter called 'bias'. It refers to the deviation of the model from the true relationship when attempting to fit the data.\n", "\n", - "#### Underfitting\n", + "### Underfitting\n", "\n", "* Underfitting occurs when machine learning model don't fit the training data well enough. It is usually caused by simple function that cannot capture the underlying trend in the data.\n", "* Underfitting models have high error in training as well as test set. This behavior is called as 'Low Bias'\n", "* This usually happens when we try to fit linear function for non-linear data.\n", "* Since underfitting models don't perform well on training set, it's very easy to detect underfitting\n", "\n", - "##### How To Avoid Underfitting?\n", + "#### How To Avoid Underfitting?\n", "* Increasing the model complexity. e.g. If linear function under fit then try using polynomial features\n", "* Increase the number of features by performing the feature engineering\n", "\n", - "#### Overfitting\n", + "### Overfitting\n", "* Overfitting occurs when machine learning model tries to fit the training data too well. It is usually caused by complicated function that creates lots of unnecessary curves and angles that are not related with data and end up capturing the noise in data.\n", "* Overfitting models have low error in training set but high error in test set. This behavior is called as 'High Variance'\n", "\n", - "##### How To Avoid Overfitting?\n", + "#### How To Avoid Overfitting?\n", "* Since overfitting algorithm captures the noise in data, reducing the number of features will help. We can manually select only important features or can use model selection algorithm for same\n", "* We can also use the 'Regularization' technique. It works well when we have lots of slightly useful features. Sklearn linear model(Ridge and LASSO) uses regularization parameter 'alpha' to control the size of the coefficients by imposing a penalty. Please refer below tutorials for more details.\n", "\n", - "#### Good Fitting \n", + "### Good Fitting \n", "* It is a sweet spot between Underfitting and Overfitting model\n", "* A good fitting model generalizes the learnings from training data and provide accurate predictions on new data\n", "* To get the good fitting model, keep training and testing the model till you get the minimum train and test error. Here important parameter is 'test error' because low train error may cause overfitting so always keep an eye on test error fluctuations. The sweet spot is just before the test error start to rise.\n", @@ -119,6 +115,10 @@ "Training data points \n", ":::\n", "\n", + "First we have some data points, then we're going to train it by linear regression.\n", + "\n", + "This shows how an over-fitting model fits the trainingset. \n", + "\n", ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/model-selection/bias-variance-overfitting.jpg\n", "---\n", "name: Over-fitting-train-ms\n", @@ -126,6 +126,9 @@ "Over-fitting model fits very well on training data\n", ":::\n", "\n", + "Of course we are going to fit the model on the testset we determined.\n", + "\n", + "AS you can see, this over-fitting model can't fit well on the testset.\n", "\n", ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/model-selection/bias-variance-overfitting-testdata.jpg\n", "---\n", @@ -134,14 +137,21 @@ "Over-fitting model fits poorly on test data \n", ":::\n", "\n", + "Let's fit an unerfitting model on the trainingset.\n", + "\n", + "We can see the result very clearly on the picture that we'er 'under-fitting'.\n", "\n", ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/model-selection/bias-variance-underfitting.jpg\n", "---\n", "name: Under-fitting-train-ms\n", "---\n", - "Under-fitting model fits poorly on training data\n", + "Under-fitting model fits poorly on training data.\n", ":::\n", "\n", + "But we can'tonly feel how it fits, we have to test it.\n", + "\n", + "Then let's test it on the testset.\n", + "\n", ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/model-selection/bias-variance-underfitting-test-data.jpg\n", "---\n", "name: Under-fitting-test-ms\n", @@ -149,14 +159,19 @@ "Under-fitting model fits poorly on test data\n", ":::\n", "\n", + "After seeing the under-fitting model and the over-fitting model we are eager to know what is a good-fitting model.\n", + "\n", + "Here we are\n", "\n", ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/model-selection/bias-variance-perfect-fit.jpg\n", "---\n", "name: Perfect-fitting-train-ms\n", "---\n", - "Perfect-fitting model fits well on training data\n", + "Perfect-fitting model fits well on training data.\n", ":::\n", "\n", + "Remember, we have to test it on the testset, and the result comes right here. It fits quit well on the testset.\n", + "\n", ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/model-selection/bias-variance-perfect-fit-test-data.jpg\n", "---\n", "name: Perfect-fitting-test-ms\n", @@ -193,9 +208,7 @@ "name: Model-complexity-ms\n", "---\n", "Model complexity v.s. error\n", - ":::\n", - "\n", - "\n" + ":::\n" ] }, { @@ -208,22 +221,80 @@ "\n", "Were there some ways that can be used to represent the bias and variance of a model?\n", "\n", - "First, when we start training, \n", + "First, when we start training, how to evaluate the goodness of fit?\n", "\n", - "We utilize the confusion matrix to obtain a set of parameters.\n", + "The simplest way is to output some metrics that can substitute for bias and variance. Here are several metrics that can be used for calculation:\n", "\n", - "The confusion matrix displays the correspondence between the predicted results of a model and the actual labels in the form of a table, helping us understand the model's performance on different classes.\n", + "Accuracy: Accuracy is a commonly used evaluation metric in classification models. It represents the proportion of correctly classified samples in the predictions made by the model. A higher accuracy indicates better performance. However, when there is class imbalance in the dataset, accuracy may underestimate the model's performance.\n", "\n", - "The structure of the confusion matrix table is as follows:\n", + "Precision and Recall: Precision and recall are primarily used to evaluate the performance of binary classification models, especially in the presence of class imbalance. Precision represents the proportion of true positive samples among those predicted as positive, while recall represents the proportion of true positive samples among all actual positive samples. Precision and recall can help provide a comprehensive evaluation of the model's classification performance.\n", "\n", - ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/model-selection/confusion_matrix.png\n", - "---\n", - "name: confusion_matrix\n", - "---\n", - "confusion matrix\n", - ":::\n", + "F1 Score: The F1 score is the harmonic mean of precision and recall, providing a balanced assessment of a model's accuracy and recall performance. A higher F1 score indicates better performance.\n", + "\n", + "Mean Squared Error (MSE): MSE is a commonly used evaluation metric in regression models. It represents the average of the squared differences between predicted values and true values. A smaller MSE indicates better performance.\n", + "\n", + "Log Loss: Log loss is commonly used in binary or multi-class probability prediction problems. It measures the difference between predicted probabilities and true labels. A lower log loss indicates better performance.\n", + "\n", + "These metrics are used to evaluate the performance of models in the model selection process. However, it's important to note that these metrics only reflect the fit of the model to a particular dataset and may not fully capture its generalization performance.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "id": "82087a9b", + "metadata": {}, + "source": [ + "### Confusion matrix" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c1fd3833", + "metadata": { + "tags": [ + "hide-input" + ] + }, + "outputs": [], + "source": [ + "#This is a note of confusion matrix\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.metrics import confusion_matrix\n", + "\n", + "# Create actual labels and predicted labels\n", + "actual_labels = [0, 1, 0, 1, 1, 0, 0, 1]\n", + "predicted_labels = [0, 1, 1, 1, 0, 1, 0, 0]\n", + "\n", + "# Compute the confusion matrix\n", + "cm = confusion_matrix(actual_labels, predicted_labels)\n", + "\n", + "# Plot the confusion matrix\n", + "plt.imshow(cm, cmap=plt.cm.Blues)\n", + "plt.title('Confusion Matrix')\n", + "plt.colorbar()\n", + "plt.xticks([0, 1], ['Predicted 0', 'Predicted 1'])\n", + "plt.yticks([0, 1], ['Actual 0', 'Actual 1'])\n", + "\n", + "# Display counts in each cell\n", + "thresh = cm.max() / 2\n", + "for i in range(cm.shape[0]):\n", + " for j in range(cm.shape[1]):\n", + " plt.text(j, i, format(cm[i, j]), horizontalalignment=\"center\", color=\"white\" if cm[i, j] > thresh else \"black\")\n", + "\n", + "plt.xlabel('Predicted label')\n", + "plt.ylabel('True label')\n", + "plt.show()" + ] + }, + { + "cell_type": "markdown", + "id": "c92b2136", + "metadata": {}, + "source": [ "\n", - "The meanings of the four values are as follows:\n", + "There are four values in the matrix their meanings are as follows:\n", "True Positive (TP): The number of positive instances correctly predicted as positive by the model.\n", "False Negative (FN): The number of positive instances incorrectly predicted as negative by the model.\n", "False Positive (FP): The number of negative instances incorrectly predicted as positive by the model.\n", @@ -243,8 +314,7 @@ "F1 Score: The harmonic mean of precision and recall, considering both the accuracy and the identification ability of the model.\n", "F1 Score = 2 * (Precision * Recall) / (Precision + Recall)\n", "\n", - "When evaluating the bias of a model, we usually consider metrics such as precision, accuracy, and F1 score. A lower F1 score may indicate that the model has issues in balancing accuracy and identification ability, but it cannot be simply equated to lower bias. By considering multiple metrics and the specific requirements of the application scenario, a more comprehensive assessment of the model's performance can be achieved.\n", - "\n" + "When evaluating the bias of a model, we usually consider metrics such as precision, accuracy, and F1 score. A lower F1 score may indicate that the model has issues in balancing accuracy and identification ability, but it cannot be simply equated to lower bias. By considering multiple metrics and the specific requirements of the application scenario, a more comprehensive assessment of the model's performance can be achieved.\n" ] }, { @@ -252,7 +322,8 @@ "id": "1130ca67", "metadata": {}, "source": [ - "## Method\n", + "## Method \n", + "\n", "Does a lower recall rate indicate better bias?\n", "\n", "**No**, a lower recall rate does not indicate better bias. In machine learning, recall rate is a metric that measures the model's ability to identify positive instances. A higher recall rate indicates that the model can better identify positive instances, while a lower recall rate means that the model may miss some true positive instances.\n", @@ -297,7 +368,86 @@ }, { "cell_type": "markdown", - "id": "5957cb79", + "id": "a98e5323", + "metadata": {}, + "source": [ + "## Interpreting the Learning Curves\n", + "\n", + "You might think about the information in the training data as being of two kinds: *signal* and *noise*. The signal is the part that generalizes, the part that can help our model make predictions from new data. The noise is that part that is *only* true of the training data; the noise is all of the random fluctuation that comes from data in the real-world or all of the incidental, non-informative patterns that can't actually help the model make predictions. The noise is the part might look useful but really isn't.\n", + "\n", + "We train a model by choosing weights or parameters that minimize the loss on a training set. You might know, however, that to accurately assess a model's performance, we need to evaluate it on a new set of data, the *validation* data. \n", + "\n", + "When we train a model we've been plotting the loss on the training set epoch by epoch. To this we'll add a plot the validation data too. These plots we call the **learning curves**. To train deep learning models effectively, we need to be able to interpret them.\n", + "\n", + "Now, the training loss will go down either when the model learns signal or when it learns noise. But the validation loss will go down only when the model learns signal. (Whatever noise the model learned from the training set won't generalize to new data.) So, when a model learns signal both curves go down, but when it learns noise a *gap* is created in the curves. The size of the gap tells you how much noise the model has learned.\n", + "\n", + "Ideally, we would create models that learn all of the signal and none of the noise. This will practically never happen. Instead we make a trade. We can get the model to learn more signal at the cost of learning more noise. So long as the trade is in our favor, the validation loss will continue to decrease. After a certain point, however, the trade can turn against us, the cost exceeds the benefit, and the validation loss begins to rise.\n", + "\n", + "This trade-off indicates that there can be two problems that occur when training a model: not enough signal or too much noise. Underfitting the training set is when the loss is not as low as it could be because the model hasn't learned enough *signal*. Overfitting the training set is when the loss is not as low as it could be because the model learned too much *noise*. The trick to training deep learning models is finding the best balance between the two.\n", + "\n", + "We'll look at a couple ways of getting more signal out of the training data while reducing the amount of noise.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "63830061", + "metadata": { + "tags": [ + "hide-input" + ] + }, + "outputs": [], + "source": [ + "#This is a note of a learning curve by using the iris dataset in sklearn\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.datasets import load_iris\n", + "from sklearn.model_selection import learning_curve\n", + "from sklearn.model_selection import train_test_split\n", + "from sklearn.linear_model import LogisticRegression\n", + "\n", + "# Load Iris Dataset\n", + "iris = load_iris()\n", + "X = iris.data\n", + "y = iris.target\n", + "\n", + "# Split the data into training and test sets\n", + "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n", + "\n", + "# Create the learning curve function\n", + "def plot_learning_curve(estimator, X, y):\n", + " train_sizes, train_scores, val_scores = learning_curve(estimator, X, y, train_sizes=np.linspace(0.1, 1.0, 10), cv=5)\n", + "\n", + " # Compute the average accuracy and standard deviation for the training and validation sets\n", + " train_mean = np.mean(train_scores, axis=1)\n", + " train_std = np.std(train_scores, axis=1)\n", + " val_mean = np.mean(val_scores, axis=1)\n", + " val_std = np.std(val_scores, axis=1)\n", + "\n", + " # Plot the learning curve\n", + " plt.figure(figsize=(8, 6))\n", + " plt.plot(train_sizes, train_mean, label='Training Accuracy', marker='o')\n", + " plt.fill_between(train_sizes, train_mean - train_std, train_mean + train_std, alpha=0.2)\n", + " plt.plot(train_sizes, val_mean, label='Validation Accuracy', marker='o')\n", + " plt.fill_between(train_sizes, val_mean - val_std, val_mean + val_std, alpha=0.2)\n", + " plt.xlabel('Training Examples')\n", + " plt.ylabel('Accuracy')\n", + " plt.title('Learning Curve')\n", + " plt.legend(loc='best')\n", + " plt.grid(True)\n", + " plt.show()\n", + "\n", + "# Create a logistic regression mode\n", + "model = LogisticRegression()\n", + "\n", + "# Plot the learning curve\n", + "plot_learning_curve(model, X_train, y_train)" + ] + }, + { + "cell_type": "markdown", + "id": "7338d800", "metadata": {}, "source": [ "\n", @@ -324,29 +474,6 @@ "Considering the above methods and guidelines, selecting an appropriate model capacity requires a balance between theory and practice and decision-making based on the specific problem and available resources. The ultimate goal is to choose a model that performs well on both the training data and new data, achieving good generalization ability." ] }, - { - "cell_type": "markdown", - "id": "5617ad72", - "metadata": {}, - "source": [ - "## Interpreting the Learning Curves\n", - "\n", - "You might think about the information in the training data as being of two kinds: *signal* and *noise*. The signal is the part that generalizes, the part that can help our model make predictions from new data. The noise is that part that is *only* true of the training data; the noise is all of the random fluctuation that comes from data in the real-world or all of the incidental, non-informative patterns that can't actually help the model make predictions. The noise is the part might look useful but really isn't.\n", - "\n", - "We train a model by choosing weights or parameters that minimize the loss on a training set. You might know, however, that to accurately assess a model's performance, we need to evaluate it on a new set of data, the *validation* data. \n", - "\n", - "When we train a model we've been plotting the loss on the training set epoch by epoch. To this we'll add a plot the validation data too. These plots we call the **learning curves**. To train deep learning models effectively, we need to be able to interpret them.\n", - "\n", - "Now, the training loss will go down either when the model learns signal or when it learns noise. But the validation loss will go down only when the model learns signal. (Whatever noise the model learned from the training set won't generalize to new data.) So, when a model learns signal both curves go down, but when it learns noise a *gap* is created in the curves. The size of the gap tells you how much noise the model has learned.\n", - "\n", - "Ideally, we would create models that learn all of the signal and none of the noise. This will practically never happen. Instead we make a trade. We can get the model to learn more signal at the cost of learning more noise. So long as the trade is in our favor, the validation loss will continue to decrease. After a certain point, however, the trade can turn against us, the cost exceeds the benefit, and the validation loss begins to rise.\n", - "\n", - "This trade-off indicates that there can be two problems that occur when training a model: not enough signal or too much noise. Underfitting the training set is when the loss is not as low as it could be because the model hasn't learned enough *signal*. Overfitting the training set is when the loss is not as low as it could be because the model learned too much *noise*. The trick to training deep learning models is finding the best balance between the two.\n", - "\n", - "We'll look at a couple ways of getting more signal out of the training data while reducing the amount of noise.\n", - "\n" - ] - }, { "cell_type": "markdown", "id": "037aa612", @@ -388,46 +515,42 @@ "L1 and L2 regularization\n", ":::\n", "\n", + "Both are very common regularization techniques, but they are suitable for different scenarios. L1 regularization is suitable for situations that require feature selection or demand model interpretability. On the other hand, L2 regularization is more general and applicable in most cases to prevent overfitting and improve model generalization ability.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "id": "2ccb9c25", + "metadata": {}, + "source": [ "\n", - ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/model-selection/L1L2contour.png\n", - "---\n", - "name: explainedairegularization-ms\n", - "---\n", - "L1 and L2 regularization \n", - ":::\n", - "\n", - "\n", - "\n", - ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/model-selection/ridgelassoItayEvron.gif\n", - "---\n", - "name: berkeley189s21-ms\n", - "---\n", - "Different $\\beta$ and ellipses \n", - ":::\n", - "\n", - "\n", + "## Early Stopping\n", "\n", + "We mentioned that when a model is too eagerly learning noise, the validation loss may start to increase during training. To prevent this, we can simply stop the training whenever it seems the validation loss isn't decreasing anymore. Interrupting the training this way is called **early stopping**.\n", "\n", - ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/model-selection/p-norm_balls.webp\n", - "---\n", - "name: p-norm_balls\n", - "---\n", - "Different p norm \n", - ":::\n", + "Once we detect that the validation loss is starting to rise again, we can reset the weights back to where the minimum occured. This ensures that the model won't continue to learn noise and overfit the data.\n", "\n", + "Training with early stopping also means we're in less danger of stopping the training too early, before the network has finished learning signal. So besides preventing overfitting from training too long, early stopping can also prevent *underfitting* from not training long enough. Just set your training epochs to some large number (more than you'll need), and early stopping will take care of the rest.\n", "\n", + "## Adding Early Stopping\n", "\n", + "In Keras, we include early stopping in our training through a callback. A **callback** is just a function you want run every so often while the network trains. The early stopping callback will run after every epoch. (Keras has [a variety of useful callbacks](https://www.tensorflow.org/api_docs/python/tf/keras/callbacks) pre-defined, but you can [define your own](https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/LambdaCallback), too.)\n", "\n", - ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/model-selection/elastic_net_balls.webp\n", + ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/model-selection/traintestoverfitting.png\n", "---\n", - "name: ElasticNet-ms\n", + "name: EarlyStopping-ms\n", "---\n", - "ElasticNet \n", - ":::\n", - "\n", - "Both are very common regularization techniques, but they are suitable for different scenarios. L1 regularization is suitable for situations that require feature selection or demand model interpretability. On the other hand, L2 regularization is more general and applicable in most cases to prevent overfitting and improve model generalization ability.\n", - "\n", - "### The impact of the value of $\\lambda$ \n", + "Early stopping\n", + ":::\n" + ] + }, + { + "cell_type": "markdown", + "id": "585321db", + "metadata": {}, + "source": [ + "## The impact of the value of $\\lambda$ \n", "\n", "We notice that the objective function contains not only the regularization term but also the regularization parameter $\\lambda$.\n", "\n", @@ -466,32 +589,6 @@ "\n" ] }, - { - "cell_type": "markdown", - "id": "a1b5d9f0", - "metadata": {}, - "source": [ - "\n", - "## Early Stopping\n", - "\n", - "We mentioned that when a model is too eagerly learning noise, the validation loss may start to increase during training. To prevent this, we can simply stop the training whenever it seems the validation loss isn't decreasing anymore. Interrupting the training this way is called **early stopping**.\n", - "\n", - "Once we detect that the validation loss is starting to rise again, we can reset the weights back to where the minimum occured. This ensures that the model won't continue to learn noise and overfit the data.\n", - "\n", - "Training with early stopping also means we're in less danger of stopping the training too early, before the network has finished learning signal. So besides preventing overfitting from training too long, early stopping can also prevent *underfitting* from not training long enough. Just set your training epochs to some large number (more than you'll need), and early stopping will take care of the rest.\n", - "\n", - "## Adding Early Stopping\n", - "\n", - "In Keras, we include early stopping in our training through a callback. A **callback** is just a function you want run every so often while the network trains. The early stopping callback will run after every epoch. (Keras has [a variety of useful callbacks](https://www.tensorflow.org/api_docs/python/tf/keras/callbacks) pre-defined, but you can [define your own](https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/LambdaCallback), too.)\n", - "\n", - ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/model-selection/traintestoverfitting.png\n", - "---\n", - "name: EarlyStopping-ms\n", - "---\n", - "Early stopping\n", - ":::\n" - ] - }, { "cell_type": "markdown", "id": "03100d74", @@ -591,7 +688,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.11.4" + "version": "3.12.3" } }, "nbformat": 4, From 71208a3174305c1e45d49169cc9af15148749a2e Mon Sep 17 00:00:00 2001 From: JERRYenSHU503 <1929891932@qq.com> Date: Sun, 28 Apr 2024 10:38:44 +0800 Subject: [PATCH 07/20] Update model-selection.ipynb --- .../ml-advanced/model-selection.ipynb | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb b/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb index 6c5b3fe61c..a8ca03008d 100644 --- a/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb +++ b/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb @@ -293,6 +293,9 @@ "id": "c92b2136", "metadata": {}, "source": [ + "Above, we output a confusion matrix on actual_labels = [0, 1, 0, 1, 1, 0, 0, 1] and the predicted_labels = [0, 1, 1, 1, 0, 1, 0, 0]\n", + "\n", + "Of course, here we are just demonstrating how to output the confusion matrix to understand its meaning after obtaining these two sets of data. In the subsequent experiment, we will explain how to obtain the desired confusion matrix through code.\n", "\n", "There are four values in the matrix their meanings are as follows:\n", "True Positive (TP): The number of positive instances correctly predicted as positive by the model.\n", @@ -300,6 +303,8 @@ "False Positive (FP): The number of negative instances incorrectly predicted as positive by the model.\n", "True Negative (TN): The number of negative instances correctly predicted as negative by the model.\n", "\n", + "As for the matrix we have above, TP is where we predicted as 1 and actually it is 1. FN is the acount that we predicted as 0 but actually it is 1. FP is predicted as 1 but actually it's 0. TN is we predicted as 0 and it's actually 0.\n", + "\n", "After understanding the meaning of the matrix, we can use the following algorithms to calculate the desired metrics:\n", "\n", "Accuracy: The ratio of the number of correctly predicted samples to the total number of samples.\n", @@ -385,7 +390,7 @@ "\n", "This trade-off indicates that there can be two problems that occur when training a model: not enough signal or too much noise. Underfitting the training set is when the loss is not as low as it could be because the model hasn't learned enough *signal*. Overfitting the training set is when the loss is not as low as it could be because the model learned too much *noise*. The trick to training deep learning models is finding the best balance between the two.\n", "\n", - "We'll look at a couple ways of getting more signal out of the training data while reducing the amount of noise.\n" + "We'll look at a couple ways of getting more signal out of the training data while reducing the amount of noise later.\n" ] }, { @@ -445,6 +450,16 @@ "plot_learning_curve(model, X_train, y_train)" ] }, + { + "cell_type": "markdown", + "id": "e5aa3d14", + "metadata": {}, + "source": [ + "First of all, let's take a look at a plot, this is a simple learning curve using an iris dataset in sklearn.dataset. We can simply notice the two curve we plot fells far apart when we have less examples, and when we enlarge the training examples we can see the two lines are approaching convergence.\n", + "\n", + "This is how we can see the fitting process using learning curve." + ] + }, { "cell_type": "markdown", "id": "7338d800", From 0bc0fa02733d37a230a8d1e6e91249765d85057f Mon Sep 17 00:00:00 2001 From: JERRYenSHU503 <1929891932@qq.com> Date: Sun, 28 Apr 2024 12:52:59 +0800 Subject: [PATCH 08/20] Update model-selection.ipynb --- .../ml-advanced/model-selection.ipynb | 59 +++++++++---------- 1 file changed, 29 insertions(+), 30 deletions(-) diff --git a/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb b/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb index a8ca03008d..5ce292ed1b 100644 --- a/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb +++ b/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb @@ -409,45 +409,44 @@ "import matplotlib.pyplot as plt\n", "from sklearn.datasets import load_iris\n", "from sklearn.model_selection import learning_curve\n", - "from sklearn.model_selection import train_test_split\n", "from sklearn.linear_model import LogisticRegression\n", + "from sklearn.preprocessing import StandardScaler\n", "\n", - "# Load Iris Dataset\n", + "# Load the Iris dataset\n", "iris = load_iris()\n", "X = iris.data\n", "y = iris.target\n", "\n", - "# Split the data into training and test sets\n", - "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)\n", - "\n", - "# Create the learning curve function\n", - "def plot_learning_curve(estimator, X, y):\n", - " train_sizes, train_scores, val_scores = learning_curve(estimator, X, y, train_sizes=np.linspace(0.1, 1.0, 10), cv=5)\n", - "\n", - " # Compute the average accuracy and standard deviation for the training and validation sets\n", - " train_mean = np.mean(train_scores, axis=1)\n", - " train_std = np.std(train_scores, axis=1)\n", - " val_mean = np.mean(val_scores, axis=1)\n", - " val_std = np.std(val_scores, axis=1)\n", - "\n", - " # Plot the learning curve\n", - " plt.figure(figsize=(8, 6))\n", - " plt.plot(train_sizes, train_mean, label='Training Accuracy', marker='o')\n", - " plt.fill_between(train_sizes, train_mean - train_std, train_mean + train_std, alpha=0.2)\n", - " plt.plot(train_sizes, val_mean, label='Validation Accuracy', marker='o')\n", - " plt.fill_between(train_sizes, val_mean - val_std, val_mean + val_std, alpha=0.2)\n", - " plt.xlabel('Training Examples')\n", - " plt.ylabel('Accuracy')\n", - " plt.title('Learning Curve')\n", - " plt.legend(loc='best')\n", - " plt.grid(True)\n", - " plt.show()\n", - "\n", - "# Create a logistic regression mode\n", + "# Feature scaling\n", + "scaler = StandardScaler()\n", + "X = scaler.fit_transform(X)\n", + "\n", + "# Define a logistic regression model\n", "model = LogisticRegression()\n", "\n", + "# Define the range of training set sizes\n", + "train_sizes = np.linspace(0.1, 1.0, 10)\n", + "\n", + "# Generate learning curve data using the learning_curve function\n", + "train_sizes, train_scores, test_scores = learning_curve(model, X, y, train_sizes=train_sizes, cv=5)\n", + "\n", + "# Calculate the average accuracy for the training and test sets\n", + "train_scores_mean = np.mean(train_scores, axis=1)\n", + "test_scores_mean = np.mean(test_scores, axis=1)\n", + "\n", "# Plot the learning curve\n", - "plot_learning_curve(model, X_train, y_train)" + "plt.figure()\n", + "plt.title(\"Learning Curve\")\n", + "plt.xlabel(\"Training Examples\")\n", + "plt.ylabel(\"Accuracy\")\n", + "plt.grid()\n", + "\n", + "# Plot the accuracy curves for the training and test sets\n", + "plt.plot(train_sizes, train_scores_mean, 'o-', color=\"r\", label=\"Training Accuracy\")\n", + "plt.plot(train_sizes, test_scores_mean, 'o-', color=\"g\", label=\"Cross-validation Accuracy\")\n", + "plt.legend(loc=\"best\")\n", + "\n", + "plt.show()" ] }, { From 5b89101af4578193827095841dd284127e5b9bc6 Mon Sep 17 00:00:00 2001 From: JERRYenSHU503 <1929891932@qq.com> Date: Sun, 28 Apr 2024 13:13:46 +0800 Subject: [PATCH 09/20] Update model-selection.ipynb --- .../ml-advanced/model-selection.ipynb | 55 +++++++++++++++++-- 1 file changed, 51 insertions(+), 4 deletions(-) diff --git a/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb b/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb index 5ce292ed1b..b58b5d508d 100644 --- a/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb +++ b/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb @@ -249,14 +249,25 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 4, "id": "c1fd3833", "metadata": { "tags": [ "hide-input" ] }, - "outputs": [], + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], "source": [ "#This is a note of confusion matrix\n", "import numpy as np\n", @@ -395,14 +406,50 @@ }, { "cell_type": "code", - "execution_count": null, + "execution_count": 3, "id": "63830061", "metadata": { "tags": [ "hide-input" ] }, - "outputs": [], + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "d:\\software\\environment for paper\\Lib\\site-packages\\sklearn\\model_selection\\_validation.py:547: FitFailedWarning: \n", + "15 fits failed out of a total of 50.\n", + "The score on these train-test partitions for these parameters will be set to nan.\n", + "If these failures are not expected, you can try to debug them by setting error_score='raise'.\n", + "\n", + "Below are more details about the failures:\n", + "--------------------------------------------------------------------------------\n", + "15 fits failed with the following error:\n", + "Traceback (most recent call last):\n", + " File \"d:\\software\\environment for paper\\Lib\\site-packages\\sklearn\\model_selection\\_validation.py\", line 895, in _fit_and_score\n", + " estimator.fit(X_train, y_train, **fit_params)\n", + " File \"d:\\software\\environment for paper\\Lib\\site-packages\\sklearn\\base.py\", line 1474, in wrapper\n", + " return fit_method(estimator, *args, **kwargs)\n", + " ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", + " File \"d:\\software\\environment for paper\\Lib\\site-packages\\sklearn\\linear_model\\_logistic.py\", line 1246, in fit\n", + " raise ValueError(\n", + "ValueError: This solver needs samples of at least 2 classes in the data, but the data contains only one class: 0\n", + "\n", + " warnings.warn(some_fits_failed_message, FitFailedWarning)\n" + ] + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], "source": [ "#This is a note of a learning curve by using the iris dataset in sklearn\n", "import numpy as np\n", From 648a4210f8973f25426ee7009a08d86bd98e7033 Mon Sep 17 00:00:00 2001 From: JERRYenSHU503 <1929891932@qq.com> Date: Sun, 28 Apr 2024 13:29:20 +0800 Subject: [PATCH 10/20] Update model-selection.ipynb --- .../ml-advanced/model-selection.ipynb | 47 +++++++++---------- 1 file changed, 21 insertions(+), 26 deletions(-) diff --git a/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb b/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb index b58b5d508d..cef35b22f9 100644 --- a/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb +++ b/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb @@ -404,9 +404,17 @@ "We'll look at a couple ways of getting more signal out of the training data while reducing the amount of noise later.\n" ] }, + { + "cell_type": "markdown", + "id": "1e5c8a70", + "metadata": {}, + "source": [ + "Let's first take a look at a learning curve. In this part we're using a datasets called iris in scikit-learn." + ] + }, { "cell_type": "code", - "execution_count": 3, + "execution_count": 6, "id": "63830061", "metadata": { "tags": [ @@ -414,31 +422,6 @@ ] }, "outputs": [ - { - "name": "stderr", - "output_type": "stream", - "text": [ - "d:\\software\\environment for paper\\Lib\\site-packages\\sklearn\\model_selection\\_validation.py:547: FitFailedWarning: \n", - "15 fits failed out of a total of 50.\n", - "The score on these train-test partitions for these parameters will be set to nan.\n", - "If these failures are not expected, you can try to debug them by setting error_score='raise'.\n", - "\n", - "Below are more details about the failures:\n", - "--------------------------------------------------------------------------------\n", - "15 fits failed with the following error:\n", - "Traceback (most recent call last):\n", - " File \"d:\\software\\environment for paper\\Lib\\site-packages\\sklearn\\model_selection\\_validation.py\", line 895, in _fit_and_score\n", - " estimator.fit(X_train, y_train, **fit_params)\n", - " File \"d:\\software\\environment for paper\\Lib\\site-packages\\sklearn\\base.py\", line 1474, in wrapper\n", - " return fit_method(estimator, *args, **kwargs)\n", - " ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", - " File \"d:\\software\\environment for paper\\Lib\\site-packages\\sklearn\\linear_model\\_logistic.py\", line 1246, in fit\n", - " raise ValueError(\n", - "ValueError: This solver needs samples of at least 2 classes in the data, but the data contains only one class: 0\n", - "\n", - " warnings.warn(some_fits_failed_message, FitFailedWarning)\n" - ] - }, { "data": { "image/png": "", @@ -452,6 +435,18 @@ ], "source": [ "#This is a note of a learning curve by using the iris dataset in sklearn\n", + "\n", + "import warnings\n", + "import numpy as np\n", + "import matplotlib.pyplot as plt\n", + "from sklearn.datasets import load_iris\n", + "from sklearn.model_selection import learning_curve\n", + "from sklearn.linear_model import LogisticRegression\n", + "from sklearn.preprocessing import StandardScaler\n", + "\n", + "# Ignore warnings\n", + "warnings.filterwarnings(\"ignore\")\n", + "\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from sklearn.datasets import load_iris\n", From 832e17cd2bca21f7fe40adf5cdf8b97204911734 Mon Sep 17 00:00:00 2001 From: JERRYenSHU503 <1929891932@qq.com> Date: Sun, 28 Apr 2024 13:37:44 +0800 Subject: [PATCH 11/20] Update model-selection.ipynb --- .../ml-advanced/model-selection.ipynb | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb b/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb index cef35b22f9..4ce9298570 100644 --- a/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb +++ b/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb @@ -498,7 +498,9 @@ "source": [ "First of all, let's take a look at a plot, this is a simple learning curve using an iris dataset in sklearn.dataset. We can simply notice the two curve we plot fells far apart when we have less examples, and when we enlarge the training examples we can see the two lines are approaching convergence.\n", "\n", - "This is how we can see the fitting process using learning curve." + "Why? \n", + "\n", + "To train a model, it is necessary to have a sufficient number of samples so that it can generalize patterns from the data. Assuming we have a function y=f(x), essentially, machine learning algorithms summarize and fit the f function based on a large number of (x, y) pairs. Therefore, if you have too few (x, y) pairs, the algorithm will not be able to summarize the function effectively. This is the impact of the sample size on the degree of fitting." ] }, { From 71766084aafc67322d5c1fcbe2918f7a5913a501 Mon Sep 17 00:00:00 2001 From: JERRYenSHU503 <1929891932@qq.com> Date: Sun, 28 Apr 2024 16:21:02 +0800 Subject: [PATCH 12/20] Update model-selection.ipynb --- .../ml-advanced/model-selection.ipynb | 190 ++++++++---------- 1 file changed, 81 insertions(+), 109 deletions(-) diff --git a/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb b/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb index 4ce9298570..7b10cfda92 100644 --- a/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb +++ b/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb @@ -62,8 +62,14 @@ "Over-fitting and under-fitting in regression\n", ":::\n", "\n", - "During the fitting process, we have an important parameter called 'bias'. It refers to the deviation of the model from the true relationship when attempting to fit the data.\n", - "\n", + "During the fitting process, we have an important parameter called 'bias'. It refers to the deviation of the model from the true relationship when attempting to fit the data.\n" + ] + }, + { + "cell_type": "markdown", + "id": "a657c169", + "metadata": {}, + "source": [ "### Underfitting\n", "\n", "* Underfitting occurs when machine learning model don't fit the training data well enough. It is usually caused by simple function that cannot capture the underlying trend in the data.\n", @@ -104,10 +110,10 @@ "id": "23a0cb84", "metadata": {}, "source": [ - "\n", "### A simple example of linear regression \n", "\n", "This is a simple graphical representation of linear regression training. \n", + "\n", ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/model-selection/bias-variance-datapoints.jpg\n", "---\n", "name: Datapoints-ms\n", @@ -117,67 +123,33 @@ "\n", "First we have some data points, then we're going to train it by linear regression.\n", "\n", - "This shows how an over-fitting model fits the trainingset. \n", + "**Over-fitting model**\n", "\n", - ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/model-selection/bias-variance-overfitting.jpg\n", - "---\n", - "name: Over-fitting-train-ms\n", - "---\n", - "Over-fitting model fits very well on training data\n", - ":::\n", - "\n", - "Of course we are going to fit the model on the testset we determined.\n", - "\n", - "AS you can see, this over-fitting model can't fit well on the testset.\n", - "\n", - ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/model-selection/bias-variance-overfitting-testdata.jpg\n", - "---\n", - "name: Over-fitting-test-ms\n", - "---\n", - "Over-fitting model fits poorly on test data \n", - ":::\n", - "\n", - "Let's fit an unerfitting model on the trainingset.\n", + "| ![Over-fitting-train-ms](https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/model-selection/bias-variance-overfitting.jpg) | ![Over-fitting-test-ms]( https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/model-selection/bias-variance-overfitting-testdata.jpg) |\n", + "|:--:|:--:|\n", + "| Over-fitting-train-ms |  Over-fitting-test-ms |\n", "\n", - "We can see the result very clearly on the picture that we'er 'under-fitting'.\n", + "As we can see, over-fitting model fits very well on training data, but Over-fitting model fits poorly on test data. \n", "\n", - ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/model-selection/bias-variance-underfitting.jpg\n", - "---\n", - "name: Under-fitting-train-ms\n", - "---\n", - "Under-fitting model fits poorly on training data.\n", - ":::\n", + "**Under-fitting model**\n", "\n", - "But we can'tonly feel how it fits, we have to test it.\n", + "| ![Under-fitting-train-ms](https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/model-selection/bias-variance-underfitting.jpg) | ![Under-fitting-test-ms]( https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/model-selection/bias-variance-underfitting-test-data.jpg) |\n", + "|:--:|:--:|\n", + "| Under-fitting-train-ms |  Under-fitting-test-ms |\n", "\n", - "Then let's test it on the testset.\n", + "As for under-fitting model, it fits poorly on training data and test data.\n", "\n", - ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/model-selection/bias-variance-underfitting-test-data.jpg\n", - "---\n", - "name: Under-fitting-test-ms\n", - "---\n", - "Under-fitting model fits poorly on test data\n", - ":::\n", + "**Perfect-fitting model**\n", "\n", "After seeing the under-fitting model and the over-fitting model we are eager to know what is a good-fitting model.\n", "\n", - "Here we are\n", + "| ![Perfect-fitting-train-ms](https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/model-selection/bias-variance-perfect-fit.jpg) | ![Perfect-fitting-test-ms]( https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/model-selection/bias-variance-perfect-fit-test-data.jpg) |\n", + "|:--:|:--:|\n", + "| Perfect-fitting-train-ms |  Perfect-fitting-test-ms |\n", "\n", - ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/model-selection/bias-variance-perfect-fit.jpg\n", - "---\n", - "name: Perfect-fitting-train-ms\n", - "---\n", - "Perfect-fitting model fits well on training data.\n", - ":::\n", - "\n", - "Remember, we have to test it on the testset, and the result comes right here. It fits quit well on the testset.\n", + "Perfect-fitting model fits well on training data on training data and test data!\n", "\n", - ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/model-selection/bias-variance-perfect-fit-test-data.jpg\n", - "---\n", - "name: Perfect-fitting-test-ms\n", - "---\n", - "Perfect-fitting model fits well on test data\n", - ":::\n" + "When overfitting occurs, the model demonstrates high accuracy or low error on the training data but performs poorly on the testing data or new data in practical applications. In contrast, underfitting indicates that the model is unable to capture the complex relationships or patterns within the data." ] }, { @@ -216,7 +188,6 @@ "id": "2f23ec03", "metadata": {}, "source": [ - "\n", "## Metrics\n", "\n", "Were there some ways that can be used to represent the bias and variance of a model?\n", @@ -225,15 +196,15 @@ "\n", "The simplest way is to output some metrics that can substitute for bias and variance. Here are several metrics that can be used for calculation:\n", "\n", - "Accuracy: Accuracy is a commonly used evaluation metric in classification models. It represents the proportion of correctly classified samples in the predictions made by the model. A higher accuracy indicates better performance. However, when there is class imbalance in the dataset, accuracy may underestimate the model's performance.\n", + "**Accuracy**:Accuracy is a commonly used evaluation metric in classification models. It represents the proportion of correctly classified samples in the predictions made by the model. A higher accuracy indicates better performance. However, when there is class imbalance in the dataset, accuracy may underestimate the model's performance.\n", "\n", - "Precision and Recall: Precision and recall are primarily used to evaluate the performance of binary classification models, especially in the presence of class imbalance. Precision represents the proportion of true positive samples among those predicted as positive, while recall represents the proportion of true positive samples among all actual positive samples. Precision and recall can help provide a comprehensive evaluation of the model's classification performance.\n", + "**Precision and Recall**:Precision and recall are primarily used to evaluate the performance of binary classification models, especially in the presence of class imbalance. Precision represents the proportion of true positive samples among those predicted as positive, while recall represents the proportion of true positive samples among all actual positive samples. Precision and recall can help provide a comprehensive evaluation of the model's classification performance.\n", "\n", - "F1 Score: The F1 score is the harmonic mean of precision and recall, providing a balanced assessment of a model's accuracy and recall performance. A higher F1 score indicates better performance.\n", + "**F1 Score**:The F1 score is the harmonic mean of precision and recall, providing a balanced assessment of a model's accuracy and recall performance. A higher F1 score indicates better performance.\n", "\n", - "Mean Squared Error (MSE): MSE is a commonly used evaluation metric in regression models. It represents the average of the squared differences between predicted values and true values. A smaller MSE indicates better performance.\n", + "**Mean Squared Error (MSE)**:MSE is a commonly used evaluation metric in regression models. It represents the average of the squared differences between predicted values and true values. A smaller MSE indicates better performance.\n", "\n", - "Log Loss: Log loss is commonly used in binary or multi-class probability prediction problems. It measures the difference between predicted probabilities and true labels. A lower log loss indicates better performance.\n", + "**Log Loss**: Log loss is commonly used in binary or multi-class probability prediction problems. It measures the difference between predicted probabilities and true labels. A lower log loss indicates better performance.\n", "\n", "These metrics are used to evaluate the performance of models in the model selection process. However, it's important to note that these metrics only reflect the fit of the model to a particular dataset and may not fully capture its generalization performance.\n", "\n" @@ -251,11 +222,7 @@ "cell_type": "code", "execution_count": 4, "id": "c1fd3833", - "metadata": { - "tags": [ - "hide-input" - ] - }, + "metadata": {}, "outputs": [ { "data": { @@ -309,26 +276,30 @@ "Of course, here we are just demonstrating how to output the confusion matrix to understand its meaning after obtaining these two sets of data. In the subsequent experiment, we will explain how to obtain the desired confusion matrix through code.\n", "\n", "There are four values in the matrix their meanings are as follows:\n", - "True Positive (TP): The number of positive instances correctly predicted as positive by the model.\n", - "False Negative (FN): The number of positive instances incorrectly predicted as negative by the model.\n", - "False Positive (FP): The number of negative instances incorrectly predicted as positive by the model.\n", - "True Negative (TN): The number of negative instances correctly predicted as negative by the model.\n", + "**True Positive (TP)**: The number of positive instances correctly predicted as positive by the model.\n", + "**False Negative (FN)**: The number of positive instances incorrectly predicted as negative by the model.\n", + "**False Positive (FP)**: The number of negative instances incorrectly predicted as positive by the model.\n", + "**True Negative (TN)**: The number of negative instances correctly predicted as negative by the model.\n", "\n", "As for the matrix we have above, TP is where we predicted as 1 and actually it is 1. FN is the acount that we predicted as 0 but actually it is 1. FP is predicted as 1 but actually it's 0. TN is we predicted as 0 and it's actually 0.\n", "\n", "After understanding the meaning of the matrix, we can use the following algorithms to calculate the desired metrics:\n", "\n", - "Accuracy: The ratio of the number of correctly predicted samples to the total number of samples.\n", - "Accuracy = (TP + TN) / (TP + TN + FP + FN)\n", + "**Accuracy**: The ratio of the number of correctly predicted samples to the total number of samples.\n", + "\n", + "**Accuracy = (TP + TN) / (TP + TN + FP + FN)**\n", + "\n", + "**Precision**: The proportion of true positive predictions among the predicted positive instances, measuring the prediction accuracy of the model.\n", + "\n", + "**Precision = TP / (TP + FP)**\n", "\n", - "Precision: The proportion of true positive predictions among the predicted positive instances, measuring the prediction accuracy of the model.\n", - "Precision = TP / (TP + FP)\n", + "**Recall**: The proportion of true positive predictions among the actual positive instances, measuring the model's ability to identify positives.\n", "\n", - "Recall: The proportion of true positive predictions among the actual positive instances, measuring the model's ability to identify positives.\n", - "Recall = TP / (TP + FN)\n", + "**Recall = TP / (TP + FN)**\n", "\n", - "F1 Score: The harmonic mean of precision and recall, considering both the accuracy and the identification ability of the model.\n", - "F1 Score = 2 * (Precision * Recall) / (Precision + Recall)\n", + "**F1 Score**: The harmonic mean of precision and recall, considering both the accuracy and the identification ability of the model.\n", + "\n", + "**F1 Score = 2 * (Precision * Recall) / (Precision + Recall)**\n", "\n", "When evaluating the bias of a model, we usually consider metrics such as precision, accuracy, and F1 score. A lower F1 score may indicate that the model has issues in balancing accuracy and identification ability, but it cannot be simply equated to lower bias. By considering multiple metrics and the specific requirements of the application scenario, a more comprehensive assessment of the model's performance can be achieved.\n" ] @@ -356,15 +327,27 @@ "\n", "Analyzing the difference between training error and validation error, Holdout Method,Cross-Validation, and Bootstrapping are all viable approaches.\n", "\n", - "So what are these method?\n", - "\n", + "So what are these method?\n" + ] + }, + { + "cell_type": "markdown", + "id": "b5e50cc2", + "metadata": {}, + "source": [ "### Holdout Method\n", "\n", "Splitting the dataset into mutually exclusive training and testing sets, using the training set to train the model, and then evaluating the model's performance using the testing set. By comparing the performance on different models using the validation set, we can select the best-performing model. The sampling criteria require stratified sampling, which means dividing the data proportionally based on data types. \n", "\n", "However, since different partitioning methods yield different data samples, the results of model evaluation also differ. Typically, we choose a large portion of the dataset (70-80%) as the training set and the remaining portion as the testing set.\n", - "By splitting the dataset, we can observe that the testing set only represents a small portion of the total dataset, which can lead to unstable evaluation results.\n", - "\n", + "By splitting the dataset, we can observe that the testing set only represents a small portion of the total dataset, which can lead to unstable evaluation results.\n" + ] + }, + { + "cell_type": "markdown", + "id": "c3090247", + "metadata": {}, + "source": [ "### Cross-Validation\n", "\n", "Splitting the dataset into K mutually exclusive subsets (K-fold cross-validation), using each subset as a validation set in turn and the remaining subsets as training sets to train the model and evaluate its performance. By averaging or aggregating the results from K validations, the best model can be selected.\n", @@ -373,13 +356,19 @@ "\n", "Cross-validation provides high precision, but it can be time-consuming when dealing with large datasets.\n", "\n", - "In general, using 10-fold cross-validation is sufficient to indirectly assess the generalization ability of a model.\n", - "\n", + "In general, using 10-fold cross-validation is sufficient to indirectly assess the generalization ability of a model.\n" + ] + }, + { + "cell_type": "markdown", + "id": "81135eee", + "metadata": {}, + "source": [ "### Bootstrapping\n", "\n", "Bootstrapping, also known as resampling or sampling with replacement, is a technique where each time a copy of a sample is selected from a dataset containing m samples and added to the resulting dataset. This process is repeated m times, resulting in a dataset with m samples. (Some samples may appear multiple times in the resulting dataset.) This resulting dataset is then used as the training set.\n", "\n", - "Since the sampling is conducted independently, the probability that a specific sample is never selected in m iterations of sampling is [(1-1/m)^m]. As m approaches infinity, i.e., m→∞, the limit of this probability is 1/e, where e is the base of the natural logarithm and approximately equal to 2.71828. Therefore, when m is sufficiently large, the probability that a specific sample is never selected in m iterations of sampling is close to 1/e." + "Since the sampling is conducted independently, the probability that a specific sample is never selected in m iterations of sampling is [(1-1/m)^m]. As m approaches infinity, i.e., m→∞, the limit of this probability is 1/e, where e is the base of the natural logarithm and approximately equal to 2.71828. Therefore, when m is sufficiently large, the probability that a specific sample is never selected in m iterations of sampling is close to 1/e.\n" ] }, { @@ -414,13 +403,9 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 1, "id": "63830061", - "metadata": { - "tags": [ - "hide-input" - ] - }, + "metadata": {}, "outputs": [ { "data": { @@ -437,12 +422,6 @@ "#This is a note of a learning curve by using the iris dataset in sklearn\n", "\n", "import warnings\n", - "import numpy as np\n", - "import matplotlib.pyplot as plt\n", - "from sklearn.datasets import load_iris\n", - "from sklearn.model_selection import learning_curve\n", - "from sklearn.linear_model import LogisticRegression\n", - "from sklearn.preprocessing import StandardScaler\n", "\n", "# Ignore warnings\n", "warnings.filterwarnings(\"ignore\")\n", @@ -508,7 +487,6 @@ "id": "7338d800", "metadata": {}, "source": [ - "\n", "## Capacity\n", "\n", "A model's **capacity** refers to the size and complexity of the patterns it is able to learn. For neural networks, this will largely be determined by how many neurons it has and how they are connected together. If it appears that your network is underfitting the data, you should try increasing its capacity.\n", @@ -519,15 +497,15 @@ "\n", "Determining an appropriate model capacity is a crucial task in model selection. Here are some common methods and guidelines to help determine the right model capacity:\n", "\n", - "Rule of thumb: In general, if the dataset is small or the task is relatively simple, choosing a lower-capacity model may be more suitable to avoid overfitting. For larger datasets or complex tasks, a higher-capacity model may be able to better fit the data.\n", + "**Rule of thumb**: In general, if the dataset is small or the task is relatively simple, choosing a lower-capacity model may be more suitable to avoid overfitting. For larger datasets or complex tasks, a higher-capacity model may be able to better fit the data.\n", "\n", - "Cross-validation: This method has been mentioned earlier in the previous text, and it is an extremely important approach in model selection. Therefore, it is necessary to mention this method multiple times and gain a deeper understanding of it.\n", + "**Cross-validation**: This method has been mentioned earlier in the previous text, and it is an extremely important approach in model selection. Therefore, it is necessary to mention this method multiple times and gain a deeper understanding of it.\n", "\n", - "Learning curves: Learning curves can help determine if the model capacity is appropriate. By plotting the performance of the model on the training set and the validation set as the number of training samples increases, one can observe the model's fitting and generalization abilities. If the model performs poorly on both the training set and the validation set, it may be underfitting due to low capacity. If the model performs well on the training set but poorly on the validation set, it may be overfitting due to high capacity. Adjustments to the model capacity can be made based on the trend of the learning curve.\n", + "**Learning curves**: Learning curves can help determine if the model capacity is appropriate. By plotting the performance of the model on the training set and the validation set as the number of training samples increases, one can observe the model's fitting and generalization abilities. If the model performs poorly on both the training set and the validation set, it may be underfitting due to low capacity. If the model performs well on the training set but poorly on the validation set, it may be overfitting due to high capacity. Adjustments to the model capacity can be made based on the trend of the learning curve.\n", "\n", - "Regularization: Adjusting the model capacity through regularization techniques (which we will also mention in the text later). Increasing the regularization parameter can reduce model capacity and decrease the risk of overfitting. Decreasing the regularization parameter can increase model capacity and improve fitting ability. By evaluating the model performance on the validation set with different regularization parameters, an appropriate regularization parameter value can be chosen.\n", + "**Regularization**: Adjusting the model capacity through regularization techniques (which we will also mention in the text later). Increasing the regularization parameter can reduce model capacity and decrease the risk of overfitting. Decreasing the regularization parameter can increase model capacity and improve fitting ability. By evaluating the model performance on the validation set with different regularization parameters, an appropriate regularization parameter value can be chosen.\n", "\n", - "Model comparison experiments: Train and evaluate models with different capacities and compare their performance on the validation set. By comparing the generalization performance of different-capacity models, select the model capacity with the best performance.\n", + "**Model comparison experiments**: Train and evaluate models with different capacities and compare their performance on the validation set. By comparing the generalization performance of different-capacity models, select the model capacity with the best performance.\n", "\n", "Considering the above methods and guidelines, selecting an appropriate model capacity requires a balance between theory and practice and decision-making based on the specific problem and available resources. The ultimate goal is to choose a model that performs well on both the training data and new data, achieving good generalization ability." ] @@ -565,7 +543,6 @@ "\n", "\n", "\n", - "\n", ":::{figure} https://static-1300131294.cos.ap-shanghai.myqcloud.com/images/model-selection/circlesquare.png\n", "---\n", "name: circlesquare-ms\n", @@ -573,8 +550,7 @@ "L1 and L2 regularization\n", ":::\n", "\n", - "Both are very common regularization techniques, but they are suitable for different scenarios. L1 regularization is suitable for situations that require feature selection or demand model interpretability. On the other hand, L2 regularization is more general and applicable in most cases to prevent overfitting and improve model generalization ability.\n", - "\n" + "Both are very common regularization techniques, but they are suitable for different scenarios. L1 regularization is suitable for situations that require feature selection or demand model interpretability. On the other hand, L2 regularization is more general and applicable in most cases to prevent overfitting and improve model generalization ability.\n" ] }, { @@ -582,7 +558,6 @@ "id": "2ccb9c25", "metadata": {}, "source": [ - "\n", "## Early Stopping\n", "\n", "We mentioned that when a model is too eagerly learning noise, the validation loss may start to increase during training. To prevent this, we can simply stop the training whenever it seems the validation loss isn't decreasing anymore. Interrupting the training this way is called **early stopping**.\n", @@ -643,8 +618,7 @@ "\n", "In summary, the role of regularization in linear regression models is to control the complexity of the model, reduce the risk of overfitting, and improve the model's generalization ability on new data.\n", "\n", - "In this section, we primarily utilize learning curves to optimize the regularization parameter, also known as the learning curve.\n", - "\n" + "In this section, we primarily utilize learning curves to optimize the regularization parameter, also known as the learning curve.\n" ] }, { @@ -652,7 +626,6 @@ "id": "03100d74", "metadata": {}, "source": [ - "\n", "## Dropout\n", "\n", "Dropout is one of the most effective and most commonly used regularization techniques for neural network, developed by Hinton and his students at the University of Toronto. Dropout, applied to a layer, consists of randomly \"dropping out\" (i.e. set to zero) a number of output features of the layer during training. Let's say a given layer would normally have returned a vector [0.2, 0.5, 1.3, 0.8, 1.1] for a given input sample during training; aafter applying dropout, this vector will have a few zero entries distributed at random, e.g. [0, 0.5, 1.3, 0, 1.1]. The 'dropout rate' is the fraction of the features that are being zeroed-out; it is usually set between 0.2 and 0.5. At test time, no units are dropped out, and instead the layer's output values are scaled down by a factor equal to the dropout rate, so as to balance for the fact that more units are active than at training time.\n", @@ -691,7 +664,6 @@ "id": "755eed48", "metadata": {}, "source": [ - "\n", "## Conclusions\n", "\n", "\n", From ded42d329ed97e28e3aa1fc93f7aaf1879afd719 Mon Sep 17 00:00:00 2001 From: JERRYenSHU503 <1929891932@qq.com> Date: Sun, 28 Apr 2024 16:21:53 +0800 Subject: [PATCH 13/20] Delete confusion_matrix.jpg --- images/model-selection/confusion_matrix.jpg | Bin 15031 -> 0 bytes 1 file changed, 0 insertions(+), 0 deletions(-) delete mode 100644 images/model-selection/confusion_matrix.jpg diff --git a/images/model-selection/confusion_matrix.jpg b/images/model-selection/confusion_matrix.jpg deleted file mode 100644 index 854d9b54f26e5985d30e325ce1c64dd1a1c39c9d..0000000000000000000000000000000000000000 GIT binary patch literal 0 HcmV?d00001 literal 15031 zcmd5?1zgn2_TOdc?vPID?rtQdJEcLCR1hR3R7wz}QyK(GrMpYIq@|JW`tPD%^sXYvjB8CX<2Ci5C{MO!4KeS67U27cU|BS5a5xJ z!4EnbDk>T}0VWpsA)v-1!~?(7lOLqP*UFtAq> z0Aw%@g$_aoW5r~Up5IE~Jk(6CRk>UoQ$*%-vz)YX`oB&@o&6h=APddo9-t~1|0V;Z zA@;ZpZ7Ol>8UxLqeyI)K*7sevzJ3s00mSmjt#?7mKcWDD?^zPzdx0V_NcMi9j^M(x zPWL_!*A?I$8~-JPxPmTffS^Fl$A_DJvWI}=dhjq-j%^M|jZSo;^OUeWVDG8O26+-S zJQn9G$1)-U_A{-}lobnjiihAaH)=BT18=(0JOH=97bv99uJ;x{*FzArWSPEql?z`6 zfIuQ3K`;mmi69jYMy?ZEINiq=j+KC;9CJMb;~=fl;qZ-3=-zrT@x6#W?nC*iKQJ(s zDC*(;$`VODGpkPv*JF^8&ylyUTX#O0L^~+o$J=8G}h`e-63xX2n*{8>w4Xb#u1votIlee#V~wL4{vgr2?&UBK#cjl zpc^?gJ-qovK`7xR3x~ijtAFRiRxC-90EhJPdJdPnq7G_rzrx0M4&W9v-qiNTj*s3y zfZvSu8}!fB`8xFb%3p!uO+E_l#1y`5LAqGyVNN}2McfWGsbcjHcccC)@$h4den<10 z;-qwjCs}E8bSv-V9x-DkJ^zb@WwJ7R36f2Y_(ZO9dG1T5+>F0U*mh|N>Cp6r6P+-c zYU*+6$V>lb5O4!tqHU%z>T{73fBg?IVa?%lchhLh{vR;j>8_9@ULA0J@Vo5fF~#L) zq=S1witqiyWZ7%Bl&W%`2_aM%pjyk)sO22yA6`!t7bdyHE_CsW9sPk5VEVqZ-Xu{< zwD)%q*m|gccc&2|=3rrn&fQ^iCVX%&KEVPOl08?gyGh!8>9B|U*Avjg0_*%4J{m8% z)I)|KTbAa?IM?sCJ_t+F?kgdrf)xeBKdm7`x$k-VTA4pt2LO0tJh07xlL(#(F65+W zUWi(i4c+VBU~0jg#uUm^V5{Y`rw8z^F>kMES5Ss)*KOLC;+Ne1?vM%hc`BJULzU%4 z!EN|ljezAOwCv5ZkEfJDN%DQobCWj-?m9hH)_?>OC}iW^_=0~j8Q)%((T(Eq+)iCq zFbsf!ui?}@!{*;ros6@C006+x)`LGUXG+}^kmtzk$pk155+3*JE zgzD*)igG?NCdf?koM^%%)61=bEgtKqX-`dsXe(QnE=D}g_SEaa-3~Rc2nFp+7U6`Q z*B5?pij*Vy5Wim4eaRA92p0el{@DFkwPpm>QXfLHjwd^j#wETxDI7~r5Ui_X|AVE8 zvNXHoEAf|VDmHxiAXDH7#~n)Akc35@mX&I`%u6(0mKsCwZ{>-H&^evqIvmHFZwR>l zNtLhopPB93w4ZJD4gJXrzJP!507E7LGv3y!Vj(8~~VmwIJrU-&c3g1i4rF(|+3Q+y-AM4GRM(N7xz z#U5$OxPVS!B&YtoXqO9(N#02rJAOEX!=EV%>4Ow;A)bG-gI~r0fL#Yf)dzQO zgY)^vz2o8N9!=E`-e3D5)z6}S>-b5uui)=hPslcJI#Guy)BxHJi|QUIYm^hN~`)A?{x!u%r+!xu)X#eL3R$z5j*xBxt5pC49L(RDZ?E z^YJ%b|7i`rkn^B}w-E*!9u{pvBAy@(Y6xc=^2A9ia$R^Z2)j9%T`{Y<7~; zl!gf(j!=Zb%z3#pn^P=NiueLrrPIU2=El@(?_0_*x-u1;5@@5G&z#*pG0V&Jkq}8{ zpraT0Qbwr8jb>UF-|XOc#id5>%hNjBHrzmmO~xeX{TWmiz&GIPS63<^5QsIs%w}EV zqC%j-jG&^@sqf?572Idx{e~`*j&d26l|h+K%_kHd-7V2p5fjB5e{N{J-%mAemrbT6 z%bcq8y=!V99HI(#oU7ID2X#dc*mSxHDJ9sn0m@cUOxdS3=1S;KU*fwCRd1`bXzB8k zPti9aCi4U?^DVZXl3$ptJRc!qG;4#7a6pD-6Hd&shK3wF zp39Qa$7ny`+~>%7l>3q^n2VcjDfiq$)yKZg%j&^HRFUCLdGrL5WKBmg657&sL<13c zLzGS!VIrtU1fKwMU7~RzS(b_goo!K=qh)3?JLpmr{6e>a1!3QuQ?1s&RVIy9%Tgbf z0!n;B3Kyb?L9Voin%{Gr=Y<{_fQgRC3FJUCm0p5L)X>}$7i-LCP2F?#SX}y9;WDmQ z+?hj5#Y@giIr;DN-5(YGFsG$LN$uxa_9BvO+;6WH19)0NMsshG(}FW?HB| zr|hFliPosdXg zidaL8@k0dkqpxH#?i#kMZq`BMRIZtAc3 z zebUjhFLB!RTj(Ehv70Im?NQ(eD|6Ds!f_2kBQmoD9G zS2RL#g#KXeZwq=6tzzr!uR1`Ip`=J2Bpo7-vo}B|B0oHY@sdz2qC10hiH@atLyyh5 z$qs}l2V;>HHLmOIF_$H?|F5e;IKw~~3VQ#=UX?ffqj#?Wn!=-bq_)FhWovB{2d0~k z-X7LypBd&qF8oh-*Z^$CUQGnJOub$D*hOMD{iXU!O%+=}z;D#geg6r{x}REFN4Y(C z>3TCR_4vZ8rQP~Rj}1@+)rYuo8CJ}$06Y@PMJ5Ym9^sRsmqtR@1I|U~<}*`^-Wo~J zoK)WM_d_E9Ee1@$?XGV4&vW!rOWQF9$2dH>pORPUJ+|ava&}|rwHc}}CHaA4g0?h% zr9Hwx9=)TrtI1J#?voVwdLQU#qz0<~6o%L)#mUOgK=N=TN`eJD3UCy6J-pC)U30SU zGHUgA)ZPn7HEwDZJs~gmIWKo2cMaYPDvG_00i=VM#UX$C-ui;qojd7?nCY;Ini7TT z99d{GJ-QZDA}6%^D)hF4hV-Xat{yyiHt9AF#bpiqt{D2e4(ZfN?wt$8`h;QgaA-JO zDE`k*33z(?hZyl)+exXo`$eLA%~&v!9ByJoE0WTBm?$F2he!yL)u&q*Q}SV&BQwQ_ zs}Q(9UjWi4QN09y>c_j4VLdy^Ge?gluQuNzXttX*NQLpLQGd6cJOJJRnR&t0r>R9z z!sJeEfa0Dr7ZNN9J6s1!DcYx!SfpmlM6_m95_%g#B%B?80&f_sRP>v!vYwaQ5f|H; z-mop$Fx%WhuY&<<}njvse;qo!avgA;qj||A{of;pBNh6wJ+NWnIv^yBE zVbA))%VNttDc9a^D_N6*@@IPme7TDizT9FG*uFrk&AJmVe9JDyiz%VqkWz3VXYKCV zy}Q)r_os0f?`8*+;V0%C_~jee361YAeqzVp;@;(>uT0m#1!P5 zOuEOEE>8UVr?>RS9jNs~z3}6g77BPBB}#xZ~moz=SFv*{t2h zL(d=68>?{ax4qeM8$61&ZRvPIEmol-4KtV~$A=#ZF`KUa zVsu7iocB}=YkeE%>3!vw@qYMZJ9io#xajYsx6?!dd5v=}EHz{6X)jG2Tpf#4q2L0a z{eByRO$&{2%!86v7|>Z_danS;S*CuP=|%5Z-tK?`pTz|tl21B^on*-g2Kz$ZO;brhQMH-T?sqT z$=xOK9fIW>X3U}RsqQ4OiMKTxQ$wNFs)NKhOYrdHpN8h;TXv()&sO21r(T2w5q(J* z3!wnyPKiat2fz9a2EN~EJmTy>zm4duFP+LB%qe+JoouQmiTJZ)TWaE zVBq_SM5HVuNa!z;V7Fo3q1UALc;C*Q)Bxwjn1 z{Xck_%ysk2_BiNIVxv5|J8Lb>jX2l~V`H_tyYrSTH2&iq1~;VmU-0?t_WzK>Ry@h< zPt<6R8RJtEJPc4v8<)C$cGZ3hB-|Sl@xcd+;bGSEX0VY&tApZUu)Km$P(l5?cT)gT z$T4e%uaPuxHwei=08YtrIqL{AfIo6fc_2;2)p;%Q# z24O*nB=N)w znH_$LY^XBoF|P;$L|C(|7G|kH)_Y=+YP#)G!n+7ru+9jdmQ<}YS^b6iSe|9Y(7K=~ zJ0RQf^LpA>eMF{$GEe0gM`*%}YH-w2ZN zvxGy761?q&y471 z>KyhkK6?t|6T6MOI;tNis1dyFSRZ;}x8oi;*#2Jb*4L`2YknX#vEf^+pAP74{BM9&n|%6y5Q)YXhiN^dp; z6X_6@Pr{VFdt=){J_{R}DHD5Dnrs2xEYo-aR!%~WKvGz8%&eJ?_s#UmxKT2#${gpa zO4RO`#WnHwzb&3q%>T~8j6*)qTT#Iu+1}45)lU|QLO+)EkLow8Q;{u8QHd3p@~fZb zuiedg(zeI}@{0miT%UAbHqI9AxtSNdTz{7F9;}RgaJh!H-bUgAl_=S)Smd+C+ipfI z2(5g_HU1`H5+!v_v9e8Sv0P26kNvIvdwz0uA*--qnFIVi^vJqdn?e3WY5GRL#<{ec z?xg6v9yt)VYt6S_jbfvkKXF>gb+luY2OvaFg1X8QvXa;&K1+ zFzTp9qFJ#N!ILOpeOvP70{;Cl`UheA&%@}8K9Dd9)HN}+`0+}8eMeM?pDn%@;I!k+ zKZr#5;|(>WpyzKt-F0oqW-#C-(kyoG)BLM1-*LR2t<&FqG-LF~!-eJSQq0&CK}sOhg4xFQC0Qs~AY`ZRERE z6~A5j+}RfYzb4-f$nDQeGxccdIca204O>@m>Du*Z#{YMTWC?oT*KIV;2sq463Q;i~ z9H{-5MOT1>FP}5uukTqf06-8B4*WwI5X_ga`ys!gQGv$5!sZZF#Ux{MU?mqbpi-JStf3C6giKA(UW$L}W8Kkibo z^?B#}hEwYXLkLdmvjr)Kw09r$t$qPaXmb&-n_CS(Xx?PJ@#dho)TZjrgJ=}3M_W`d zLA_!w>a1(s|`Kt&N8tzB%IL)%Un5**&Q9a4JO^79A=6yvT z_jC8n8*jD>Uw;@Hx&nN}KgafxpMdif=pAUt-7*5zRa@?87Vdbl7BDeVXofXt8sHvH za2d=T3P-|hBHqqhAoZ=3Iyy?s@Y9&Cg&LW8+<2M>+{M6q9u7bXe4MT^0{~Di-td5f zNKz1$t}%&KT4Yl-()l5G^XY{4^w^?<3$6uyez%5BOl}FAhQ*!S?iP*MXL%erntuvs zwXP=Vf7oYWV#3wcWh^b#v9>B6_+Xxq)+H|$9sT^*@Y8L%kqGGl5dj9_$OjzOA|roj zdZK0yF{i4-3+-L^3N37{WQJx=%ejiYhKhR>vv91qEwFK_xD~V9%Ypm#`2Dh=H7)u3a9i7oude`lx6U{6PZh^{e&3d^ z%D98bC?*v|`xXY5EUg(^+xI!T$}b_>f)G={DueJejOAo~{+TOI@_ne1%~{RI)rfr+ zK01fn|DcQRaOr(Ujj+sFGB`k(!2tpV4Fe7kXwdfn!BlmC#=vG5#Udl;U}aMQM~9fW za`=nu_yGS94H6%s)D3m2P-_zT%&++y|DB3%Tv@F=CCg4sX~H+L3`W2P@xc)GW;WgA zpTgX!&8?@Co><(WNId1|3ik^t<~XcoOWf~Y!&d<6TWfMrdeZQW zOdgfBWU(D6KHCd$D%oRiYIbm%+M?o;GR7bms?ry4asN2olU|c#*p(f|b1_iWZ-5w6=^!g_wV0>m|4P+TG*Zto1bOuRKbdRSt$6}VwOr2WQr z3RVuI6RFo|>}Cry<|1Ir;&A3^$U*rP0M?Q*+<{=xha6M3`v^Rq1fDixURI44ieuXo z_P*V5%#7asMvD&(UniW?*b(7E;Ccl%7jjjcU%!g>R_!?xSvHBe4}Mu?3YCBjq}1Sv zb!2biPT)y%)3^@d>>X=OejOH8kFGYM26AmH0ySyalL&xDkRB)-=87YPfL9#;-D(j_l&*odG*&J}1l*z3@N94(M@ z1sYQ_vKSqMRYLhi1Qwg9xJn}oyO=6AIi;Ed1&7aDM_4jrDHBtBr?3A#j0lONRXV!q z)8bKF`c<-*wx&(PUfLQLCRzvnMk6@?g{I4YiSIXgKQv+434L0V3uJ!{i??mN+WF=j zt*5?&)F^Y&9TNLU{Z@}@&gTGK&YtcMAg;Eju_f#$h=wexvOt zk&+)3O&1>a#&yP!q2&PmgYOXMeKaqR(;fAKPU$e@3gF6U5q_j8i;)Gr4#3<@Z1i}x z&-oY*y0(hGYLg+y^qlgxt-={}mTg4BtLliOc4Zv){)@^4tqYmsR`k*Sl0{TfD<(C8 zxz-@@aS+l5I(EO5NCP!nU~$Ju%xm}AHUoQ+*2x?Ol*>?$4YfQLER@g{#09VAz4PU57l3U^41s-$R{J0My7}Pg#U7mi}Sy zi1a7{kgmnIl47~4<`Xy4eV+pw-1F}HQ)xL71eC0UZ3?YIS)1Hg_~tj}IM5Q4LNv*p zXIp2aDLhH>jxOs{l1!uuKKthsYlV;eUyYh;dBQDCG$CcUT5 z&0jk&qrYV)v=xe5vV0waFa2z?%7<@SuPMT%LyKRoX;U2JHF|SxYiJfdber> zTY3s-OD$JajT$EKtJY}=Aar+sa)PsBb&7hO4rV>?!PzHM)S!@xHL$$-l2AThi@W+e z`m6XpC+~=V454ii9#F9q9kC>5-dI;Ucv3HfNlly2$5#77Q)XXy|7J%GCt;?0_GlXK zo>gFI&eKmx=>_FE;M}7qX#RVUk}><5d4(?%eX6YGzMoOHjYiC~5)92d^xaZAL3wOA z78Lo-e-q*@xdOa^a1{N@4-v8e41_%tgt^aetNn`nF3q(p*K@Ch1CzO&CdmcXvmtVa z>?>0O^p$7F#M=d~*1X6q3~0Hfqi`g&LB4SX zKqLpP=vottbY%N0SjX;&5*>RNr*d{5KcU_VR54yvNG?92SCMfZ&*cSjw+c9`6 zM3DR$E{u0h3&!XhB&fy=%eiSI3|m0a!U1BmT-cNokUn96?QjyDWOx&;0I(_CM~xMN zXoFy%GY0!yNKHhQkeCcPNC$0LL@hWUqg4>Vkkwgu+&5=*-o@1@|Ye|9CaaZoG!v}j|ZCRh`#Se&5X z$k*)uCM`kRabL@?e1EbQFV>scHSVM*-Oonyza#&flORHk3yzNgC8`vn;K9Nd3x+0 zwryn2D%tMG2+rDo{Brs#1>NqKJ5@2;JdG6sUc7p3EU88^1x{+dUeSdk-rRkco7W0$ z4H`dFG~>_))G(0(3Zd-}#@ehC?wELW`I=*88pbKC z9%A(Kx%))5{*Os~tfA2p$dYwzs);KyXFRL3q)+nJwtx=_d~1R$c~Cpw)LvA$lB2IZ zS|D31#o-NVBs#+L)_q=wr0EK=R@7Z{4 zFI%uN9!i2Xb#0u+(N97HB?o5L6_sA(i*QuK{1`49zBT!P$lV$>?XxqB%-g!aBEnsc z1|!6&Z4o87AEbfAn$z!M4HA^Uci!2DzxK9&%C1zAr$(kn>Tf>)7!yfbe{2=0U%whv zR!$OeMEy&FoP4Q0sQ&^1(g%G{2K8J1_jpMQK~upE;Oiq{pt+Wj=g^JWbs7fLp41o3 z4F!E}#r;CE^n9tVBjc3zP6p=~uVYr{okKx&YSN{LlDnd4Ta#dauop3ed;i0pry%>{ ziG(YhyB+7`MymP0>1bgqIoB>_*YY&+&b*E9`!N;fRk9|wVT0t6TGO~UIk`=tQ#!0X zJzBuwP_TQucb~xYV|?_Sb85z#C@DXJ7f(nIqn16Vk2M#3PPc{HzsK<=WnkHh#Jt2q z7z@wF_E)7|y=j)swRmigIqNYSXJqKLJZTb|Th*j>M@XwD)C<#Cg=vu)Da35o<~8tN z`_$JH5R_hl|Ar9Hfzeua@1%seGTEwCF+bL2aG#`^F87ygU4s~Cw(0tVaFT_-_gY; zoU}Tg)=|=3u49Hdy^yk}akD_N9ZYfJWUEOl?$ls}Vl}zDOeXc{%BECS%t(5PShAP5 zCLe8<=~@#F`3^o+6|8JKSU-E2)9a2T_9$A&>^a?5P}P#CV90IIsNIj7QZ;j%b7UL| zC&(nv9e+j?aZ2VDBtq<}H}sZQg-Wpm5znRQ53ob0x=+J9(7uh-x}-VyzeL zRZvmreNjMSAFPb&!aYz9{0tqTY>VFyAhi2&Ls#Zr)NFY4SSz{{>+X(nu0zxFblx#W zzT3l0)8|`@^KQ~@qM~oG_R*-C+`!hSJe`aZLfJEyf-aGf$C05ziTk>tK#tK3ctRyj1a&9fSSGpk4kb$t6%+A@}bx|_iSuwc& z%oZi{a#JSVCVav!yY*QT%X$=ys313_v;@62Dk}rtRMtwd@ z|9m?AE-)bJ$D#hhm1ObzQ4dK!ZJ>i$_kW2U*Zcl=!1E`^L>I=_{$0TG?G=DSp Date: Sun, 28 Apr 2024 17:11:09 +0800 Subject: [PATCH 14/20] Update model-selection.ipynb --- .../ml-advanced/model-selection.ipynb | 48 +++++++++++++------ 1 file changed, 33 insertions(+), 15 deletions(-) diff --git a/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb b/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb index 7b10cfda92..5e39d49dd9 100644 --- a/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb +++ b/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb @@ -129,7 +129,7 @@ "|:--:|:--:|\n", "| Over-fitting-train-ms |  Over-fitting-test-ms |\n", "\n", - "As we can see, over-fitting model fits very well on training data, but Over-fitting model fits poorly on test data. \n", + "As we can see, over-fitting model fits very well on training data, but over-fitting model fits poorly on test data. \n", "\n", "**Under-fitting model**\n", "\n", @@ -147,9 +147,9 @@ "|:--:|:--:|\n", "| Perfect-fitting-train-ms |  Perfect-fitting-test-ms |\n", "\n", - "Perfect-fitting model fits well on training data on training data and test data!\n", + "Perfect-fitting model fits well on training data and test data!\n", "\n", - "When overfitting occurs, the model demonstrates high accuracy or low error on the training data but performs poorly on the testing data or new data in practical applications. In contrast, underfitting indicates that the model is unable to capture the complex relationships or patterns within the data." + "When over-fitting occurs, the model demonstrates high accuracy or low error on the training data but performs poorly on the testing data or new data in practical applications. In contrast, under-fitting indicates that the model is unable to capture the complex relationships or patterns within the data." ] }, { @@ -276,9 +276,13 @@ "Of course, here we are just demonstrating how to output the confusion matrix to understand its meaning after obtaining these two sets of data. In the subsequent experiment, we will explain how to obtain the desired confusion matrix through code.\n", "\n", "There are four values in the matrix their meanings are as follows:\n", + "\n", "**True Positive (TP)**: The number of positive instances correctly predicted as positive by the model.\n", + "\n", "**False Negative (FN)**: The number of positive instances incorrectly predicted as negative by the model.\n", + "\n", "**False Positive (FP)**: The number of negative instances incorrectly predicted as positive by the model.\n", + "\n", "**True Negative (TN)**: The number of negative instances correctly predicted as negative by the model.\n", "\n", "As for the matrix we have above, TP is where we predicted as 1 and actually it is 1. FN is the acount that we predicted as 0 but actually it is 1. FP is predicted as 1 but actually it's 0. TN is we predicted as 0 and it's actually 0.\n", @@ -287,19 +291,19 @@ "\n", "**Accuracy**: The ratio of the number of correctly predicted samples to the total number of samples.\n", "\n", - "**Accuracy = (TP + TN) / (TP + TN + FP + FN)**\n", + "$$Accuracy = \\frac{TP + TN}{TP + TN + FP + FN}$$\n", "\n", "**Precision**: The proportion of true positive predictions among the predicted positive instances, measuring the prediction accuracy of the model.\n", "\n", - "**Precision = TP / (TP + FP)**\n", + "$$Precision = \\frac{TP}{TP + FP}$$\n", "\n", "**Recall**: The proportion of true positive predictions among the actual positive instances, measuring the model's ability to identify positives.\n", "\n", - "**Recall = TP / (TP + FN)**\n", + "$$Recall = \\frac{TP}{TP + FN}$$\n", "\n", "**F1 Score**: The harmonic mean of precision and recall, considering both the accuracy and the identification ability of the model.\n", "\n", - "**F1 Score = 2 * (Precision * Recall) / (Precision + Recall)**\n", + "$$F_1 \\text{ Score} = \\frac{2 \\cdot (Precision \\cdot Recall)}{Precision + Recall}$$\n", "\n", "When evaluating the bias of a model, we usually consider metrics such as precision, accuracy, and F1 score. A lower F1 score may indicate that the model has issues in balancing accuracy and identification ability, but it cannot be simply equated to lower bias. By considering multiple metrics and the specific requirements of the application scenario, a more comprehensive assessment of the model's performance can be achieved.\n" ] @@ -368,7 +372,7 @@ "\n", "Bootstrapping, also known as resampling or sampling with replacement, is a technique where each time a copy of a sample is selected from a dataset containing m samples and added to the resulting dataset. This process is repeated m times, resulting in a dataset with m samples. (Some samples may appear multiple times in the resulting dataset.) This resulting dataset is then used as the training set.\n", "\n", - "Since the sampling is conducted independently, the probability that a specific sample is never selected in m iterations of sampling is [(1-1/m)^m]. As m approaches infinity, i.e., m→∞, the limit of this probability is 1/e, where e is the base of the natural logarithm and approximately equal to 2.71828. Therefore, when m is sufficiently large, the probability that a specific sample is never selected in m iterations of sampling is close to 1/e.\n" + "Since the sampling is conducted independently, the probability that a specific sample is never selected in m iterations of sampling is $ [(1-\\frac{1}{m})^m] $. As m approaches infinity, $ lim_{m \\to \\infty} (1 - \\frac{1}{m})^m = \\frac{1}{e} $ the limit of this probability is $1/e$ , where e is the base of the natural logarithm and approximately equal to 2.71828. Therefore, when m is sufficiently large, the probability that a specific sample is never selected in m iterations of sampling is close to $\\frac{1}{e}$ ≈ 0.36787944117$ .\n" ] }, { @@ -523,17 +527,17 @@ "\n", "Let's consider a target function with a regularization term, which can be represented as:\n", "\n", - "J(θ) = L(θ) + λR(θ)\n", + "$$J(\\theta) = L(\\theta) + \\lambda R(\\theta)$$\n", "\n", - "Here, J(θ) is the target function, θ represents the model's parameters, L(θ) is the loss function (typically the model's error on the training data), R(θ) is the regularization term, and λ is the regularization parameter.\n", + "Here, $J(\\theta)$ is the target function, $\\theta$ represents the model's parameters, $L(\\theta)$ is the loss function (typically the model's error on the training data), $R(\\theta)$ is the regularization term, and \\lambda is the regularization parameter.\n", "\n", - "The loss function L(θ) measures how well the model fits the training data, and our goal is to minimize it. The regularization term R(θ) constrains or penalizes the values of the model's parameters, and it controls the complexity of the model.\n", + "The loss function $L(\\theta)$ measures how well the model fits the training data, and our goal is to minimize it. The regularization term $R(\\theta)$ constrains or penalizes the values of the model's parameters, and it controls the complexity of the model.\n", "\n", - "The regularization parameter λ determines the weight of the regularization term in the target function. When λ approaches 0, the impact of the regularization term becomes negligible, and the model's objective is primarily to minimize the loss function. On the other hand, when λ approaches infinity, the regularization term's impact becomes significant, and the model's objective is to minimize the regularization term as much as possible, leading to parameter values tending towards zero.\n", + "The regularization parameter $\\lambda$ determines the weight of the regularization term in the target function. When $\\lambda$ approaches $\\theta$, the impact of the regularization term becomes negligible, and the model's objective is primarily to minimize the loss function. On the other hand, when $\\lambda$ approaches infinity, the regularization term's impact becomes significant, and the model's objective is to minimize the regularization term as much as possible, leading to parameter values tending towards zero.\n", "\n", - "There are two forms of this cost: L1 regularization (also known as Lasso regression) with the regularization term R(θ) represented as the sum of the absolute values of the parameters θ: R(θ) = ||θ||₁. L1 regularization can induce certain parameters of the model to become zero, thereby achieving feature selection and sparsity.\n", + "There are two forms of this cost: L1 regularization (also known as Lasso regression) with the regularization term $R(\\theta)$ represented as the sum of the absolute values of the parameters $\\theta$: $R(\\theta) = ||\\theta||_1$. L1 regularization can induce certain parameters of the model to become zero, thereby achieving feature selection and sparsity.\n", "\n", - "L2 regularization (also known as Ridge regression) with the regularization term R(θ) represented as the square root of the sum of the squares of the parameters θ: R(θ) = ||θ||₂. L2 regularization encourages the parameter values of the model to gradually approach zero but not exactly become zero, hence it does not possess the ability for feature selection.\n", + "L2 regularization (also known as Ridge regression) with the regularization term $R(\\theta)$ represented as the square root of the sum of the squares of the parameters $\\theta$: $R(\\theta) = ||\\theta||_2$. L2 regularization encourages the parameter values of the model to gradually approach zero but not exactly become zero, hence it does not possess the ability for feature selection.\n", "\n", "In `tf.keras`, weight regularization is added by passing weight regularizer instances to layers as keyword arguments. Let's add L2 weight regularization now.\n", "\n", @@ -698,7 +702,21 @@ "## Your turn! 🚀\n", "\n", "Machine learning model selection and dealing with overfitting and underfitting are crucial aspects of the machine learning pipeline. In this assignment, you'll have the opportunity to apply your understanding of these concepts and techniques. Please complete the following tasks:\n", - "[assignment](../assignments/ml-advanced/model-selection/model-selection-assignment-1)" + "[assignment](../assignments/ml-advanced/model-selection/model-selection-assignment-1)\n", + "\n", + "If you would like to learn more about open-source projects related to model selection.\n", + "\n", + "Here are some recommended open-source and free model selection projects on GitHub!\n", + "\n", + "[Model Zoo](https://github.com/modzo/model-zoo)\n", + "\n", + "[AutoML](https://github.com/automl/auto-sklearn)\n", + "\n", + "[ModelHub](https://github.com/modelhub-ai/modelhub)\n", + "\n", + "[Hugging Face Models](https://github.com/huggingface/models)\n", + "\n", + "These projects are open-source and provide rich documentation and example code. You can choose the appropriate model selection project based on your needs and explore them. " ] } ], From 7dd6169a916fd73c747dc0c9ea5215cd81bd84b3 Mon Sep 17 00:00:00 2001 From: JERRYenSHU503 <1929891932@qq.com> Date: Sun, 28 Apr 2024 18:12:27 +0800 Subject: [PATCH 15/20] Update cnn-vgg.ipynb --- .../deep-learning/cnn/cnn-vgg.ipynb | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/open-machine-learning-jupyter-book/deep-learning/cnn/cnn-vgg.ipynb b/open-machine-learning-jupyter-book/deep-learning/cnn/cnn-vgg.ipynb index 188b31cdbd..21304520af 100644 --- a/open-machine-learning-jupyter-book/deep-learning/cnn/cnn-vgg.ipynb +++ b/open-machine-learning-jupyter-book/deep-learning/cnn/cnn-vgg.ipynb @@ -452,7 +452,13 @@ "You can refer to those YouTube videos for further study:\n", "\n", "- [Convolutional Neural Networks (CNNs) explained, by deeplizard](https://www.youtube.com/watch?v=YRhxdVk_sIs)\n", - "- [Convolutional Neural Networks Explained (CNN Visualized), by Futurology](https://www.youtube.com/watch?v=pj9-rr1wDhM)" + "- [Convolutional Neural Networks Explained (CNN Visualized), by Futurology](https://www.youtube.com/watch?v=pj9-rr1wDhM)\n", + "\n", + "Here are some recommended open-source and free model selection projects on GitHub:\n", + "\n", + "[An automated machine learning tool(AutoML), by aron-bram](https://github.com/automl/auto-sklearn)\n", + "\n", + "[An open platform ModelHub,by 9zelle9](https://github.com/modelhub-ai/modelhub)" ] }, { From bdbe212f0728fa7614da4c454459ed1239a9bc79 Mon Sep 17 00:00:00 2001 From: JERRYenSHU503 <1929891932@qq.com> Date: Sun, 28 Apr 2024 18:12:33 +0800 Subject: [PATCH 16/20] Update model-selection.ipynb --- .../ml-advanced/model-selection.ipynb | 23 ++++++++----------- 1 file changed, 9 insertions(+), 14 deletions(-) diff --git a/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb b/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb index 5e39d49dd9..772e51a7e8 100644 --- a/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb +++ b/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb @@ -702,21 +702,16 @@ "## Your turn! 🚀\n", "\n", "Machine learning model selection and dealing with overfitting and underfitting are crucial aspects of the machine learning pipeline. In this assignment, you'll have the opportunity to apply your understanding of these concepts and techniques. Please complete the following tasks:\n", - "[assignment](../assignments/ml-advanced/model-selection/model-selection-assignment-1)\n", - "\n", - "If you would like to learn more about open-source projects related to model selection.\n", - "\n", - "Here are some recommended open-source and free model selection projects on GitHub!\n", - "\n", - "[Model Zoo](https://github.com/modzo/model-zoo)\n", - "\n", - "[AutoML](https://github.com/automl/auto-sklearn)\n", - "\n", - "[ModelHub](https://github.com/modelhub-ai/modelhub)\n", - "\n", - "[Hugging Face Models](https://github.com/huggingface/models)\n", + "[assignment](../assignments/ml-advanced/model-selection/model-selection-assignment-1)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Acknowledgments\n", "\n", - "These projects are open-source and provide rich documentation and example code. You can choose the appropriate model selection project based on your needs and explore them. " + "Thanks to xyb for organizing the content related to model selection and for their suggestion to concretize abstract concepts.\n" ] } ], From 273087dffef0c067dd1ceb21f04a08b2d2ee5b87 Mon Sep 17 00:00:00 2001 From: JERRYenSHU503 <1929891932@qq.com> Date: Sun, 28 Apr 2024 18:49:29 +0800 Subject: [PATCH 17/20] Revert "Update cnn-vgg.ipynb" This reverts commit 7dd6169a916fd73c747dc0c9ea5215cd81bd84b3. --- .../deep-learning/cnn/cnn-vgg.ipynb | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/open-machine-learning-jupyter-book/deep-learning/cnn/cnn-vgg.ipynb b/open-machine-learning-jupyter-book/deep-learning/cnn/cnn-vgg.ipynb index 21304520af..188b31cdbd 100644 --- a/open-machine-learning-jupyter-book/deep-learning/cnn/cnn-vgg.ipynb +++ b/open-machine-learning-jupyter-book/deep-learning/cnn/cnn-vgg.ipynb @@ -452,13 +452,7 @@ "You can refer to those YouTube videos for further study:\n", "\n", "- [Convolutional Neural Networks (CNNs) explained, by deeplizard](https://www.youtube.com/watch?v=YRhxdVk_sIs)\n", - "- [Convolutional Neural Networks Explained (CNN Visualized), by Futurology](https://www.youtube.com/watch?v=pj9-rr1wDhM)\n", - "\n", - "Here are some recommended open-source and free model selection projects on GitHub:\n", - "\n", - "[An automated machine learning tool(AutoML), by aron-bram](https://github.com/automl/auto-sklearn)\n", - "\n", - "[An open platform ModelHub,by 9zelle9](https://github.com/modelhub-ai/modelhub)" + "- [Convolutional Neural Networks Explained (CNN Visualized), by Futurology](https://www.youtube.com/watch?v=pj9-rr1wDhM)" ] }, { From 5cd6281ac0163ae6c95beb794c6ae3d626cb6f5d Mon Sep 17 00:00:00 2001 From: JERRYenSHU503 <1929891932@qq.com> Date: Sun, 28 Apr 2024 18:52:46 +0800 Subject: [PATCH 18/20] Update model-selection.ipynb --- .../ml-advanced/model-selection.ipynb | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb b/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb index 772e51a7e8..0c740edb1c 100644 --- a/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb +++ b/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb @@ -702,7 +702,13 @@ "## Your turn! 🚀\n", "\n", "Machine learning model selection and dealing with overfitting and underfitting are crucial aspects of the machine learning pipeline. In this assignment, you'll have the opportunity to apply your understanding of these concepts and techniques. Please complete the following tasks:\n", - "[assignment](../assignments/ml-advanced/model-selection/model-selection-assignment-1)\n" + "[assignment](../assignments/ml-advanced/model-selection/model-selection-assignment-1)\n", + "\n", + "Here are some recommended open-source and free model selection projects on GitHub\n", + "\n", + "[An automated machine learning tool(AutoML), by aron-bram](https://github.com/automl/auto-sklearn)\\n\",\n", + "\n", + "[An open platform ModelHub,by 9zelle9](https://github.com/modelhub-ai/modelhub)\"\n" ] }, { From 172df9565ec5295b09305b5609f4f525de5b29f1 Mon Sep 17 00:00:00 2001 From: JERRYenSHU503 <1929891932@qq.com> Date: Sun, 28 Apr 2024 19:14:30 +0800 Subject: [PATCH 19/20] Update model-selection.ipynb --- .../ml-advanced/model-selection.ipynb | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb b/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb index 0c740edb1c..f20008bf1a 100644 --- a/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb +++ b/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb @@ -372,7 +372,7 @@ "\n", "Bootstrapping, also known as resampling or sampling with replacement, is a technique where each time a copy of a sample is selected from a dataset containing m samples and added to the resulting dataset. This process is repeated m times, resulting in a dataset with m samples. (Some samples may appear multiple times in the resulting dataset.) This resulting dataset is then used as the training set.\n", "\n", - "Since the sampling is conducted independently, the probability that a specific sample is never selected in m iterations of sampling is $ [(1-\\frac{1}{m})^m] $. As m approaches infinity, $ lim_{m \\to \\infty} (1 - \\frac{1}{m})^m = \\frac{1}{e} $ the limit of this probability is $1/e$ , where e is the base of the natural logarithm and approximately equal to 2.71828. Therefore, when m is sufficiently large, the probability that a specific sample is never selected in m iterations of sampling is close to $\\frac{1}{e}$ ≈ 0.36787944117$ .\n" + "Since the sampling is conducted independently, the probability that a specific sample is never selected in m iterations of sampling is $ [(1-\\frac{1}{m})^m] $. As m approaches infinity, $ lim_{m \\to \\infty} (1 - \\frac{1}{m})^m = \\frac{1}{e} $ the limit of this probability is $1/e$ , where e is the base of the natural logarithm and approximately equal to 2.71828. Therefore, when m is sufficiently large, the probability that a specific sample is never selected in m iterations of sampling is close to $\\frac{1}{e} ≈ 0.36787944117$ .\n" ] }, { @@ -704,11 +704,13 @@ "Machine learning model selection and dealing with overfitting and underfitting are crucial aspects of the machine learning pipeline. In this assignment, you'll have the opportunity to apply your understanding of these concepts and techniques. Please complete the following tasks:\n", "[assignment](../assignments/ml-advanced/model-selection/model-selection-assignment-1)\n", "\n", - "Here are some recommended open-source and free model selection projects on GitHub\n", + "## Self study\n", "\n", - "[An automated machine learning tool(AutoML), by aron-bram](https://github.com/automl/auto-sklearn)\\n\",\n", + "Here are some recommended open-source and free model selection projects on GitHub, you can refer to them for further study:\n", "\n", - "[An open platform ModelHub,by 9zelle9](https://github.com/modelhub-ai/modelhub)\"\n" + "- [An automated machine learning tool(AutoML), by aron-bram](https://github.com/automl/auto-sklearn)\n", + "\n", + "- [An open platform ModelHub, by 9zelle9](https://github.com/modelhub-ai/modelhub)\n" ] }, { From 6a6c8cfe43d71fe51b6936620c35425d0e7c3ec0 Mon Sep 17 00:00:00 2001 From: JERRYenSHU503 <1929891932@qq.com> Date: Sun, 28 Apr 2024 19:23:11 +0800 Subject: [PATCH 20/20] Update model-selection.ipynb --- .../ml-advanced/model-selection.ipynb | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb b/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb index f20008bf1a..6b33475aeb 100644 --- a/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb +++ b/open-machine-learning-jupyter-book/ml-advanced/model-selection.ipynb @@ -701,8 +701,18 @@ "\n", "## Your turn! 🚀\n", "\n", - "Machine learning model selection and dealing with overfitting and underfitting are crucial aspects of the machine learning pipeline. In this assignment, you'll have the opportunity to apply your understanding of these concepts and techniques. Please complete the following tasks:\n", - "[assignment](../assignments/ml-advanced/model-selection/model-selection-assignment-1)\n", + "Machine learning model selection and dealing with overfitting and underfitting are crucial aspects of the machine learning pipeline. In this assignment, you'll have the opportunity to apply your understanding of these concepts and techniques. \n", + "Please complete the following tasks:\n", + "\n", + "- [model-selection-assignment-1](../assignments/ml-advanced/model-selection/model-selection-assignment-1.ipynb)\n", + "\n", + "- [lasso-and-ridge-regression](../assignments/ml-advanced/model-selection/lasso-and-ridge-regression.ipynb)\n", + "\n", + "- [dropout-and-batch-normalization](../assignments/ml-advanced/model-selection/dropout-and-batch-normalization.ipynb)\n", + "\n", + "- [learning-curve-to-identify-overfit-underfit](../assignments/ml-advanced/model-selection/learning-curve-to-identify-overfit-underfit.ipynb)\n", + "\n", + "- [regularized-linear-models](../assignments/ml-advanced/model-selection/regularized-linear-models.ipynb)\n", "\n", "## Self study\n", "\n",