2.4 Setting up the validation framework

Notes

In general, the dataset is split into three parts: training, validation, and test. For each partition, we need to obtain feature matrices (X) and y vectors of targets. First, the size of partitions is calculated, records are shuffled to guarantee that values of the three partitions contain non-sequential records of the dataset, and the partitions are created with the shuffled indices.

Pandas attributes and methods:

df.iloc[] - returns subsets of records of a dataframe, being selected by numerical indices
df.reset_index() - restate the orginal indices
del df[col] - eliminates target variable

Numpy methods:

np.arange() - returns an array of numbers
np.random.shuffle() - returns a shuffled array
np.random.seed() - set a seed

The entire code of this project is available in this jupyter notebook.

⚠️	The notes are written by the community. If you see an error here, please create a PR with a fix.

Notes from Peter Ernicke

Navigation

Machine Learning Zoomcamp course
Session 2: Machine Learning for Regression
Previous: Exploratory data analysis
Next: Linear regression

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

04-validation-framework.md

04-validation-framework.md

2.4 Setting up the validation framework

Notes

Navigation

Files

04-validation-framework.md

Latest commit

History

04-validation-framework.md

File metadata and controls

2.4 Setting up the validation framework

Notes

Navigation