Skip to content

Latest commit

 

History

History
44 lines (29 loc) · 1.73 KB

04-validation-framework.md

File metadata and controls

44 lines (29 loc) · 1.73 KB

2.4 Setting up the validation framework

Slides

Notes

In general, the dataset is split into three parts: training, validation, and test. For each partition, we need to obtain feature matrices (X) and y vectors of targets. First, the size of partitions is calculated, records are shuffled to guarantee that values of the three partitions contain non-sequential records of the dataset, and the partitions are created with the shuffled indices.

Pandas attributes and methods:

  • df.iloc[] - returns subsets of records of a dataframe, being selected by numerical indices
  • df.reset_index() - restate the orginal indices
  • del df[col] - eliminates target variable

Numpy methods:

  • np.arange() - returns an array of numbers
  • np.random.shuffle() - returns a shuffled array
  • np.random.seed() - set a seed

The entire code of this project is available in this jupyter notebook.

⚠️ The notes are written by the community.
If you see an error here, please create a PR with a fix.

Navigation