Issue via Email #250

whitead · 2024-08-28T16:42:43Z

B. Machine Learning, Chapter 3, Regression & Model Assessment, Section 3.6.1 k-Fold Cross-Validation. In the code snippets to calculate the 10-fold cross validation across all cases, the training set is amassed incorrectly and has redundant data.

Instead of

train = pd.concat([soldata[splits[i]:], soldata[splits[i + 1]:]])

it should be

train = pd.concat([soldata[:splits[i]], soldata[splits[i + 1] :]])

The change is pretty minor when we consider the whole dataset, but switching to only looking at a subset of the data, there is a significant variation in error based on the choice of k.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue via Email #250

Issue via Email #250

whitead commented Aug 28, 2024

Issue via Email #250

Issue via Email #250

Comments

whitead commented Aug 28, 2024