Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build_best versus build_merged what is the difference if no testset is given. #39

Open
JanoschMenke opened this issue Jan 11, 2025 · 0 comments

Comments

@JanoschMenke
Copy link

I just wanted to ask what the difference between build_best versus build_merged is, when there is no testset specified such as here

config = OptimizationConfig(
    data=Dataset(
        input_column="canonical",
        response_column="molwt",
        training_dataset_file="tests/data/DRD2/subset-50/train.csv",
        split_strategy= Random(),
    ),
)

Based on the results I get I assume that build_merged is trained on the complete training_dataset supplied, while build best is trained only on a subset, I assume generated by the split_strategy but when I am using a 10-fold criss validation which of the 10 fold is it split by?

# Build (re-Train) and save the best model.
build_best(buildconfig, "target/best.pkl")

# Build (Train) and save the model on the merged train+test data.
build_merged(buildconfig, "target/merged.pkl")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant