Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cross validation fails in some cases when intermediate hyperparameters are present #62

Open
serbanstan opened this issue Jul 13, 2018 · 0 comments

Comments

@serbanstan
Copy link
Contributor

serbanstan commented Jul 13, 2018

Our current DefaultClassificationTemplate has CorEx text before imputer. If no hyperparameters are specified for CorEx, we get a successful run.
If only one set of hyperparameters is specified, for example

10, 0, 1, .9, .02 or
10, 0, 5, .9, .02

we get a successful run.

If we allow 'n_grams' to be equal to the list [(1), (5)], the cross validation fails

To reproduce the error uncomment the hyperparameters from the CorEx step. The run below is on 38_sick. As a note, CorEx shouldn't be doing any computations on this dataset, just returning the input as output.

Error log:

(dsbox-devel-710) [stan@dsbox01 python]$ python ta2-search /nas/home/stan/dsbox/runs2/config-seed/38_sick_config.json
Namespace(configuration_file='/nas/home/stan/dsbox/runs2/config-seed/38_sick_config.json', cpus=-1, debug=False, output_prefix=None, timeout=-1)
Using configuation:
{'cpus': '10',
 'dataset_schema': '/nfs1/dsbox-repo/data/datasets/seed_datasets_current/38_sick/38_sick_dataset/datasetDoc.json',
 'executables_root': '/nfs1/dsbox-repo/stan/dsbox-ta2/python/output/38_sick/executables',
 'pipeline_logs_root': '/nfs1/dsbox-repo/stan/dsbox-ta2/python/output/38_sick/logs',
 'problem_root': '/nfs1/dsbox-repo/data/datasets/seed_datasets_current/38_sick/38_sick_problem',
 'problem_schema': '/nfs1/dsbox-repo/data/datasets/seed_datasets_current/38_sick/38_sick_problem/problemDoc.json',
 'ram': '10Gi',
 'saved_pipeline_ID': '',
 'saving_folder_loc': '/nfs1/dsbox-repo/stan/dsbox-ta2/python/output/38_sick',
 'temp_storage_root': '/nfs1/dsbox-repo/stan/dsbox-ta2/python/output/38_sick/temp',
 'timeout': 9,
 'training_data_root': '/nfs1/dsbox-repo/data/datasets/seed_datasets_current/38_sick/38_sick_dataset'}
[INFO] No test data config found! Will split the data.
[INFO] Succesfully parsed test data
{'structural_type': <class 'd3m.container.pandas.DataFrame'>, 'semantic_types': ('https://metadata.datadrivendiscovery.org/types/Table', 'https://metadata.datadrivendiscovery.org/types/DatasetEntryPoint'), 'dimension': {'name': 'rows', 'semantic_types': ('https://metadata.datadrivendiscovery.org/types/TabularRow',), 'length': 3018}}
{'dimension': <FrozenOrderedDict OrderedDict([('name', 'rows'), ('semantic_types', ('https://metadata.datadrivendiscovery.org/types/TabularRow',)), ('length', 3018)])>,
 'semantic_types': ('https://metadata.datadrivendiscovery.org/types/Table',
                    'https://metadata.datadrivendiscovery.org/types/DatasetEntryPoint'),
 'structural_type': <class 'd3m.container.pandas.DataFrame'>}
{'structural_type': <class 'd3m.container.pandas.DataFrame'>, 'semantic_types': ('https://metadata.datadrivendiscovery.org/types/Table', 'https://metadata.datadrivendiscovery.org/types/DatasetEntryPoint'), 'dimension': {'name': 'rows', 'semantic_types': ('https://metadata.datadrivendiscovery.org/types/TabularRow',), 'length': 754}}
{'dimension': <FrozenOrderedDict OrderedDict([('name', 'rows'), ('semantic_types', ('https://metadata.datadrivendiscovery.org/types/TabularRow',)), ('length', 754)])>,
 'semantic_types': ('https://metadata.datadrivendiscovery.org/types/Table',
                    'https://metadata.datadrivendiscovery.org/types/DatasetEntryPoint'),
 'structural_type': <class 'd3m.container.pandas.DataFrame'>}
[INFO] Template choices:
Template ' Default_classification_template ' has been added to template base.
[INFO] Worker started, id: <_MainProcess(MainProcess, started)>
/nfs1/dsbox-repo/stan/miniconda/envs/dsbox-devel-710/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.
[INFO] Push@cache: ('d3m.primitives.dsbox.Denormalize', 4986999622121787936)
/nfs1/dsbox-repo/stan/miniconda/envs/dsbox-devel-710/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.
[INFO] Push@cache: ('d3m.primitives.datasets.DatasetToDataFrame', 4986999622121787936)
[INFO] Push@cache: ('d3m.primitives.data.ExtractColumnsBySemanticTypes', -2701047265198232908)
[INFO] Push@cache: ('d3m.primitives.data.ExtractColumnsBySemanticTypes', 4441544048499093159)
[INFO] Push@cache: ('d3m.primitives.data.ColumnParser', 6916788228332877018)
[INFO] Push@cache: ('d3m.primitives.data.CastToType', -5585081685236413210)
[INFO] Push@cache: ('d3m.primitives.dsbox.CorexText', -1029106721422580684)
[INFO] Push@cache: ('d3m.primitives.sklearn_wrap.SKImputer', 2755365990599631608)
[INFO] Push@cache: ('d3m.primitives.sklearn_wrap.SKMultinomialNB', 2607380340696403083)
[INFO] Hit@cache: ('d3m.primitives.sklearn_wrap.SKImputer', 2755365990599631608)
The following pipeline file will be loaded:
/nfs1/dsbox-repo/stan/dsbox-ta2/python/output/38_sick/pipelines/c27faa63-98ab-4b3b-93ab-b08084762be5.json






Pickling succeeded






****************************************************************************************************
[INFO] Running Pool: 1
[INFO] Worker started, id: <ForkProcess(ForkPoolWorker-2, started daemon)>
[INFO] Hit@cache: ('d3m.primitives.dsbox.Denormalize', 4986999622121787936)
[INFO] Hit@cache: ('d3m.primitives.datasets.DatasetToDataFrame', 4986999622121787936)
[INFO] Hit@cache: ('d3m.primitives.data.ExtractColumnsBySemanticTypes', -2701047265198232908)
[INFO] Hit@cache: ('d3m.primitives.data.ExtractColumnsBySemanticTypes', 4441544048499093159)
[INFO] Hit@cache: ('d3m.primitives.data.ColumnParser', 6916788228332877018)
[INFO] Hit@cache: ('d3m.primitives.data.CastToType', -5585081685236413210)
[INFO] Push@cache: ('d3m.primitives.dsbox.CorexText', -8598879279764128775)
[INFO] Hit@cache: ('d3m.primitives.sklearn_wrap.SKImputer', 2755365990599631608)
[INFO] Hit@cache: ('d3m.primitives.sklearn_wrap.SKMultinomialNB', 2607380340696403083)
[INFO] Hit@cache: ('d3m.primitives.sklearn_wrap.SKImputer', 2755365990599631608)
The following pipeline file will be loaded:
/nfs1/dsbox-repo/stan/dsbox-ta2/python/output/38_sick/pipelines/ab7db90a-48f3-46d9-b7b8-7a7f5aad7307.json






Pickling succeeded






[WARN] write_training_results
[WARN] write_training_results
[WARN] write_training_results
[WARN] write_training_results
[WARN] write_training_results
[WARN] write_training_results
[WARN] write_training_results
[WARN] write_training_results
[WARN] write_training_results
[WARN] write_training_results
Traceback (most recent call last):
  File "/nfs1/dsbox-repo/stan/dsbox-ta2/python/dsbox/template/search.py", line 192, in search_one_iter
    cross_validation_values.append(res['cross_validation_metrics'][0]['value'])
IndexError: list index out of range
Traceback (most recent call last):
  File "ta2-search", line 141, in <module>
    result = main(args)
  File "ta2-search", line 110, in main
    status = controller.train()
  File "/nfs1/dsbox-repo/stan/dsbox-ta2/python/dsbox/controller/controller.py", line 378, in train
    candidate, value = search.search_one_iter()
  File "/nfs1/dsbox-repo/stan/dsbox-ta2/python/dsbox/template/search.py", line 224, in search_one_iter
    best_cv_index = cross_validation_values.index(max(cross_validation_values))
ValueError: max() arg is an empty sequence
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant