Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parameterized notebook examples #108

Merged
merged 21 commits into from
Jun 5, 2018
Merged

Parameterized notebook examples #108

merged 21 commits into from
Jun 5, 2018

Conversation

jlstevens
Copy link
Contributor

@jlstevens jlstevens commented May 30, 2018

This PR will add a number of examples to show how notebooks can be parameterized both with widgets for a normal notebook workflow and at the command line.

These examples are based on the GSSHA_Workflow notebook, illustrating how two parameters rain_intensity and rain_duration can be exposed. The workflow presented allows any parameters to be exposed in this way and also allows new parameters to be defined at the notebook level.

Notes:

  • Currently requires a fix available on parambokeh master regarding FileSelector.
  • Some sub-objects have their widgets set in a different cell as parambokeh lacks recursive editing (Support recursive editing ioam/parambokeh#58). This is also why the precedence of some parameters was set to -1.

"```bash\n",
"cd examples/topics/batched_example/\n",
"PARAM_JSON_INIT='{\"rain_intensity\":42, \"rain_duration\":50}' jupyter notebook GSSHA_Workflow_Batched_Example1.ipynb\n",
"```\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add the example nbconvert command as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 49912a8. It is a long command!

"metadata": {},
"outputs": [],
"source": [
"parambokeh.Widgets(NotebookParams, initializer=parambokeh.JSONInit())"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable.

"You can also generate a report HTML file using ``nbconvert`` as follows:\n",
"\n",
"```bash\n",
"PARAM_JSON_INIT='{\"rain_intensity\":25, \"rain_duration\":60}' jupyter nbconvert --execute GSSHA_Workflow_Batched_Example1.ipynb --output report1.html\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably --output report1.html is optional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It would output the notebook name with the .html extension otherwise.

It seemed worth showing this flexibility but I can also mention that is it optional if you wish.

Copy link
Contributor

@jbednar jbednar May 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think omitting it here is correct, because then the example shows the right way to do the simple case. And then the Lancet example can show how to encode parameter values into the filenames as a record, if appropriate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

@jlstevens
Copy link
Contributor Author

jlstevens commented Jun 1, 2018

@jbednar Added two new notebooks including Rain_Intensity_Sweep_Example.ipynb which demonstrates how you can use lancet to batch generate reports.

I should mention that this PR is intended as a prototype/example especially as we want to migrate the idea of global_params out of earthsim. One item still to do for this PR is to add instructions for installing lancet as it is currently not a dependency.

@jlstevens
Copy link
Contributor Author

The problem with trying to make lancet an optional dependency is that the tests fail. I suppose I can either update the lancet package to avoid param issues (at least on defaults) and/or skip tests for this notebook (makes sense as it takes a while to execute the 3 runs!).

@jbednar
Copy link
Contributor

jbednar commented Jun 1, 2018

Please update lancet, which needs to be done anyway. Skipping those tests is also reasonable, but should be a separate decision based only on duration, not on packaging issues.

@jlstevens
Copy link
Contributor Author

Please update lancet, which needs to be done anyway.

Agreed. I'll do that next though I am skipping this notebook in the tests as it does take too long to run.

@jlstevens jlstevens force-pushed the gssha_json_init branch 2 times, most recently from ea8826b to 2cc4e6f Compare June 1, 2018 17:18
@jlstevens
Copy link
Contributor Author

jlstevens commented Jun 1, 2018

@ceball I've not been able to skip this new Lancet notebook in the tests despite multiple attempts.

I've opened an nbsmoke issue about it holoviz-dev/nbsmoke#11 and would appreciate it if you could help me get the syntax right.

Meanwhile, I'll update the lancet package so it could run. Alternatively, I could use a short rain_duration in the code (e.g 60) , telling the reader to increase it to a larger value (e.g 3600) to see meaningful results (i.e enough to see that the simulation worked at all).

That said, I'm not sure it is worth making the story in the notebook more confusing for the sake of running tests on it, especially as this notebook is mainly designed to be an illustration. It isn't running particularly particularly useful simulation runs which is why I would like to skip the tests in this case.

@jbednar
Copy link
Contributor

jbednar commented Jun 1, 2018

In general I prefer to have tests run even for a very short time, rather than be skipped, which will catch the vast majority of errors regardless of duration. I suppose if we get the parameterization set up cleanly, we could make our notebooks have a flag that makes them run much more quickly (whatever it takes for them to do so) and have the tests set the appropriate environment variable?

@jlstevens
Copy link
Contributor Author

jlstevens commented Jun 1, 2018

Using an environment variable to control the lancet notebook which controls a batch of jobs via the same environment variable?

I suppose that would work, although it is environment variables all the way down which are clobbering each other (the lancet notebook is the parent process controlled by PARAM_JSON_INIT environment variable which it then sets differently for the jobs that it runs).

This approach does make sense in the long term but right now I think this notebook will suffer too much churn for it to be worthwhile. For instance, we already know a good chunk of this functionality should migrate to param, the command line interface of the console script can be improved etc. Even with more favorable parameter values, it will still slow down the overall testing cycle. I'm happy to set it up as suggested but only if you are sure it is worth it.

This would also involve setting up the environment variable in the testing environment and I'm not sure how to do that yet. Lastly, I note that all ^.*GSSHA.*\.ipynb$ notebooks are already being skipped...

@jbednar
Copy link
Contributor

jbednar commented Jun 1, 2018

I'm just pointing out the overall goal and plan; we should make interim decisions pragmatically, of course!

@jlstevens
Copy link
Contributor Author

jlstevens commented Jun 1, 2018

Lancet is now being updated on the defaults channel (it was mysteriously pinned to an old param even though the recipe didn't specify it). This means I should be able to add lancet as a dependency soon and the tests should pass (without disabling them for being too slow just yet).

@ceball
Copy link
Contributor

ceball commented Jun 1, 2018

I've not been able to skip this new Lancet notebook in the tests despite multiple attempts

I guess it was just that an underscore was missing (there's one in the notebook name, Rain_Intensity).

This means I should be able to add lancet as a dependency soon and the tests should pass (without disabling them for being too slow just yet).

lancet appears to be on conda-forge, so you could use that (we're already tied to conda-forge)

@jlstevens
Copy link
Contributor Author

I guess it was just that an underscore was missing (there's one in the notebook name, Rain_Intensity).

Thanks! I must have accidentally stripped out that first underscore when I was wondering if I hadn't escape underscores properly.

lancet appears to be on conda-forge, so you could use that (we're already tied to conda-forge)

It is now on conda-forge and defaults so I'll add it as a dependency.

setup.py Outdated
@@ -6,7 +6,12 @@
setup_args.update(dict(
name='earthsim',
version="0.1",
install_requires = ['lancet'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we are all just using the environment file, so that's where to add the dependency for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Furthermore, putting it here would install some unrelated piece of software (https://pypi.org/project/lancet/). Unless I'm confused?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Fixed in 193293d.

@jlstevens
Copy link
Contributor Author

@jbednar Ready for review.

Copy link
Contributor

@jbednar jbednar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I had only minor suggestions.


Example usage:

param -cmd 'jupyter nbconvert --execute GSSHA_Workflow_Batched_Example2.ipynb' -p rain_intensity=25 -p rain_duration=3600
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably 'param' is 'earthsim' in this case?

Copy link
Contributor Author

@jlstevens jlstevens Jun 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure what to call it: right now it is in earthsim but we hope to migrate it to param. I'm happy to name it whatever you want though I don't know if an earthsim console script would have a long term future/purpose.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see -- I thought the command name would default to the module name, but I see later that it's explicitly declared param = earthsim.__main__:main. If it can really be supplied as an arbitrary command name, then param seems highly likely to conflict with something else on the user's path, so I think it should be something less generic. Maybe run_with_params? Not sure if running with params is more or less dangerous than running with scissors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe their is a unix/windows command called param so I don't see why param couldn't have a console script of that name. What do you think users might have on their path that conflicts with it?

If we have to worry about a clash, I suppose I might consider something like params or parameters instead.

Copy link
Contributor

@jbednar jbednar Jun 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of those are equally generic. If the intent is for param to supply this command, then calling it param is reasonable, as having the package param supply the param command won't be surprising to anyone. @ceball, do you see any conflict with some other usage of a param module command? Seems ok to me.

Copy link
Contributor Author

@jlstevens jlstevens Jun 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the intent is for param to supply this command, then calling it param is reasonable

That was what I was thinking at least.

elif isinstance(v, list):
params[k] = param.List(**kws)
elif isinstance(v, np.ndarray):
params[k] = param.Array(**kws)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd greatly prefer this to be done using a lookup table. Something like:

param_types = {
    bool:	param.Boolean,
    int:	param.Integer,
    float:	param.Number,
    str:	param.String,
    dict:	param.Dict,
    tuple:	param.Tuple,
    list:	param.List,
    np.ndarray:	param.Array,
}


def params_from_kwargs(**kwargs):
    params = {}
    for k, v in kwargs.items():
        kws = dict(default=v)
        if isinstance(v, param.Parameter):
            params[k] = v
        elif v in param_types:
            params[k] = param_types[t](**kws)
        else:
            params[k] = param.Parameter(**kws)
    return params

That way other types could be registered by adding them to this dictionary.

Copy link
Contributor Author

@jlstevens jlstevens Jun 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is code I've used in at least three separate contexts, once in another project and once in holoviews itself. I would like to see it moved to param instead of duplicating it in different places. It is often useful!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy for it to move to param, though if it's done there, it will need a good description mentioning the limitations (i.e. that only a few Parameter types will ever be supported in this way).

Copy link
Contributor Author

@jlstevens jlstevens Jun 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll submit a PR to param for this code but I don't think this PR should wait on that. Once this is in param, I can update earthsim to import it accordingly.

be used to parameterize a notebook, display corresponding widgets
with parambokeh and control the workflow from the command line.
"""
name = 'GlobalParams'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What makes this "global_params"? It doesn't seem to have anything global about what it does or how it works. How about just calling it "parameterized()" with a name "parameterized", since it simply returns a Parameterized class? I strongly prefer objects to be named with an accurate description of what they do, to make it clear that no special magic is happening.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Yes, I do know that it's being used as a set of global parameters, later, but that shouldn't affect how it's named here. The globality can be conveyed by a local variable name, if needed, at the point of use.)

Copy link
Contributor Author

@jlstevens jlstevens Jun 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name global_params was taken from topographica as the way it is used is very similar in concept. I agree that what it is is nothing global or special though 1) this doesn't define methods so the result is just a collection of parameters and nothing more 2) it is intended to be a singleton class for use at the top of a script/notebook to parameterize the rest of the workflow. What other uses do we have for classes that are just collections of parameters without code?

I don't mind renaming it parameterized though 1) that might get confused with param.Parameterized 2) the handle given by the user should still reflect the intent e.g:

nbparams = parameterized(...)

Edit: Github didn't update to show your second comment as I was typing up this reply!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to see we're on the same wavelength. :-) Maybe parameters() is a better name, as it's indeed a collection of parameters with no methods, as you point out? Either is fine by me; parameterized emphasizes that it can be used like any other Parameterized class (while glossing over the lack of methods), while parameters() emphasizes that it's just some parameters with no methods (while not mentioning that it's a Parameterized). I have no preference, but do prefer either of those to global_params.

Copy link
Contributor Author

@jlstevens jlstevens Jun 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of those two options, I think parameters is definitely the better choice. I'll update to use that name now.

Copy link
Contributor Author

@jlstevens jlstevens Jun 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is used as a singleton in once specific way: if you partition your parameter sets into two classes and try to set them separately via JSONInit in two different calls to parambokeh.Widgets, then you will see warnings because all the parameters controlling the workflow are supplied at once at the command line (via a single environment variable). Each JSONInit call should be issuing warnings about the parameters that it cannot set that were specified on the command line. For this reason, if you want to avoid these warnings, it is best to treat it as a singleton.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like that's true whether it's a singleton or not, or whether you use multiple explicit Parameterized classes -- unless the parameters supplied are present in all such objects, there will be an error. What makes it a single object is that we're applying a single set of parameters from the envt to it, which again is about how it is used, not about how it's defined.

I was about to say that we should supply a way for the method to avoid being a singleton by letting the user provide a name, but now I see there's a problem if the user does that:

name = 'Parameters'
params = params_from_kwargs(**kwargs)
params['name'] = param.String(default=name)

I.e., if the user passes in a 'name' kw, it will silently be overwritten. That's dangerous, as it's quite likely that eventually a user will have a parameter named 'name'. Seems like we could instead check for the name in the supplied kwargs and use that, using 'Parameters' only if not found, which would serve a double purpose of providing a way to have arbitrarily many different classes created in this way, and also avoids clobbering a user-defined name.

Copy link
Contributor Author

@jlstevens jlstevens Jun 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... unless the parameters supplied are present in all such objects, there will be an error.

That isn't necessarily true: you can target JSONInit to specifically named classes (Example 2 onwards), in which case you could partition the objects and avoid warnings (if you set it up right).

Currently the notebooks in this PR are using the simplest approach in Example 1. I could have tried to fix the class name for parameters using the other more complex formats but I didn't which means I can use name as you suggest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 9e38275

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks.

@jlstevens
Copy link
Contributor Author

@jbednar I had to rebase to get the tests passing again and I've pushed the rename to parameters. Other than settling on the name of the console script, I think this PR is ready to merge.

@jbednar
Copy link
Contributor

jbednar commented Jun 5, 2018

Seems ready to merge. Ok?

@jlstevens
Copy link
Contributor Author

Seems ready to merge. Ok?

Yup.

@jbednar jbednar merged commit 1384f2a into master Jun 5, 2018
@philippjfr philippjfr deleted the gssha_json_init branch October 17, 2018 17:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants