-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parameterized notebook examples #108
Conversation
"```bash\n", | ||
"cd examples/topics/batched_example/\n", | ||
"PARAM_JSON_INIT='{\"rain_intensity\":42, \"rain_duration\":50}' jupyter notebook GSSHA_Workflow_Batched_Example1.ipynb\n", | ||
"```\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add the example nbconvert command as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 49912a8. It is a long command!
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"parambokeh.Widgets(NotebookParams, initializer=parambokeh.JSONInit())" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable.
"You can also generate a report HTML file using ``nbconvert`` as follows:\n", | ||
"\n", | ||
"```bash\n", | ||
"PARAM_JSON_INIT='{\"rain_intensity\":25, \"rain_duration\":60}' jupyter nbconvert --execute GSSHA_Workflow_Batched_Example1.ipynb --output report1.html\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably --output report1.html
is optional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It would output the notebook name with the .html
extension otherwise.
It seemed worth showing this flexibility but I can also mention that is it optional if you wish.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think omitting it here is correct, because then the example shows the right way to do the simple case. And then the Lancet example can show how to encode parameter values into the filenames as a record, if appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
@jbednar Added two new notebooks including I should mention that this PR is intended as a prototype/example especially as we want to migrate the idea of |
The problem with trying to make |
Please update lancet, which needs to be done anyway. Skipping those tests is also reasonable, but should be a separate decision based only on duration, not on packaging issues. |
Agreed. I'll do that next though I am skipping this notebook in the tests as it does take too long to run. |
ea8826b
to
2cc4e6f
Compare
@ceball I've not been able to skip this new Lancet notebook in the tests despite multiple attempts. I've opened an nbsmoke issue about it holoviz-dev/nbsmoke#11 and would appreciate it if you could help me get the syntax right. Meanwhile, I'll update the lancet package so it could run. Alternatively, I could use a short That said, I'm not sure it is worth making the story in the notebook more confusing for the sake of running tests on it, especially as this notebook is mainly designed to be an illustration. It isn't running particularly particularly useful simulation runs which is why I would like to skip the tests in this case. |
In general I prefer to have tests run even for a very short time, rather than be skipped, which will catch the vast majority of errors regardless of duration. I suppose if we get the parameterization set up cleanly, we could make our notebooks have a flag that makes them run much more quickly (whatever it takes for them to do so) and have the tests set the appropriate environment variable? |
Using an environment variable to control the lancet notebook which controls a batch of jobs via the same environment variable? I suppose that would work, although it is environment variables all the way down which are clobbering each other (the lancet notebook is the parent process controlled by This approach does make sense in the long term but right now I think this notebook will suffer too much churn for it to be worthwhile. For instance, we already know a good chunk of this functionality should migrate to param, the command line interface of the console script can be improved etc. Even with more favorable parameter values, it will still slow down the overall testing cycle. I'm happy to set it up as suggested but only if you are sure it is worth it. This would also involve setting up the environment variable in the testing environment and I'm not sure how to do that yet. Lastly, I note that all |
I'm just pointing out the overall goal and plan; we should make interim decisions pragmatically, of course! |
Lancet is now being updated on the |
I guess it was just that an underscore was missing (there's one in the notebook name,
lancet appears to be on conda-forge, so you could use that (we're already tied to conda-forge) |
Thanks! I must have accidentally stripped out that first underscore when I was wondering if I hadn't escape underscores properly.
It is now on conda-forge and defaults so I'll add it as a dependency. |
setup.py
Outdated
@@ -6,7 +6,12 @@ | |||
setup_args.update(dict( | |||
name='earthsim', | |||
version="0.1", | |||
install_requires = ['lancet'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we are all just using the environment file, so that's where to add the dependency for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Furthermore, putting it here would install some unrelated piece of software (https://pypi.org/project/lancet/). Unless I'm confused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Fixed in 193293d.
@jbednar Ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! I had only minor suggestions.
|
||
Example usage: | ||
|
||
param -cmd 'jupyter nbconvert --execute GSSHA_Workflow_Batched_Example2.ipynb' -p rain_intensity=25 -p rain_duration=3600 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably 'param' is 'earthsim' in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure what to call it: right now it is in earthsim
but we hope to migrate it to param
. I'm happy to name it whatever you want though I don't know if an earthsim
console script would have a long term future/purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see -- I thought the command name would default to the module name, but I see later that it's explicitly declared param = earthsim.__main__:main
. If it can really be supplied as an arbitrary command name, then param
seems highly likely to conflict with something else on the user's path, so I think it should be something less generic. Maybe run_with_params
? Not sure if running with params is more or less dangerous than running with scissors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't believe their is a unix/windows command called param
so I don't see why param couldn't have a console script of that name. What do you think users might have on their path that conflicts with it?
If we have to worry about a clash, I suppose I might consider something like params
or parameters
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both of those are equally generic. If the intent is for param to supply this command, then calling it param
is reasonable, as having the package param
supply the param
command won't be surprising to anyone. @ceball, do you see any conflict with some other usage of a param module command? Seems ok to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the intent is for param to supply this command, then calling it
param
is reasonable
That was what I was thinking at least.
elif isinstance(v, list): | ||
params[k] = param.List(**kws) | ||
elif isinstance(v, np.ndarray): | ||
params[k] = param.Array(**kws) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd greatly prefer this to be done using a lookup table. Something like:
param_types = {
bool: param.Boolean,
int: param.Integer,
float: param.Number,
str: param.String,
dict: param.Dict,
tuple: param.Tuple,
list: param.List,
np.ndarray: param.Array,
}
def params_from_kwargs(**kwargs):
params = {}
for k, v in kwargs.items():
kws = dict(default=v)
if isinstance(v, param.Parameter):
params[k] = v
elif v in param_types:
params[k] = param_types[t](**kws)
else:
params[k] = param.Parameter(**kws)
return params
That way other types could be registered by adding them to this dictionary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is code I've used in at least three separate contexts, once in another project and once in holoviews itself. I would like to see it moved to param instead of duplicating it in different places. It is often useful!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy for it to move to param, though if it's done there, it will need a good description mentioning the limitations (i.e. that only a few Parameter types will ever be supported in this way).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll submit a PR to param for this code but I don't think this PR should wait on that. Once this is in param, I can update earthsim to import it accordingly.
earthsim/__init__.py
Outdated
be used to parameterize a notebook, display corresponding widgets | ||
with parambokeh and control the workflow from the command line. | ||
""" | ||
name = 'GlobalParams' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What makes this "global_params"? It doesn't seem to have anything global about what it does or how it works. How about just calling it "parameterized()" with a name "parameterized", since it simply returns a Parameterized class? I strongly prefer objects to be named with an accurate description of what they do, to make it clear that no special magic is happening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Yes, I do know that it's being used as a set of global parameters, later, but that shouldn't affect how it's named here. The globality can be conveyed by a local variable name, if needed, at the point of use.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name global_params
was taken from topographica as the way it is used is very similar in concept. I agree that what it is is nothing global or special though 1) this doesn't define methods so the result is just a collection of parameters and nothing more 2) it is intended to be a singleton class for use at the top of a script/notebook to parameterize the rest of the workflow. What other uses do we have for classes that are just collections of parameters without code?
I don't mind renaming it parameterized
though 1) that might get confused with param.Parameterized
2) the handle given by the user should still reflect the intent e.g:
nbparams = parameterized(...)
Edit: Github didn't update to show your second comment as I was typing up this reply!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad to see we're on the same wavelength. :-) Maybe parameters()
is a better name, as it's indeed a collection of parameters with no methods, as you point out? Either is fine by me; parameterized
emphasizes that it can be used like any other Parameterized class (while glossing over the lack of methods), while parameters()
emphasizes that it's just some parameters with no methods (while not mentioning that it's a Parameterized). I have no preference, but do prefer either of those to global_params
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of those two options, I think parameters
is definitely the better choice. I'll update to use that name now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is used as a singleton in once specific way: if you partition your parameter sets into two classes and try to set them separately via JSONInit
in two different calls to parambokeh.Widgets
, then you will see warnings because all the parameters controlling the workflow are supplied at once at the command line (via a single environment variable). Each JSONInit call should be issuing warnings about the parameters that it cannot set that were specified on the command line. For this reason, if you want to avoid these warnings, it is best to treat it as a singleton.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like that's true whether it's a singleton or not, or whether you use multiple explicit Parameterized classes -- unless the parameters supplied are present in all such objects, there will be an error. What makes it a single object is that we're applying a single set of parameters from the envt to it, which again is about how it is used, not about how it's defined.
I was about to say that we should supply a way for the method to avoid being a singleton by letting the user provide a name, but now I see there's a problem if the user does that:
name = 'Parameters'
params = params_from_kwargs(**kwargs)
params['name'] = param.String(default=name)
I.e., if the user passes in a 'name' kw, it will silently be overwritten. That's dangerous, as it's quite likely that eventually a user will have a parameter named 'name'. Seems like we could instead check for the name in the supplied kwargs and use that, using 'Parameters' only if not found, which would serve a double purpose of providing a way to have arbitrarily many different classes created in this way, and also avoids clobbering a user-defined name
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... unless the parameters supplied are present in all such objects, there will be an error.
That isn't necessarily true: you can target JSONInit
to specifically named classes (Example 2 onwards), in which case you could partition the objects and avoid warnings (if you set it up right).
Currently the notebooks in this PR are using the simplest approach in Example 1. I could have tried to fix the class name for parameters using the other more complex formats but I didn't which means I can use name as you suggest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 9e38275
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks.
…re to notebook name.
b8dafb2
to
46c21c8
Compare
@jbednar I had to rebase to get the tests passing again and I've pushed the rename to |
Seems ready to merge. Ok? |
Yup. |
This PR will add a number of examples to show how notebooks can be parameterized both with widgets for a normal notebook workflow and at the command line.
These examples are based on the
GSSHA_Workflow
notebook, illustrating how two parametersrain_intensity
andrain_duration
can be exposed. The workflow presented allows any parameters to be exposed in this way and also allows new parameters to be defined at the notebook level.Notes:
parambokeh
master regardingFileSelector
.