-
Notifications
You must be signed in to change notification settings - Fork 299
Config file devel #41
base: branch-1.6
Are you sure you want to change the base?
Conversation
This is a much-requested feature from the early days of spark-ec2, so thank you for working on this. I wonder, though, if there is a way to implement this feature without using JSON -- which IMO is not appropriate as a configuration format -- and without manually maintaining this mapping of config name to variable name. Are there no features of argparse that can be used to offer this functionality in a more straightforward way? |
I really enjoy using this feature, it permits me to make less error when managing a cluster, but I ignored that it was a much-requested feature. I was thinking that JSON was an appropriate format, it seems clear and simple to manipulate. I didn't find a feature of argparse that gives us what we want, but I wrote a function to avoid to maintain the manual mapping. unneeded_opts = ("version", "help")
def mapping_conf(opt):
normal_opt = str(opt).split("/")[-1]
trans_opt = normal_opt.strip("-")
trans_opt = trans_opt.replace("-", "_")
return {trans_opt: normal_opt}
map_conf = {}
[map_conf.update(mapping_conf(opt)) for opt in parser.option_list]
[map_conf.pop(unneeded_opt, None) for unneeded_opt in unneeded_opts] map_conf:
Do you think this solution could do the job ? |
Yes, I think it's a step up to have a function that does the mapping for us. Regarding JSON vs. something else, I am biased since I chose YAML for Flintrock. 😄 The main problem I have with JSON is that you can't put comments in it. In Flintrock I also used Click to parse command-line arguments, since that lets the user load configs from a file but override individual settings at the command-line. Obviously, we don't have to do that here, as it would be a pretty big change. If this works well enough for @shivaram then it's fine by me too. |
I prefer YAML as well compared to JSON -- Its just more human friendly to read and format. Also for YAML parsing we should download the library similar to how we download boto -- That should make it transparent to the user ? |
|
||
|
||
# Only PyPI libraries are supported. | ||
external_libs = [ | ||
{ | ||
"name": "boto", | ||
"version": "2.34.0", | ||
"md5": "5556223d2d0cc4d06dd4829e671dcecd" | ||
"md5": "5556223d2d0cc4d06dd4829e671dcecd", | ||
"add-path": None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of add-path
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose add-path
is to precise a custom function to execute, because some libraries need to be added to the system path differently than just doing sys.path.insert(1, libs/LIBRARY_NAME).
@Tirami-su you will probably also want to add a note to the REAME, perhaps along with a brief example config file, to show users how this feature works. |
@nchammas No prob, I will add a description of this feature with an example. |
…located under .ssh/
@@ -327,13 +411,19 @@ def parse_args(): | |||
parser.add_option( | |||
"--instance-profile-name", default=None, | |||
help="IAM profile name to launch instances under") | |||
parser.add_option("-c", "--conf-file", default=None, | |||
help="Specify config file", metavar="FILE") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add some more description of what the config file format is here ? A pointer to some other readme / markdown file will be fine if we dont want to make this too long. Also it'll be good to clarify precedence -- i.e. if a config option is present both on command line and in the file which one is used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, we need to add some more description in the help section. It will be great to document this feature, can we add a pointer to the README.md
file ?
If -c/--conf-file
is used, only the arguments present in the configuration file are used and the options present in the command line, will be ignored. It'll nice to add this information somewhere everyone can spot it directly. Can we print a message to the console when -c/--conf-file
is used, to indicate that only the arguments in the configuration file are used ? Or write it in the help
section of add_option
?
When we will switch from optparse
to argparse
, we can easily manage config file and the other command line arguments at the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized we haven't merged #34 yet - Will it be fine if I merge that first ?
Configuration file
-c
or--conf-file
option enables to use a YAML configuration file to simplify the way you interact with the cluster.Launch new cluster
./spark-ec2 --conf-file config.yml launch my_cluster
Example of a configuration file
Note
-c
or--config-file
option is precised, only the parameters present in the configuration file are used. If there is other options in the command line, they are ignored.~/.ssh/
.Modificaiton
JSON to YAML
Keys must be located in
~/.ssh/