Config file devel #41

ar-ms · 2016-07-20T10:09:42Z

Configuration file

-c or --conf-file option enables to use a YAML configuration file to simplify the way you interact with the cluster.

Launch new cluster

./spark-ec2 --conf-file config.yml launch my_cluster

Example of a configuration file

slaves: 2
instance_type: r3.xlarge
region: eu-west-1
zone: eu-west-1c
ami: ami-xxxxxxxx
spot_price: 0.07
hadoop_major_version: yarn

spark_ec2_git_repo: https://github.com/tirami-su/spark-ec2
spark_ec2_git_branch: config_file-devel

copy_aws_credentials: true
no_ganglia: y
resume: on

## Network and security
key_pair: xxxxx
identity_file: xxxxx.pem

credentials: 
   aws_access_key_id: XXXXX
   aws_secret_access_key: XXXXX

Note

In this version, when -c or --config-file option is precised, only the parameters present in the configuration file are used. If there is other options in the command line, they are ignored.
The identity file must be located in ~/.ssh/.

Modificaiton

JSON to YAML
Keys must be located in ~/.ssh/

nchammas · 2016-07-21T16:51:44Z

This is a much-requested feature from the early days of spark-ec2, so thank you for working on this.

I wonder, though, if there is a way to implement this feature without using JSON -- which IMO is not appropriate as a configuration format -- and without manually maintaining this mapping of config name to variable name.

Are there no features of argparse that can be used to offer this functionality in a more straightforward way?

ar-ms · 2016-07-25T11:52:26Z

I really enjoy using this feature, it permits me to make less error when managing a cluster, but I ignored that it was a much-requested feature.

I was thinking that JSON was an appropriate format, it seems clear and simple to manipulate.
We could use ConfigParser which uses the INI file format.
The YAML format could be nice too, but it requires to install PyYAML library...

I didn't find a feature of argparse that gives us what we want, but I wrote a function to avoid to maintain the manual mapping.

unneeded_opts = ("version", "help")

def mapping_conf(opt):
    normal_opt = str(opt).split("/")[-1]
    trans_opt = normal_opt.strip("-")
    trans_opt = trans_opt.replace("-", "_")
    return {trans_opt: normal_opt}

map_conf = {}
[map_conf.update(mapping_conf(opt)) for opt in parser.option_list]
[map_conf.pop(unneeded_opt, None) for unneeded_opt in unneeded_opts]

map_conf:

{
    'D': '-D',
    'additional_security_group': '--additional-security-group',
    'additional_tags': '--additional-tags',
    'ami': '--ami',
    'authorized_address': '--authorized-address',
    'conf_file': '--conf-file',
    'copy_aws_credentials': '--copy-aws-credentials',
    'delete_groups': '--delete-groups',
    'deploy_root_dir': '--deploy-root-dir',
    'ebs_vol_num': '--ebs-vol-num',
    'ebs_vol_size': '--ebs-vol-size',
    'ebs_vol_type': '--ebs-vol-type',
    'ganglia': '--ganglia',
    'hadoop_major_version': '--hadoop-major-version',
    'identity_file': '--identity-file',
    'instance_initiated_shutdown_behavior': '--instance-initiated-shutdown-behavior',
    'instance_profile_name': '--instance-profile-name',
    'instance_type': '--instance-type',
    'key_pair': '--key-pair',
    'master_instance_type': '--master-instance-type',
    'master_opts': '--master-opts',
    'no_ganglia': '--no-ganglia',
    'placement_group': '--placement-group',
    'private_ips': '--private-ips',
    'profile': '--profile',
    'region': '--region',
    'resume': '--resume',
    'slaves': '--slaves',
    'spark_ec2_git_branch': '--spark-ec2-git-branch',
    'spark_ec2_git_repo': '--spark-ec2-git-repo',
    'spark_git_repo': '--spark-git-repo',
    'spark_version': '--spark-version',
    'spot_price': '--spot-price',
    'subnet_id': '--subnet-id',
    'swap': '--swap',
    'use_existing_master': '--use-existing-master',
    'user': '--user',
    'user_data': '--user-data',
    'vpc_id': '--vpc-id',
    'wait': '--wait',
    'worker_instances': '--worker-instances',
    'zone': '--zone'
}

Do you think this solution could do the job ?

nchammas · 2016-07-25T12:41:58Z

Yes, I think it's a step up to have a function that does the mapping for us.

Regarding JSON vs. something else, I am biased since I chose YAML for Flintrock. 😄 The main problem I have with JSON is that you can't put comments in it. In Flintrock I also used Click to parse command-line arguments, since that lets the user load configs from a file but override individual settings at the command-line.

Obviously, we don't have to do that here, as it would be a pretty big change. If this works well enough for @shivaram then it's fine by me too.

shivaram · 2016-07-25T18:40:45Z

I prefer YAML as well compared to JSON -- Its just more human friendly to read and format.
@Tirami-su Would it be possible to have a similar function that generates the mapping for YAML ?

Also for YAML parsing we should download the library similar to how we download boto -- That should make it transparent to the user ?

ar-ms · 2016-07-25T18:59:09Z

@nchammas haha you're biased but you're right 😄 !

@shivaram I will take a look and let you know, I think that it should be possible. Yep you're right, we should do this 👍 !

…fig file args.

ar-ms · 2016-07-27T10:08:11Z

@nchammas, @shivaram I pushed the new version with the YAML configuration file format. You can check :) !

nchammas · 2016-07-27T14:51:54Z

spark_ec2.py



 # Only PyPI libraries are supported.
 external_libs = [
    {
        "name": "boto",
        "version": "2.34.0",
-        "md5": "5556223d2d0cc4d06dd4829e671dcecd"
+        "md5": "5556223d2d0cc4d06dd4829e671dcecd",
+        "add-path": None


What's the purpose of add-path?

The purpose add-path is to precise a custom function to execute, because some libraries need to be added to the system path differently than just doing sys.path.insert(1, libs/LIBRARY_NAME).

nchammas · 2016-07-27T14:54:19Z

@Tirami-su you will probably also want to add a note to the REAME, perhaps along with a brief example config file, to show users how this feature works.

ar-ms · 2016-07-27T15:09:59Z

@nchammas No prob, I will add a description of this feature with an example.

…located under .ssh/

shivaram · 2016-07-28T03:50:52Z

spark_ec2.py

@@ -327,13 +411,19 @@ def parse_args():
    parser.add_option(
        "--instance-profile-name", default=None,
        help="IAM profile name to launch instances under")
+    parser.add_option("-c", "--conf-file", default=None,
+                        help="Specify config file", metavar="FILE")


Can we add some more description of what the config file format is here ? A pointer to some other readme / markdown file will be fine if we dont want to make this too long. Also it'll be good to clarify precedence -- i.e. if a config option is present both on command line and in the file which one is used

Yep, we need to add some more description in the help section. It will be great to document this feature, can we add a pointer to the README.md file ?

If -c/--conf-file is used, only the arguments present in the configuration file are used and the options present in the command line, will be ignored. It'll nice to add this information somewhere everyone can spot it directly. Can we print a message to the console when -c/--conf-file is used, to indicate that only the arguments in the configuration file are used ? Or write it in the help section of add_option ?

When we will switch from optparse to argparse, we can easily manage config file and the other command line arguments at the same time.

I just realized we haven't merged #34 yet - Will it be fine if I merge that first ?

tirami-su added 3 commits July 19, 2016 17:58

The cluster could be configured using a JSON configuration file.

c4a9983

sys.exit(-1) -> sys.exit(1) and check config file emplacement changed.

45008d9

unused code removed. Comments added.

c97ee21

YAML configuration file, automatique mapping between cmd args and con…

5d856da

…fig file args.

nchammas reviewed Jul 27, 2016
View reviewed changes

No longer need to put keys in the keys/ folder. The keyfs need to be …

d707640

…located under .ssh/

shivaram reviewed Jul 28, 2016
View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Config file devel #41

Config file devel #41

ar-ms commented Jul 20, 2016 •

edited

Loading

nchammas commented Jul 21, 2016

ar-ms commented Jul 25, 2016

nchammas commented Jul 25, 2016

shivaram commented Jul 25, 2016

ar-ms commented Jul 25, 2016

ar-ms commented Jul 27, 2016

nchammas Jul 27, 2016

ar-ms Jul 27, 2016 •

edited

Loading

nchammas commented Jul 27, 2016

ar-ms commented Jul 27, 2016

shivaram Jul 28, 2016

ar-ms Jul 28, 2016

shivaram Jul 29, 2016

Config file devel #41

Are you sure you want to change the base?

Config file devel #41

Conversation

ar-ms commented Jul 20, 2016 • edited Loading

Configuration file

Launch new cluster

Example of a configuration file

Note

Modificaiton

nchammas commented Jul 21, 2016

ar-ms commented Jul 25, 2016

nchammas commented Jul 25, 2016

shivaram commented Jul 25, 2016

ar-ms commented Jul 25, 2016

ar-ms commented Jul 27, 2016

nchammas Jul 27, 2016

Choose a reason for hiding this comment

ar-ms Jul 27, 2016 • edited Loading

Choose a reason for hiding this comment

nchammas commented Jul 27, 2016

ar-ms commented Jul 27, 2016

shivaram Jul 28, 2016

Choose a reason for hiding this comment

ar-ms Jul 28, 2016

Choose a reason for hiding this comment

shivaram Jul 29, 2016

Choose a reason for hiding this comment

ar-ms commented Jul 20, 2016 •

edited

Loading

ar-ms Jul 27, 2016 •

edited

Loading