-
Notifications
You must be signed in to change notification settings - Fork 4
Salt
I'm finding Salt to be a pain. For example, compare this:
/etc/login.defs:
file.replace:
- pattern: "^#\\s*CREATE_HOME.*"
- repl: CREATE_HOME no
with
file_replace '^#\s*CREATE_HOME.*' 'CREATE_HOME no' /etc/login.defs
where you implement file_replace as follows:
file_replace() {
perl -pi -e "s{$1}{$2}" "$3"
}
The actual implementation of Salt's file.replace
is substantially longer and more complex. The fact that the state module file.py
is 5,020 lines of code indicates a real problem to me.
Salt is an excellent remote execution engine. It also manages CM state securely, but only if you use Pillars, not Grains, and if you wrap states to manage.
One thing it doesn't do well is authorative configuration management. There is no easy "undo" of all (or any) operations. It doesn't handle deferred udpates. Updates are "atomic enough" for clusters, because you can target machines, and the clients do all the work.
The configuration selectors are too complicated and more importantly insecure. According to the last SaltStack FAQ entry, you can't trust Grains and instead you should use the Minion ID. This subtle design flaw is critically important to the overall architecture.
The documentation indicates strongly that you should use Grains for configuration selection. However, you can't actually modify them in a practical way, unlike the Pillar (Pillars?). It's a bit like using ID's in HTML vs classes for CSS selectors. The former seem easy at first, but then you learn that an ID must be globally unique on the page, which eliminates their utility for easy selection. Rather, you should use classes for everything. The same thing with Salt: you should use the Pillar for everything.
Salt is not fail fast. You can set failhard: true
in the master
and minion configuration, but this doesn't do much. The philosophy
is to keep going in the event of errors. For example,
pillar.stack keeps going
when it can't load a file. The Salt API doesn't use "failhard"
except in the case that a state fails. However, there's no way
to communicate External Pillar failures except by raising
exceptions, which module writers don't seem to do much.
While you might end up with a half configured machine if you fail fast, you will end up with an incorrectly configured machine if you continue after errors. This is a major issue, and one which you have to be careful to work around.
- Unlike RPMs, when you remove a
file.managed
, you have to replace withfile.absent
so the file gets deleted the next time. - Like RPMs, there's no "undo edit". You have to code that yourself.
- service.running does not handle docker well. The container may start fine, but get an error later. service.running thinks everything is fine, but it isn't.
- It remounts NFS partitions constantly. See saltstack/salt#18630
- Setting user.present.creathome False is broken on Fedora in the stable release. Fixed in saltstack/salt#28726 but you have to modify /etc/login.defs in the stable release and what's on Fedora.
- Passing
context
is painful unless you create a dictionary of values. See jupyterhub/init.sls - The pillars and grains are not updated automatically so you have to call saltutil.refresh_pillar AND saltutil.sync_all. Encapsulated this in [minion/init.sls](https://git hub.com/biviosoftware/salt-conf/blob/20570d3de26a887bbb0e7eef6d1ecb7d00a94615/srv/salt/minion/init.sls#L1)
- Not clear what to do about building things like docker images on the client. When do you rebuild? Need to think about dependencies.
- Syntax is far too long:
/foo/bar:
file.directory:
- makedirs: True
- user: root
- group: root
- mode: 750
which should be mkdir -m 750 -p /foo/bar
. These two commands are identical, because -p
doesn't do anything (or fail) when the directory exists.
- You need a "reactor" to initialize the minion properly. See minion start reactor
- systemd needs to be reloaded whenever a unit file changes so you have to write:
jupyterhub:
service.running:
- enable: True
# onchanges is subtle. It doesn't run the state if there are no changes.
# watch runs if there are changes OR if the watched state returns True
# (it acts like require in this case). We want the server to be
# running always when salt runs so it has to be watch.
- watch:
- file: {{ zz.host_config }}
- file: {{ zz.systemd_service }}
- cmd: jupyterhub-image
Understanding the difference between onchanges and watch is unclear. The doc say watch is deprecated in favor of onchanges, but that's not true. You wan this state to run even if there are no changes.
- Naming is a huge issue, because there is no scoping. Including a state file is flat, that is, the names have no package prefix. It's like programming in C. See a sample discussion
- failhard=True is not the default. If there's an error, other states continue
- Undefined values do not result in an error, just empty in Jinja
- Jinja syntax is ugly
- Inter-node state dependencies do not exist. Orchestration is an inadequate substitute.
Here is a salt state to create users:
# https://github.com/saltstack/salt/issues/28726
# user.createhome False does not work without update-etc./etc/etc/login.defs
users_login_defs:
file.replace:
- name: /etc/login.defs
- pattern: "^#\\s*CREATE_HOME.*"
- repl: CREATE_HOME no
{% for name, uid in pillar.users.iteritems() %}
users_{{ name }}:
group.present:
- name: {{ name }}
- gid: {{ uid }}
user.present:
- name: {{ name }}
- uid: {{ uid }}
- gid: {{ uid }}
- createhome: False
- enforce_password: False
mount.mounted:
- name: /home/{{ name }}:
- device: /var/nfs/apa11/home/{{ name }}
- fstype: none
- opts: bind
- mkmnt: True
{% endfor %}
The same thing in bash:
_user_home() {
[[ -d /home/$user ]] && return
mkdir -p /home/$user
mount -o bind -t nfs /var/nfs/apa11/home/$user /home/
}
_user_present() {
id $user >& /dev/null && return
groupadd -g $id $user
useradd -M -g $id -u $id $user
_user_home
}
for name in ${!pillar_users[@]}; do
uid=${pillar_users[$users]}
_user_present
done
Note the lack of the /etc/login.defs
hack to workaround
saltstack/salt#28726.
This is a typical problem with Salt: working around bugs and/or
misfeatures.
Also note that the naming is clearer in the bash script. Salt
pretends to be declarative, but it is imperative: states are
executed in the order they are defined. In the above example,
you'd have to put in require
statements to ensure the order
is written right.
Some might argue that _user_present
in Bash is not right,
because it assumes it can create the group. That's true,
but if there's a group by the name of the user then you've
failed, and you want to stop.
With Docker, we can buy into a single operating system, and the way it does business. This makes the need for portability across installs a non-issue. You either are a Red Hat or Debian/Ubuntu or CoreOS shop for your base operating system.
Salt does not understand "undo". This is a serious problem. If you remove a user from the pillar.users list, you need to remove them from the system. Salt does not handle this automatically so you have to write an imperative script that checks for the existence of users. You can't do that declaratively, because you don't know the users on the system.
In Bash, you could do this:
_user_remove_unknown() {
for home in /home/*; do
user=$(basename $home)
if [[ ! ${pillar_users[$user]} ]]; then
umount /home/$user
userdel -r $user
fi
done
}
There's no way to write this in Salt YAML syntax.
Salt is ultimately a remote execution model with a security layer. The Salt master can initiate (push) an update on a set of minions or minions can initiate (pull) the update. That's convenient. Salt controls what pillars a minion can see.
Treating salt as a secure file server executing a library of files is useful, and probably the best approach.
mod_init is a halfway state. There's no __env__
so you can't, say
call file.managed
, because it tries to pass saltenv=__env__
.
You do get the low
as an argument which is the same as __lowstate__
global.
We wrap all states with an init so instead, we just check a global variable to detect if the module needs to be initialized.
Systemd's firewalld is supported, but it's inconsistent from other modules in that the entire state must be defined in a single state declaration. Consider:
public:
firewalld.present:
- ports:
- 999/tcp
This will enable port 999 in the public zone. However, all other
ports in /etc/firewalld/zones/public.xml will be cleared. And,
things like masquerade
will be turned off, because the
default value is False.
The systctl state module only supports setting a single value:
vm.swappiness:
sysctl.present:
- value: 20
What you want is to configure sysctl in groups:
my_systctl_conf:
sysctl.present:
vm.swappiness: 20
vm.dirty_background_ratio: 1
If there's a syntax or other error in the state module, it will output something like:
Reason: 'radia.cluster_stop' is not available.
Run python <state module>.py
to debug.
salt-call cp.list_master
ID: postgresql_image
Function: bivio.docker_image
Result: False
Comment: State 'bivio.docker_image' was not found in SLS 'test'
Reason: 'bivio.docker_image' is not available.
Started:
Duration:
Changes:
Failed to load the module bivio
. There's a syntax error, but Salt won't tell
you. You have to run:
python <state>.py
This will happen if there was an attempt to accept already in the queue for a host:
$ salt-key -L
Accepted Keys:
Denied Keys:
v3
Unaccepted Keys:
v3
Rejected Keys:
You need to:
$ salt-key -d v3
The following keys are going to be deleted:
Denied Keys:
v3
Unaccepted Keys:
v3
Proceed? [N/y] y
No minions matched the target. No command was sent, no jid was assigned.
No minions matched the target. No command was sent, no jid was assigned.
Key for minion v3 deleted.
This is likely a permission issue. Check /etc/salt/pki permissions in the container.
2016-05-11 02:02:07,031 [salt.config ][INFO ][20931] Found minion id from generate_minion_id(): salt-master
2016-05-11 02:02:07,141 [salt.utils.process][INFO ][35] Process <class 'salt.master.Maintenance'> (20931) died with exit status None, restarting...
SLS files cannot have .
in them. It took a while to debug.
Nothing works, and there isn't
an error output, just foo.bar
not found.
It's referenced in the documentation on includes. A .
indicates a /
in the
include mechansim.
Need an inventory of actual changes ("pchanges" in Salt's terminology) that can be undone, e.g. creating a directory can be undone, but "system daemon-reload" cannot be undone. Every highstate, we indicate all changes that would be done if this were a fresh install. If there are new operations that actually had pchanges, ones not already in the db, those changes are added to the db, in the order they need to be done. For example, if there's a new keyfile for a service, it would have to appear before the service was started that way when you undo, you don't remove the keyfile before you stop the service. All operations probably have to be expressed as idempotent shell operations that can be reversed. eg. mkdir /foo/bar, and not mkdir -p /foo/bar, because we don't know that /foo already exists.
The question of data is important. It will not be remove unless there is an indication that it can be removed. There might be a need to backup the objects before they are removed. Consider /var/lib/postgresql/data. This can be the remove sequence:
systemctl stop postgresql
docker rm -f postgresql
backup /var/lib/postgresql/data
rm -rf /var/lib/postgresql/data
backup may be large so where does it go? maybe just mv /var/lib/postgresql/data /var/lib/postgresql/data-20160509-bivio-salt-bkp This is "removed" We make sure the archive happens before removing. With databases, you would want to execute an actual backup.
Most other applications it doesn't matter.
You must define the module as a package, i.e. the file name must be:
salt/_modules/foomod/__init__.py
.