Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce overly noisy logging of services #2500

Open
sgnn7 opened this issue Jan 20, 2016 · 15 comments
Open

Reduce overly noisy logging of services #2500

sgnn7 opened this issue Jan 20, 2016 · 15 comments

Comments

@sgnn7
Copy link
Contributor

sgnn7 commented Jan 20, 2016

All services seem to create an exorbitant amount of log output on systemd machines that literally eats up almost any amount of space on a machine/VM that it's installed on:

  • flocker-* services have --journald appended to them
  • They all are really chatty - at least one message per service per second
  • /etc/systemd/journald.conf has ForwardToSyslog=True by default so it sends everything to /var/log/syslog
  • /var/log/syslog is rotated daily and 7 are saved by default
  • /var/log/kern.log is weekly and 4 are saved default
  • We get about >1GB of logs per day (roughly) on a working cluster with no apps running and exponentially more when there's Flocker problems
  • Within a few days, we run out of space on slave/master node

Options:

@wallnerryan
Copy link
Contributor

Thanks for filing this issue @sgnn7, will bring this back to our team.

@itamarst
Copy link
Contributor

What operating system is this?

@itamarst
Copy link
Contributor

In particular on CentOS I believe we make journald logs for Flocker services not go to syslog, so one can rely on journald's log rotation; unlike syslog it's size based so you're not stuck waiting for scheduled rotation.

@itamarst
Copy link
Contributor

Also, are most of the logs from one particular service? I would expect flocker-container-agent to be by far the worst culprit. While it's required right now we're working on making that service optional (since it's only really needed for our deprecated containers API). So that should reduce log volume by quite a bit.

@sgnn7
Copy link
Contributor Author

sgnn7 commented Jan 21, 2016

@itamarst It's Ubuntu 15.10 (systemd) and all of the services are pretty noisy - I had to manually squelch/override these units in /etc/systemd/system in my deployment: flocker-control, flocker-docker-plugin, flocker-dataset-agent, and flocker-container-agent.

Even if you disable things from going to syslog, I think kern.log still gets this stuff and will consume a lot of space, just on a smaller scale and might get log-rotated before it eats up the allocated space. Either way, seeing each service report "success" json each second on every machine is a bit superfluous and on the "debug" output side of things imo since signal-to-noise ratio is pretty low.

@itamarst
Copy link
Contributor

The once a second noise is mostly driven by flocker-container-agent; once it's gone the quiescent state involves agents waking up only once a minute, so <2% of current noise.

When we support Ubuntu with journald (likely LTS 16.04) we'll do the work to ensure we don't fill up disk, including kern.log. Keep in mind that that Ubuntu 15.10 is an unsupported system at the moment.

@sgnn7
Copy link
Contributor Author

sgnn7 commented Jan 21, 2016

@itamarst Fair enough though I suspect 16.04 LTS is going to be very close to 15.10 so fixing things on that distribution would go a long way to make it compatible :)

@nikolaiderzhak
Copy link

On Ubuntu 15.10 I have fixed it by adding number to rsyslog config for flocker.
So I believe there is issue with order in which rules apply during rsyslogd init.

$ sudo mv /etc/rsyslog.d/flocker.conf /etc/rsyslog.d/11-flocker.conf 
$ cat /etc/rsyslog.d/11-flocker.conf 
# Flocker logs will end up in journald and can be accessed via journalctl. There
# is therefore no need to duplicate them in /var/log/messages; instead they
# should be discarded by rsyslog.
if $programname == 'flocker-control' then ~
if $programname == 'flocker-dataset-agent' then ~
if $programname == 'flocker-container-agent' then ~
$ sudo service rsyslog restart

@nikolaiderzhak
Copy link

I still can see logs using journalctl if I need. Also disk usage shows 80M not gigs like for syslog/kern.log before change.

oot@es-100-pem-300-docker-swarm-master-1:/home/ubuntu# journalctl --since='23:41:51'| tail -n1
Feb 24 23:42:59 es-100-pem-300-docker-swarm-master-1 flocker-control[27728]: {"timestamp": 1456357379.413892, "task_uuid": "dbe0f32b-47a6-4b54-9a71-5bb82c0f01e7", "action_type": "eliot:remote_task", "action_status": "succeeded", "task_level": [3, 15, 2, 3]}
root@es-100-pem-300-docker-swarm-master-1:/home/ubuntu# journalctl --disk-usage 
Archived and active journals take up 80.0M on disk.
root@es-100-pem-300-docker-swarm-master-1:/home/ubuntu# date
Wed Feb 24 23:43:14 UTC 2016

@nikolaiderzhak
Copy link

flocker-docker-plugin is also chatty. So added one more line in the config:

$ cat /etc/rsyslog.d/11-flocker.conf 
# Flocker logs will end up in journald and can be accessed via journalctl. There
# is therefore no need to duplicate them in /var/log/messages; instead they
# should be discarded by rsyslog.
if $programname == 'flocker-control' then ~
if $programname == 'flocker-dataset-agent' then ~
if $programname == 'flocker-container-agent' then ~
if $programname == 'flocker-docker-plugin' then ~

@wallnerryan
Copy link
Contributor

Thanks @nikolaiderzhak ! Appreciate you sharing

@rmmr
Copy link

rmmr commented Mar 6, 2016

So I've been following the manual install method. Right now I (think) I got everything working on some vagrant boxes. I've been closely watching the logs in /var/logs/flocker. The flocker-container-agent.log file keeps spamming out these message every second:

{"timestamp": 1457307418.0748, "task_uuid": "7bab5f41-8e15-4d3f-a486-8193b4d51549", "message_type": "flocker:node:docker:image_from_cache", "image": "sha256:0cf2bf2398fa0e976120b1a5520ca3847a3c499b1e0391e473c69ad7ca4eb099", "task_level": [2]}
{"timestamp": 1457307418.07537, "task_uuid": "7bab5f41-8e15-4d3f-a486-8193b4d51549", "action_type": "flocker:node:docker:inspect_image", "action_status": "succeeded", "task_level": [3]}
{"task_uuid": "776a4cf0-84ba-47fc-ab87-f1cb7227270b", "task_level": [2, 2, 2], "action_type": "flocker:agent:discovery", "timestamp": 1457307418.076333, "state": "NodeLocalState(node_state=NodeState(applications=ApplicationPSet([]), paths=None, manifestations=None, hostname=u'127.0.0.1', uuid=UUID('e95c8c59-95d9-471c-9240-f10302acc2c4'), devices=None))", "action_status": "succeeded"}
{"timestamp": 1457307418.092091, "task_uuid": "776a4cf0-84ba-47fc-ab87-f1cb7227270b", "message_type": "flocker:agent:converge:actions", "task_level": [2, 3], "calculated_actions": "NoOp(sleep=datetime.timedelta(0, 1))"}
{"timestamp": 1457307418.092744, "task_uuid": "776a4cf0-84ba-47fc-ab87-f1cb7227270b", "action_type": "flocker:change:noop", "action_status": "started", "task_level": [2, 4, 1]}
{"timestamp": 1457307418.093303, "task_uuid": "776a4cf0-84ba-47fc-ab87-f1cb7227270b", "action_type": "flocker:change:noop", "action_status": "succeeded", "task_level": [2, 4, 2]}
{"fsm_identifier": "<flocker.node._loop.ConvergenceLoop object at 0x7fc69356ee50>", "fsm_input": "<ConvergenceLoopInputs=SLEEP>", "timestamp": 1457307418.094136, "fsm_rich_input": "<_Sleep>", "action_status": "started", "task_uuid": "776a4cf0-84ba-47fc-ab87-f1cb7227270b", "action_type": "fsm:transition", "fsm_state": "<ConvergenceLoopStates=CONVERGING>", "task_level": [2, 5, 1]}
{"fsm_next_state": "<ConvergenceLoopStates=SLEEPING>", "task_level": [2, 5, 2], "action_type": "fsm:transition", "timestamp": 1457307418.094852, "fsm_output": ["<ConvergenceLoopOutputs=SCHEDULE_WAKEUP>"], "task_uuid": "776a4cf0-84ba-47fc-ab87-f1cb7227270b", "action_status": "succeeded"}
{"timestamp": 1457307418.095458, "task_uuid": "776a4cf0-84ba-47fc-ab87-f1cb7227270b", "action_type": "flocker:agent:converge", "action_status": "succeeded", "task_level": [2, 6]}

I have no idea if this is normal. If it is, is there a way i can set the verbosity to reduce the noise?

@wallnerryan
Copy link
Contributor

Hey @adaptivdesign, yes this is actually normal for now. The caveat is that we are getting rid of the need to use the container agent so we have lower priority for things like this. Future cases (and i think in master right now) you have the option to turn the container agent off.

@sgnn7
Copy link
Contributor Author

sgnn7 commented Mar 7, 2016

@adaptivdesign / @nikolaiderzhak I've created overrides for the noisy services and they work pretty well as they don't get logged while still also printing out stderr output on problems:

  • /etc/systemd/system/flocker-<name_of_chatty_service>.service.d/sqeulch.conf :
[Service]

StandardOutput=null

@rmmr
Copy link

rmmr commented Mar 7, 2016

@wallnerryan I totally understand. I'll keep an eye out for new releases. If it's nothing to worry about i'll go with @sgnn7 solution for now 👍

Thank you both for your quick response!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants