Skip to content

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
erothe committed Mar 4, 2020
0 parents commit acc6a23
Show file tree
Hide file tree
Showing 16 changed files with 3,039 additions and 0 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
\#*
*pyc
674 changes: 674 additions & 0 deletions LICENSE

Large diffs are not rendered by default.

126 changes: 126 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
# mobylette

mobylette is a python script written to parse system log files
and report which environment modules were loaded and how many times
each one was loaded. mobylette has a low memory footprint and can parse
several files simultaneously using several cpus, which is ideal for cluster
environments.

## tldr;

```
git clone https://github.com/scibian/mobylette
cd mobylette
source install.sh
mobylette
```
## Prerequisites

mobylette was designed to work with Lmod for Scibian 8 (since version 6.6-0.2sci8u2)
and Lmod for Scibian 9 (since version 6.6-0.2sci9u4).

## Configuration

mobylette uses a configuration file (`mobylete.conf`) which provides information about where to look for logs, which sub-directories to search and which files to read.

- `log_path` - mobylette will look for log files in this directory and all sub-directories.
- `patterns` - This field actually accepts regular expressions (python compatible). As an example, mobylette can search only compressed files if the following regex `filename[\S]gz` is used.

mobylette was designed to run in a cluster environment. As such, there can be hundreds or thousands of sub-directories where to look for logs. mobylette assumes these sub-directories have a naming mechanism which can be seen as `prefix` + `nodes`.

- `prefix` - Prefix for the nodes, usually the first letters of the cluster name.
- `nodes` - Usually a cluster has different type of nodes. Can accept multiple values separated by commas.

If both of these two fields are left empty, mobylette will search for files under the directory given by log_path and all its sub-directories.

mobylette will look for the configuration file in `.mobylette` under the user home directory and then in the `/etc` directory.

## Installation

After having cloned the repository the `install.sh` script should be sourced. This will update the `PATH` and `PYTHONPATH` variables with the location of mobylette files.

In case it is the first time you are using mobylette, this script will also create a dummy configuration file under `.mobylette` in the users home directory. This out-of-the-box configuration sets up mobylette to run against two log files which are given in the sample directory.

## Getting started

The `mobylette` command will create a csv file (in the same directory where the command was issued) and also a horizontal bar graph relating the data found in the csv file.

With the out-of-the-box configuration (i.e., mobylette.conf points to the sample directory) the following data can be seen:

```
abc/123,1
gcc/9.2,3
ifaible/2020,1
ifaible/3030,2
ifort/2020,1
ifort/3030,2
python/2.7,2
python/3.6,1
```

![Number of module load commands](sample/0_abc123_python3.6.svg)


## Use cases

$ mobylette

The standard behaviour is to count the number of modules found in distinct jobs.

$ mobylette -group cat

The `-group cat`option tells mobylette to group the results by module category. That is, the name of the module or the string that preceds the version of the module.

$ mobylette -group path

When grouping by the modules path, only the first directory of the path is considered.

$ mobylette -uniq users

Other than counting the number of module across different jobs, mobylette can also count the number of modules loaded by diferent users. The report file created with this option also lists the users that loaded the module.

$ mobylette -uniq users -group cat

As before, modules can be grouped in categories, also when the different users paradigm is in use.

$ mobylette -uniq users -group path

Grouping by path is also available when counting unique users.

## Filter options

-start yyyymmdd

When start parameter is specified, mobylette will only retain system log entries whose date is AFTER the given parameter.
If desired, this parameter will also accept the time passed immediately after the date (no space between) in hhmmss format.

-end yyyymmdd

If specified, mobylette will only retain system log entries whose date is BEFORE the given parameter.
If desired, this parameter will also accept the time passed immediately after the date (no space between) in hhmmss format.

-module <module1> <module1>

A list of modules can be given to mobylette. As expected mobylete will check if the modules found in the logs belong to this list before updating the counts.

## Other options

-cpus n

By default, mobylette will take up to 75% of the existing cores. This behaviour can be changed by explicitly giving the number od cores to be used with the `-cpus` parameter. No need to say, don't run mobylette with more cores than you have.

-verbose

Can be usedfull, who knows?

-chart-color <hex-value, no pound sign>

Faishon is always changing. Try to keep up!

## Authors

EDF CCN-HPC

## License

mobylette is distributed under the terms of the GPL v3 licence.
210 changes: 210 additions & 0 deletions bin/mobylette
Original file line number Diff line number Diff line change
@@ -0,0 +1,210 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
##############################################################################
# #
# This file is part of the mobylette parsing tool. #
# Copyright (C) 2019 EDF SA #
# #
# mobylette is free software: you can redistribute it and/or modify #
# it under the terms of the GNU General Public License as published by #
# the Free Software Foundation, either version 3 of the License, or #
# (at your option) any later version. #
# #
# mobylette is distributed in the hope that it will be useful, #
# but WITHOUT ANY WARRANTY; without even the implied warranty of #
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the #
# GNU General Public License for more details. #
# #
# You should have received a copy of the GNU General Public License #
# along with mobyllette. If not, see <http://www.gnu.org/licenses/>. #
# #
##############################################################################

import re # compile
import os # walk, path
import sys # stdout
import csv # writer
import time # sleep
import multiprocessing as mp # Process, Pool, cpu_count
from collections import Counter

from mobylette.args import ParseArgs
from mobylette.config import Config
from mobylette.chart import Chart
import mobylette.reader as reader

def create_file_list(pattern, log_path, nodes_tuple):
''' Returns list of file matching pattern stored
under log_path on directories whose name starts
by nodes_tuple.
'''
files_list = []
files_count = 0
files_size = 0
file_patt = re.compile(pattern)
for root, dirs, files in os.walk(log_path, topdown=False):
if os.path.basename(root).startswith(nodes_tuple):
for name in files:
match = file_patt.search(name)
if match is not None:
files_count += 1
file = os.path.join(root, name)
files_size += os.path.getsize(file)
files_list.append(file)
return files_list

def backspace(n):
''' Moves cursor n steps backwards '''
sys.stdout.write((b'\x08' * n).decode())

def erase_forward(n):
''' Prints white space n times '''
sys.stdout.write(' ' * n)
sys.stdout.flush()

def point_sleep(n):
''' Prints a period and sleeps
for 1 second (repeat n times) '''
for i in range(n):
sys.stdout.write('.')
sys.stdout.flush()
time.sleep(1)

def erase_sleep(n):
''' Prints white space and sleeps
for 1 second (repeat n times) '''
for i in range(n):
sys.stdout.write(' ')
sys.stdout.flush()
time.sleep(1)

def print_usr_msg(msg):
''' Prints wating message on screen '''
sys.stdout.write(msg) ; sys.stdout.flush()
length = 5
while True:
point_sleep(length)
backspace(length)
erase_forward(length)
backspace(length)

if __name__ == "__main__":

# All arguments are stored in a dictionary which
# is used whenever needed across the program.
parse = ParseArgs()
options = {}
options['internals'] = {'uniq' : parse.args.uniq, 'group' : parse.args.group, 'verbose' : parse.args.verbose, 'cpus' : parse.args.cpus}
options['filter'] = {'start_date' : parse.args.start_date, 'end_date' : parse.args.end_date, 'module' : parse.args.module}
options['chart'] = {'max_charts' : parse.args.max_charts, 'max_rows' : parse.args.max_rows, 'chart_color' : parse.args.chart_color}

# Reads configuration file mobylette.conf
conf = Config(options['internals']['verbose'])
options['config'] = {'log_path' : conf.log_path, 'cluster_name' : conf.cluster_name, 'cluster_prefix' : conf.cluster_prefix,
'search_pattern' : conf.search_pattern, 'nodes_tuple' : conf.nodes_tuple, 'hostname' : conf.hostname}

if options['internals']['verbose']:
print(options)

# Creates list of all files to be parsed, meaning all files that
# match the pattern in mobylette.conf and that are located also
# in the path given by `log_path` in this same file.
file_list = create_file_list(options['config']['search_pattern'],
options['config']['log_path'],
options['config']['nodes_tuple'])

print('total files found: {}'.format(len(file_list)))

if options['internals']['verbose']:
print(file_list)

if options['internals']['cpus'] is None:
from math import floor
cpu = int(floor(mp.cpu_count() * 0.75))
else:
cpu = int(options['internals']['cpus'])

print("using {} cpus".format(cpu))

start_date_arr = [options['filter']['start_date']] * len(file_list)
end_date_arr = [options['filter']['end_date']] * len(file_list)
modules_arr = [options['filter']['module']] * len(file_list)

# Main logic for mobylette
if 'users' in options['internals']['uniq']:
worker = reader.read_users
chart_label = 'Nombre modules (utilisateurs uniques)'
if 'cat' == options['internals']['group']:
worker = reader.read_users_cat
chart_label = 'Nombre de modules par categorie (utilisateurs uniques)'
if 'path' == options['internals']['group']:
worker = reader.read_users_path
chart_label = 'Nombre de modules par chemin (utilisateurs uniques)'
elif 'jobs' in options['internals']['uniq']:
worker = reader.read_jobs
chart_label = 'Nombre de modules (jobs uniques)'
if 'cat' == options['internals']['group']:
worker = reader.read_jobs_cat
chart_label = 'Nombre de modules par categorie (jobs uniques)'
if 'path' == options['internals']['group']:
worker = reader.read_jobs_path
chart_label = 'Nombre de modules par chemin (jobs uniques)'

# The following two commented lines are usefull when debuging
# the read functions where a serial version is needed.
#params = (file_list[0], start_date_arr[0], end_date_arr[0], modules_arr[0])
#result = worker(params)

# Sends message to user while parsing files
usr_msg = mp.Process(target=print_usr_msg, args=('parsing files ', ))
usr_msg.start()

# For this technique to work, the parameter array must be passed in this order:
# file_list, start_array, end_array, module_list
p = mp.Pool(processes=cpu)

# For performance issues we'll be calling a dedicated function
# according to the parameters we have.
result = p.map(worker, zip(file_list, start_date_arr, end_date_arr, modules_arr))

usr_msg.terminate()
usr_msg.join()
sys.stdout.write('\n') ; sys.stdout.flush()

print 'checking results'

# matter is the set of ALL DISTINCT tuples found. Because two files can reference the
# same tuple the creation of the matter uses the set instruction to only count once each tuple.
# The matter variable is a list of tuples. It also does a simplification of the `result` variable.
matter = set([result[files].values()[0][item] for files in range(len(result))
for item in range(len(result[files].values()[0]))])
bodies = {}
for first, second in matter:
bodies.setdefault(first, []).append(second)

if len(matter) == 0:
print 'no modules found matching criteria'
exit()

sorted_keys = sorted(bodies.keys())

print 'ploting graphs'

# Create chart (orderd bodies)
chart = Chart(sorted_keys, [len(bodies[mod]) for mod in sorted_keys],
options, x_label = chart_label, title = 'Chargement de modules', chart_color = options['chart']['chart_color'])

print 'writing report'

# Create CSV report
with open('report.csv', 'wb') as csv_file: # Just use 'w' mode in 3.x
writer = csv.writer(csv_file)
for key in sorted_keys:
str = [key]
str.append(len(bodies[key]))
if 'users' in options['internals']['uniq'] and options['internals']['group'] is None:
for val in sorted(bodies[key]):
str.append(val)
writer.writerows([str])

print 'execution finished ok'
16 changes: 16 additions & 0 deletions config/mobylette.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
[CLUSTER]
# Where the logs are stored.
# Mobylette will search inside this folder and subfolders.
log_path=/var/log
# Name of the cluster
name=nestor
# In which nodes should mobylette look for information.
# This can save a lot of time by skiping giant log files.
# Can have more than one value separated by comma.
nodes=compute,graph
# Mobylette will look for data in files which begin by this pattern.
patterns=clusterlogs
# mobylette assumes node names are obtained by
# concatenation between `prefix` and `nodes`.
# In this case, it will look for nodes named ntrcompute and ntrgraph.
prefix=ntr
34 changes: 34 additions & 0 deletions install.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
#!/bin/bash

MOBYLETTE_HOME=`pwd`
export PATH=$MOBYLETTE_HOME/bin:$PATH
export PYTHONPATH=$MOBYLETTE_HOME:$PYTHONPATH

# Creates mobylette.conf
CONF_FILE=mobylette.conf
CONF_PATH=`eval echo ~$USER`/.mobylette

if [ -d "$CONF_PATH" ]; then
read -p "$CONF_PATH already exists. Overwrite (y/n)? " ANSWER

if [ "$ANSWER" = "n" ]; then
echo "mobylette ready"
return
fi

if [ "$ANSWER" = "y" ]; then

rm -r $CONF_PATH
mkdir $CONF_PATH

echo "[CLUSTER]" > $CONF_PATH/$CONF_FILE
echo "log_path=$MOBYLETTE_HOME/" >> $CONF_PATH/$CONF_FILE
echo "name=phoenix" >> $CONF_PATH/$CONF_FILE
echo "nodes=" >> $CONF_PATH/$CONF_FILE
echo "patterns=dataset" >> $CONF_PATH/$CONF_FILE
echo "prefix=sample" >> $CONF_PATH/$CONF_FILE

echo "mobylette ready"
return
fi
fi
Loading

0 comments on commit acc6a23

Please sign in to comment.