-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit acc6a23
Showing
16 changed files
with
3,039 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
\#* | ||
*pyc |
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,126 @@ | ||
# mobylette | ||
|
||
mobylette is a python script written to parse system log files | ||
and report which environment modules were loaded and how many times | ||
each one was loaded. mobylette has a low memory footprint and can parse | ||
several files simultaneously using several cpus, which is ideal for cluster | ||
environments. | ||
|
||
## tldr; | ||
|
||
``` | ||
git clone https://github.com/scibian/mobylette | ||
cd mobylette | ||
source install.sh | ||
mobylette | ||
``` | ||
## Prerequisites | ||
|
||
mobylette was designed to work with Lmod for Scibian 8 (since version 6.6-0.2sci8u2) | ||
and Lmod for Scibian 9 (since version 6.6-0.2sci9u4). | ||
|
||
## Configuration | ||
|
||
mobylette uses a configuration file (`mobylete.conf`) which provides information about where to look for logs, which sub-directories to search and which files to read. | ||
|
||
- `log_path` - mobylette will look for log files in this directory and all sub-directories. | ||
- `patterns` - This field actually accepts regular expressions (python compatible). As an example, mobylette can search only compressed files if the following regex `filename[\S]gz` is used. | ||
|
||
mobylette was designed to run in a cluster environment. As such, there can be hundreds or thousands of sub-directories where to look for logs. mobylette assumes these sub-directories have a naming mechanism which can be seen as `prefix` + `nodes`. | ||
|
||
- `prefix` - Prefix for the nodes, usually the first letters of the cluster name. | ||
- `nodes` - Usually a cluster has different type of nodes. Can accept multiple values separated by commas. | ||
|
||
If both of these two fields are left empty, mobylette will search for files under the directory given by log_path and all its sub-directories. | ||
|
||
mobylette will look for the configuration file in `.mobylette` under the user home directory and then in the `/etc` directory. | ||
|
||
## Installation | ||
|
||
After having cloned the repository the `install.sh` script should be sourced. This will update the `PATH` and `PYTHONPATH` variables with the location of mobylette files. | ||
|
||
In case it is the first time you are using mobylette, this script will also create a dummy configuration file under `.mobylette` in the users home directory. This out-of-the-box configuration sets up mobylette to run against two log files which are given in the sample directory. | ||
|
||
## Getting started | ||
|
||
The `mobylette` command will create a csv file (in the same directory where the command was issued) and also a horizontal bar graph relating the data found in the csv file. | ||
|
||
With the out-of-the-box configuration (i.e., mobylette.conf points to the sample directory) the following data can be seen: | ||
|
||
``` | ||
abc/123,1 | ||
gcc/9.2,3 | ||
ifaible/2020,1 | ||
ifaible/3030,2 | ||
ifort/2020,1 | ||
ifort/3030,2 | ||
python/2.7,2 | ||
python/3.6,1 | ||
``` | ||
|
||
 | ||
|
||
|
||
## Use cases | ||
|
||
$ mobylette | ||
|
||
The standard behaviour is to count the number of modules found in distinct jobs. | ||
|
||
$ mobylette -group cat | ||
|
||
The `-group cat`option tells mobylette to group the results by module category. That is, the name of the module or the string that preceds the version of the module. | ||
|
||
$ mobylette -group path | ||
|
||
When grouping by the modules path, only the first directory of the path is considered. | ||
|
||
$ mobylette -uniq users | ||
|
||
Other than counting the number of module across different jobs, mobylette can also count the number of modules loaded by diferent users. The report file created with this option also lists the users that loaded the module. | ||
|
||
$ mobylette -uniq users -group cat | ||
|
||
As before, modules can be grouped in categories, also when the different users paradigm is in use. | ||
|
||
$ mobylette -uniq users -group path | ||
|
||
Grouping by path is also available when counting unique users. | ||
|
||
## Filter options | ||
|
||
-start yyyymmdd | ||
|
||
When start parameter is specified, mobylette will only retain system log entries whose date is AFTER the given parameter. | ||
If desired, this parameter will also accept the time passed immediately after the date (no space between) in hhmmss format. | ||
|
||
-end yyyymmdd | ||
|
||
If specified, mobylette will only retain system log entries whose date is BEFORE the given parameter. | ||
If desired, this parameter will also accept the time passed immediately after the date (no space between) in hhmmss format. | ||
|
||
-module <module1> <module1> | ||
|
||
A list of modules can be given to mobylette. As expected mobylete will check if the modules found in the logs belong to this list before updating the counts. | ||
|
||
## Other options | ||
|
||
-cpus n | ||
|
||
By default, mobylette will take up to 75% of the existing cores. This behaviour can be changed by explicitly giving the number od cores to be used with the `-cpus` parameter. No need to say, don't run mobylette with more cores than you have. | ||
|
||
-verbose | ||
|
||
Can be usedfull, who knows? | ||
|
||
-chart-color <hex-value, no pound sign> | ||
|
||
Faishon is always changing. Try to keep up! | ||
|
||
## Authors | ||
|
||
EDF CCN-HPC | ||
|
||
## License | ||
|
||
mobylette is distributed under the terms of the GPL v3 licence. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,210 @@ | ||
#!/usr/bin/env python | ||
# -*- coding: utf-8 -*- | ||
############################################################################## | ||
# # | ||
# This file is part of the mobylette parsing tool. # | ||
# Copyright (C) 2019 EDF SA # | ||
# # | ||
# mobylette is free software: you can redistribute it and/or modify # | ||
# it under the terms of the GNU General Public License as published by # | ||
# the Free Software Foundation, either version 3 of the License, or # | ||
# (at your option) any later version. # | ||
# # | ||
# mobylette is distributed in the hope that it will be useful, # | ||
# but WITHOUT ANY WARRANTY; without even the implied warranty of # | ||
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # | ||
# GNU General Public License for more details. # | ||
# # | ||
# You should have received a copy of the GNU General Public License # | ||
# along with mobyllette. If not, see <http://www.gnu.org/licenses/>. # | ||
# # | ||
############################################################################## | ||
|
||
import re # compile | ||
import os # walk, path | ||
import sys # stdout | ||
import csv # writer | ||
import time # sleep | ||
import multiprocessing as mp # Process, Pool, cpu_count | ||
from collections import Counter | ||
|
||
from mobylette.args import ParseArgs | ||
from mobylette.config import Config | ||
from mobylette.chart import Chart | ||
import mobylette.reader as reader | ||
|
||
def create_file_list(pattern, log_path, nodes_tuple): | ||
''' Returns list of file matching pattern stored | ||
under log_path on directories whose name starts | ||
by nodes_tuple. | ||
''' | ||
files_list = [] | ||
files_count = 0 | ||
files_size = 0 | ||
file_patt = re.compile(pattern) | ||
for root, dirs, files in os.walk(log_path, topdown=False): | ||
if os.path.basename(root).startswith(nodes_tuple): | ||
for name in files: | ||
match = file_patt.search(name) | ||
if match is not None: | ||
files_count += 1 | ||
file = os.path.join(root, name) | ||
files_size += os.path.getsize(file) | ||
files_list.append(file) | ||
return files_list | ||
|
||
def backspace(n): | ||
''' Moves cursor n steps backwards ''' | ||
sys.stdout.write((b'\x08' * n).decode()) | ||
|
||
def erase_forward(n): | ||
''' Prints white space n times ''' | ||
sys.stdout.write(' ' * n) | ||
sys.stdout.flush() | ||
|
||
def point_sleep(n): | ||
''' Prints a period and sleeps | ||
for 1 second (repeat n times) ''' | ||
for i in range(n): | ||
sys.stdout.write('.') | ||
sys.stdout.flush() | ||
time.sleep(1) | ||
|
||
def erase_sleep(n): | ||
''' Prints white space and sleeps | ||
for 1 second (repeat n times) ''' | ||
for i in range(n): | ||
sys.stdout.write(' ') | ||
sys.stdout.flush() | ||
time.sleep(1) | ||
|
||
def print_usr_msg(msg): | ||
''' Prints wating message on screen ''' | ||
sys.stdout.write(msg) ; sys.stdout.flush() | ||
length = 5 | ||
while True: | ||
point_sleep(length) | ||
backspace(length) | ||
erase_forward(length) | ||
backspace(length) | ||
|
||
if __name__ == "__main__": | ||
|
||
# All arguments are stored in a dictionary which | ||
# is used whenever needed across the program. | ||
parse = ParseArgs() | ||
options = {} | ||
options['internals'] = {'uniq' : parse.args.uniq, 'group' : parse.args.group, 'verbose' : parse.args.verbose, 'cpus' : parse.args.cpus} | ||
options['filter'] = {'start_date' : parse.args.start_date, 'end_date' : parse.args.end_date, 'module' : parse.args.module} | ||
options['chart'] = {'max_charts' : parse.args.max_charts, 'max_rows' : parse.args.max_rows, 'chart_color' : parse.args.chart_color} | ||
|
||
# Reads configuration file mobylette.conf | ||
conf = Config(options['internals']['verbose']) | ||
options['config'] = {'log_path' : conf.log_path, 'cluster_name' : conf.cluster_name, 'cluster_prefix' : conf.cluster_prefix, | ||
'search_pattern' : conf.search_pattern, 'nodes_tuple' : conf.nodes_tuple, 'hostname' : conf.hostname} | ||
|
||
if options['internals']['verbose']: | ||
print(options) | ||
|
||
# Creates list of all files to be parsed, meaning all files that | ||
# match the pattern in mobylette.conf and that are located also | ||
# in the path given by `log_path` in this same file. | ||
file_list = create_file_list(options['config']['search_pattern'], | ||
options['config']['log_path'], | ||
options['config']['nodes_tuple']) | ||
|
||
print('total files found: {}'.format(len(file_list))) | ||
|
||
if options['internals']['verbose']: | ||
print(file_list) | ||
|
||
if options['internals']['cpus'] is None: | ||
from math import floor | ||
cpu = int(floor(mp.cpu_count() * 0.75)) | ||
else: | ||
cpu = int(options['internals']['cpus']) | ||
|
||
print("using {} cpus".format(cpu)) | ||
|
||
start_date_arr = [options['filter']['start_date']] * len(file_list) | ||
end_date_arr = [options['filter']['end_date']] * len(file_list) | ||
modules_arr = [options['filter']['module']] * len(file_list) | ||
|
||
# Main logic for mobylette | ||
if 'users' in options['internals']['uniq']: | ||
worker = reader.read_users | ||
chart_label = 'Nombre modules (utilisateurs uniques)' | ||
if 'cat' == options['internals']['group']: | ||
worker = reader.read_users_cat | ||
chart_label = 'Nombre de modules par categorie (utilisateurs uniques)' | ||
if 'path' == options['internals']['group']: | ||
worker = reader.read_users_path | ||
chart_label = 'Nombre de modules par chemin (utilisateurs uniques)' | ||
elif 'jobs' in options['internals']['uniq']: | ||
worker = reader.read_jobs | ||
chart_label = 'Nombre de modules (jobs uniques)' | ||
if 'cat' == options['internals']['group']: | ||
worker = reader.read_jobs_cat | ||
chart_label = 'Nombre de modules par categorie (jobs uniques)' | ||
if 'path' == options['internals']['group']: | ||
worker = reader.read_jobs_path | ||
chart_label = 'Nombre de modules par chemin (jobs uniques)' | ||
|
||
# The following two commented lines are usefull when debuging | ||
# the read functions where a serial version is needed. | ||
#params = (file_list[0], start_date_arr[0], end_date_arr[0], modules_arr[0]) | ||
#result = worker(params) | ||
|
||
# Sends message to user while parsing files | ||
usr_msg = mp.Process(target=print_usr_msg, args=('parsing files ', )) | ||
usr_msg.start() | ||
|
||
# For this technique to work, the parameter array must be passed in this order: | ||
# file_list, start_array, end_array, module_list | ||
p = mp.Pool(processes=cpu) | ||
|
||
# For performance issues we'll be calling a dedicated function | ||
# according to the parameters we have. | ||
result = p.map(worker, zip(file_list, start_date_arr, end_date_arr, modules_arr)) | ||
|
||
usr_msg.terminate() | ||
usr_msg.join() | ||
sys.stdout.write('\n') ; sys.stdout.flush() | ||
|
||
print 'checking results' | ||
|
||
# matter is the set of ALL DISTINCT tuples found. Because two files can reference the | ||
# same tuple the creation of the matter uses the set instruction to only count once each tuple. | ||
# The matter variable is a list of tuples. It also does a simplification of the `result` variable. | ||
matter = set([result[files].values()[0][item] for files in range(len(result)) | ||
for item in range(len(result[files].values()[0]))]) | ||
bodies = {} | ||
for first, second in matter: | ||
bodies.setdefault(first, []).append(second) | ||
|
||
if len(matter) == 0: | ||
print 'no modules found matching criteria' | ||
exit() | ||
|
||
sorted_keys = sorted(bodies.keys()) | ||
|
||
print 'ploting graphs' | ||
|
||
# Create chart (orderd bodies) | ||
chart = Chart(sorted_keys, [len(bodies[mod]) for mod in sorted_keys], | ||
options, x_label = chart_label, title = 'Chargement de modules', chart_color = options['chart']['chart_color']) | ||
|
||
print 'writing report' | ||
|
||
# Create CSV report | ||
with open('report.csv', 'wb') as csv_file: # Just use 'w' mode in 3.x | ||
writer = csv.writer(csv_file) | ||
for key in sorted_keys: | ||
str = [key] | ||
str.append(len(bodies[key])) | ||
if 'users' in options['internals']['uniq'] and options['internals']['group'] is None: | ||
for val in sorted(bodies[key]): | ||
str.append(val) | ||
writer.writerows([str]) | ||
|
||
print 'execution finished ok' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
[CLUSTER] | ||
# Where the logs are stored. | ||
# Mobylette will search inside this folder and subfolders. | ||
log_path=/var/log | ||
# Name of the cluster | ||
name=nestor | ||
# In which nodes should mobylette look for information. | ||
# This can save a lot of time by skiping giant log files. | ||
# Can have more than one value separated by comma. | ||
nodes=compute,graph | ||
# Mobylette will look for data in files which begin by this pattern. | ||
patterns=clusterlogs | ||
# mobylette assumes node names are obtained by | ||
# concatenation between `prefix` and `nodes`. | ||
# In this case, it will look for nodes named ntrcompute and ntrgraph. | ||
prefix=ntr |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
#!/bin/bash | ||
|
||
MOBYLETTE_HOME=`pwd` | ||
export PATH=$MOBYLETTE_HOME/bin:$PATH | ||
export PYTHONPATH=$MOBYLETTE_HOME:$PYTHONPATH | ||
|
||
# Creates mobylette.conf | ||
CONF_FILE=mobylette.conf | ||
CONF_PATH=`eval echo ~$USER`/.mobylette | ||
|
||
if [ -d "$CONF_PATH" ]; then | ||
read -p "$CONF_PATH already exists. Overwrite (y/n)? " ANSWER | ||
|
||
if [ "$ANSWER" = "n" ]; then | ||
echo "mobylette ready" | ||
return | ||
fi | ||
|
||
if [ "$ANSWER" = "y" ]; then | ||
|
||
rm -r $CONF_PATH | ||
mkdir $CONF_PATH | ||
|
||
echo "[CLUSTER]" > $CONF_PATH/$CONF_FILE | ||
echo "log_path=$MOBYLETTE_HOME/" >> $CONF_PATH/$CONF_FILE | ||
echo "name=phoenix" >> $CONF_PATH/$CONF_FILE | ||
echo "nodes=" >> $CONF_PATH/$CONF_FILE | ||
echo "patterns=dataset" >> $CONF_PATH/$CONF_FILE | ||
echo "prefix=sample" >> $CONF_PATH/$CONF_FILE | ||
|
||
echo "mobylette ready" | ||
return | ||
fi | ||
fi |
Oops, something went wrong.