Skip to content

Commit ba65670

Browse files
265 inputoutput base model graphs for pv2puml (#266)
* 265 - update events to provide functionality to load and save models * 265 - update pv to puml to provide functionality to use an input event model and save and event model * 265 - update types so that global options concerning inputting model and saving models can be used * 265 - update data ingestion to handle an input events model dict * 265 - update otel_to_puml to provide functionality to add input model and save output models * 265 - update main to handle input args for input models list and output models flag * 265 - make sure events aren't altered during loop detection process * 265 - update main args for input model and output model * 265 - documentation of newly added arguments * 265 - fix mypy, linting and flaky test issue * 265 - add link in README * 265 - add best practice section to provide user with information about considerations to make when creating PUML's * 265 - update to use conda-incubator
1 parent e74e48a commit ba65670

14 files changed

+1144
-46
lines changed

.github/workflows/test-workflow.yml

+2-3
Original file line numberDiff line numberDiff line change
@@ -26,11 +26,10 @@ jobs:
2626
uses: actions/checkout@v4
2727

2828
- name: Setup-conda
29-
uses: s-weigand/setup-conda@v1
29+
uses: conda-incubator/setup-miniconda@v3
3030
with:
31-
update-conda: true
3231
python-version: 3.11.9
33-
conda-channels: conda-forge
32+
channels: conda-forge
3433

3534
- name: Install pre-requisites
3635
run: |

README.md

+13-1
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,8 @@ This project converts [OpenTelemetry (OTel)](https://opentelemetry.io/) data int
2424
</ol>
2525
</li>
2626
<li><a href="#tel2puml-cli-documentation">TEL2PUML CLI Documentation</a></li>
27-
<li><a href="#technical-implementation">Technical Implementtation</a></li>
27+
<li><a href="#technical-implementation">Technical Implementation</a></li>
28+
<li><a href="#best-practices">Best Practices</a></li>
2829
<li><a href="#documentation">Documentation</a></li>
2930
<li><a href="#dependencies">Dependencies</a></li>
3031
<li><a href="#contributing">Contributing</a></li>
@@ -321,6 +322,8 @@ python -m tel2puml -o OUTPUT_DIR otel2puml -c CONFIG_FILE [options]
321322
- `-ni`, `--no-ingest`: Do not load data into the data holder.
322323
- `-ug`, `--unique-graphs`: Find unique graphs within the data holder.
323324
- `-d`, `--debug`: Enable debug mode to view full error stack trace.
325+
- `-im`, `--input-puml-models`: Path to an input puml model. Can be used multiple times for each model the user wishes to input. [Usage](/docs/user/PUML_models.md)
326+
- `-om`, `--output-puml-models`: Flag to indicate whether to output the puml models. If this flag is not set, the puml models will not be output. [Usage](/docs/user/PUML_models.md)
324327
325328
**Example:**
326329
@@ -371,6 +374,8 @@ python -m tel2puml pv2puml [options] [FILE_PATHS...]
371374
- `-group-by-job`: Group events by job ID. Can only be used if there are single events in each input file otherwise an error will be raised.
372375
- `-mc`, `--mapping-config`: Path to the mapping configuration file. [Usage](docs/user/mapping_config.md)
373376
- `-d`, `--debug`: Enable debug mode to view full error stack trace.
377+
- `-im`, `--input-puml-models`: Path to an input puml model. Can be used multiple times for each model the user wishes to input. [Usage](/docs/user/PUML_models.md)
378+
- `-om`, `--output-puml-models`: Flag to indicate whether to output the puml models. If this flag is not set, the puml models will not be output. [Usage](/docs/user/PUML_models.md)
374379

375380
**Notes:**
376381

@@ -417,6 +422,13 @@ python -m tel2puml [subcommand] -h
417422
python -m tel2puml pv2puml -h
418423
```
419424

425+
## Best Practices
426+
427+
- The tool was initially designed to work with OpenTelemetry data, but can in principle be used for any data that is a call tree with information of parents links and timestamps. Ensure that your data is in the correct format before running the tool.
428+
- When ingesting data using the `otel2pv` or `otel2puml` subcommands, the data is stored in a SQL database provided by the user. There are some SQL functions that are used to map data from the root node to the leaf nodes. For large sizes of data, this can take a long time and may require a large amount of memory. The best way to avoid this is to chunk the data into smaller sizes (making sure that the min and max timestamps of the chunks do not overlap) and ingest it in smaller parts either:
429+
- For `otel2puml` command by using the `-om` and `-im` args to save models and then load the models to the next chunk.
430+
- For `otel2pv` by using the `-ug` flag to find the unique event sequences and then `-se` to save the event sequences. Once this is done all of the event sequences can be loaded used with the `pv2puml` command to find the PUML's.
431+
420432
## Technical Implementation
421433
422434
To gain a better technical understanding of the project it's recommended to read the [technical implementation overview](docs/Technical_implementation_overview.md).

docs/user/PUML_models.md

+189
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,189 @@
1+
# PUML Models
2+
## Introdcution
3+
Internally `tel2puml` uses a model that it builds from event sequences that are passed into it. This model is built with the following components:
4+
5+
6+
- **EventSet** - This is a unique "set" of `eventTypes` and their counts e.g., `A: 3, B: 2, C: 1`. This is used to represent a single possibility specific events and their occurence
7+
- **Event**: Represents a single event with:
8+
- `eventType` - e.g., `A`, `B`, `C`, etc.
9+
- `outgoingEventSets` - This is a list of `EventSet` objects that represent the possible next events that can occur after this event with a forward connection.
10+
- `incomingEventSets` - This is a list of `EventSet` objects that represent the possible previous events that can occur before this event with a backwards connection.
11+
- **EventModel**: This is the main model that is built from the event sequences. It contains a list of `Event` objects that represent the events in the sequence. It also contains a list of `EventSet` objects that represent the possible event sets that can occur in the sequence.
12+
13+
Functionality is present in tel2puml to input and save these "models" as JSON files. Saved models can be loaded back into the tool to be updated with new event sequences. This can provide a way to build up a model over time as more event sequences are collected or in the circumstances that there are large volumes of data that need to be processed.
14+
15+
## Model JSON file format
16+
The model file is provided as a specific JSON file format. This file format is used to save the model that is built from the event sequences. It should be noted that there will always be a "Dummy" starting event added with `eventType` of `|||START|||` to represent the start of the sequence. This is added to ensure that all events have a starting point so that logic can be derived even for multiple starting points in the actual data.
17+
18+
The JSON file format has the following JSON schema
19+
20+
```json
21+
{
22+
"$schema": "http://json-schema.org/schema#",
23+
"type": "object",
24+
"properties": {
25+
"job_name": {
26+
"type": "string"
27+
},
28+
"events": {
29+
"type": "array",
30+
"items": {
31+
"type": "object",
32+
"properties": {
33+
"eventType": {
34+
"type": "string"
35+
},
36+
"outgoingEventSets": {
37+
"type": "array",
38+
"items": {
39+
"type": "array",
40+
"items": {
41+
"type": "object",
42+
"properties": {
43+
"eventType": {
44+
"type": "string"
45+
},
46+
"count": {
47+
"type": "integer"
48+
}
49+
},
50+
"required": [
51+
"count",
52+
"eventType"
53+
]
54+
}
55+
}
56+
},
57+
"incomingEventSets": {
58+
"type": "array",
59+
"items": {
60+
"type": "array",
61+
"items": {
62+
"type": "object",
63+
"properties": {
64+
"eventType": {
65+
"type": "string"
66+
},
67+
"count": {
68+
"type": "integer"
69+
}
70+
},
71+
"required": [
72+
"count",
73+
"eventType"
74+
]
75+
}
76+
}
77+
}
78+
},
79+
"required": [
80+
"eventType",
81+
"incomingEventSets",
82+
"outgoingEventSets"
83+
]
84+
}
85+
}
86+
},
87+
"required": [
88+
"events",
89+
"job_name"
90+
]
91+
}
92+
```
93+
94+
An example derived using the data in the [end-to-end walkthrough](/docs/e2e_walkthrough/example_pv_event_sequence_files) is shown below
95+
96+
```json
97+
{
98+
"job_name": "Users_Service",
99+
"events": [
100+
{
101+
"eventType": "return_response_200",
102+
"outgoingEventSets": [],
103+
"incomingEventSets": [
104+
[{"eventType": "fetch_user_data_200", "count": 1}],
105+
[{"eventType": "request_data_200", "count": 1}]
106+
]
107+
},
108+
{
109+
"eventType": "fetch_user_data_200",
110+
"outgoingEventSets": [
111+
[{"eventType": "return_response_200", "count": 1}]
112+
],
113+
"incomingEventSets": [
114+
[{"eventType": "authorization_check_200", "count": 1}]
115+
]
116+
},
117+
{
118+
"eventType": "authorization_check_200",
119+
"outgoingEventSets": [
120+
[{"eventType": "fetch_user_data_200", "count": 1}],
121+
[{"eventType": "authorization_check_200", "count": 1}]
122+
],
123+
"incomingEventSets": [
124+
[{"eventType": "user_login_200", "count": 1}],
125+
[{"eventType": "authorization_check_200", "count": 1}]
126+
]
127+
},
128+
{
129+
"eventType": "user_login_200",
130+
"outgoingEventSets": [
131+
[{"eventType": "authenticate_credentials_200", "count": 1}],
132+
[{"eventType": "authorization_check_200", "count": 1}]
133+
],
134+
"incomingEventSets": [
135+
[{"eventType": "|||START|||", "count": 1}]
136+
]
137+
},
138+
{
139+
"eventType": "|||START|||",
140+
"outgoingEventSets": [
141+
[{"eventType": "user_login_200", "count": 1}]
142+
],
143+
"incomingEventSets": []
144+
},
145+
{
146+
"eventType": "request_data_200",
147+
"outgoingEventSets": [
148+
[{"eventType": "return_response_200", "count": 1}]
149+
],
150+
"incomingEventSets": [
151+
[{"eventType": "session_expired_200", "count": 1}],
152+
[{"eventType": "validate_session_200", "count": 1}]
153+
]
154+
},
155+
{
156+
"eventType": "validate_session_200",
157+
"outgoingEventSets": [
158+
[{"eventType": "authenticate_credentials_200","count": 1}],
159+
[{"eventType": "request_data_200", "count": 1}]
160+
],
161+
"incomingEventSets": [
162+
[{"eventType": "authenticate_credentials_200", "count": 1}]
163+
]
164+
},
165+
{
166+
"eventType": "authenticate_credentials_200",
167+
"outgoingEventSets": [
168+
[{"eventType": "session_expired_200", "count": 1}],
169+
[{"eventType": "validate_session_200", "count": 1}]
170+
],
171+
"incomingEventSets": [
172+
[{"eventType": "user_login_200", "count": 1}],
173+
[{"eventType": "session_expired_200", "count": 1}],
174+
[{"eventType": "validate_session_200", "count": 1}]
175+
]
176+
},
177+
{
178+
"eventType": "session_expired_200",
179+
"outgoingEventSets": [
180+
[{"eventType": "authenticate_credentials_200", "count": 1}],
181+
[{"eventType": "request_data_200", "count": 1}]
182+
],
183+
"incomingEventSets": [
184+
[{"eventType": "authenticate_credentials_200", "count": 1}]
185+
]
186+
}
187+
]
188+
}
189+
```

tel2puml/__main__.py

+50-10
Original file line numberDiff line numberDiff line change
@@ -32,8 +32,11 @@
3232
from tel2puml.tel2puml_types import (
3333
OtelToPVArgs,
3434
PvToPumlArgs,
35+
GlobalArgs,
3536
OtelPVOptions,
3637
PVPumlOptions,
38+
GlobalOptions,
39+
Options,
3740
PVEventMappingConfig,
3841
)
3942
from tel2puml.otel_to_pv.config import IngestDataConfig
@@ -63,6 +66,29 @@
6366
dest="output_file_directory",
6467
)
6568

69+
# load and save model options, shared between otel2puml and pv2puml
70+
input_output_parent_parser = argparse.ArgumentParser(add_help=False)
71+
72+
input_output_parent_parser.add_argument(
73+
"-im",
74+
"--input-puml-models",
75+
metavar="input_puml_models",
76+
help="Input puml models file paths. Can be used multiple times",
77+
action="append",
78+
dest="input_puml_models",
79+
required=False,
80+
default=[],
81+
)
82+
83+
input_output_parent_parser.add_argument(
84+
"-om",
85+
"--output-puml-models",
86+
help="Flag to indicate whether to save puml models",
87+
action="store_true",
88+
dest="output_puml_models",
89+
)
90+
91+
6692
# mapping config, shared between otel2pv and pv2puml
6793
mapping_config_parent_parser = argparse.ArgumentParser(add_help=False)
6894

@@ -120,7 +146,9 @@
120146
otel_to_puml_parser = subparsers.add_parser(
121147
"otel2puml",
122148
help="otel to puml help",
123-
parents=[otel_parent_parser, debug_parent_parser],
149+
parents=[
150+
otel_parent_parser, debug_parent_parser, input_output_parent_parser
151+
],
124152
)
125153

126154
otel_to_pv_parser = subparsers.add_parser(
@@ -146,7 +174,10 @@
146174
pv_to_puml_parser = subparsers.add_parser(
147175
"pv2puml",
148176
help="pv to puml help",
149-
parents=[mapping_config_parent_parser, debug_parent_parser],
177+
parents=[
178+
mapping_config_parent_parser, debug_parent_parser,
179+
input_output_parent_parser
180+
],
150181
)
151182
pv_input_paths = pv_to_puml_parser.add_argument_group(
152183
"Input paths",
@@ -236,14 +267,13 @@ def find_files(directory: str) -> list[str]:
236267
def generate_component_options(
237268
command: Literal["otel2puml", "otel2pv", "pv2puml"],
238269
args_dict: dict[str, Any],
239-
) -> tuple[OtelPVOptions | None, PVPumlOptions | None]:
270+
) -> Options:
240271
"""Generate puml options objects based on CLI arguments.
241272
242273
:param command: The CLI command
243274
:type command: `Literal`["otel2puml", "otel2pv", "pv2puml"]
244-
:return: A tuple containing component options
245-
:rtype: `tuple`[:class:`OtelPVOptions` | `None`, :class:`PVPumlOptions`
246-
| `None`]
275+
:return: A tuple containing options
276+
:rtype: :class:`Options`
247277
"""
248278

249279
otel_pv_options, pv_puml_options = None, None
@@ -281,8 +311,17 @@ def generate_component_options(
281311
**generate_config(str(pv_to_puml_obj.mapping_config_file))
282312
)
283313
pv_puml_options["mapping_config"] = mapping_config
314+
global_obj = GlobalArgs(**args_dict)
315+
global_options = GlobalOptions(
316+
input_puml_models=global_obj.input_puml_models,
317+
output_puml_models=global_obj.output_puml_models,
318+
)
284319

285-
return otel_pv_options, pv_puml_options
320+
return Options(
321+
otel_pv_options=otel_pv_options,
322+
pv_puml_options=pv_puml_options,
323+
global_options=global_options,
324+
)
286325

287326

288327
def handle_exception(
@@ -338,12 +377,13 @@ def main_handler(
338377
command: Literal["otel2puml", "otel2pv", "pv2puml"] = args_dict[
339378
"command"
340379
]
341-
otel_pv_options, pv_puml_options = generate_component_options(
380+
options = generate_component_options(
342381
command, args_dict
343382
)
344383
otel_to_puml(
345-
otel_pv_options,
346-
pv_puml_options,
384+
options.otel_pv_options,
385+
options.pv_puml_options,
386+
options.global_options,
347387
args_dict["output_file_directory"],
348388
args_dict["command"],
349389
)

0 commit comments

Comments
 (0)