Skip to content

Commit

Permalink
refactor: streamline cluster analysis process (#15)
Browse files Browse the repository at this point in the history
### Overview:
Consolidate python methods and files to form a CLI to help offer easier
discoverability of how to analyze either local logs or remote logs from
Artifactory. Also switches to a smaller model for smaller, less
memory/disk intensive, and faster results; results between two felt
comparable. Rewrote direction for installation and running to support
one of four different ways to configure: DevContainer, plain Docker,
system Python, or with `uv`.

Optimized various parts in part through the intermediate pickle files
that were being written and read between steps.
  • Loading branch information
sjungling authored Jan 6, 2025
1 parent b1c7cb1 commit a95cbee
Show file tree
Hide file tree
Showing 21 changed files with 1,738 additions and 850 deletions.
8 changes: 0 additions & 8 deletions .devcontainer/Dockerfile

This file was deleted.

8 changes: 4 additions & 4 deletions .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"build": {
"dockerfile": "Dockerfile",
"context": ".."
}
"build": {
"dockerfile": "../Dockerfile",
"context": ".."
}
}
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -173,3 +173,4 @@ cython_debug/
**/.idea/

.python-version
output
14 changes: 14 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
FROM mcr.microsoft.com/devcontainers/python:3.12 AS base
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/

WORKDIR /app

COPY scripts/download_model.py .
RUN uv run download_model.py

COPY pyproject.toml .
COPY uv.lock .
RUN uv pip install --system -r pyproject.toml

COPY scripts/* .
COPY templates templates
180 changes: 44 additions & 136 deletions LOCAL_INSTALL.md
Original file line number Diff line number Diff line change
@@ -1,152 +1,60 @@
## Prerequisites
# Local Installation Guide

Please ensure you have the following tools installed on your system:
This project requires Python 3.12.x and manages dependencies using [`pyproject.toml`](pyproject.toml). Below are several methods to set up your environment.
## Using System Python with `venv`

* Python 3.12 (newer versions will not work)
* [pyenv](https://github.com/pyenv/pyenv) is recommended
* [Homebrew installation](https://formulae.brew.sh/formula/[email protected])
* [Official Python installer](https://www.python.org/downloads/release/python-31014/)
* [Git](https://git-scm.com/downloads)
1. Ensure Python 3.12.x is installed:
```bash
python --version
```

## Instructions
2. Create a virtual environment:
```bash
python -m venv .venv
```

### Step 1: Clone this project
3. Activate the virtual environment:
```bash
# On Unix/macOS
source .venv/bin/activate
# On Windows
.venv\Scripts\activate
```

```bash
git clone [email protected]:moderneinc/moderne-cluster-build-logs.git
cd moderne-cluster-build-logs
```

### Step 2: Set up the Python virtual environment
4. Install dependencies:
```bash
pip install -r pyproject.toml
```

You will be creating a server and running clustering inside of a Python virtual environment. To create said environment, please run:
## Using [uv](https://docs.astral.sh/uv/) (Fast Python Package Installer)

```bash
## Pick the one that applies to your system
python -m venv venv
1. [Install uv](https://docs.astral.sh/uv/getting-started/installation/#installation-methods):
```bash
pip install uv
```

## For Mac or Linux users
source venv/bin/activate
2. Create a virtual environment and install dependencies:
```bash
uv venv
source .venv/bin/activate # or .venv\Scripts\activate on Windows
uv pip install -r pyproject.toml
```

## For Windows users
source venv\Scripts\activate
```
## Using DevContainer

After running the `source` command, you should see that you're in a Python virtual environment.
This project includes DevContainer configuration for VS Code:

### Step 3: Install dependencies
1. Install the [Dev Containers extension](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers) in VS Code
2. Open the project in VS Code
3. Click "Reopen in Container" when prompted or use the command palette (F1) and select "Dev Containers: Reopen in Container"

Double-check that `pip` is pointing to the correct Python version by running the following command. The output should include `python 3.12.X`. If it doesn't, try using `pip3` instead.
The container will automatically set up Python 3.12 and install all dependencies.

```bash
pip --version
```
## Using Docker

Once you've confirmed which `pip` works for you, install dependencies by running the following command:
Build and run the project using Docker:

```bash
pip install -r requirements.txt
```

### Step 4: Download the model

Download the model which will assist with tokenizing and clustering of the build log data.

```bash
python scripts/download_model.py
```

### Step 5: Gather build logs

In order to perform an analysis on your build logs, all of them need to be copied over to this directory. Please ensure that they are copied over inside a folder named `repos`.

You will also need a `build.xlsx` file that provides details about the builds such as where the build logs are located, what the outcome was, and what the path to the project is. This file should exist inside of `repos` directory.

Here is an example of what your directory should be look like if everything was set up correctly:

```
moderne-cluster-build-logs
├───scripts
│ (4 files)
└───repos
│ builds.xlsx
├───Org1
│ ├───Repo1
│ │ └───main
│ │ build.log
│ │
│ └───Repo2
│ └───master
│ build.log
├───Org2
│ ├───Repo1
│ │ └───main
│ │ build.log
│ │
│ └───Repo2
│ └───master
│ build.log
└───Org3
└───Repo1
└───main
build.log
```


#### Using Moderne mass ingest logs

If you want to use Moderne's mass ingest logs to run this scripts, you may use the following script to download a sample.

```bash
python scripts/00.download_ingest_samples.py
```

You will be prompted which of the slices you want to download. Enter the corresponding number and press `Enter`.


### Step 6: Run the scripts

_Please note these scripts won't function correctly if you haven't copied over the logs and `build.xlsx` file into the `repos` directory and put that inside of the `Clustering` directory you're working out of._

**Run the following scripts in order**:

1. Load the logs and extract relevant error messages and stacktraces from the logs:

```bash
python scripts/01.load_logs_and_extract.py
```

_Please note that the loaded logs only include those generated from failures to build Maven or Gradle projects. You can open `build.xlsx` if there are less logs loaded than expected_

2. Embed logs and cluster:

```bash
python scripts/02.embed_summaries_and_cluster.py
```

### Step 7: Analyze the results

Once you've run the two scripts, you should find that a `clusters_scatter.html` and `clusters_logs.html` file was produced. Open those in the browser of your choice to get detailed information about your build failures.

Success! You can now freely exit out of the Python virtual environment by typing `exit` into the command line.

## Example results

Below you can see some examples of the HTML files produced by following the above steps.

### clusters_scatter.html

This file is a visual representation of the build failure clusters. Clusters that contain the most number of dots should generally be prioritized over ones that contain fewer dots. You can hover over the dots to see part of the build logs.

![expected_clusters](images/expected_clusters.gif)

#### cluster_logs.html

To see the full extracted logs, you may use this file. This file shows all the logs that belong to a cluster.

![logs](images/expected_logs.png)
# Build the image
docker build -t moderne-cluster-build-logs:latest .
```
Loading

0 comments on commit a95cbee

Please sign in to comment.