-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
refactor: streamline cluster analysis process (#15)
### Overview: Consolidate python methods and files to form a CLI to help offer easier discoverability of how to analyze either local logs or remote logs from Artifactory. Also switches to a smaller model for smaller, less memory/disk intensive, and faster results; results between two felt comparable. Rewrote direction for installation and running to support one of four different ways to configure: DevContainer, plain Docker, system Python, or with `uv`. Optimized various parts in part through the intermediate pickle files that were being written and read between steps.
- Loading branch information
Showing
21 changed files
with
1,738 additions
and
850 deletions.
There are no files selected for viewing
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
{ | ||
"build": { | ||
"dockerfile": "Dockerfile", | ||
"context": ".." | ||
} | ||
"build": { | ||
"dockerfile": "../Dockerfile", | ||
"context": ".." | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -173,3 +173,4 @@ cython_debug/ | |
**/.idea/ | ||
|
||
.python-version | ||
output |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
FROM mcr.microsoft.com/devcontainers/python:3.12 AS base | ||
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/ | ||
|
||
WORKDIR /app | ||
|
||
COPY scripts/download_model.py . | ||
RUN uv run download_model.py | ||
|
||
COPY pyproject.toml . | ||
COPY uv.lock . | ||
RUN uv pip install --system -r pyproject.toml | ||
|
||
COPY scripts/* . | ||
COPY templates templates |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,152 +1,60 @@ | ||
## Prerequisites | ||
# Local Installation Guide | ||
|
||
Please ensure you have the following tools installed on your system: | ||
This project requires Python 3.12.x and manages dependencies using [`pyproject.toml`](pyproject.toml). Below are several methods to set up your environment. | ||
## Using System Python with `venv` | ||
|
||
* Python 3.12 (newer versions will not work) | ||
* [pyenv](https://github.com/pyenv/pyenv) is recommended | ||
* [Homebrew installation](https://formulae.brew.sh/formula/[email protected]) | ||
* [Official Python installer](https://www.python.org/downloads/release/python-31014/) | ||
* [Git](https://git-scm.com/downloads) | ||
1. Ensure Python 3.12.x is installed: | ||
```bash | ||
python --version | ||
``` | ||
|
||
## Instructions | ||
2. Create a virtual environment: | ||
```bash | ||
python -m venv .venv | ||
``` | ||
|
||
### Step 1: Clone this project | ||
3. Activate the virtual environment: | ||
```bash | ||
# On Unix/macOS | ||
source .venv/bin/activate | ||
# On Windows | ||
.venv\Scripts\activate | ||
``` | ||
|
||
```bash | ||
git clone [email protected]:moderneinc/moderne-cluster-build-logs.git | ||
cd moderne-cluster-build-logs | ||
``` | ||
|
||
### Step 2: Set up the Python virtual environment | ||
4. Install dependencies: | ||
```bash | ||
pip install -r pyproject.toml | ||
``` | ||
|
||
You will be creating a server and running clustering inside of a Python virtual environment. To create said environment, please run: | ||
## Using [uv](https://docs.astral.sh/uv/) (Fast Python Package Installer) | ||
|
||
```bash | ||
## Pick the one that applies to your system | ||
python -m venv venv | ||
1. [Install uv](https://docs.astral.sh/uv/getting-started/installation/#installation-methods): | ||
```bash | ||
pip install uv | ||
``` | ||
|
||
## For Mac or Linux users | ||
source venv/bin/activate | ||
2. Create a virtual environment and install dependencies: | ||
```bash | ||
uv venv | ||
source .venv/bin/activate # or .venv\Scripts\activate on Windows | ||
uv pip install -r pyproject.toml | ||
``` | ||
|
||
## For Windows users | ||
source venv\Scripts\activate | ||
``` | ||
## Using DevContainer | ||
|
||
After running the `source` command, you should see that you're in a Python virtual environment. | ||
This project includes DevContainer configuration for VS Code: | ||
|
||
### Step 3: Install dependencies | ||
1. Install the [Dev Containers extension](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers) in VS Code | ||
2. Open the project in VS Code | ||
3. Click "Reopen in Container" when prompted or use the command palette (F1) and select "Dev Containers: Reopen in Container" | ||
|
||
Double-check that `pip` is pointing to the correct Python version by running the following command. The output should include `python 3.12.X`. If it doesn't, try using `pip3` instead. | ||
The container will automatically set up Python 3.12 and install all dependencies. | ||
|
||
```bash | ||
pip --version | ||
``` | ||
## Using Docker | ||
|
||
Once you've confirmed which `pip` works for you, install dependencies by running the following command: | ||
Build and run the project using Docker: | ||
|
||
```bash | ||
pip install -r requirements.txt | ||
``` | ||
|
||
### Step 4: Download the model | ||
|
||
Download the model which will assist with tokenizing and clustering of the build log data. | ||
|
||
```bash | ||
python scripts/download_model.py | ||
``` | ||
|
||
### Step 5: Gather build logs | ||
|
||
In order to perform an analysis on your build logs, all of them need to be copied over to this directory. Please ensure that they are copied over inside a folder named `repos`. | ||
|
||
You will also need a `build.xlsx` file that provides details about the builds such as where the build logs are located, what the outcome was, and what the path to the project is. This file should exist inside of `repos` directory. | ||
|
||
Here is an example of what your directory should be look like if everything was set up correctly: | ||
|
||
``` | ||
moderne-cluster-build-logs | ||
│ | ||
├───scripts | ||
│ (4 files) | ||
│ | ||
└───repos | ||
│ builds.xlsx | ||
│ | ||
├───Org1 | ||
│ ├───Repo1 | ||
│ │ └───main | ||
│ │ build.log | ||
│ │ | ||
│ └───Repo2 | ||
│ └───master | ||
│ build.log | ||
│ | ||
├───Org2 | ||
│ ├───Repo1 | ||
│ │ └───main | ||
│ │ build.log | ||
│ │ | ||
│ └───Repo2 | ||
│ └───master | ||
│ build.log | ||
│ | ||
└───Org3 | ||
└───Repo1 | ||
└───main | ||
build.log | ||
``` | ||
|
||
|
||
#### Using Moderne mass ingest logs | ||
|
||
If you want to use Moderne's mass ingest logs to run this scripts, you may use the following script to download a sample. | ||
|
||
```bash | ||
python scripts/00.download_ingest_samples.py | ||
``` | ||
|
||
You will be prompted which of the slices you want to download. Enter the corresponding number and press `Enter`. | ||
|
||
|
||
### Step 6: Run the scripts | ||
|
||
_Please note these scripts won't function correctly if you haven't copied over the logs and `build.xlsx` file into the `repos` directory and put that inside of the `Clustering` directory you're working out of._ | ||
|
||
**Run the following scripts in order**: | ||
|
||
1. Load the logs and extract relevant error messages and stacktraces from the logs: | ||
|
||
```bash | ||
python scripts/01.load_logs_and_extract.py | ||
``` | ||
|
||
_Please note that the loaded logs only include those generated from failures to build Maven or Gradle projects. You can open `build.xlsx` if there are less logs loaded than expected_ | ||
|
||
2. Embed logs and cluster: | ||
|
||
```bash | ||
python scripts/02.embed_summaries_and_cluster.py | ||
``` | ||
|
||
### Step 7: Analyze the results | ||
|
||
Once you've run the two scripts, you should find that a `clusters_scatter.html` and `clusters_logs.html` file was produced. Open those in the browser of your choice to get detailed information about your build failures. | ||
|
||
Success! You can now freely exit out of the Python virtual environment by typing `exit` into the command line. | ||
|
||
## Example results | ||
|
||
Below you can see some examples of the HTML files produced by following the above steps. | ||
|
||
### clusters_scatter.html | ||
|
||
This file is a visual representation of the build failure clusters. Clusters that contain the most number of dots should generally be prioritized over ones that contain fewer dots. You can hover over the dots to see part of the build logs. | ||
|
||
data:image/s3,"s3://crabby-images/1d5a1/1d5a191916df73406e062468672377962e225f74" alt="expected_clusters" | ||
|
||
#### cluster_logs.html | ||
|
||
To see the full extracted logs, you may use this file. This file shows all the logs that belong to a cluster. | ||
|
||
data:image/s3,"s3://crabby-images/42cf8/42cf8dadf57eaec2cb4d780b48f2dce7627df4f3" alt="logs" | ||
# Build the image | ||
docker build -t moderne-cluster-build-logs:latest . | ||
``` |
Oops, something went wrong.