Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ready for review #1

Closed
wants to merge 22 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
151 changes: 151 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
# DeltaKernel Change Impact Analysis Tool

## Table of Content

- [Introduction](#introduction)
- [Innovation](#innovation)
- [How to Use](#how-to-use)
matthew-l-weber marked this conversation as resolved.
Show resolved Hide resolved
- [Intermediate Files Generated](#intermediate-files-generated)
- [Operation Stages of the Tool](#operation-stages-of-the-tool)
- [I. Compilation File List Generation](#i-compilation-file-list-generation)
- [II. Git Diff Report Generation](#ii-git-diff-report-generation)
- [III. Commit Metadata Retrieval](#iii-commit-metadata-retrieval)
- [IV. Web Script Generation](#iv-web-script-generation)

## Introduction

DeltaKernel Change Impact Analysis Tool generates a visual report detailing changes in both header files and source code between two Linux kernel versions (tags in the Linux kernel repository: old_tag and new_tag). This tool helps developers compare updates between versions.

The diff report includes a subset of files from the Linux kernel repository that are included in building the kernel, contributing to a focused and detailed report on the compile-time source code in Linux.

## Innovation

The idea of generating a web display for Linux kernel version change impact analysis is inspired by [Cregit](https://github.com/cregit/cregit). This tool improves on Cregit by:

- The tool performs a version analysis between two tags to identify updated files, similar to git diff, but focuses only on files used to compile the kernel rather than the entire Linux source code.

- Generating not only web source files but also lists of all source files and dependencies/header files used in kernel compilation, facilitating additional analysis purposes. (More details in [Intermediate Files Generated](#intermediate-files-generated))
- Enabling comparison between two specific tags/releases in the Linux kernel, highlighting all newly added and deleted lines. This provides a clear layout of differences between the tags. While Cregit organizes information by files and embeds the latest commit details in each line/token, it does not support direct comparison between two tags.
- User customization: allows users to define the URL of the Linux kernel repository and specify the specific subsystem for analysis. (More details in [How to Use](#how-to-use))
- Kernel Configuration: The linux kernel is configured with `make olddefconfig` in `build_scripts/build_collect_diff`: Updates the configuration using the existing `.config` file and applies default values for new options.

## How to use
matthew-l-weber marked this conversation as resolved.
Show resolved Hide resolved

matthew-l-weber marked this conversation as resolved.
Show resolved Hide resolved
To utilize this tool in your Linux environment (tested successfully on `Ubuntu 22.04`), follow these steps:

**Clone the repository**:

```bash
git clone <repository_url>
```

**Navigate to the cloned repository**:

```bash
cd <repository_directory>
```

**Execute the tool by specifying the old and new tags**:

```bash
./run_tool <tag1> <tag2> [-c clone_path] [-u repo_link] [-s subsystem]
```

**Example Usage**:

```bash
./run_tool "v5.15" "v5.15.100" -c "linux-clone" -u "https://github.com/torvalds/linux" -s "security"
matthew-l-weber marked this conversation as resolved.
Show resolved Hide resolved
# the tool will generate web update report on linux kernel v5.15.100 from v5.15 for security subsystem.
cd web_source_code # click on index.html to view the result
```

**Command Line Arguments**:

- `<tag1>`: Specifies the old version tag.
- `<tag2>`: Specifies the new version tag.
- `c <clone_name>`: Optional. Defines the user-specified repo name to clone the Linux source code repository. Default is linux-clone. `delta-kernel` should be located in `$K/scripts/change-impact-tools`. To preserve `change-impact-tools/` while checking out different tags, `linux-clone` simulates changes without affecting `change-impact-tools/`.
- `u <repo_link>`: Optional. Provides the URL for the Linux source code repository.
- `s <subsystem>`: Optional. Specifies the subsystem for analysis (e.g., -s security).

**What the Tool Does**:

- Clones the Linux repository from `repo_link`.
- Copies `/delta-kernel/*` into `linux/scripts/change-impact-tools/`.
- Clones the Linux repository into another repository named `clone_name`, defaulting to `linux-clone`.
- Navigates to `linux-clone`:
- Checks out `tag1` and applies `fixdep-patch.file`.
- Configures and compiles the kernel.
- Collects compile-time source files list and dependency files list.
- Generates diff reports based on the file lists.
- Cleans up the working directory in `linux-clone`.
- Retrieves git metadata for each file inside the lists.
- Copies file lists, git diff reports, and git metadata (`build_data/`) to `/delta-kernel`.
**linux-clone**
After execution, `linux-clone` will be in the branch of `tag2`.

If a runtime git conflict is encountered, resolve it with the following steps:

```bash
cd linux-clone # or user-defined clone name
git reset --hard
git checkout master
cd .. # return to delta-kernel
./run_tool # the cloned repository will not be re-cloned
```

**Clean Up (Optional)**:

```bash
rm -r linux
rm -r linux-clone # or how you name the cloned dir
rm -r build_data
```

## Intermediate Files Generated

**/build_data:**

- `sourcefile.txt` - List of all built source code files
- `headerfile.txt` - List of all built dependency files
- `git_diff_sourcefile.txt` - Git diff report for source code files
- `git_diff_headerfile.txt` - Git diff report for dependency files
- `tokenize_header.json` - Metadata for commit git diff for dependency files
- `tokenize_source.json` - Metadata for commit git diff for source files

## Operation Stages of the Tool

The tool operates through a structured process to generate a comprehensive change impact analysis report. Here's a detailed breakdown of its operation:
matthew-l-weber marked this conversation as resolved.
Show resolved Hide resolved

### I. Compilation File List Generation

#### Header File

During linux kernel compilation, `Makefile.build` calls `$K/scripts/basic/fixdep.c` to generate a .cmd file for each source that collects dependency information during compilation.

The `scripts/basic/fixdep.c` file generates a `.cmd` file containing dependency information for each source file that the kernel compiles. This tool includes a modification that applies a patch (fixdep-patch.file) to `fixdep.c`, enabling it to collect dependency files for each source file and output a comprehensive list of all source files and their dependencies for the entire kernel compilation. The resulting `dependency_list.txt`` is generated after kernel compilation.

#### Source code

This tool leverages the `$K/scripts/clang-tools/gen_compile_commands` script to generate a `compile_commands.json` file. This file documents all source files involved in the compilation process. The `gen_compile_commands` script traverses each `.cmd` file to aggregate the list of source files.

Then, the tool invokes `parse_json` to parse `compile_commands.json`, generating **a list of source files**.

### II. Git Diff Report Generation

Using the file lists, the tool generates 2 separate git diff reports (dependency diff report & source diff report) for updates from **old_tag** to **new_tag**.

### III. Commit Metadata Retrieval

Based on the git diff reports, the tool retrieves commit metadata for each newly added line in the reports.

- **Tokenization**: If multiple commits modify a single line between two tags, the tool breaks down each commit line into smaller parts and associates commit information with relevant tokens. The results after tokenization are stored in JSON files.

### IV. Web Script Generation

Using the git diff reports and metadata stored in JSON files, the tool generates a web report displaying the changes.

The web report contains three html files:

- `index.html`: with on-click directions to:
- `sourcecode.html`: renders the content in source diff report, with embedded url and on-hover metadata box for each newly added lines/tokens in new_tag.
- `header.html`: renders teh content in dependency diff report, with embedded url and on-hover metadata box for each newly added lines/tokens in new_tag.
147 changes: 147 additions & 0 deletions build_scripts/build_collect_diff
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
#!/bin/bash
#
# Script to build the kernel, collect compiled file lists using modified kernel scripts,
# and generate a git diff report based on the collected lists.

set -e

# safely apply a patch to linux kernel
apply_patch() {
# shellcheck disable=SC2154
local patch_path="$kernel_repo_path/scripts/change-impact-tools/fixdep-patch.file"

# Stash any changes if there is any
if ! git diff --quiet; then
echo "linux-clone has unstashed change. Please stash them and run the script again."
exit 1
fi

# Abort `git am` only if there is a patch being applied
if git am --show-current-patch &> /dev/null; then
echo "linux-clone has a patch being applied already. Should run git am --abort and try run the tool again."
exit 1
fi
echo "path check: $(pwd)"
git apply < "$patch_path"
echo "applied the git patch"
echo "path check: $(pwd)"
}

parse_source_json_file() {
local python_path="$kernel_repo_path/scripts/change-impact-tools/build_scripts/parse_json"
# shellcheck disable=SC2154
local cloned_repo_name="/$clone_dir/"
local input_path="$kernel_repo_path/scripts/change-impact-tools/build_data/compile_commands.json"
local output_path="$kernel_repo_path/scripts/change-impact-tools/build_data/sourcefile.txt"

"$python_path" "$cloned_repo_name" "$input_path" "$output_path"
display_file_head "$kernel_repo_path/scripts/change-impact-tools/build_data" "sourcefile.txt" 3
}

# generate the build file list after building the kernel
generate_compiled_file_lists() {
# Generate compiled source files list
local json_output_path="$kernel_repo_path/scripts/change-impact-tools/build_data/compile_commands.json"
echo "path check: $(pwd)"
./scripts/clang-tools/gen_compile_commands.py -o "$json_output_path"

parse_source_json_file
echo "source compiled filelist generated to sourcefile.txt"

# Generate compiled header files list

local output_list="$kernel_repo_path/scripts/change-impact-tools/build_data/headerfile.txt"
local output_json="$kernel_repo_path/scripts/change-impact-tools/build_data/source_dep.json"
local dep_path="dependency_file.txt"
local python_tool_path="$kernel_repo_path/scripts/change-impact-tools/build_scripts/parse_dep_list"

"$python_tool_path" "$dep_path" "$output_json" "$output_list"
echo "dependency compiled filelist generated to headerfile.txt$"

}

# clean up the working directory in linux-clone
cleanup_working_directory() {
git reset --hard
git clean -fdx
}

# generate diff for build between TAG1 and TAG2
generate_git_diff() {

# collect and setup input & output file
file_type=${1:-source}
local root_build_data_path="$kernel_repo_path/scripts/change-impact-tools/build_data"
local diff_input="$root_build_data_path/sourcefile.txt"
local diff_output="$root_build_data_path/filtered_diff_source.txt"

if [ "$file_type" = "header" ]; then
echo "[generate_git_diff] Generating dependency git diff report ..."
diff_input="$root_build_data_path/headerfile.txt"
diff_output="$root_build_data_path/filtered_diff_header.txt"
else
echo "[generate_git_diff] Generating source git diff report ..."
fi
echo "parsing for subsys: $SUBSYS"
while IFS= read -r file
do
if [[ "$file" == $SUBSYS/* ]]; then
echo "now generating git diff for $file"
if git show "$TAG2:$file" &> /dev/null; then
local diff_result
echo "$file suitable for parse"
diff_result=$(git diff "$TAG1" "$TAG2" -- "$file")
if [[ -n "$diff_result" ]]; then
{
echo "Diff for $file"
echo "$diff_result"
echo ""
} >> "$diff_output"

fi
fi
fi
done < "$diff_input"
echo "[generate_git_diff] Git diff report for $file_type files save to compiled_data"

}


TAG1="$1"
TAG2="$2"
SUBSYS="$3"
echo "build and collect kernel for subsystem: $SUBSYS"

# Fetch tags from the repository
git fetch --tags
echo "Generating source file list for $TAG1"
git checkout "$TAG1"
# Preparation before running make
apply_patch
echo "starting to run make olddefconfig"
make defconfig
echo "finished make olddefconfig"



# Build linux kernel
echo "the current os-linux version: "
cat /etc/os-release

echo "start running make"
make -j $(nproc)
echo "finished compile kernel"


# Collect build metadata
echo "starting on preparing compiled file list"
generate_compiled_file_lists

# Generate git diff report
echo "starting on generating git diff report on source"
generate_git_diff source
echo "starting on generating git diff report on header"
generate_git_diff header

# Clean up the working directory
cleanup_working_directory
25 changes: 25 additions & 0 deletions build_scripts/git_shortlog
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
#!/bin/bash
#
# Fetch name email information for linux kernel contributors
set -e

usage() {
echo "Usage: $0 tag"
exit 1
}
TAG="$1"
[[ -z ${TAG} ]] && usage
git checkout "$TAG"

echo "Starting to generate the email name list ..."
# shellcheck disable=SC2154
git shortlog -e -s -n HEAD > "$curr_dir"/build_data/name_list.txt

# shellcheck disable=SC2154
if [ -s "$curr_dir"/build_data/name_list.txt ]; then
echo "build_data/name_list.txt created successfully"
else
echo "build_data/name_list.txt is empty or not created"
fi

echo "Finished generating name list"
72 changes: 72 additions & 0 deletions build_scripts/parse_dep_list
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
#!/usr/bin/python3
"""
The script parses the dependency list generated by patching `fixdep.c`.

This script takes three arguments:
1. The path of dependency list
2. The output path for a json file
3. The output path for the list of header files.

Usage:
parse_json <dep_list_path> <output_json_path>
<output_header_file_list_path>
"""
import re
import argparse
import json

# Regular expression patterns
source_file_pattern = re.compile(r'^source file := (.+)$')

# Function to parse the input data


def parse_dependencies(dep_list_file, output_json, output_dep_list):
"""Parse dependency file generated by 'fixdep.c'."""
dependencies = []
dep_set = set()
current_source_file = None

for line in dep_list_file:
line = line.strip()
if not line:
continue

source_match = source_file_pattern.match(line)
if source_match:
current_source_file = source_match.group(1)
dependencies.append({
'source_file': current_source_file,
'dependency_files': []
})
else:
dependencies[-1]['dependency_files'].append(line)
dep_set.add(line)

# Write dependency list to output file
with open(output_dep_list, 'w', encoding='utf-8') as output_list_file:
for header_file in dep_set:
output_list_file.write(header_file + '\n')

# Dump dependencies into JSON file
with open(output_json, 'w', encoding='utf-8') as json_file:
json.dump(dependencies, json_file, indent=4)


if __name__ == "__main__":
parser = argparse.ArgumentParser(
description="Process dependency list generated while compiling kernel.")
parser.add_argument('input_file', type=str,
help="Path to input dependency file")
parser.add_argument('output_json', type=str,
help="Path to output JSON file")
parser.add_argument('output_header_list', type=str,
help="Path to output dependency list file")

args = parser.parse_args()

with open(args.input_file, 'r', encoding='utf-8') as input_file:
parse_dependencies(input_file, args.output_json,
args.output_header_list)

print("Dependency parsing complete.")
Loading