Skip to content

Commit

Permalink
feat: Generate SBOM for a single file (#330)
Browse files Browse the repository at this point in the history
  • Loading branch information
nightlark committed Jan 28, 2025
1 parent 28ca5c0 commit 99b225e
Show file tree
Hide file tree
Showing 5 changed files with 174 additions and 41 deletions.
24 changes: 22 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,26 @@ pip install -e ".[test,dev]"
pip install -e plugins/fuzzyhashes
```

## Quick Start: Generating an SBOM

Surfactant supports several subcommands that can be shown using `surfactant --help`. The main one for creating an SBOM is the `generate` subcommand, which takes the following arguments:

```bash
surfactant generate [OPTIONS] SPECIMEN_CONFIG SBOM_OUTFILE [INPUT_SBOM]
```

The two required arguments are a specimen configuration, and the output SBOM file name. For a simple case of generating an SBOM for a single directory or file, it is enough to just use the path to the directory or file for the specimen configuration. For example, the following command will generate an SBOM file called `output.json` with software entries for all files found in the folder `mysoftware`:

```bash
surfactant generate /usr/local/mysoftware output.json
```

In the generated SBOM, there will be software entries for each file. The install paths captured will say where individual files are located within `/usr/local/mysoftware` -- if instead a relative path had been given such as `surfactant generate local/mysoftware output.json`, all of the install paths for files would appear to be under the relative path `local/mysoftware` instead of an absolute path.

For more control over the options used to create software entries and relationships, or for capturing information from multiple directories, see the following section on how to write a [Surfactant specimen config file](#build-configuration-file-for-sample). This configuration file is a JSON file can then be given to Surfactant for the `SPECIMEN_CONFIG` argument.

NOTE: When using a Surfactant speciment configuration file, it is recommended that it end in a `.json` file extension; otherwise, you'll have to use a special prefix for the `SPECIMEN_CONFIG` argument to tell Surfactant that it should interpret the given file that doesn't end in `.json` as a specimen configuration file rather than to generate an SBOM that only contains details on that one file.

## Settings

Surfactant settings can be changed using the `surfactant config` subcommand, or by hand editing the settings configuration file (this is not the same as the JSON file used to configure settings for a particular sample that is described later). The [settings documentation page](https://surfactant.readthedocs.io/en/latest/settings.html) has a list of available options that are built-into Surfactant.
Expand Down Expand Up @@ -377,10 +397,10 @@ NOTE: These examples have been simplified to show differences in output based on
### Run surfactant

```bash
$ surfactant generate [OPTIONS] CONFIG_FILE SBOM_OUTFILE [INPUT_SBOM]
$ surfactant generate [OPTIONS] SPECIMEN_CONFIG SBOM_OUTFILE [INPUT_SBOM]
```

**CONFIG_FILE**: (required) the config file created earlier that contains the information on the sample\
**SPECIMEN_CONFIG**: (required) the config file created earlier that contains the information on specimens to include in an SBOM, or the path to a specific file/directory to generate an SBOM for with some implied default configuration options\
**SBOM OUTPUT**: (required) the desired name of the output file\
**INPUT_SBOM**: (optional) a base sbom, should be used with care as relationships could be messed up when files are installed on different systems\
**--skip_gather**: (optional) skips the gathering of information on files and adding software entires\
Expand Down
12 changes: 9 additions & 3 deletions docs/configuration_files.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Configuration Files

There are several files for configuring different aspects of Surfactant functionality based on the subcommand used.
This page currently describes sample configuration files, and the Surfactant settings configuration file. The sample configuration file is used to generate an SBOM for a particular software/firmware sample, and will be the most frequently written by users. The Surfactant settings configuration file is used to turn on and off various Surfactant features, including settings for controlling functionality in Surfactant plugins.
This page currently describes specimen configuration files, and the Surfactant settings configuration file. The specimen configuration file is used to generate an SBOM for a particular software/firmware sample, and will be the most frequently written by users. The Surfactant settings configuration file is used to turn on and off various Surfactant features, including settings for controlling functionality in Surfactant plugins.

## Settings Configuration File

Expand All @@ -23,6 +23,12 @@ Getting the currently set value for the option would then be done with:
surfactant config core.recorded_institution
```

Another example of a setting you might want to change is `docker.enable_docker_scout`, which controls whether Docker Scout is enabled. To disable Docker Scout (which also suppresses the warning message about installing Docker Scout), set this option to `false`:

```bash
surfactant config docker.enable_docker_scout false
```

### Manual Editing

If desired, the settings config file can also be manually edited. The location of the file will depend on your platform.
Expand All @@ -37,9 +43,9 @@ The file itself is a TOML file, and for the previously mentioned example plugin
recorded_institution = "LLNL"
```

## Build sample configuration file
## Specimen Configuration File

A sample configuration file contains the information about the sample to gather information from. Example JSON sample configuration files can be found in the examples folder of this repository.
A specimen configuration file contains the information about the sample to gather information from. Example JSON specimen configuration files can be found in the examples folder of this repository.

- **extractPaths**: (required) the absolute path or relative path from location of current working directory that `surfactant` is being run from to the sample folders, cannot be a file. Note that even on Windows, Unix style `/` directory separators should be used in paths.
- **archive**: (optional) the full path, including file name, of the zip, exe installer, or other archive file that the folders in `extractPaths` were extracted from. This is used to collect metadata about the overall sample and will be added as a "Contains" relationship to all software entries found in the various `extractPaths`.
Expand Down
27 changes: 25 additions & 2 deletions docs/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,13 +76,36 @@ pip install -e ".[test,dev]"
`pip install` with the `-e` or `--editable` option can also be used to install Surfactant plugins for development.

## Generating an SBOM

To create an SBOM, run the `surfactant generate` subcommand. For more details on the options it takes, please refer to this page on [basic usage](basic_usage.md). For more information on writing Surfactant configuration files for software specimens, see the [configuration files](configuration_files.md) page.
To create an SBOM, run the `surfactant generate` subcommand. For more details on the options it takes, please refer to this page on [basic usage](basic_usage.md). For more information on writing Surfactant configuration files for software specimens, see the documentation on how to build a [specimen configuration file](configuration_files.md#specimen-configuration-file).

The following diagram gives a high-level overview of what Surfactant does. The [internal implementation overview](internals_overview.md) page gives more detail about how Surfactant works internally.

![Surfactant Overview Diagram](img/surfactant_overview_diagram.svg)

In simpler cases such as generating an SBOM for a single file or directory that lives on the same system as Surfactant is being run on, Surfactant can just be given the path to generate the SBOM for:

```bash
surfactant generate "C:/Program Files/Adobe/Acrobat Reader" acrobat_reader_sbom.json
```

This command will generate an output SBOM file named `acrobat_reader_sbom.json` for all files in `C:/Program Files/Adobe/Acrobat Reader`, with install paths for files in the SBOM that show them as being under `C:/Program Files/Adobe/Acrobat Reader`. Alternatively, running Surfactant from the `C:/Program Files/Adobe` folder with the command `surfactant generate "Acrobat Reader" acrobat_reader_sbom.json` would result in the install paths in the SBOM showing the files as being under the relative path `Acrobat Reader/`.

If the path is to a single file an SBOM will be generated for that single file, unless its name ends in a `.json` extension (or the very rare case of the path being given to Surfactant beginning with one of 3 special prefixes: `config:`, `file:`, and `dir:`).

If an SBOM is being generated that requires more fine-grained control over various options such as the install prefix, or for capturing information on multiple locations, then Surfactant should be given a path to a [specimen configuration file](configuration_files.md#specimen-configuration-file). It is strongly recommended to always include a `.json` file extension as part of the file name.

### Special Specimen Config Argument Prefixes

For the specimen config command line argument, the path to a file with a `.json` extension is always treated as a specimen configuration file, and a path to a file without a `.json` file extension is treated as being for generating an SBOM with just that single file. To override this behavior, the specimen configuration argument to `surfactant generate` recognizes the special prefixes `config:`, `file:`, and `dir:`. For example, `surfactant generate file:home/abc.json` would tell Surfactant to generate an SBOM with a single entry in it, for the file called `abc.json` in the `home` directory (without the `file:` prefix, `home/abc.json` would be interpreted as a specimen configuration file).

Similarly, a `config:` prefix forces Surfactant to interpret the given file path as a specimen configuration file regardless of if the file name is missing a `.json` extension.

A file or directory name that starts with one of these special prefixes could cause problems, however these cases should be extremely rare and can always be solved by creating a specimen configuration file (which since it is user created, can be given a file name that avoid s issues). However, a special prefix could also be used to solve the issue. For example with a directory named `config:myapp`, running `surfactant generate config:myapp` will look for a specimen configuration file called `myapp`. To resolve this, the `dir:` prefix could be added to essentially tell Surfactant "this directory is actually named config:myapp". Running `surfactant generate dir:config:myapp` would then generate an SBOM for everyting in a directory called `config:myapp`.

NOTE: As long as the directory or file name that starts with the special prefix isn't the first thing in the argument, adding a special prefix shouldn't be necessary. For example, running `surfactant generate abc/config:myapp` or `surfactant generate /etc/config:myapp` to create an SBOM from a directory or file called `config:myapp` should work without issues since the specimen config argument doesn't start with one of the special prefixes.

Surfactant specimen configuration file should never be given a name that starts with one of these special prefixes, and should always end in a `.json` file extension.

## Understanding the SBOM Output

The following is a brief overview of the default SBOM file output format (which follows the CyTRICS schema). It is
Expand Down
40 changes: 6 additions & 34 deletions surfactant/cmd/generate.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@
# See the top-level LICENSE file for details.
#
# SPDX-License-Identifier: MIT
import json
import os
import pathlib
import queue
Expand All @@ -13,6 +12,7 @@
from loguru import logger

from surfactant import ContextEntry
from surfactant.cmd.internal.generate_utils import SpecimenConfigParamType
from surfactant.configmanager import ConfigManager
from surfactant.fileinfo import sha256sum
from surfactant.plugin.manager import call_init_hooks, find_io_plugin, get_plugin_manager
Expand Down Expand Up @@ -110,17 +110,6 @@ def get_software_entry(
return (sw_entry, sw_children)


def validate_config(config):
for line in config:
extract_path = line["extractPaths"]
for pth in extract_path:
extract_path_convert = pathlib.Path(pth)
if not extract_path_convert.exists():
logger.error("invalid path: " + str(pth))
return False
return True


def print_output_formats(ctx, _, value):
if not value or ctx.resilient_parsing:
return
Expand Down Expand Up @@ -194,9 +183,9 @@ def get_default_from_config(option: str, fallback: Optional[Any] = None) -> Any:

@click.command("generate")
@click.argument(
"config_file",
envvar="CONFIG_FILE",
type=click.Path(exists=True),
"specimen_config",
envvar="SPECIMEN_CONFIG",
type=SpecimenConfigParamType(),
required=True,
)
@click.argument("sbom_outfile", envvar="SBOM_OUTPUT", type=click.File("w"), required=True)
Expand Down Expand Up @@ -266,7 +255,7 @@ def get_default_from_config(option: str, fallback: Optional[Any] = None) -> Any:
# Disable positional argument linter check -- could make keyword-only, but then defaults need to be set
# pylint: disable-next=too-many-positional-arguments
def sbom(
config_file: str,
specimen_config: list,
sbom_outfile: click.File,
input_sbom: click.File,
skip_gather: bool,
Expand All @@ -289,26 +278,9 @@ def sbom(
output_writer = find_io_plugin(pm, output_format, "write_sbom")
input_reader = find_io_plugin(pm, input_format, "read_sbom")

if pathlib.Path(config_file).is_file():
with click.open_file(config_file) as f:
try:
config = json.load(f)
except json.decoder.JSONDecodeError as err:
logger.exception(f"Invalid JSON in given config file ({config_file})")
raise SystemExit(f"Invalid JSON in given config file ({config_file})") from err
# TODO: what if it isn't a JSON config file, but a single file to generate an SBOM for? perhaps file == "archive"?
else:
# Emulate a configuration file with the path
config = []
config.append({"extractPaths": [config_file], "installPrefix": config_file})

# quit if invalid path found
if not validate_config(config):
return

context: queue.Queue[ContextEntry] = queue.Queue()

for cfg_entry in config:
for cfg_entry in specimen_config:
context.put(ContextEntry(**cfg_entry))

# define the new_sbom variable type
Expand Down
112 changes: 112 additions & 0 deletions surfactant/cmd/internal/generate_utils.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
# Copyright 2025 Lawrence Livermore National Security, LLC
# See the top-level LICENSE file for details.
#
# SPDX-License-Identifier: MIT
import json
import os
import pathlib

import click


# pylint: disable=too-few-public-methods
class SpecimenConfigParamType(click.Path):
"""
A custom Click parameter type for handling configuration paths.
This class extends `click.Path` to provide additional functionality for
handling different types of configuration paths, including files, directories,
and JSON configuration files.
Attributes:
name (str): The name of the parameter type, set to "config".
Methods:
convert(value, param, ctx):
Converts the input value based on its prefix and returns the appropriate
configuration data. Supports the following prefixes:
- "file:" for file paths
- "dir:" for directory paths
- "config:" for JSON configuration files
If no prefix is provided, it attempts to determine if the value is a file
and loads it as JSON if possible. Otherwise, it treats the value as a
directory path.
"""

name = "specimen_config"

@staticmethod
def _get_param_type(filename: str):
"""Determines the type and properties of a parameter based on its filename prefix.
Args:
filename (str): The filename string to analyze, optionally with a type prefix
Returns:
tuple: A 3-tuple containing:
- str: The parameter type ('FILE', 'DIR', 'CONFIG', or '' for default)
- Path: The filepath as a Path object
- Path or None: The install prefix path (for files), directory path (for dirs),
or None (for config or default)
"""

if filename.startswith("file:"):
filepath = pathlib.Path(filename[5:])
return "FILE", filepath, filepath.parent
if filename.startswith("dir:"):
filepath = pathlib.Path(filename[4:])
return "DIR", filepath, filepath
if filename.startswith("config:"):
filepath = pathlib.Path(filename[7:])
return "CONFIG", filepath, None
return "", pathlib.Path(filename), None

def convert(self, value, param, ctx):
# value received may already be the right type
if isinstance(value, list):
return value

param_type, filepath, installprefix = self._get_param_type(value)

# validate filepath exists and is readable
if not filepath.exists():
self.fail(f"{value!r} does not exist", param, ctx)
if not os.access(filepath, os.R_OK):
self.fail(f"{value!r} is not readable", param, ctx)

# no explicit type given, use heuristics to determine correct type
if not param_type:
# if it's a file (that ends in .json) then treat it as a CONFIG file
# if it's not a file then probably a directory (or something odd...)
if filepath.is_file():
if filepath.suffix.lower() == ".json":
param_type = "CONFIG"
else:
param_type = "FILE"
installprefix = filepath.parent
else:
param_type = "DIR"
installprefix = filepath

if param_type in ("FILE", "DIR"):
# avoid a relative directory of "./" as the install prefix
if not installprefix.is_absolute() and len(installprefix.parts) == 0:
installprefix = ""
else:
installprefix = installprefix.as_posix()
# emulate a configuration file with the given path
config = [{"extractPaths": [filepath.as_posix()], "installPrefix": installprefix}]
elif param_type in ("CONFIG"):
with click.open_file(filepath) as f:
try:
config = json.load(f)
except json.decoder.JSONDecodeError as err:
self.fail(
f"{filepath.as_posix()!r} config file contains invalid JSON", param, ctx
)

for entry in config:
extract_path = entry["extractPaths"]
for pth in extract_path:
extract_path_convert = pathlib.Path(pth)
if not extract_path_convert.exists():
self.fail(f"invalid extract path in config file: {pth}", param, ctx)
else:
self.fail(f"{value!r} is not a valid specimen config type", param, ctx)

return config

0 comments on commit 99b225e

Please sign in to comment.