diff --git a/README.md b/README.md index 4071a781..64b80cb2 100644 --- a/README.md +++ b/README.md @@ -98,6 +98,26 @@ pip install -e ".[test,dev]" pip install -e plugins/fuzzyhashes ``` +## Quick Start: Generating an SBOM + +Surfactant supports several subcommands that can be shown using `surfactant --help`. The main one for creating an SBOM is the `generate` subcommand, which takes the following arguments: + +```bash +surfactant generate [OPTIONS] SPECIMEN_CONFIG SBOM_OUTFILE [INPUT_SBOM] +``` + +The two required arguments are a specimen configuration, and the output SBOM file name. For a simple case of generating an SBOM for a single directory or file, it is enough to just use the path to the directory or file for the specimen configuration. For example, the following command will generate an SBOM file called `output.json` with software entries for all files found in the folder `mysoftware`: + +```bash +surfactant generate /usr/local/mysoftware output.json +``` + +In the generated SBOM, there will be software entries for each file. The install paths captured will say where individual files are located within `/usr/local/mysoftware` -- if instead a relative path had been given such as `surfactant generate local/mysoftware output.json`, all of the install paths for files would appear to be under the relative path `local/mysoftware` instead of an absolute path. + +For more control over the options used to create software entries and relationships, or for capturing information from multiple directories, see the following section on how to write a [Surfactant specimen config file](#build-configuration-file-for-sample). This configuration file is a JSON file can then be given to Surfactant for the `SPECIMEN_CONFIG` argument. + +NOTE: When using a Surfactant speciment configuration file, it is recommended that it end in a `.json` file extension; otherwise, you'll have to use a special prefix for the `SPECIMEN_CONFIG` argument to tell Surfactant that it should interpret the given file that doesn't end in `.json` as a specimen configuration file rather than to generate an SBOM that only contains details on that one file. + ## Settings Surfactant settings can be changed using the `surfactant config` subcommand, or by hand editing the settings configuration file (this is not the same as the JSON file used to configure settings for a particular sample that is described later). The [settings documentation page](https://surfactant.readthedocs.io/en/latest/settings.html) has a list of available options that are built-into Surfactant. @@ -377,10 +397,10 @@ NOTE: These examples have been simplified to show differences in output based on ### Run surfactant ```bash -$ surfactant generate [OPTIONS] CONFIG_FILE SBOM_OUTFILE [INPUT_SBOM] +$ surfactant generate [OPTIONS] SPECIMEN_CONFIG SBOM_OUTFILE [INPUT_SBOM] ``` -**CONFIG_FILE**: (required) the config file created earlier that contains the information on the sample\ +**SPECIMEN_CONFIG**: (required) the config file created earlier that contains the information on specimens to include in an SBOM, or the path to a specific file/directory to generate an SBOM for with some implied default configuration options\ **SBOM OUTPUT**: (required) the desired name of the output file\ **INPUT_SBOM**: (optional) a base sbom, should be used with care as relationships could be messed up when files are installed on different systems\ **--skip_gather**: (optional) skips the gathering of information on files and adding software entires\ diff --git a/docs/basic_usage.md b/docs/basic_usage.md index 662e3f0d..06d422c8 100644 --- a/docs/basic_usage.md +++ b/docs/basic_usage.md @@ -10,10 +10,10 @@ In order to test out surfactant, you will need a sample file/folder. If you don' ## Running Surfactant ```bash -$ surfactant generate [OPTIONS] CONFIG_FILE SBOM_OUTFILE [INPUT_SBOM] +$ surfactant generate [OPTIONS] SPECIMEN_CONFIG SBOM_OUTFILE [INPUT_SBOM] ``` -**CONFIG_FILE**: (required) the config file created earlier that contains the information on the sample\ +**SPECIMEN_CONFIG**: (required) the config file created earlier that contains the information on specimens to include in an SBOM, or the path to a specific file/directory to generate an SBOM for with some implied default configuration options\ **SBOM OUTPUT**: (required) the desired name of the output file\ **INPUT_SBOM**: (optional) a base sbom, should be used with care as relationships could be messed up when files are installed on different systems\ **--skip_gather**: (optional) skips the gathering of information on files and adding software entires\ @@ -26,6 +26,7 @@ $ surfactant generate [OPTIONS] CONFIG_FILE SBOM_OUTFILE [INPUT_SBOM] **--include_all_files**: (optional) include all files in the SBOM, rather than just those recognized by Surfactant + ## Merging SBOMs A folder containing multiple separate SBOM JSON files can be combined using merge_sbom.py with a command such the one below that gets a list of files using ls, and then uses xargs to pass the resulting list of files to merge_sbom.py as arguments. diff --git a/docs/configuration_files.md b/docs/configuration_files.md index d27d76ca..d12d197e 100644 --- a/docs/configuration_files.md +++ b/docs/configuration_files.md @@ -1,7 +1,7 @@ # Configuration Files There are several files for configuring different aspects of Surfactant functionality based on the subcommand used. -This page currently describes sample configuration files, and the Surfactant settings configuration file. The sample configuration file is used to generate an SBOM for a particular software/firmware sample, and will be the most frequently written by users. The Surfactant settings configuration file is used to turn on and off various Surfactant features, including settings for controlling functionality in Surfactant plugins. +This page currently describes specimen configuration files, and the Surfactant settings configuration file. The specimen configuration file is used to generate an SBOM for a particular software/firmware sample, and will be the most frequently written by users. The Surfactant settings configuration file is used to turn on and off various Surfactant features, including settings for controlling functionality in Surfactant plugins. ## Settings Configuration File @@ -23,6 +23,12 @@ Getting the currently set value for the option would then be done with: surfactant config core.recorded_institution ``` +Another example of a setting you might want to change is `docker.enable_docker_scout`, which controls whether Docker Scout is enabled. To disable Docker Scout (which also suppresses the warning message about installing Docker Scout), set this option to `false`: + +```bash +surfactant config docker.enable_docker_scout false +``` + ### Manual Editing If desired, the settings config file can also be manually edited. The location of the file will depend on your platform. @@ -37,9 +43,9 @@ The file itself is a TOML file, and for the previously mentioned example plugin recorded_institution = "LLNL" ``` -## Build sample configuration file +## Specimen Configuration File -A sample configuration file contains the information about the sample to gather information from. Example JSON sample configuration files can be found in the examples folder of this repository. +A specimen configuration file contains the information about the sample to gather information from. Example JSON specimen configuration files can be found in the examples folder of this repository. - **extractPaths**: (required) the absolute path or relative path from location of current working directory that `surfactant` is being run from to the sample folders, cannot be a file. Note that even on Windows, Unix style `/` directory separators should be used in paths. - **archive**: (optional) the full path, including file name, of the zip, exe installer, or other archive file that the folders in `extractPaths` were extracted from. This is used to collect metadata about the overall sample and will be added as a "Contains" relationship to all software entries found in the various `extractPaths`. diff --git a/docs/getting_started.md b/docs/getting_started.md index 5f7f6815..da5ba185 100644 --- a/docs/getting_started.md +++ b/docs/getting_started.md @@ -76,13 +76,36 @@ pip install -e ".[test,dev]" `pip install` with the `-e` or `--editable` option can also be used to install Surfactant plugins for development. ## Generating an SBOM - -To create an SBOM, run the `surfactant generate` subcommand. For more details on the options it takes, please refer to this page on [basic usage](basic_usage.md). For more information on writing Surfactant configuration files for software specimens, see the [configuration files](configuration_files.md) page. +To create an SBOM, run the `surfactant generate` subcommand. For more details on the options it takes, please refer to this page on [basic usage](basic_usage.md). For more information on writing Surfactant configuration files for software specimens, see the documentation on how to build a [specimen configuration file](configuration_files.md#specimen-configuration-file). The following diagram gives a high-level overview of what Surfactant does. The [internal implementation overview](internals_overview.md) page gives more detail about how Surfactant works internally. ![Surfactant Overview Diagram](img/surfactant_overview_diagram.svg) +In simpler cases such as generating an SBOM for a single file or directory that lives on the same system as Surfactant is being run on, Surfactant can just be given the path to generate the SBOM for: + +```bash +surfactant generate "C:/Program Files/Adobe/Acrobat Reader" acrobat_reader_sbom.json +``` + +This command will generate an output SBOM file named `acrobat_reader_sbom.json` for all files in `C:/Program Files/Adobe/Acrobat Reader`, with install paths for files in the SBOM that show them as being under `C:/Program Files/Adobe/Acrobat Reader`. Alternatively, running Surfactant from the `C:/Program Files/Adobe` folder with the command `surfactant generate "Acrobat Reader" acrobat_reader_sbom.json` would result in the install paths in the SBOM showing the files as being under the relative path `Acrobat Reader/`. + +If the path is to a single file an SBOM will be generated for that single file, unless its name ends in a `.json` extension (or the very rare case of the path being given to Surfactant beginning with one of 3 special prefixes: `config:`, `file:`, and `dir:`). + +If an SBOM is being generated that requires more fine-grained control over various options such as the install prefix, or for capturing information on multiple locations, then Surfactant should be given a path to a [specimen configuration file](configuration_files.md#specimen-configuration-file). It is strongly recommended to always include a `.json` file extension as part of the file name. + +### Special Specimen Config Argument Prefixes + +For the specimen config command line argument, the path to a file with a `.json` extension is always treated as a specimen configuration file, and a path to a file without a `.json` file extension is treated as being for generating an SBOM with just that single file. To override this behavior, the specimen configuration argument to `surfactant generate` recognizes the special prefixes `config:`, `file:`, and `dir:`. For example, `surfactant generate file:home/abc.json` would tell Surfactant to generate an SBOM with a single entry in it, for the file called `abc.json` in the `home` directory (without the `file:` prefix, `home/abc.json` would be interpreted as a specimen configuration file). + +Similarly, a `config:` prefix forces Surfactant to interpret the given file path as a specimen configuration file regardless of if the file name is missing a `.json` extension. + +A file or directory name that starts with one of these special prefixes could cause problems, however these cases should be extremely rare and can always be solved by creating a specimen configuration file (which since it is user created, can be given a file name that avoid s issues). However, a special prefix could also be used to solve the issue. For example with a directory named `config:myapp`, running `surfactant generate config:myapp` will look for a specimen configuration file called `myapp`. To resolve this, the `dir:` prefix could be added to essentially tell Surfactant "this directory is actually named config:myapp". Running `surfactant generate dir:config:myapp` would then generate an SBOM for everyting in a directory called `config:myapp`. + +NOTE: As long as the directory or file name that starts with the special prefix isn't the first thing in the argument, adding a special prefix shouldn't be necessary. For example, running `surfactant generate abc/config:myapp` or `surfactant generate /etc/config:myapp` to create an SBOM from a directory or file called `config:myapp` should work without issues since the specimen config argument doesn't start with one of the special prefixes. + +Surfactant specimen configuration file should never be given a name that starts with one of these special prefixes, and should always end in a `.json` file extension. + ## Understanding the SBOM Output The following is a brief overview of the default SBOM file output format (which follows the CyTRICS schema). It is diff --git a/surfactant/cmd/generate.py b/surfactant/cmd/generate.py index c0fdef90..fcb16031 100644 --- a/surfactant/cmd/generate.py +++ b/surfactant/cmd/generate.py @@ -2,7 +2,6 @@ # See the top-level LICENSE file for details. # # SPDX-License-Identifier: MIT -import json import os import pathlib import queue @@ -13,6 +12,7 @@ from loguru import logger from surfactant import ContextEntry +from surfactant.cmd.internal.generate_utils import SpecimenConfigParamType from surfactant.configmanager import ConfigManager from surfactant.fileinfo import sha256sum from surfactant.plugin.manager import call_init_hooks, find_io_plugin, get_plugin_manager @@ -110,17 +110,6 @@ def get_software_entry( return (sw_entry, sw_children) -def validate_config(config): - for line in config: - extract_path = line["extractPaths"] - for pth in extract_path: - extract_path_convert = pathlib.Path(pth) - if not extract_path_convert.exists(): - logger.error("invalid path: " + str(pth)) - return False - return True - - def print_output_formats(ctx, _, value): if not value or ctx.resilient_parsing: return @@ -194,9 +183,9 @@ def get_default_from_config(option: str, fallback: Optional[Any] = None) -> Any: @click.command("generate") @click.argument( - "config_file", - envvar="CONFIG_FILE", - type=click.Path(exists=True), + "specimen_config", + envvar="SPECIMEN_CONFIG", + type=SpecimenConfigParamType(), required=True, ) @click.argument("sbom_outfile", envvar="SBOM_OUTPUT", type=click.File("w"), required=True) @@ -266,7 +255,7 @@ def get_default_from_config(option: str, fallback: Optional[Any] = None) -> Any: # Disable positional argument linter check -- could make keyword-only, but then defaults need to be set # pylint: disable-next=too-many-positional-arguments def sbom( - config_file: str, + specimen_config: list, sbom_outfile: click.File, input_sbom: click.File, skip_gather: bool, @@ -289,26 +278,9 @@ def sbom( output_writer = find_io_plugin(pm, output_format, "write_sbom") input_reader = find_io_plugin(pm, input_format, "read_sbom") - if pathlib.Path(config_file).is_file(): - with click.open_file(config_file) as f: - try: - config = json.load(f) - except json.decoder.JSONDecodeError as err: - logger.exception(f"Invalid JSON in given config file ({config_file})") - raise SystemExit(f"Invalid JSON in given config file ({config_file})") from err - # TODO: what if it isn't a JSON config file, but a single file to generate an SBOM for? perhaps file == "archive"? - else: - # Emulate a configuration file with the path - config = [] - config.append({"extractPaths": [config_file], "installPrefix": config_file}) - - # quit if invalid path found - if not validate_config(config): - return - context: queue.Queue[ContextEntry] = queue.Queue() - for cfg_entry in config: + for cfg_entry in specimen_config: context.put(ContextEntry(**cfg_entry)) # define the new_sbom variable type diff --git a/surfactant/cmd/internal/generate_utils.py b/surfactant/cmd/internal/generate_utils.py new file mode 100644 index 00000000..0cfdb5d7 --- /dev/null +++ b/surfactant/cmd/internal/generate_utils.py @@ -0,0 +1,112 @@ +# Copyright 2025 Lawrence Livermore National Security, LLC +# See the top-level LICENSE file for details. +# +# SPDX-License-Identifier: MIT +import json +import os +import pathlib + +import click + + +# pylint: disable=too-few-public-methods +class SpecimenConfigParamType(click.Path): + """ + A custom Click parameter type for handling configuration paths. + This class extends `click.Path` to provide additional functionality for + handling different types of configuration paths, including files, directories, + and JSON configuration files. + Attributes: + name (str): The name of the parameter type, set to "config". + Methods: + convert(value, param, ctx): + Converts the input value based on its prefix and returns the appropriate + configuration data. Supports the following prefixes: + - "file:" for file paths + - "dir:" for directory paths + - "config:" for JSON configuration files + If no prefix is provided, it attempts to determine if the value is a file + and loads it as JSON if possible. Otherwise, it treats the value as a + directory path. + """ + + name = "specimen_config" + + @staticmethod + def _get_param_type(filename: str): + """Determines the type and properties of a parameter based on its filename prefix. + Args: + filename (str): The filename string to analyze, optionally with a type prefix + Returns: + tuple: A 3-tuple containing: + - str: The parameter type ('FILE', 'DIR', 'CONFIG', or '' for default) + - Path: The filepath as a Path object + - Path or None: The install prefix path (for files), directory path (for dirs), + or None (for config or default) + """ + + if filename.startswith("file:"): + filepath = pathlib.Path(filename[5:]) + return "FILE", filepath, filepath.parent + if filename.startswith("dir:"): + filepath = pathlib.Path(filename[4:]) + return "DIR", filepath, filepath + if filename.startswith("config:"): + filepath = pathlib.Path(filename[7:]) + return "CONFIG", filepath, None + return "", pathlib.Path(filename), None + + def convert(self, value, param, ctx): + # value received may already be the right type + if isinstance(value, list): + return value + + param_type, filepath, installprefix = self._get_param_type(value) + + # validate filepath exists and is readable + if not filepath.exists(): + self.fail(f"{value!r} does not exist", param, ctx) + if not os.access(filepath, os.R_OK): + self.fail(f"{value!r} is not readable", param, ctx) + + # no explicit type given, use heuristics to determine correct type + if not param_type: + # if it's a file (that ends in .json) then treat it as a CONFIG file + # if it's not a file then probably a directory (or something odd...) + if filepath.is_file(): + if filepath.suffix.lower() == ".json": + param_type = "CONFIG" + else: + param_type = "FILE" + installprefix = filepath.parent + else: + param_type = "DIR" + installprefix = filepath + + if param_type in ("FILE", "DIR"): + # avoid a relative directory of "./" as the install prefix + if not installprefix.is_absolute() and len(installprefix.parts) == 0: + installprefix = "" + else: + installprefix = installprefix.as_posix() + # emulate a configuration file with the given path + config = [{"extractPaths": [filepath.as_posix()], "installPrefix": installprefix}] + elif param_type in ("CONFIG"): + with click.open_file(filepath) as f: + try: + config = json.load(f) + except json.decoder.JSONDecodeError as err: + self.fail( + f"{filepath.as_posix()!r} config file contains invalid JSON", param, ctx + ) + + for entry in config: + extract_path = entry["extractPaths"] + for pth in extract_path: + extract_path_convert = pathlib.Path(pth) + if not extract_path_convert.exists(): + self.fail(f"invalid extract path in config file: {pth}", param, ctx) + else: + self.fail(f"{value!r} is not a valid specimen config type", param, ctx) + + return config