Skip to content

Commit

Permalink
removed aiohttp from bd_data
Browse files Browse the repository at this point in the history
  • Loading branch information
matthewb66 committed Jun 3, 2024
1 parent ad94c3d commit b66fef0
Show file tree
Hide file tree
Showing 4 changed files with 176 additions and 24 deletions.
165 changes: 150 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,35 +1,170 @@
# bd_sig_filter
BD Script to ignore components matched from Signature scan likely to be partial or invalid matches.

# INTRODUCTION
# PROVISION OF THIS SCRIPT
This script is provided under the MIT OSS license (see LICENSE file).
It does not represent any extension of licensed functionality of Synopsys software itself and is provided as-is, without warranty or liability.
If you have comments or issues, please raise a GitHub issue here. Synopsys support is not able to respond to support tickets for this OSS utility. Users of this pilot project commit to engage properly with the authors to address any identified issues.

## INTRODUCTION
Black Duck Signature matching is a unique and powerful way to find OSS and 3rd party code within your applications and
environments.

Signature matching uses hierarchical folder analysis to find matches with depth, identifying the strongest match for components.
Most competitive SCA solutions use individual file matching which is not effective to identify component matches
because the majority of files in a component do not change between versions, so multiple matches will be identified for every file.
Signature matching uses hierarchical folder analysis to find matches with depth, identifying the most likely components matching the project.
Many competitive SCA solutions use individual file matching across all files in the project which is not effective
to identify component matches because the majority of files in a component do not change between versions,
so multiple matches will be identified for every file.

However, Signature matching can produce false positive matches, especially where template code hierarchies is repeated across
multiple components.
However, Signature matching can still produce false positive matches, especially where template code hierarchies
exist in custom and OSS code.

Furthermore, Signatures matches can be identified in folders containing Synopsys tools, or in cache and configuration
locations or test folders which can be ignored at scan time, but may exist in the Black Duck project. Additionally, when scanning
modified OSS, Signature scanning can result in more than 1 match for a component leading to the requirement
to curate the BOM to ignore components.
Furthermore, Signature matches can be identified in folders created by Synopsys tools, or in cache/config
locations or test folders; these folders can be ignored at scan time, but can exist in the Black Duck project and need to
be removed after scan completion. Additionally, when scanning
modified OSS, Signature scanning can identify the same component with multiple versions from a single project
location, with the need to curate the BOM to ignore duplicate components.

This script uses several techniques to examine the Signature match paths for components, looking for the component
This script uses several techniques to examine the Signature match paths for components, searching for the component
name and version in the path to determine matches which are likely correct and optionally marking them as reviewed.

It can ignore components only matched from paths which should be excluded (Synopsys tools, cache/config folders
It can also ignore components only matched from paths which should be excluded (Synopsys tools, cache/config folders
and test folders), and components which are duplicates across versions where the version string is not found
in the signature match path.

Options can be used to enable ignore and review actions, and other features.

# INSTALLATION
## PREREQUISITES

Python 3.8+ must be installed prior to using this script.

## INSTALLATION

The package can be installed using the command:

python3 -m pip install bd-sig-filter

Upgrade from a previous version using:

python3 -m pip install bd-sig-filter --upgrade

Alternatively, the repository can be cloned and the script run directly using the command:

python3 bd_sig_filter/bd_sig_filter.py OPTIONS

## USAGE

If installed as a package, run the utility using the command `bd-sig-filter`.

Alternatively if you have cloned the repo, use a command similar to:

python3 bd_sig_filter/bd_sig_filter.py OPTIONS

The package can be invoked as follows:

usage: bd_sig_filter [-h] [--blackduck_url BLACKDUCK_URL] [--blackduck_api_token BLACKDUCK_API_TOKEN] [--blackduck_trust_cert] [-p PROJECT] [-v VERSION] [--debug] [--logfile LOGFILE]
[--report_file REPORT_FILE] [--version_match_reqd] [--ignore] [--review] [--no_ignore_test] [--no_ignore_synopsys] [--no_ignore_defaults]
[--ignore_no_path_matches]

options:
-h, --help show this help message and exit
--blackduck_url BLACKDUCK_URL
Black Duck server URL (REQUIRED)
--blackduck_api_token BLACKDUCK_API_TOKEN
Black Duck API token (REQUIRED)
--blackduck_trust_cert
Black Duck trust server cert
-p PROJECT, --project PROJECT
Black Duck project to create (REQUIRED)
-v VERSION, --version VERSION
Black Duck project version to create (REQUIRED)
--debug Debug logging mode
--logfile LOGFILE Logging output file
--report_file REPORT_FILE
Report output file
--version_match_reqd Component matches require version string in path
--ignore Ignore components in synopsys, default or test folders and duplicates with wrong version
--review Mark components reviewed
--no_ignore_test Do not ignore components in test folders
--no_ignore_synopsys Do not ignore components in synopsys tool folders
--no_ignore_defaults Do not ignore components in default folders
--ignore_no_path_matches
Also ignore components with no component/version match in signature path
(Use with caution)

The minimum required options are:

--blackduck_url https://BLACKDUCK_SERVER_URL
--blackduck_api_token BLACKDUCK_API_TOKEN
--project PROJECT
--version VERSION

Environment variables BLACKDUCK_URL, BLACKDUCK_API_TOKEN and BLACKDUCK_TRUST_CERT may also be used.

## SCRIPT BEHAVIOUR
The default behaviour of the script is to create a table of BOM components with details about what actions can be taken.
By default no actions will be taken, with only the table being created.

An example of the output table is shown below:

SUMMARY:
Components Ignored Reviewed Neither
------ ------------ --------- ---------- ---------
Before 641 0 0 641
After 641 24 615 2

Component Match Type Ignored Reviewed To be Ignored To be Reviewed Action
------------------------------------ ------------ --------- ---------- --------------- ---------------- ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
aggs-matrix-stats/1.3.14 Dep+Sig False False False True Mark REVIEWED - Dependency
aggs-matrix-stats/2.11.1 Sig False False False True Mark REVIEWED - Compname & version in path '/Plugins/ActOnePluginInstaller/image/actone-plugins-installer 10.0.0.67/RCM_Plugins/actOne-opensearch-2.x-connector/lib/aggs-matrix-stats-client-2.11.1.jar', Match result 200
aircompressor/0.10 Dep+Sig False False False True Mark REVIEWED - Dependency
Amazon MSK Library for AW/2.0.2 Dep+Sig False False False True Mark REVIEWED - Dependency
Apache HttpComponents Cor/5.2.4 Sig False False True False Mark IGNORED - compname or version not found in paths & --ignore_no_path_matches set
WSDL4J/1.5.1 Sig False False False False No Action

Note component names are truncated at 25 characters.

The `Before` and `After` rows in the SUMMARY list the total number of components, and how many components would be ignored or
marked reviewed by the script (if the `--ignore` and `--review` options are supplied).

The list of components shows the name, matchtypes and current ignore/review statuses, with the future status
(after running the script with the `--ignore` and `--review` options) in the `To Be Ignored` and `To Be Reviewed`
columns with an explanation in the `Action` column.

The following options can be specified:

--ignore: Ignore components as shown in the `To Be Ignored` column
--review: Mark components as reviewed as shown in the `To Be Reviewed` column
--no_ignore_test: Do not ignore components with signature paths within test folders
--no_ignore_synopsys: Do not ignore components with signature paths within Synopsys tools folders (for example '.synopsys')
--no_ignore_defaults: Do not ignore components with signature paths in cache/config folders (for example '.git', '.m2', '.local')
--version_match_required:
Enforce search for component version string in signature paths for marking reviewed
(Paths containing only the component name will be used for matching otherwise)
--ignore_no_path_matches:
Components with no match in the signature path are left unreviewed by default, allowing
manual review. Use this option to ignore these components instead but use with caution
as it may exclude components which are legitimate (the Signature match path does not
have to include the component name or version).

The options --report_file and --logfile can be used to output the tabular report and logging data to
specified files.

## PROPOSED WORKFLOW

The script provides automatic classification of Signature scan results.

# USAGE
It can mark components as reviewed which are either Dependencies, or which have signature match paths containing
the component name (and optionally component version) and therefore highly likely to be correctly identified
by Signature matching. Fuzzy pattern matching is used so there is the possibility
that components could be marked as reviewed where only a partial match exists, or components which should be matched
are not identified meaning that manual curation may still be required.

# WORKFLOW
It will also ignore components only matched within extraneous folders (for example created by Synopsys tools,
config/cache folders or test folders).

Components shown with `No action` are Signature matches where the component name or version
could not be identified in the signature paths, so they are potential false matches and require manual review.
Specify the `--ignore_no_path_matches` option to ignore these components automatically.
Duplicate components with multiple versions where the version
is not found in the signature match path are also marked as ignored.

18 changes: 14 additions & 4 deletions bd_sig_filter/ComponentClass.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ def __init__(self, name, version, data):
self.sig_match_result = -1
self.compname_found = False
self.compver_found = False
self.reason = 'No Action'
self.reason = 'No Action - compname or version not found in Sig paths'
self.best_sigpath = ''

def get_compverid(self):
Expand Down Expand Up @@ -140,9 +140,19 @@ def filter_name_string(name):
# - for, with, in, on,
# Remove strings in brackets
# Replace / with space
ret_name = re.sub(r" for | with | in | on | a |apache ", r" ", name, flags=re.IGNORECASE)
ret_name = re.sub(r"\(.*\)", r"", ret_name)
ret_name = re.sub(r"/", r" ", ret_name)
ret_name = re.sub(r"\(.*\)", r"", name)
for rep in [r" for ", r" with ", r" in ", r" on ", r" a ", r" the ", r" by ",
r" and ", r"^apache | apache | apache$", r" bundle ", r" only | only$", r" from ",
r" to ", r" - "]:
ret_name = re.sub(rep, " ", ret_name, flags=re.IGNORECASE)
ret_name = re.sub(r"[/@#:]", " ", ret_name)
ret_name = re.sub(r" \w$| \w |^\w ", r" ", ret_name)
ret_name = ret_name.replace("::", " ")
ret_name = re.sub(r" *", r" ", ret_name)
ret_name = re.sub(r"^ ", r"", ret_name)
ret_name = re.sub(r" $", r"", ret_name)

logging.debug(f"filter_name_string(): Compname '{name}' replaced with '{ret_name}'")
return ret_name

@staticmethod
Expand Down
15 changes: 11 additions & 4 deletions bd_sig_filter/SigEntryClass.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ def __init__(self, src_entry):
return

def search_component(self, compname, compver):
logging.debug("")
logging.debug(f"search_component() Checking Comp '{compname}/{compver}' - {self.path}:")
# If component_version_reqd:
# - folder matches compname and compver
Expand All @@ -33,8 +34,10 @@ def search_component(self, compname, compver):
compver_in_element = 0

# test of path search
comp_in_path = fuzz.token_set_ratio(compstring, self.path)
logging.debug(f"search_component(): comp_in_path is {comp_in_path}: path='{self.path}")
newpath = self.path.replace(os.sep, " ")
newpath = re.sub(r"([a-zA-Z-]*)[0-9] ", "\1 ", newpath)
comp_in_path = fuzz.token_set_ratio(compstring, newpath)
logging.debug(f"search_component(): TEST comp_in_path is {comp_in_path}: path='{self.path}")

found_compname_only = False
for element in self.elements:
Expand Down Expand Up @@ -62,6 +65,8 @@ def search_component(self, compname, compver):
if compver_in_element > 50:
logging.debug(f"search_component() - MATCHED component version ({compver}) in '{element}'")
return True, True, element_in_compname + compver_in_element
else:
test = 1

if found_compname_only:
logging.debug("search_component() - MATCHED Compname only")
Expand Down Expand Up @@ -89,9 +94,11 @@ def filter_folders(self):
return True, f"Found '{e}' in Signature match path '{self.path}'"

if not global_values.no_ignore_test:
test_folders = ['test', 'tests']
test_folders = r"^test$|^tests$|^testsuite$"
for e in self.elements:
if e in test_folders:
if re.search(test_folders, e, flags=re.IGNORECASE) is not None:
return True, f"Found '{e}' in Signature match path '{self.path}'"
# if e in test_folders:
# return True, f"Found '{e}' in Signature match path '{self.path}'"

return False, ''
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

[project]
name = "bd_sig_filter"
version = "1.1"
version = "1.2"
authors = [
{ name="Matthew Brady", email="[email protected]" },
]
Expand Down

0 comments on commit b66fef0

Please sign in to comment.