Skip to content

Commit

Permalink
wip: triagehistory: Try some optimisation
Browse files Browse the repository at this point in the history
Signed-off-by: Mathieu Dubois-Briand <[email protected]>
  • Loading branch information
mbriand committed Nov 7, 2024
1 parent f11cfd2 commit c2ca949
Showing 1 changed file with 12 additions and 2 deletions.
14 changes: 12 additions & 2 deletions swattool/triagehistory.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,8 +63,18 @@ def get_similarity_score(self, log_fingerprint: Collection[str]) -> float:
flags=re.IGNORECASE | re.MULTILINE)

# Compute scores for all fingerprint fragment combinations
scores = [[jellyfish.jaro_similarity(f1, f2) for f2 in log_fingerprint]
for f1 in self.log_fingerprint]
# Only consider combinations with similar positions in the files:
# reduce both false positives and computation time.
scores = [[0 for f2 in log_fingerprint] for f1 in self.log_fingerprint]
lendiff = len(self.log_fingerprint) - len(log_fingerprint)
for i, f1 in enumerate(self.log_fingerprint):
for j, f2 in enumerate(log_fingerprint):
maxdist = 2
startdist = i - j
enddist = lendiff - startdist
if min(abs(startdist), abs(enddist)) > maxdist:
continue
scores[i][j] = jellyfish.jaro_similarity(f1, f2)

# Compute the final score as 2 half-scores: fingerprint A to B, then B
# to A, so the similarity score is commutative.
Expand Down

0 comments on commit c2ca949

Please sign in to comment.