Skip to content
This repository has been archived by the owner on Mar 8, 2024. It is now read-only.

Finding Frequent (Sub)sequences

Casper Boone edited this page May 10, 2018 · 3 revisions

For finding frequent (sub)sequences the PrefixSpan algorithm by Pei et al. (2004) is used1. The algorithm uses the 'divide and conquer' principle, and is partially inspired by the FP-tree structure used to mine sets of unordered items.

During each iteration, a new prefix is generated based on the current prefix and nodes observed to be frequent in the set of paths. If this prefix is observed in the set of paths it is added to the set of frequent sequences. After this, all suffixes of sequences which start with this prefix are then mined recursively, with the suffix of a sequence being everything that follows said prefix. This improves upon the a priori method by foregoing the need to generate candidate sequences during each iteration. A pseudocode representation is given below.

Procedure PrefixSpace(all_paths, minimum_support)
  frequent_items = all nodes in all_paths with occurrence >= minimum_support
  return PrefixSpace([], frequent_items, all_paths, [])
EndProcedure

SubProcedure PrefixSpace(prefix, frequent_items, projected_paths, frequent_sequences)
  for all item in frequent_items
    if (prefix + item) in a path in projected_paths or
      (prefix.last + item) in a path in projected_paths

      new_prefix <- prefix + item
      frequent_sequences <- frequent_sequences U new_prefix
      new_projected_paths <- empty_set

      for all sequence in projected_paths
        if item starts with prefix
          new_projected_paths <- new_projected_paths U (item - prefix)
        end if
      end for

      PrefixSpace(new_prefix, frequent_items, new_projected_paths, frequent_sequences)
    end if
  end for

  return frequent_sequences
SubEndProcedure

[1] Pei, J., Han, J., Mortazavi-Asl, B., Wang, J., Pinto, H., Chen, Q., ... & Hsu, M. C. (2004). Mining sequential patterns by pattern-growth: The prefixspan approach. IEEE Transactions on knowledge and data engineering, 16(11), 1424-1440.

Clone this wiki locally