This repository contains the source codes about implementation of some algorithms or data structurses that I need to know to study something about an algorithm. The source codes are written by me, referencing the original paper sugesting the source codes or some papers referencing the original paper.
- Contributors: StoneIron02
Category | Subcategory | Name | |
Algorithms | String Algorithms | String Matching | Rabin-Karp algorithm |
Knuth–Morris–Pratt algorithm (KMP) | |||
Boyer-Moore algorithm (BM) | |||
Boyer-Moore-Horspool algorithm (BMH) | |||
Aho-Corasick Automata | |||
2D String Matching | BAKER-BIRD Algorithm | ||
String Distance | (Weighted) Edit Distance algorithm | ||
the Longest Common Subsequence (LCS) | |||
Y. Sakai's Maximal Common Subsequence (MCS) | |||
DY Lee & JC Na's MCS | |||
HJ Shin & JS Sim's MCS | |||
Graph Algorithms | MST | Prim's Algorithm | |
Kruskal's Algorithm | |||
Shortest Path | Dijkstra's Algorithm | ||
Bellman-Ford's Algorithm | |||
Floyd-Warshall's Algorithm | |||
Data Structures | Hierarchical Data Structures | Tree | General Tree |
Priority Queue | Heap | ||
Binary Search Tree | Red-Black Tree | ||
Set | DisjointSet | ||
Graphs | Graph Representations |
Adjacency List representation |
- String matching problem is the problem that finds multiple or single pattern in multiple of single given text.
-
Rabin-Karp algorithm
- Contributor: StoneIron02
- Reference: Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein, "Introduction to algorithms3rd", 2009
- Language used to implement: C++
- Abstract:
Rabin-Karp Algorithm is one of the pattern matching algorithms to find all locations of pattern in the given text. This algorithm separates text into substrings that have the length of the pattern and converts pattern and substrings into numbers. After that, compare each number by using modulo.
- Time complexity:
Let the the length of pattern is m, and the length of text is n. It takes O(m) time to preprocess the first substring of the pattern and text. It takes O(n-m) time to compute the number of the next substring. If each number of pattern and substring is the same, we should compare the original string to find valid shifts. That takes O(m)*O(the sum of valid shifts and spurious hits). So, worst case is$$O((n-m+1)m)$$ and expected time, if the number of valid shifts are O(1) and q is more than m, is$$O(n)$$
- Space complexity:
Equals the sum of the length of pattern and text.$$O(n+m)$$
-
Knuth-Morris-Pratt algorithm (KMP)
- Contributor: unsik6
- Reference: Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein, "Introduction to algorithms3rd", 2009
- Language used to implement: C++
- Abstract:
Knuth-Morris-Pratt(KMP) algorithm is most famous string matching algorithm. This algorithm is referenced by many algorithms, papers and etc. This algorithm finds all occurrences of a given pattern string in a given text string. The main property used in this algorithm is appeared in Failure function(array). If we meet a mismatch when comparing a character of a text and one of a pattern, we can skip shifts as many as the length of the longest prefix which equals to a suffix of the prefix, of pattern, already matched with a substring of text.
- Time complexity:
There are one preprocessing phase to construct Failure array and one text scanning phase. Failure array is constructed with a pattern, using dynamic programming in time linear to a length of a pattern. And, using simliar way, a text scanning runs in time linear to the length of a text. In two phase, we scan each character of pattern(a text) only twice. So, where n is the length of a text and m is the length of a pattern, total time complexity is$$O(n+m)$$ - Space complexity:
We only need the space for Failure array. So,$$O(m)$$
-
Boyer-Moore algorithm (BM)
- Contributor: unsik6
- Reference: Robert S. Boyer and J Strother Moore, "A Fast String Searching Algorithm", ACM(1977) 20 (10) 762-772
- Language used to implement: C++
- Abstract:
Boyer-Moore(BM) algorithm is most famous practical string matching algorithm. This algorithm is not only referenced by many algorithms, papers and etc but also often appied to some programs using string algorithm. This algorithm finds all occurrences of a given pattern string in a given text string. The special idea of BM algorithm is scanning the text backward. There are two heuristics using the information proposed by backward text scanning, Bad Character(BC) heuristic and Good Suffix(GS) heuristic. (Actually, these heuristics were called delta1 and delta2 in the paper written by Boyer and Moore. the heuristics were named after the paper was published.)
BC heuristic uses the character which mismatch with a character of pattern. If a mismatch occurs, we know that the shift aligning the mismatched character of text with the rightmost occurrence of the mismatched character of a text in a pattern is the nearest shift among possibly vaild shifts. So, skip the shifts already assured to be invaild.
But BC heuristic is not enough. If the rightmost occurrence of the mismatched character of a text in pattern is right of the poistion where the mismatch occurs in poistion, the pattern moves to left by BC heuristic. In the case, we use GS heuristic. GS heuristic uses the information about the suffix of a pattern already matched with a substring of text. We align the matched substring of a text with the rightmost reoccurrence of the matched suffix of pattern. But since we already know the mismatched character, the reoccurrence which the previous character of doesn't equal the previous of the mismatched suffix of pattern is aligned. Moreover, if the suffix of the matched suffix of a pattern is a prefix of a pattern, the prefix of a pattern is aligned with the suffix of the matched substring of a text.
So, When mismatch occurs, the pointer of the character which will be scanned next in a text is moved as many as the minimum between the distance by BC heuristic and one by GS heuristc and the pointer of the character which will be scanned next in a pattern is initialized to the last position of a pattern.
- Time complexity:
Actually, the time complexity of this algorithm is O(n * m) in the worst case, where n is the length of a text and m is the length of a pattern. But, Boyer and Moore experimentally proved that their algorithm runs in linear time.$$O(nm)$$ $$O(n + m)\ in\ average\ case$$ - Space complexity:
We need BC array and GS array. Since BC array stores the rightmost occurrnce of each character in pattern, the number of elements equals the size of alphabet. And GS array stores the distances that the pointer in text moves when a mismatch occurs forall positions in a pattern. So,$$O(|\Sigma| + m)$$
-
Boyer-Moore-Horspool algorithm (BMH)
- Contributor: unsik6
- Reference: R. Nigel Horspool, "Practical Fast Searching in Strings", Software-Practice and Experience(1980) 10 501-506
- Language used to implement: C++
- Abstract:
Boyer-Moore-Horspool(BMH) algorithm is a variant of Boyer-Moore algorithm. The main idea of this algorithm is not using Good Suffix heuristic of BM algorithm since the case applying Good Suffix heuristic is very rare in real data although the time of preprocessing to constructing GS array is huge. So, this algorithm is more efficient to strings, consisted of a big alphabet, such as English texts, since this algorithm uses only Bad Character heuristic. If the input strings have more bigger alphabet, the case applying BC heuristic occurs more.
- Time complexity:
Time complexity of this algorithm equals one of BM algorithm. But, real running time is different by input strings. This paper written by M,Crochemore and T.Lecroq suggested the result of experiments about comparing the running times of variants and original algorithm of BM algorithm.$$O(nm)$$ $$O(n + m)\ in\ average\ case$$ - Space complexity:
We need only BC array. So,$$O(|\Sigma|)$$
-
Aho-Corasick Automata
- Contributor: unsik6
- Reference: Alfred V. Aho and Margaret J. Corasick, "Efficient String Matching: An Aid to Bibliographic Search", ACM(1975) 18 (6) 333-340
- Language used to implement: C++
- Abstract:
Aho-Corasck Automata is the deterministic automata to find all location of multiple patterns in a given text. Preprocessing patterns constructs the automata looked like a trie. Next, put a given text, the string in which we want to find patterns, in the automata.
- Time complexity: (implementation: array)
preprocessing by all patterns to construct the automata, including the trie, failure func and output func, needs time linear to sum the length of all patterns. And, Searching patterns in a given text needs time linear to the length of the text. If the sum of the length of all patterns is m, the alphabet used in the text is constant, and the length of the text is n, then the time complexity of this algorithm is$$O(m + n)$$
- Space complexity: (implementation: array)
The number of nodes and edges of the automata is at most the sum of the length of patterns. And, the failure function can be stored as a linear array. If the patterns consist of only one character and their length are differnet from each other, the output function returns all patterns whose length is shorter than the new pattern founded.$$O(m^2)$$
-
BAKER-BIRD Algorithm
- Contributor: unsik6
- Reference: R.S.Bird, "TWO DIMENSIONAL PATTERN MATCHING", IPL(1977), 168-170 & THEODORE P. BAKER, "A Technique for extending rapid exact-match string matching to arrays of more than one dimension", SIAM J. Comput. (1978), 533-541
- Language used to implement: C++
- Abstract:
Two dimensional pattern matching algorithm of R. S. Bird uses Aho-Corasick automata and KMP pattern Matching algorithm. The rows of pattern(m X m ) is considered multiple patterns. So, as Aho-Corasick automata is applied to row matching, Bird algorithm finds all rows that occur in a position of text(n X n). In two dimensional pattern matching, It can be siad that the patterns was found by finding all the rows of pattern in order. It can be computation using KMP algorithm.
Baker also published a paper introducing two dimensional pattern matching. He use only M-P algorithm and it's automaton. He extended M-P automaton for row matching, but that is almost same with Aho-Corasick automaton. And in column matching, Baker use M-P automaton(M-P algorithm equals KMP). So, generally, Baker's algorithm and Bird's algorithm are combined and called 'Baker-Bird algorithm'.
- Time complexity: (implementation: array)
One of preprocessing to construct an AC automata by all rows of pattern needs O(m2) time. In the original paper, Other preprocessing, labeling all rows of pattern, was introduced it needs same time. but it can run in O(m) time using an AC automata already constructed. Also, It can run in the preprocessing to construct Output Func of an AC automata. The rest of preprocessing is to construct a KMP failure function using the labels of rows of pattern, and It runs in time in O(m2).
The prcessing to find all location of pattern in text run in O(n2) time, since there are n column and KMP algorithm in each column needs O(n) time. And row matching needs O(n2) time, since Time complexity of the pattern matching using AC automata is linear. So,$$O(m^2 + m + m^2 + n^2 + n^2)$$ $$Total\ running\ time=O(n^2+m^2)$$
- Space complexity: (implementation: array)
There are an AC automata, which haves at most m2 nodes, the failure function of the AC automata that also have the same number of elements becuase it is implemented as array, the output function, which have m2 elements, and the KMP failure function, in column matching process, that have m elements. And we need the array that have n elements, to keep how many rows of pattern matched before using KMP algorithm in column matching. So, the space complexity of this algorithm is$$O(n+m^2)$$
- String distance means how much differenct between strings are. So, If string distance between two strings is more shorter, the two strings are more simliar.
- There are many measures of string distance. We can check the difference between strings in a lot of way. 'How many operations are needed to make a text from pattern', 'How long common substring of two strings is', ... Some of them are named '~ distance', but some of them don't look like that, such as LCS(the Longest Common Subsequence).
-
(Weighted) Edit Distance algorithm
- Contributor: unsik6
- Reference: Venki and et al., 'Edit distance | DP-5', GeeksforGeeks
- Language used to implement: C++
- Abstract:
'Edit distance' means how many operations are need to transform a pattern to a text. The operations are also given as input. In common, INSERT, DELETE, REPLACE are used. If you don't need to use the operation for the transformation, there is the position where a charcter of a text and a character of a pattern are matching. Edit distance is the minimum of the sum of cost of all operation which run to transform from a pattern to a text.
Basic edit distance algorithm defines costs of all operation as 1. So, if match position occurs, there is no cost, and if you can use REPLACE by aligning a character of a text and a character of a pattern, it is more efficient than deleting the character of a pattern and inserting the character of a text.
But the costs of operations are not always 1 and equal each other. So, there is Weighted edit distance algorithm. This algrithm gets all costs of operations as input. Sometimes, the cost of doing nothing in a matching case also is given. (Acutally, it is not correct that there is some cost of nothing, since doing nothing don’t need any work. But, for example, in some real world problem, doing nothing has more value. So, we can define the cost of doing nothing as negative value to give more value than 0.) Basic edit distance algorithm belongs to Weighted edit distance, since basic algorithm equals weighted edit distance in the case that all costs of operations are given as 1.
Edit distance algorithm is implementated, using Dynamic Programming, like LCS(the Longest Common Subsequence). Since it is important that how to align the subsequences of two strings, there are so many cases. - Time complexity:
This algorithm fills out n x m two dimensional matrix, where n is the length of a text and m is the length of a pattern. It runs simliarly with some of dynamic programming algorithms such as MCM(Matrix Chain Multiplication), LCS, LIS(the Longest Increasing Subsequence). So,$$O(nm)$$
- Space complexity:
We need n x m matrix. But we can compute an edit distance using only two rows by toggling the rows.$$O(n+m)$$
But if you want to print or know the sequence of operations to transform from a pattern to a text, we need one more matrix of the quadratic size, since we need to keep all optimal solution of all subproblems. And we can construct the sequence by backtracking the matrix. So, we need to know the sequence of operations with,$$O(nm)$$ space.
-
the Longest Common Subsequence
- Contributor: unsik6
- Reference: Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein, "Introduction to algorithms3rd", 2009
- Language used to implement: C++
- Abstract:
The Longest Common Subsequence (LCS) is one of the measures of a string distance. If LCS is more longer, the strings are more similar. LCS is computed by dynamic programming. It is simple.
- Time complexity:
This algorithm fills out n x m two dimensional matrix, where n is the length of a text and m is the length of a pattern. Sadly, for any positive constant c, there are no O(n2-c)-time algorithms to compute the length of LCS if the strong exponential time hypothesis (SETH) is true. So,$$O(nm)$$
- Space complexity:
We need n x m matrix. But we can compute the length of LCS using only two rows by toggling the rows. For convenience, I just implement the algorithm using a n by m matrix.$$O(n+m)$$
But if you want to print or know the sequence of operations to transform from a pattern to a text, we need one more matrix of the quadratic size, since we need to keep all optimal solution of all subproblems. And we can construct the sequence by backtracking the matrix. So, we need to know the sequence of operations with,$$O(nm)$$ space.
- For any positive constant c, there are no O(n2-c)-time algorithms to compute the length of LCS if the strong exponential time hypothesis (SETH) is true.
- So, someone focus on Maximal Common Subsequence(MCS).
- MCS is common subsequences that is no longer a common subsequence when any character is inserted.
- This was proposed first by Campbell B. Fraser and Robert W. Irving in 1975( Fraser, Campbell B., Robert W. Irving, and Martin Middendorf. "Maximal common subsequences and minimal common supersequences." information and computation 124.2 (1996): 145-153.). (Actually, They proposed SMCS(Shortest Maximal Common Subsequence).)
Problem | Algorithms |
Find a MCS of two strings | Y.Sakai's Algorithm |
DY Lee & JC Na's Algorithm | |
HJ Shin & JS Sim's Algorithm | |
Find all MCSs of multiple strings |
- Sakai's algorithm, Lee's algorithm and Shin's algorithm use queries of the data structure of Beame and Fich as the operation to find a common character. Because implementation of the data structure is too hard, however, I use the function first_find_of of standard C++ library.
-
Y. Sakai's MCS
- Contributor: unsik6
- Reference: Sakai, Yoshifumi. "Maximal common subsequence algorithms." Theoretical Computer Science 793 (2019): 132-139.
- Language used to implement: C++
- Abstract:
Yoshifumi Sakai proposed a (sub)linearithmic-time, linear-space algorithm for finding a maximal common subsequence of two strings and a linear-time algorithm for determining if a common subsequence of two strings is maximal. I think this algorithm uses greedy strategy.
[Outline]
(1) Find a common character from front to back. Choose a character of a string, and find position of same character in another string. Execute this operation alternately on two strings of characters.
(2) If there is no character to find a common character, then remove the suffix that starts with the last common character in both strings. And push the common character in front of MCS. Repeat (1).
- Time complexity:
This algorithm use the data structure of Beame and Fich. This is O(n)-time constuctible, And Support O(sqrt(log n / log log n))-time queries that find first(resp. last) position where a given character occurs after(resp. before) a given index. In this algorithm, the queries are used O(n) times. So,$$O(n \sqrt{\log n / \log \log n})$$
- Space complexity:
This algorithm stores positions of all found common characters in both strings, and how many times the operation to find a common character is used to find each common character. And, There is O(n) common characters. So,$$O(n)$$
-
DY Lee & JC Na's MCS
- Contributor: unsik6
- Reference: DongYeop Lee, & Joong Chae Na. (2022), An Improved Algorithm for Finding a Longer Maximal Common Subsequence. Journal of KIISE, 49(7), 507-513.
- Language used to implement: C++
- Abstract:
DY Lee & JC Na proposed the result of experiment, a MCS computed by Sakai's algorithm is very short than LCS. So, They proposed the strategy that makes the length of MCS comuted by Y.Sakai's algorithm more longer. In Sakai's algorithm, the operation to find a common character are executed alternately on two strings of characters. The strategy is,
(1) Execute the operation in both strings at the same time.
(2) Compare the maximal numbers of executing the operation to find next common characters by two common characters found by (1). If the index of a common charcter is i in string1 and j in string2, then the maximal number of executing the operation is min(|range in string1| - i, |range in string2| - j).
(3) Choose the common character, which makes the maximal number of executing the operation more greater, as a character of MCS.
This algorithm computes more longer MCS than Sakai's. - Time complexity:
In this algorithm, the opeartion to find a common character execute one more per characters. So,$$O(n \sqrt{\log n / \log \log n})$$
- Space complexity:
The space complexity of this algorithm is same with Sakai's. So,$$O(n)$$
-
HJ Shin & JS Sim's MCS
- Contributor: unsik6
- Reference: Hyeonjun Shin, & Jeon Seop Sim. A New Algorithm of Finding a Maximal Common Subsequence of Two Strings. Korea Software Congress 2022, 1212-1214.
- Language used to implement: C++
- Abstract:
Lee's algorithm computes more longer MCS than Sakai's. But, This algorithm faces two problems. First, If an infrequent character is chosen as a character of MCS, the strategy is not work well. And, more greater the number of alphabets of strings is, more shorter a computed MCS is.
So, HJ Shin & JS Sim proposed the strategy, using Lee's algorithm and Sakai's algorithm. The startegy is,
*Find a common character like Sakai's.
Case 1) If the distance between a common character found in current stage and the last common character is more greater than given positive constant k. Then, pend choosing the common character as a character of MCS. Store the common character in the array indexed by the last common character.
Case 2) If the distance is k or less, choose the common character as a character of MCS.
Case 3 - base) Use pending common character, when there is no common characters whose distance is less than k or there is no more characters that can be target of finding a common character before k distance. Using linear time algorithm, find the pending character, which makes the maximal numbers of executing the operation to find next common characters, in the array indexed by the last common character. And choose that as a character of MCS.
This algorithm computes more longer MCS than Lee's and Sakai's. Moreover, more greater the number of alphabets of strings is, more longer MCS this algorithm computes than others. Although this algorithm not always computs a longer MCS, it can always computes a longer MCS if Lee's strategy is applied to common characters whose distance is k or less. - Time complexity:
The time complexity of this algorithm is k^2 times Sakai's. First, it spends O(k^2n) time to find the best common character in all arrays. There are O(n) arrays, and maximal size of arrays is O(k). Second, the opeartion to find a common character more execute k times per pending common characters. Therefore it spends more O(k^2 *n *sqrt(log n /log log n) time. However, this is the worst case, and it is not usual. Because k is constant,$$O(n \sqrt{\log n / \log \log n})$$
- Space complexity:
The space complexity of this algorithm is k times Sakai's. The number of arrays storing pending common characters is O(n), and the size of each array is O(k). However, k is constant. So,$$O(n)$$
[1] MST
- Minimum Spanning Tree(MST) problem is one of the famous problems. The problem is finding a spanning tree with the minimum weight of a given connected, undirected graph. It is no matter the given graph is weighted or not. If the given graph is unweighted, we can solve the problem considering the weights of all edges are same. A spanning tree for a connected, undirected graph is an undirected tree, a subgraph which includes all vertices of the given graph. The weight of a spanning tree is the sum of the weight of all edges of itself. If a spanning tree has the least weight than othes, the spanning tree is MST.
- Here are two MST algorithms, Prim's and Kruskal's. The algorithms are implemented in one namespace MST,
- They return different datatype. So, If you want to test the algorithm, NOT use the function named the name of algorithms. please use the function whose name starts with print.
-
Prim's Algorithm
- Contributor: unsik6
- Reference: Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein, "Introduction to algorithms3rd", 2009
- Language used to implement: C++
- Abstract:
Prim's algorithm is the algorithm to find MST. The idea of Prim's algorithm is like BFS. We insert the root vertex in FRINGE priority-queue. The priority-queue orders the vertices by the minimum weight of the edges of the vertex. And, in loop, pop a vertex from the priority-queue and put the minimum weighted edge of the vertex into the MST edge set. (All vertices of the given graph belong to MST, so the ouput is the set of edges.) The next process in the loop is that all adjacent vertices of the vertex, the destination of latest inserted edge, insert into FRINGE priority-queue. Prim's algorithm concentrate the information of vertices. Though it control the edges, consider vertices first.
The algorithm needs several datastructure, priority-queue. And, The input is a graph. I use my heap implementation for priority-queue and adjacency list representation of graph for the input graph. There are a two main implementations. One is that implementing the priority-queue as unordered_array. Anoter is the implementation using Heap. Since oredered_array has no advantage, it is excluded.
- Time complexity: (discuss in theory, not aout my implementation)
In the first implementation, The algorithm runs in O(V|2) time, where V is the set of vertices. The dominative operation is searching the vertex, which has the minimum weighted incident edge from a vertex of FRINGE. It runs in O(|V|) time by each iteration. So, the time complexity is O(|V|).
The second implementation, using Heap. The searching operation runs in O(log n) time. And, since each edges can be considered from source and destination, the keys that mean the minimum weight of all edges from a vertex can be updated. DecreasKey of Heap runs in O(log n) time. So, the total time complexity of updating keys is O(|E|log n), where E is the set of edges. Since the number of vertices of a connected graph is O(|E|),$$using\ unordered\ array=O(|V|^2)$$ $$using\ heap=O(|V| log |V| + |E|log |V|) = O(|E| log |V|)$$
- Space complexity:
Regardless how to implement FRINGE priority-queue, the priority-queue has O(|V|) elements.$$O(|V|)$$
-
Kruskal's Algorithm
- Contributor: unsik6
- Reference: Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein, "Introduction to algorithms3rd", 2009
- Language used to implement: C++
- Abstract:
Kruskal's algorithm is the algorithm to find MST. Unlike Prim's algorithm, Kruskal's algorithm focuses on the information of edges. The main idea of Kruskal's algorithm is that we can construct MST if we select the minimum weighted edge at each moment. So, at first, Kruskal's algorithm puts all edges into REMAINING priority-queue. In loop, pop the minimum weighted edge from the priority-queue, and if the edge doesn't make a cylce in the tree we have constructed, put it in the tree. After loop, we can get the MST. If we use Disjoint Set, we can easily check whether the edge selected makes a cycle. Make Disjoint Set using all vertices. And, in loop, check the source and the destination of the edge selected are in the same set. If they are not in the same set, they have not been connected. Then, unify the sets the vertices belong. Else, they are already connected, so just skip the edge. The algorithm needs several datastructure, priority-queue. And, The input is a graph. I use my heap implementation for priority-queue and adjacency list representation of graph for the input graph. Also, I use my disjoint set implementation. There are a two main implementations. One is that implementing the REMAINING priority-queue as sorted_array. Anoter is the implementation using Heap.
- Time complexity: (discuss in theory, not aout my implementation)
The implementations above don't affect the time complexity of this algorithms. The dominative operations of this algorithm are build the priority-queue, pop the minimum weighted edge from REMAINING(pop from the priority-queue). The first operation runs in O(|E|log |E|) time(sorting array) or O(|E|) time(heap). The second operation runs once in each iteration, and the iteration runs O(|V|) times. So, The total time complexity of the second operation is O(|E|)(sorting array) or O(|E|log |E|)(heap). Total time complexity of this algorithm is$$O(|E|log |E|)$$
The number of edges is O(|V|2), since the input graph is a connectted, undirected graph. So, the time complexity of Kruskal's algorithm is same with Prim's algorithm.$$O(|E|log|E|)=O(|E|log|V|^2)=O(|E|log|V|)$$ - Space complexity:
Regardless how to implement REMAINING priority-queue, the priority-queue has O(|E|) elements. And, disjoint set has O(|V|) elements. So,$$O(|V|||E|)$$
[2] Shortest Path
- Shortest Path problem is one of the famous graph problems. The probelm is finding a minimum-weight path from a source vertex(or all vertices) to a destination vertex(or all vertices). Generally, it looks like the problems not depending on some properties of a given graph, such as (un)directed, acyclic. But some algorithms to solve these probelms restrict input.
- There are four types of shotest path problem.
- Single-Source Shortest Paths (SSSP): finding the shortest paths from a given specified vertex(source vertex) to all vertices.
- Single-Destination Shortest Paths (SDSP): finding the shortest paths from all vertices to a given specified vertex(destinateion vertex). This problem equals SSSP. Change all direction of edges of a given graph.(If the graph is undirected, it is not needed.) Input a given destination vertex into SSSP as a source, and output the path reversed directions (distances is same).
- Single-Pair Shortest Path (SPSP): finding the shortest path from a given specified vertex(source vertex) to a given specified vertex(destination vertex). This problem equals SSSP, because we get the shortest path that we want by expanding the shortest paths of each vertex. So, we have to check all shortest paths from the source vertex to all vertices in the worst case.
- All-Pair Shortest Path (APSP): finding the shortest paths from all vertices to all vertices.
- Single-Source Shortest Paths (SSSP): finding the shortest paths from a given specified vertex(source vertex) to all vertices.
- The algorithms are implemented in one namespace ShortestPath,
- They return different datatype. So, If you want to test the algorithm, NOT use the function named the name of algorithms. please use the function whose name starts with print.
Problem | Algorithms |
Single-Source Shortest Paths(SSSP) | Dijkstra's Algorithm |
Bellman-Ford's Algorithm | |
Single-Destination Shortest Paths (SDSP) | |
Single-Pair Shortest Path (SPSP) | |
All-Pair Shortest Path (APSP) | Floyd-Warshall's Algorithm |
-
Dijkstra's Algorithm
- Contributor: unsik6
- Reference: Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein, "Introduction to algorithms3rd", 2009
- Language used to implement: C++
- Abstract:
Dijkstra's algorithm is one of the algorithm to solve Single-Source Shortest Path (SSSP) problem. Its mechanism equals Prim's. Unlike Prim’s, this algorithm considers the distance from a source vertex we have updated, when selecting a new TREE vertex. If the new TREE vertex selected, we insert the adjacent vertices to FRINGE priority-queue and update the distance from a source vertex of the FRINGE vertices already existing. If the distance of a adjacent vertex of a new TREE vertex already definded is longer than the new distance(the distance of the new TREE vertex + the weight of edge from the new TREE vertex to the adjacent FRINGE vertex), update the distance of the adjacent FRINGE vertex to the new distance. If the distance of an adjacent vertex has not defined, it means the vertex is UNSEEN, so insert the vertex into FRINGE priority-queue and update the distance of the vertex to the new distance. If all distances of all vertices was initialized infinte, this process can work in same mechanism.
Dijkstra's algorithm works well only when all edges of a given graph have nonnegative weight. Because the distances of vertices are fixed when the vertices became TREE, the algorithm can not update the distance of a TREE vertex when the new distance decrease by a negative edge. You know that the distances of vertices only increase from a distance of a TREE vertex in this algorithm.
The algorithm needs several datastructure, priority-queue. And, The input is a graph. I use my heap implementation for priority-queue and adjacency list representation of graph for the input graph. There are a two main implementations. One is that implementing the priority-queue as unordered_array. Anoter is the implementation using Heap. Since oredered_array has no advantage, it is excluded.
- Time complexity: (discuss in theory, not aout my implementation)
In the first implementation, The algorithm runs in O(V|2) time, where V is the set of vertices. The dominative operation is searching the vertex, which has the minimum weighted incident edge from a vertex of FRINGE. It runs in O(|V|) time by each iteration. So, the time complexity is O(|V|).
The second implementation, using Heap. The searching operation runs in O(log n) time. And, since each edges can be considered from source and destination, the keys that mean the minimum weight of all edges from a vertex can be updated. DecreasKey of Heap runs in O(log n) time. So, the total time complexity of updating keys is O(|E|log n), where E is the set of edges. Since the number of vertices of a connected graph is O(|E|),$$using\ unordered_array=O(|V|^2)$$ $$using\ heap=O(|V| log |V| + |E|log |V|) = O(|E| log |V|)$$ And, the number of vertices on the shortest path can't be more than the number of vertex of a given graph. So, printing the path runs in O(|V|) time. - Space complexity:
Regardless how to implement FRINGE priority-queue, the priority-queue has O(|V|) elements.$$O(|V|)$$
-
Bellman-Ford's algorithm
- Contributor: unsik6
- Reference: Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein, "Introduction to algorithms3rd", 2009
- Language used to implement: C++
- Abstract:
Bellman-Ford's algorithm is one of the algorithm to solve Single-Source Shortest Path (SSSP) problem. A given graph is a directed, weighted graph. Unlike Dijkstra's algorithm, moreover, a given graph can have negative weighted edges. But if there are negative weighted cycle, this algorithm can't work well, since the paths that the part of is in negative weighted cycle get less weight more and more. So, the path can not indicate the 'shortest' path.
Bellman-Ford's algorithm uses the property about the shortest path. Since the shortest path has to be a simple path, the maximum number of vertices of path is |V|. So, except a source vertex, update the distances of vertices |V| - 1 times using all edges. This algorithm use relaxation like Dijkstra's algorithm.
The input is a graph. I use my adjacency list representation of graph for the input graph.
- Time complexity: (discuss in theory, not aout my implementation)
This algorithm iterates updating all distances of all vertices(except source vertex) |V| - 1 times. Since, updating the distances uses all edges, |E| iterations run in each outer iteration. So, the time complexity of this algorithm is$$O(|V||E|)$$
- Space complexity:
There are arrays storing all distances of vertices and parent vertex(previous vertex in the shortest path) of each vertix. You don't have to use both and just use arrays about what you want to ouput. Since each element of arrays indicates the information of each vertex. So, the each array has |V| elements.$$O(|V|)$$
-
Floyd-Warshall's algorithm
- Contributor: unsik6
- Reference: Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein, "Introduction to algorithms3rd", 2009
- Language used to implement: C++
- Abstract:
Floyd-Warshall's algorithm is one of the algorithm to solve All-Pair Shortest Path (APSP) problem. This algorithm use a matrix of |V| by |V|. This algorithm work inserting a vertex as a waypoint into a path between vertices, which are not same with the waypoint vertex, if a new path inserted the waypoint vertex is shorter than the path that already exists. So, this algorithm checks all cases that each vertex is a waypoint to all path between all pair of vertices, and updates if a new path is shorter.
It is same with Bellman-Ford's algorithm that a given graph to Floyd-Warshall's algorithm can have negative weighted edges and this algorithm can't work well if there are negative weighted cycle.
The input is a graph. I use my adjacency list representation of graph for the input graph.
- Time complexity: (discuss in theory, not aout my implementation)
This algorithm runs with triple-overlapping iteration. The most outer iteration changes what vertex is a waypoint. The middle iteration changes what vertex is the source vertex of the path, and The inner interation changes what vertex is the destination vertex of the path. So, the time complexity of this algorithm is$$O(|V|^3)$$
- Space complexity:
There are a two-dimensional array that stores the distance of paths. The element is in the ith column and the jth row in the 2D array is the shortest path from vertex i to vertex j, So, the space complexity of this algorithm is$$O(|V|^2)$$
Sorry. This part is currently being prepared.
-
Heap
- Contributor: unsik6
- Reference: Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein, "Introduction to algorithms3rd", 2009
- Language used to implement: C++
- Abstract:
Heap is the one of the famous data structure. Heap is the left-complete binary tree. The main property of heap is par.key() <= children.key(), where the heap is Min-Heap, If the heap is Max-Heap, the property is reversed. It is outstanding to implement Priority queue. Heap keeps the optimal value(minimum or maximum) at the root of heap. Since, the time complexity of insert and removeOptimalValue(with getOptimalValue) is O(log n), where n is the number of elements, heap is used to sort in O(nlog n) time.
- Time complexity:$$Build\ Heap\ =O(n)$$ Since heap is the left-complete binary tree, the height of heap is O(log n). So,
$$Insert\ and\ removeOptimalValue =O(log n)$$ - Space complexity: (implementation: array)
Since heap is the left-complete binary tree, heap don't need empty space, altough the heap is implemented as array. So,$$O(n)$$
- Binary Search Tree(BST) is an useful datastructure. Its main property is the node property, left_child.element <= parent.element < right_child.element. It is no matter if the equality relationship with the parent is established with the left_child or right_child.
- In a Balanced BST, the difference between the height of the left subtree and the height of the right subtree of all nodes is not big. Therefore, the some queries of a general BST runs in O(n) time, where n is the number of nodes, the same queries of a balanced BST are more efficient.
-
Red-Black Tree
- Contributor: unsik6
- Reference: Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein, "Introduction to algorithms3rd", 2009
- Language used to implement: C++
- Abstract:
Red-Black Tree is one of the balenced binary search tree. Since the height of this tree is O(log n) when n is the number of the node that has a key, The queries including 'search', 'predecessor', 'successor', 'minimum(maximum)', 'insert' and 'delete' can run in O(h) = O(log n) time. The height of a general binary search tree is O(n). Moreover, Red-Black Tree is more efficeint than AVL Tree, also one of the balenced binary search tree. In AVL Tree, the process to satisfy the balanced property in the insertion of deletion runs recursively in units of one level. But, The process, which has the same purpose, of Red-Black Tree runs recursively in units of two levels.
There is Red-Black Tree class implemented using template in the "RB_Tree.h" of this repository. The node of the class has an element variable declared using template. And when a node is deleted, My Red-Black Tree uses the successor of the node which will be deleted. However,you can use the predecessor of the node that will be deleted like the pseucode of CLRS. And, you can see how Red-Black Tree works in here(using predecessor)!
- Time complexity:
All queries can run in O(h) = O(log n) time.$$The\ time\ complexity\ of\ all\ queries=O(log n)$$
- Space complexity:
All leaves of Red-Black Tree is NIL. I implements all NIL distinctly, but they can be implemented as an integrated NIL since all operation of Red-Black Tree don't call or reference the member of NIL.$$O(n)$$
-
DisjointSet
- Contributor: unsik6
- Reference: TECHIE DELIGT</>
- Language used to implement: C++
- Abstract:
Disjoint Set(Union-Find Set) is a data structure implemented as a forest. Disjoint Set has sets that have the representative number. So, as I did, Disjoint Set can be implemented using the unordered_map<key, representative number of the set to which the key belongs>. Disjoint Set has two main operation, Find and Union. Find is passed a key and returns the representative of the set to wich the key belongs. Union is passed two keys. If the keys belong to the same set, it merges the sets.
- Time complexity:
The time complexity of Find and Union is O(n), if implemented as naive algorithm, where n is the number of elements. But, using Path compression and Union by rank, the time complexity is amortized O(1). So,$$Find\ and\ Union =O(log n)$$
- Space complexity: (implementation: unorodered_set)
It just need the pair, key and representative, by each elements. And, if appling Path compression and Union by rank, we need the pair, <representativ, height of the tree representing the set> by each set. The number of sets is less than the number of elements. So,$$O(n)$$
- Adjacency List representation
- Contributor: unsik6
- Reference: Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein, "Introduction to algorithms3rd", 2009
- Language used to implement: C++
- Abstract:
Adjacency List is the one of the representations of graph. The idea of this representation is storing the edges from each vector in same linked list. This representation needs less space than Adjacency Matrix representation.
I implement that std::vector<std::list<Edge*>*>. Edge is the custom class that has the pointers pointing the source vertex and the destination vertex and a weight. I implement Adjacency List as template class that need two type defined by user. The first type is the type of the element of vertices and the second type is the type of the weight of edges. - Time complexity: !!! This time complexity is not about Adjacency List ADT. !!!
Insert vertex needs that the two array is inserted each one element, so the operation runs in amortized O(1) time. delete vertex and find vertex by the element of vertex runs in O(|V|), where V is the set of vertices. The operations need to find the vertex having the element we input. insert edge by a weight and two elements of vertices, source vertex and destination vertex, runs in O(|V|) time. The operation need to find the index of the source vertex. delete edge by the same parameters runs in O(|V|+|E|) time, where E is the set of edges. The operation need to visit adjacency lists of all vertices and edges. find edge by the same parameters runs in O(degree of source vertex). - Space complexity: !!! This time complexity is not about Adjacency List ADT. !!!
I added one more array(std::vector<Vertex*>) having pointers of all vertices. This array needs O(|V|) space. Adjacency lists of all vertices have pointers of all edges. And all adjacency lists are stored in same array. So, we need O(|V| + |E|) space. Since we don't discuss about the number of elements of vertex and edge, total space complexity is O(|V| + |E|).