Abstract data types (ADTs) refer to classes of objects whose operations and properties are formally defined, but are not restricted to specific implementations.
This is a repository of ADTs written in Ruby and, more recently, Python (CPython). It is an exercise in implementing an ADT with different data structures and a guide for practical application in the future.
It is not intended to cover all the API variations and implementations for a particular abstract data type.
I define the following ADTs, including their specifications, common operations in their API, and possible implementations with comparisons. I also link my code at the bottom of each section.
Note that many of the ADTs have their own nomenclature, so the same methods may have various names across the data types.
A Set is an unordered collection of unique elements.
- Unordered (no promises regarding insertion order)
- Unique collection of elements (no duplicates)
- Immutable elements: Although not a strict requirement, it is generally a good practice to use immutable elements in a set to ensure the integrity of the set's structure.
Basic operations
insert(el)
: inserts a new elementinclude?(el)
: queries for an elementdelete(el)
: removes an elementsize()
: Returns the number of elements in the set.
Set operations
union(set)
: Returns a new set containing all elements from the original set and the input set, without duplicates.intersection(set)
: Returns a new set containing the elements present in both the original set and the input set.difference(set)
: Returns a new set containing the elements present in the original set but not in the input set.subset(set)
: Checks if the original set is a subset of the input set, meaning all elements of the original set are also present in the input set.
- Hash function: A function that generates a value of fixed length for each input it receives. For any given input, the output will always be the same.
- Data is not contiguously stored
- Store elements in sub-arrays (buckets) in an array: when we insert an element into the set, use the modulo operator to deterministically assign every element to a bucket:
index = hash(value) % num_buckets
- Elements are required to be hashable
- Hashing the element's value returns an integer that is a valid index, allowing the set to handle keys of any data type that can be hashed.
- e.g., { 2, 4, 8, 16, “hello”, “dolly” }
- Don't allow it to be indexed into
- Use the same hashing function to compute an index into it.
- Array should be dynamic: resize the array by a constant multiple that scales with the number of buckets. The goal is to have
buckets.length > N
at all times.- This ensures that as our sample size increases, we don't rely more and more on a linear scan to find elements
Note: in Ruby and Python, Sets are implemented as a hash table
Avg. case is
O(n/k)
, wheren
is the number of elements in the hash set andk
is the number of buckets. When the hash set is properly managed, the average-case lookup time complexity ofO(n/k)
will be close toO(1)
.
Method | Amortized | Worst case | Notes |
---|---|---|---|
include? |
O(1) |
O(n) |
Worst case is in rare case when all keys collide into a single bucket, making it a linear search. |
insert |
O(1) |
O(n) |
Worst case is in rare case of a hash collision |
delete |
O(1) |
O(n) |
Worst case is in rare case of a hash collision |
SPACE | O(n) |
---|
-
Ruby handles a hash collision with separate chaining.
-
Python handles a hash collision with open addressing.
-
The maximum
density
(# of items chained at a location in memory) Ruby allows beforerehashing
is 5, which isO(n)
time complexity. -
A Hash Set is the fastest implementation of a Set: the hash uses a hashing function to store elements in memory and to later access where they are stored.
- This creates the highest chance of uniform distribution because there is no pattern to the output, although there is still the chance of a hash collision.
- We want to use a Hash Set over an array-based set most of the time.
- Useful if you want to ensure absolutely no duplicates - Hash Maps can have duplicate values (but not keys).
-
Although it isn’t particularly compact (requires pre-allocation of memory), it provides near constant time insertion and removal in the average case.
- Sets are most commonly used for:
- Membership testing
- Eliminating duplicate entries (cleanup)
- Not ideal for runtime operations: unfortunately, the edge cases of Hash Set (and Hash Table - up next) performance are significant, the cases are inevitable, and the methods for dealing with them require some runtime tuning.
- They are comprised of pre-allocated buckets of fixed size, and due to speed requirements of typical non-cryptographic hash functions, collisions are inevitable.
-
There are a number of different ways of handling collisions
- Chaining: In this case, collisions are resolved by using a second data structure (such as a linked list) to compare elements when a collision is found.
- This method “degrades gracefully,” but requires a good idea of the bucket size to do so.
- Open addressing: Entries are all stored within the hash buckets, and when a collision is found, some probing algorithm is used to find the next free bucket. When free slots run low, the buckets are resized.
- Cuckoo hashing: Multiple hash functions are used to insert into different places in the bucket. If all hash functions collide, the bucket is resized.
- Might give us better general performance than linear probing on a busy / full table
- Several other methods also exist.
- Chaining: In this case, collisions are resolved by using a second data structure (such as a linked list) to compare elements when a collision is found.
- When the buckets require resizing, every element stored in the bucket must be re-hashed to find its new place. With millions of keys, this resize operation becomes prohibitively expensive. (We basically have to double the table size when we run out of space).
- The fundamental problem with these methods is that they require some level of runtime tuning.
- For ex., for the Fastly CDN, the time required to complete this operation could cause it to pause for a significant time period — significant enough to cause it to appear to “miss” purge requests for surrogate keys.
Ruby
Python
At its essence, a Map is just an unordered collection of key-value pairs.
- Also called a Dictionary and Associative Array
- Hash Table: conceptually the same as a Hash Map, though not in practice - Java has both
HashMap
andHashtable
classes with different underlying implementations / use cases due to historical reasons (HashMap
is newer).- It is a data strucure that implements a Map. It allows for the efficient storage and retrieval of values based on associated keys.
- A hash table uses a hash function to compute an index, also called a hash code, into an array of buckets or slots, from which the desired value can be found.
- From the client code's perspective, elements are accessed using a key rather than an index number.
- Unordered (no promises regarding insertion order)
- Duplicate values are allowed, but not duplicate keys
get(k)
: queries for a key-value pair and returns the valueset(k, v)
: creates a new key-value pair or updates the value for a pre-existing keydelete(k)
: deletes a key-value pair
The underlying implementations below are similar to the ones for the Set. The main difference is that the Map implementations allow for duplicate values (but not keys) and the Set implementations do not.
- Also called a
touple
. - The array will contain several subarrays, and each subarray will contain a k,v pair.
Method | Avg. Case | Worst Case | Best Case | Notes |
---|---|---|---|---|
get |
O(n) |
O(n) |
O(1) |
|
set |
O(n) |
O(n) |
O(1) |
|
delete |
O(n) |
O(n) |
O(1) |
SPACE | O(n) |
---|
- Note: the time and space complexities are the same as for the array-based set.
- For the Hash Map implementation, the internals will basically be the same, but we will use a Doubly Linked List for our buckets instead of sub-arrays so that we can use link objects that store both a key and a value in one node together.
- We could also just use touples, but the Linked List is the classic, canonical way to implement a Hash Map.
- A hash table is a form of list where elements are accessed by a keyword rather than an index number. At least, this is how the client code will see it. Internally, it will use a slightly modified version of our hashing function in order to find the index position in which the element should be inserted. This gives us fast lookups, since we are using an index number which corresponds to the hash value of the key.
Method | Amortized | Worst Case | Notes |
---|---|---|---|
get |
O(1) |
O(n) |
Worst case is in rare case of a hash collision |
set |
O(1) |
O(n) |
Worst case is in rare case of a hash collision |
delete |
O(1) |
O(n) |
Worst case is in rare case of a hash collision |
SPACE | O(n) |
---|
- Note: the time and space complexities are the same as for the Hash Set.
- Useful when you want to store values associated with keys.
- Dictionaries provide constant time (amortized) performance for assignments and accesses, which means they are ideal for bookkeeping dynamic information.
Ruby
Python
A Linked List is a linear collection of data elements of any type, called nodes, where each node itself has a value, and points to the next node in the list.
Linked List vs Array:
While arrays have contiguously stored data, Linked Lists spread out their data across many cells across the computer's memory. These cells are nodes. In addition to the data stored within the node, each node also stores the memory address of the next node in the Linked List. These pointers are a link.
- Insertion: Unlike arrays, the insertion of new elements at the beginning of the list is very cheap (
O(1)
), since it only requires us to change one reference or pointer- No random access and no access to len (number of elements in list)
- Searching: for an element is the same as an array (
O(n)
)
Head
: first nodeTail
: last node- Points to
nil
/None
- Points to
- Each node has a value
- Each node has a pointer to the next node in the list
- Sequential/Ordered (i.e. consistent element ordering based on collection population)
- Duplicates permitted
- No indexing (meaning no random access)
include?(key)
: returns boolean indicating whether the node is in the Listappend(key, value)
: appends a nodeprepend(key, value)
: prepends a noderemove(key)
: removes a nodeempty?
: returns boolean indicating whether the List is empty[](index)
: returns the node at a specified index or nil if the node is not in the Listget(key)
: returns the value of the node at a specified index or nil if the node is not in the Listupdate(key, value)
: updates the value for a node and returns that node or nil if the node is not in the Listfirst
: returns the first nodelast
: returns the last node
- Singly-Linked List with augmentations: add a
previous
attribute for the nodes and keep track of thetail
- This implementation applies to both Linked List types mentioned above.
- Implemented using a
Node
class.- A Linked List is a
node-based data structure
.- Trees also fall into this category.
- A Linked List is a
- The time and space complexities below refer to this implementation.
- Note that the time and space complexities for the Singly-Linked List and Doubly-Linked List will be the same asymptotically, so the following complexities are for both types.
- Time Complexity
Method | Avg. Case | Worst Case | Best Case | Notes |
---|---|---|---|---|
include? |
O(n) |
O(n) |
O(1) |
|
append |
O(1) |
O(1) |
O(1) |
Assuming access to tail |
prepend |
O(1) |
O(1) |
O(1) |
Assuming access to head |
remove |
O(n) |
O(n) |
O(1) |
Deletion is more efficient in a Doubly-Linked List than Singly because we have a previous pointer, so we can easily access the node before the node being deleted to change its next pointer |
empty? |
O(1) |
O(1) |
O(1) |
Only have to check if the head's next pointer is the tail |
[] |
O(n) |
O(n) |
O(1) |
|
get |
O(n) |
O(n) |
O(1) |
|
update |
O(n) |
O(n) |
O(1) |
|
first |
O(1) |
O(1) |
O(1) |
|
last |
O(1) |
O(1) |
O(1) |
Assuming access to tail |
-
Note: insertion / deletion anywhere other than the beginning or end would only be
O(n)
(assuming you don't have a pointer / reference to the node) because it has to search the LinkedList for that node. But, that's not what we often use the Linked List for. The LinkedList is best for adding and removing elements sequentially.- So insertion and deletion are summarized as
O(1)
assuming the LinkedList is used in its optimal way (e.g., LRU cache)
- So insertion and deletion are summarized as
-
Space Complexity:
O(n)
- Space complexity for a Doubly-Linked List is more than a Singly-Linked List because were storing an extra link to a node, but it doesn't change it asymptotically (still linear).
- Circular
- Cycle: when a node points back to a previous node. Begins at the last node of the linked list
- The advantage of the Linked List is that the values are stable: they don't correspond to indices, so you never need to re-index.
- If you are building an
LRU Cache
the Linked List is essential as an auxiliary structure for its quick insertion and deletion timeO(1)
. You never need to index because always delete from the beginning or end. The Hash Map on its own would be slower:O(n)
because it would have to iterate to find the key with the most recent timestamp value since it is unordered. With an array as the auxiliary structure these operations would still takeO(n)
time because you would be stuck re-indexing. - Also, unlike lists, Linked Lists don't preallocate memory
- If you are building an
- Useful in general if you are deleting many items: it will be faster than with an array, but doesn't make a difference in the time complexity asymptotically.
Ruby
Python
A Stack stores data in the same way that arrays do - it is simply a list of elements - but stacks have added on constraints.
- LIFO
- Data can only be inserted at the end of a stack
- Data can only be removed from the end of a stack
- Data can only be read from the end of a stack
- Sequential/Ordered (i.e. consistent element ordering based on collection population)
- Duplicates permitted
- Addition and removal of a single item is an
O(1)
algorithm
push(el)
: adds an element to the top of the Stackpop
: removes the top element in the Stack and returns itpeek
: returns the top element in the Stackempty
: check whether the stack is empty
1) Array
- Time Complexity
Method | Worst Case | Notes |
---|---|---|
push |
O(1) |
Array#push is O(1) time |
pop |
O(1) |
Array#pop is O(1) time |
peek |
O(1) |
Array#last is O(1) time |
empty |
O(1) |
- Space Complexity:
O(n)
2) Linked List
-
A Stack can be implemented as a Doubly Linked List by just enforcing constraints on it that only allow insertion, removal, and peeking at the tail.
-
Initialize and track a
self.size
variable so thatempty()
remainsO(1)
-
Analysis:
- Implementing a Stack with an array or Linked List will have the same space and time complexities asymptotically.
- Recursion (stack frame)
- The 'back' button of our web browser.
- Web browsers, often use two stacks to move backward and forward through your browsing history.
- Syntax checking; matching opening and closing parentheses (e.g., in regex, math equations, compilers etc.) to be specific.
- Backtracking algorithms
- Ideal as a temporary container for temporary data
A monotonic stack is a stack whose elements are monotonically increasing or decreasing. It contains all qualities that a typical stack has.
Ruby
Python
Like a Stack, a Queue handles temporary data elegantly and is also essentially an array with restrictions.
- Stacks and Queues can be effectively implemented by dynamic arrays or linked lists. However, unlike for stacks, a simple underlying array won't cut it. The implementation would involve using 2 arrays implemented as stacks (
inbound_stack
andoutbound_stack
)- Queues are harder to implement and appropriate when order is important.
- FIFO
- Data can only be inserted at the end of a Queue (same as Stack)
- Data can only be read from the front of a Queue (opposite behavior of Stack)
- Data can only be removed from the front of a Queue (opposite behavior of Stack)
- Sequential/Ordered (i.e. consistent element ordering based on collection population)
- Duplicates permitted
- Unlike a stack, tracking the tail in a queue is not optional
enqueue(el)
: adds an element to the back of the Queuedequeue
: removes the element at the front of the Queue and returns itfirst
: returns the element at the front of the Queueempty
: return true if the Queue does not contain any elements.
1) Array
- Naive implementation
- Note that it is your choice which ends of the array will be the "front" and "back".
- This will change which array methods you use in your implementation.
- Time complexity
Method | Worst Case | Best case | Notes |
---|---|---|---|
enqueue |
O(1) |
O(1) |
Array#push has O(1) runtime |
dequeue |
O(n) |
O(n) |
Array#shift has O(n) runtime |
first |
O(1) |
O(1) |
|
empty |
O(1) |
O(1) |
- Analysis
- Could do better: is there a way to implement a Queue using an array in constant time?
- Yes, implement it using 2 stacks. (Amortized constant time)
- See
NaiveArrayQueue
vsBetterArrayQueue
in the code files linked at the end of this section.
- Could still do better with Doubly-Linked List.
- Could do better: is there a way to implement a Queue using an array in constant time?
2) Doubly Linked List
- A Queue can be implemented as a Doubly-Linked List by just enforcing constraints on it that only allow you to add to the back, delete from the front and peek at the first element in the front of the Queue.
Method | Worst Case | Best case | Notes |
---|---|---|---|
enqueue |
O(1) |
O(1) |
|
dequeue |
O(1) |
O(1) |
|
first |
O(1) |
O(1) |
|
empty |
O(1) |
O(1) |
- Analysis
- Doubly-Linked List is the superior implementation - pure constant runtime.
- Printing queues
- Handling asynchronous requests - queues ensure that requests are processed in the order in which they are received
- Operating systems use queues to handle requests to write data to a hard disk drive, stream audio and video, and send and receive network packets
- Web servers, on the other hand, use queues to handle incoming requests
- Whenever you see a “buffering” message, it means the software system you are using is probably waiting for incoming data to add to its queue for processing.
- Internally, a list is represented as an array; the largest costs come from growing beyond the current allocation size (because everything must move), or from inserting or deleting somewhere near the beginning (because everything after that must move). If you need to add/remove at both ends, consider using a
collections.deque
instead.- A deque (double-ended queue) is represented internally as a doubly linked list. (Well, a list of arrays rather than objects, for greater efficiency.) Both ends are accessible, but even looking at the middle is slow, and adding to or removing from the middle is slower still.
Ruby
Python
A Deque (not to be confused with the "dequeue" operation in queues), or double-ended queue, is a data structure that is closely related to queues. Deques allow us to add and remove items at both sides of a queue.
It is usually pronounced “deck” to avoid confusion with the dequeue method of the regular queue ADT, which is pronounced like the abbreviation “D.Q.”
Method | Worst Case | Best case | Notes |
---|---|---|---|
add_first |
O(1) |
O(1) |
|
add_last |
O(1) |
O(1) |
|
delete_first |
O(1) |
O(1) |
|
delete_last |
O(1) |
O(1) |
|
first |
O(1) |
O(1) |
|
last |
O(1) |
O(1) |
|
empty |
O(1) |
O(1) |
1) Doubly Linked List
- Builds on the Doubly Linked List Queue implementation to add additional methods.
The deque abstract data type is more general than both the stack and the queue ADTs. The extra generality can be useful in some applications. For example, a restaurant using a queue to maintain a waitlist. Occassionally, the first person might be removed from the queue only to find that a table was not available. Typically, the restaurant will re-insert the person at the first position in the queue.
Python
A priority queue is somewhat similar to a queue, with an important distinction: each item is added to a priority queue with a priority level, and will be later removed from the queue with the highest priority element first. That is, the items are (conceptually) stored in the queue in priority order instead of in insertion order.
- Must support at least two operations:
- Insertion: An element is added to the queue with a priority (a numeric value).
- Top item removal: Deletes the element or one of the elements with the current top priority and return it.
A circular queue, also called a ring buffer, is linear data structure in which the operations are performed based on FIFO principle, and the last position is connected back to the first position to make a circle.
Circular queue makes it possible to use a fixed array storage, as the data structure to efficiently implement a queue,
such that the enqueue()
and dequeue()
operations
have O(1)
performance...
enqueue()
and dequeue()
operations
have O(1)
performance......This approach is known as a circular queue, which makes the novel suggestion that the first value in the array isn’t always storage[0]
. Instead, keep track of first, the index for the oldest value in the queue, and last, the index where the next enqueued value will be placed
- In a circular queue, we use a fixed array and two pointers,
head
andtail
.head
indicates the start position of the queue whiletail
indicates the ending position of the queue. - The first value in the array isn’t always at index 0.
- Standard array-based queue structure modified with a modulo index system in order to reuse the cleared space at the beginning of the queue without the need to reallocate space with
push
andshift
operations.
Method | Worst Case | Best case | Notes |
---|---|---|---|
enqueue |
O(1) |
O(1) |
If the array is fixed. |
dequeue |
O(1) |
O(1) |
If the array is fixed. By keeping a variable to track the head index, can be performed in O(1) time. |
A Tree data structure is used to store hierarchical data (it is non-linear). It contains nodes where each node has a parent-child relationship.
- Root: top-level node
- Leaf (or Terminal, External) node: a node with no children
- Internal node: a node with at least 1 child
- Parent (or superior) node: has a child
- child only has 1 parent, but possibly many ancestors, such as the parent's parent
- Siblings: child nodes with the same parent (same level doesn't necessarily mean siblings)
- Neighbor: parent or child
- Subtree: consists of a node and all of its descendants
- Size of a tree: the number of nodes it has
- Path: if
n_1, n_2, ..., n_k
is a sequence of nodes in the tree such thatn_i
is the parent ofn_i + 1
for1 <= i < k
, then this sequence is called a path fromn_1
ton_k
- The length of the path is
k - 1
- The length of a path is the number of edges
- The length of the path is
- Edge (or Link, Line): connection between one node to another. (two vertices).
- All nodes can be reached from root by following these
- Traversal of a tree refers to visiting or performing an operation, at each node.
- Searching a tree refers to traversing all the nodes of a tree where each node is visited only once.
Depth and height are properties of a node:
The depth of a node is the number of edges on the path from the root of the tree to the node
- 0 indexed - the root is the only node at level 0, and its depth is 0
The height of a node is the number of edges on the longest path from the node to a leaf
- 0 indexed - a leaf node will have a height of 0
- Conventionally, an empty tree (tree with no nodes, if such are allowed) has height −1
Degree of a node is the number of children it has
- A leaf has degree 0.
The height of a tree is the depth of its deepest node
- Or equivalently, the height of its root node
- A tree with only a single node (hence both a root and leaf) has depth and height zero.
Level: All nodes of depth
d
are at leveld
in the tree
- 0 indexed - since levels are 0 indexed, the below tree whose last level is 3 has 4 total levels
- The below tree with 4 total levels also has a height of 3 - a tree's last level is represented by
h
, its heightThe diameter (or width) of a tree is the number of nodes on the longest path between any two leaf nodes.
- The tree below has a diameter of 6 nodes.
Degree or (order) of a tree: the maximum degree of a node in the tree
- The degree of a binary tree is 2
- Directional (root to leaves) - Any node can have either zero, one, or two children
- Each child node must have a parent and only one parent
add_child(node)
: adds a child node to a parentremove_child(node)
: removes a child node from a parentcount
: returns the count of nodes starting from the passed in node and including all of its descendants
- When creating a tree object, you can keep all the business logic in a
Node
class.- You just create a root node and build on it: once you have the root you have the whole tree.
- The tree is uni-directional, so you can only build children off of the root.
- And the tree itself is recursive: the BFS and DFS algorithms apply the same way to the root as they do to any child node.
These sub-types are determined by the maximum number of children, not the number of children at any particular node.
The implementation above is generalized to apply to all types of the tree data structure. For specific implementations, augment to include the below restrictions for each type.
A binary tree is a tree data structure in which each node has at most two children, which are referred to as the left child and the right child.
- One of the most typical tree structure
- Any node can have either zero, one, or two children.
- No notion of order
- Note: time complexities assume you are calling these methods on the root node
Avg. Case | Worst Case | Best Case | Notes | |
---|---|---|---|---|
add_child |
O(1) |
O(1) |
O(1) |
|
remove_child |
O(n) |
O(n) |
O(n) |
|
count |
O(n) |
O(n) |
O(1) |
SPACE | O(n) |
---|
Balanced (or height-balanced): The left and right subtrees of every node differ in height by no more than 1. Otherwise, the binary tree is unbalanced.
Full (or Proper, Plane): All nodes except leaf nodes have 2 children.
Complete:
- All nodes except for the level before the last must have 2 children.
- All nodes in the last level are as far left as possible.
Perfect: All interior nodes have two children and all leaves have the same depth.
Left or Right Skewed (or Pathological, degenerate): Each node has 1 child node only.
- Summary:
- Complete trees: must be balanced; can be full
- Full trees: can be balanced; can be complete
- Balanced trees: can be complete; can be full
- Perfect trees: must be balanced, complete, and full
- Balanced, complete, and full trees can be perfect
Examples:
Full, balanced & complete - All nodes have 0 or 2 children, 3 - 2 <= 1
, last level nodes are as far left as possible:
1 --- LEVEL 0
/ \
1 1 --- LEVEL 1
/\ /\
1 1 1 1 --- LEVEL 2
/\ - - -
1 1 --- LEVEL 3
- -
Full & balanced - All nodes have 0 or 2 children, 3 - 2 <= 1
, (Not complete - last level nodes are not as far left as possible)
1 --- LEVEL 0
/ \
1 1 --- LEVEL 1
/\ /\
1 1 1 1 --- LEVEL 2
- /\ - -
1 1 --- LEVEL 3
x x - -
Full - All nodes have 0 or 2 children (Unbalanced - 3 - 1 > 1
, Not complete - level 2 has a node with 0 children):
1 --- LEVEL 0
/ \
1 1 --- LEVEL 1
/ \ -
1 1 --- LEVEL 2
/ \ - x x
1 1 --- LEVEL 3
- -
Complete & balanced - Last level nodes are as far left as possible, 3 - 3 <= 1
(Not full - there is a level 2 node with 1 child):
1 --- LEVEL 0
/ \
1 1 --- LEVEL 1
/\ /\
1 1 1 1 --- LEVEL 2
/\ /\ /\ /x
1 1 1 11 1 1 --- LEVEL 3
- - - -- - -
Balanced - 3 - 3 <= 1
, (Not full - there is a level 2 node with 1 child, Not complete - last level nodes are not as far left as possible)
1 --- LEVEL 0
/ \
1 1 --- LEVEL 1
/\ /\
1 1 1 1 --- LEVEL 2
/\ /\ /x /\
1 11 11 1 1 --- LEVEL 3
- -- -- x - -
A binary search tree (BST) is an ordered or sorted binary tree.
- Search tree: an ordered tree data structure used to store a dynamic set or associative array where the keys are usually strings
- An extension of the Binary Tree with the addition of a restriction:
- The child nodes are stored in a specific order: the left subtree of a node only contains values less than itself and the right subtree only contains values greater than it.
graph searchtree {
size=6;
5 -- 2;
5 -- 7;
2 -- 1;
2 -- 3;
7 -- 6;
7 -- 9;
9 -- 8;
9 -- 13;
}
- Key operations, namely
#find
,#insert
, and#delete
, should run fast on this data structure.
- Note: time complexities assume you are calling these methods on the root node.
Time Avg. | Time Worst | Time Best | Notes | |
---|---|---|---|---|
INSERTION | O(logn) |
O(n) |
O(1) |
Refer to this method by the Avg. Case: you can only add from the root so it has to find the correct node to add to either from the left or right. O(logn) where n is the height of the tree. Worst Case: the tree is one-sided (most extreme case of unbalanced - at this point it's pretty much just a linked list) so you can't take advantage of the logarithmic property of this tree type. |
DELETION | O(logn) |
O(n) |
O(1) |
|
SEARCH | O(logn) |
O(n) |
O(1) |
|
TRAVERSAL | O(n) |
O(n) |
O(n) |
It's often the case that we want to get out all the values held in a binary search tree. And, it's often the case that we want these results to be sorted -- after all, that's one of the main benefits of creating a BST in the first place. To do this, we perform an in-order traversal of the tree. |
SPACE | O(n) |
---|
- Used for searching: the structure of the BST makes looking for the node with the maximum and minimum values very easy.
There is an auto-balancing, or self-balancing, BST called AVL tree where every node also stores the number of children to its left and right and it uses that as a sort of balancing metric. When you insert, you can get it down to the worst case being
O(logn)
, and so you can actually get your inserts to maintain a balanced tree inlog(n)
time every time
Time Avg. | Time Worst | Time Best | Notes | |
---|---|---|---|---|
INSERTION | O(logn) |
O(logn) |
O(1) |
|
DELETION | O(logn) |
O(1) |
||
SEARCH | O(logn) |
O(1) |
||
TRAVERSAL | O(n) |
O(n) |
O(n) |
SPACE | O(n) |
---|
Red black tree is a height-balanced bst data structure which slightly relief on its height balance restriction by introducing red and black color as a balance factor. Unlike AVL tree which strictly require its balance factor(height) should keep the same in each node, red black tree require 5 properties and provide less rotation in both insertion and deletion operations. Below list 5 properties and inspections about this data structure.
It guarantee each insertion at most need 2 rotations and 3 rotations for each deletion.
5 properties are:
- A node must be colored as RED or BLACK.
- Root must be BLACK color.
- Null node must be colored as BLACK.
- There has no two consecutive RED nodes connected. Which means if a node is RED, then its father and children should be BLACK. If a node is RED, its left or right node must BOTH be BLACK.
- Any node has the same BLACK count from itself to the every leaf node.
Due to above properties, red black tree guarantee the height would be at most 2*log(n+1)
, you can prove it by mathematical induction.
Time Avg. | Time Worst | Time Best | Notes | |
---|---|---|---|---|
INSERTION | O(logn) |
O(logn) |
O(logn) |
|
DELETION | O(logn) |
O(logn) |
O(logn) |
|
SEARCH | O(logn) |
O(logn) |
O(logn) |
|
TRAVERSAL | O(n) |
O(n) |
O(n) |
SPACE | O(n) |
---|
- A kind of search tree.
- Also called
digital tree
orprefix tree
. - Unlike a binary search tree, no node in the tree stores the key associated with that node; instead, its position in the tree defines the key with which it is associated.
- All the descendants of a node have a common prefix of the string associated with that node, and the root is associated with the empty string.
- Keys tend to be associated with leaves, though some inner nodes may correspond to keys of interest. Hence, keys are not necessarily associated with every node.
- Also called a
radix trie
orcompact prefix tree
. - Represents a space-optimized trie in which each node that is the only child is merged with its parent.
- A condensed radix trie, generally yielding better performance due to the reduced number of pointers involved in maintaining nodes of the tree (and they’re significantly more compact than trie implementations requiring more pointers).
- Binary Search Tree: maintains order and has fast search, insertion and deletion.
- Used on databases to perform quick searches (e.g., indexing in Rails used a sorted tree to make lookup time go from linear to logarithmic)
- The others are useful for storing data:
- Operating Systems use a tree structure to store files.
- HTML
DOM
uses a tree data structure to represent the hierarchy of elements. - Expression trees: The tree structure is also used to parse arithmetic and Boolean expressions.
- Crit-bit Tree: used by the Fastly CDN for its surrogate key functionality.
A tree data structure in which each node has at most three children
- Any node can have either zero, one, or two children.
- No notion of order
A tree data structure in which each node has at most one child
- Any node can have either zero or one children.
- No notion of order
A tree data structure in which each node can have an arbitrary number of children
- Any node can have from 0 to
n-1
children. - No notion of order
- Note: complexities for the Poly Tree and Binary Tree are the same.
- Note: Time complexities assume you are calling these methods on the root node.
Avg. Case | Worst Case | Best Case | Notes | |
---|---|---|---|---|
add_child |
O(1) |
O(1) |
O(1) |
|
remove_child |
O(n) |
O(n) |
O(n) |
|
count |
O(n) |
O(n) |
O(1) |
SPACE | O(n) |
---|
Ruby
Python
A graph is a set of vertices and edges that form connections between the vertices.
- Node or vertex: A point, usually represented by a dot in a graph
- Edge: This is a connection between two vertices
- There are two well-known implementations of a graph, the adjacency matrix and the adjacency list.
- Graphs can be classified based on whether they are undirected or directed.
- An undirected graph simply represents edges as lines between the nodes. There is no additional information about the relationship between the nodes than the fact that they are connected
- In a directed graph, the edges provide orientation in addition to connecting nodes. That is, the edges, which will be drawn as lines with an arrow, will point in which direction the edge connects the two nodes
- weighted graphs associate edges with a numerical value indicating some extra information