Most of this notes are from the book:
C++ Data structures and algorithm design Principles. Author: John Carey, Shreyans Doshi and Payas Rajan
The code here is very similar with the solutions proposed from the book.GitHub
The main differeces:
- Use of smart pointer
- Google Style
- Use of headers files
Perhaps the mos important principle for the good algorithm designer is to refuse to be content. — Aho, Hopcroft, and Ullman
NO.
Time complexity is commonly estimated by counting the number of elementray operations performed by the algorithm, supposing that each elementary operation takes a fixed amount of time to perform.
Since an algorithm’s time may vary amont different inputs of the same size, one commonly considers the worst-case time complexity. Therefore, the time complexity is commonlu expressed unsing big O notation O(n), where n is the input size.
Time Name | Complexity Class | Running Time | Examples algorithms |
---|---|---|---|
constant | O(1) | find median value in a sorted array | |
inverse Ackermann | O(a(n)) | Amortized time per operation using disjoint set | |
Iterated logarithmic | O(log* n) | Distributed coloring of cycles | |
Log logarithmic | O(log log n) | Amortized tiem per opration using a bounded priority queue | |
logarithmic | DLOGTIME | O(log n) | Binary search |
Polylogarithmic | poly(log n) | ||
Fractional power | O(n^c) | seaching in a Kd-tree | |
Linear | O(n) | Finding the smallest or larges item in a unsorted array | |
n log-start n | O(n log* n) | Seidel polygon triangulation algorithm | |
linearithmic | O(n log n) | Fastes possible comparison sort; FFT | |
quasilinear | n poly(log n) | ||
Quadratic | O(n^2) | Bubble sort; Insertion sort; Direct Convolution | |
Cubic | O(n^3) | Naive mulplcation of two nxn matrices’ calculation partial correlation | |
Polynomial | P | 2^(O(log n)) | karmarkar’s algorithm for linear programming` |
Quasipolynomial | QP | 2^(poly(log n)) | Bestknown approximantion algorithm for the steiner tree problem |
sub exponetial | SUBEXP | O(2^n^E) | Constais BPP |
sub exponetial 2Def. | 2^O(n) | Best-known algorithm for integer factorization | |
Exponetial | E | 2^o(n) | Solving traveling salesman problem using Dynamic programming |
Exponetial | EXPTIME | 2^poly(n) | Solving matrix chain multiplication via bruteforce search |
Factorial | O(n!) | Solving traveling salesman problem via brute force | |
Double exponetial | 2-EXPTIME | 2^2^(poly(N)) | Deciding the truth of given statement in presburger arithmetic |
The choice of the right structure for holding data, also known as a data structure, is crucial for ensuring reliability, performance, and enabling the required functionalities in the application.
Time complexity indicates the relative amount of time required, in proportion to the size of the data, to perform a certain operation.
Store all the elements in a single chunk of memory.
To access any element at index i, we can get it with the generic formula: BA + i * (sizeof(type)). In this case, the access time is always constant. O(1).
The two main typoes of arrays are:
- Static: has a lifetime only inside its declaration block.
- Dynamic: provides better flexibility since the programmer can determine when it should be allocated and whe it should be deallocated.
// Static array
int arr[size];
// Dynamic array
int* arr = new int[size];
//or
std::unique_ptr<int> arr = std::make_unique<int>();
std::shared_ptr<int> arr = std::make_shared<int>();
A static array is aggregated, which means that it is allocated on the stack, and hence gets deallocated when the flow goes out of the function. On the other hand, a dynamic array is allocated on a heap and stays there until the memory is freed manually. (not true for smart pointers) Since all the elements are present next to each other, when one of the elements is acessed, a few elements next to it are also brought into the cache. Hence, if you want to access those elements, it is a very fast operation as the data is already present in the cache. This property is alo known as cache locality.
hold the data in multiple chunks of memory, also known as nodes, which may be placed at different places in the memory.
Each node contains the data to be stored in that node and a ponter to the next node. The last node contains a NULL pointer to indicate the end of the list. To reach any element, we must start from the beginning of the linked list, that is, the head, and follow the next pointer until we reach the intended element. To access the element present at index i, we need to traverse through the linked list and iterate i times. O(n).
To insert or delete any element the operation is really small and quite fast for a linked list compared to arrays.
A linked list can’t provide cache locality at all since the elements are not stored contiguously in memory.
Contiguous Data Structures | Linked Data Structures | |
---|---|---|
All the data is stored nex to one another in memory | Data is stored in nodes, which may be scattered across the memory | |
Accessing random element is immediate due to contiguous data | Accessing random elements is linear and slower | |
Since data is stored contiguous, traversal is faster due to cache locality | traversal is a bit slower as there is no cache locality | |
No memory overhead on top of the memory required for storing the elements themselves | Extra memory is required to sotre pointers in each node | |
Parameter | Array | Linked List |
Random access | O(1) | O(n) |
Insertion at end | O(1) | O(1) |
Insertion in the middle | O(n) | O(1) |
Cache locality | Yes | No |
Drawbacks of std::array:
- The size of std::array must be constant and provided at compile time, and fixed. So, we can’t change it at runtime.
- Due to size limitation, we can’t insert or remove elements from the array.
- No custom allocation is possible for std::array. It always uses stack memory
std::vector does not require us to provide its length during initialization.
We can insert elements using:
- push_back it’ll insert elements at the end. If there’s enough space, push_back only takes O(1) time to insert something at the back. If there’s not enough space, it will have to copy/move all the elements, which will take O(n) time.
- insert it takes the iterator as the first parameter for the position, and it can be used to insert the element in any location. insert need to shift the elements that come after the given iterator, So it takes O(n) time.
push_back and insert, they first construct the element, and then either copy or move the element to its new location inside the vector’s buffer. emplace_back and emplace, they forward the arguments to the constructor at the appropriate location.
We can remove elements using:
- pop_back it removes the last element from the vector, effectively reducing the size by one. O(1)
- erase it can remove a single element provided by the iterator pointing to it, and to remove a range of elements provided by the iterator, where the range is defined by the element to be removed (inclusive) and the last element to be removed (exclusive). O(n)
Insertion and deletion in the middle of the data strucures are very inefficient operation for contiguous data structures. The purpose of std::forward_list is to provide some additional functionality without compromising performance compared to a basic linked list.
We can insert elements using:
- push_fron it’ll insert a element at the front. Since forward_list doesn’t have direct acess to the last element. O(1)
- insert_after Insert an new elemeent in a linked list requires updating the next pointer of the element, since traversing backward is not allowed in forward_list. O(1)
forward_list also provides emplace_front and emplace_after, which is similar to emplace to a vector.
We can remove elements using:
- pop_front it rmeoves the first element. O(1)
- erase_after it can remove a single element, taking an iterator to its previos element. it can remove multiple elements in a range, taking an iterator to the element before the first eleemnt of the range and another iterator to the las element. The time complexity is linear to the number of elements that are erased because the deletion of elements can’t be done via deallocating just a single chunk of memory. the function needs to deallocate each of them separately.
functions to remove elements based on their values.
- remove It removes all the elements that match the given element based on the equality operator defined for the type of the value. O(n * logn)
- remove_if it takes a predicate as a parameter, which is a function taking an element of the value type as a parameter, and a Boolean as the return value. O(n)
There is no operator[] int the *forward_list. There are other utility functions that we can use, such:
- advance
- next
- prev
it’s a bidirectional linked list, also known as a doubly-linked list. it’s one extra pointer to point to the previous element. Thus, it provides us with a way in which to traverse backward. We can also store the size and the last element to support fast push_back and size operation.
although push_front, insert, pop_front and erase have the same time complexity as equivalent function for std::forward_list, these are slightly more expensive for std::list.
As we can see, the number of operation is constant even in the case of std::list; however, compared to forward_list, we have to fix both the Prev and next pointers
Unlike vector, linked list-based iterators ares safer for insertion and deletion operation because the element will not be shifted or moved.
std::vector<int> vec = {1,2,3,4,5};
auto it4 = vec.begin() + 4;
// it4 now points to vec[4]
vec.insert(vec.begin() + 2,0);
// vec becomes {1,2,0,3,4,5}
it4 is invalid now, since it comes after the insertion position. Acessing it will lead to undefined behavior.
std::list<int> lst = {1,2,3,4,5};
auto l_it4 = next(lst.begin(), 4);
lst.insert(next(vec.begin(), 2),0);
// l_it4 remains valid
A lot of operation, such as size, push_back, and pop_back, are provided, which operate with a time complexity of O(1).
it’s short for double ended queue. try to overcome the costly push_front and pop_front
The STL guarantees the following time complexities for different operation of deque:
- O(1) push_front pop_front push_back pop_back
- O(1) random access to all the elements
- Maximum of N/2 steps in the case of insertion or deletion in the middle
Follow LIFO Last In First Out.
Provides:
- empty
- size
- top call back
- push call *push*_back
- pop call *pop_*back
- emplace
Follow FIFO First In First Out.
- push means push_back
- pop means pop_front
Provides a heap, fast access to the minimum or maximum element from the container with O(1).
Insert has O(log n), while delection can only be performd for the min/max element, which always stays on the top.
- Problems solved with trees and graphs
- B-trees
- Huffman Tree
- graph coloring
- assignemnet problems
- minimum distance
This data is inherently hierarchical in nature. This type of data is difficult to manage using simple arrays, vectors or linked lists.
This problem can solved using a data structure called a tree. All of the objects are known as the nodes of a tree, while the paths leading from one node to another are known as edges.
This structure is called a graph. Such structures commonly used by varioys social networks to represents theris users and the connections between them.
The main node, which is not dependent on any other node, is also known as root node and is usually representeded at the top.
we’ll assume that any person can have, at most, two subordinates.
struct Node {
std::string position;
std::shared_ptr<Node> left;
std::shared_ptr<Node> right;
explicit Node(const std::string& pos):position(pos), left(nullptr), right(nullptr) {}
};
The Tree need to have a root and to make easier to create a tree we add a static fuction to create on
class Tree {
private:
using node_ptr = std::shared_ptr<Node>;
node_ptr root;
static Tree create_tree(std::string pos) {
Tree tree;
tree.insert("", pos);
return tree;
}
To add a subordinate of an employeem, you need a function that will us find this particular node based on a value.
Two functions was declared, the first one start looking in the root node. The second one look recursively until find a nullptr or the desirable value.
node_ptr find_helper(node_ptr node, std::string pos) {
if (node == nullptr)
return nullptr;
if (node->position == pos)
return node;
auto left_found = this->find_helper(node->left, pos);
if (left_found)
return left_found;
return this->find_helper(node->right, pos);
}
node_ptr find(const std::string& pos) {
return find_helper(this->root, pos);
}
To add a new function. First we make sure that the root has a value This part of the code is used in the Tree::create_tree
bool insert(const std::string& parent_pos, const std::string& pos) {
if (root == nullptr) {
root = node_ptr(new Node(pos));
return true;
After that we find the parent_node and look in the two branches
} else {
auto parent_node = this->find(parent_pos);
if (!parent_node) {
std::cout << parent_pos << " not in the tree" << std::endl;
return false;
}
if (parent_node->left && parent_node->right) {
std::cout << parent_pos << "already has 2 subordinates" << std::endl;
return false;
} else {
if (!parent_node->left)
parent_node->left = node_ptr(new Node(pos));
else
parent_node->right = node_ptr(new Node(pos));
}
}
return true;
}
};
in the main function we created a function to make the insert easier
void try_to_add(Tree t, const std::string& f, const std::string& s) {
if (t.insert(f, s))
std::cout << "Added " << s << " in the tree" << std::endl;
else
std::cout << "Couldn't add " << s << " in the tree" << std::endl;
}
int main() {
Tree tree = Tree::create_tree("CEO");
tree.insert("CEO", "Deputy Director");
try_to_add(tree, "Deputy Director", "IT Head");
try_to_add(tree, "Deputy Director", "Marketing Head");
try_to_add(tree, "IT Head", "Security Head");
- Preorder Traversal
The prefix Pre indicates that the parent node is visited before its children.
Traversing the tree above in this method goes like this:
CEO, Deputy Director, IT Division, Marketing Head, Security Head, App Development, Logistics Head, Public Relations Head.
- In-order traversal
we visit the more left and below node, the the parent node, and finally the
right node.
Security Head, IT Division, App Development, Deputy Director, Logistics Head, Marketing Head, Public Relations Head, CEO.
- Post-order traversal
We visit both the children followed by the parent node.
Security Head, App Development, IT Division, Logistics Head, Public Relations Head, Marketing Head, Deputy Director, CEO.
- Level order traversal
we visit each level of the tree, from top to bottom, and from left to right.
CEO, Deputy Director, IT Division, Marketing Head, Security Head, App Development, Logistics Head, Public Relations Head.
*
In order to optimize the time complextity, we need to optimize the height of the tree. With a Balanced Search Tree the height is guaranteed to be no larger than O(2lg N).
A 2-3 search treee is a tree that either a node is empty or:
- A 2-node, with one key and two links, a left link with smaller keys, and a right link with larger keys (common node)
- A 3-node, with two keys and three links a left link with smaller keys, a
middle link with keys between the node’s and righ link with larger keys.
Search and insert operation in a 203 tre with N keys are guaranteed to bisit at most lg N nodes. However, most of the implementetation is inconvenient.
Introduction to Algorithm The idea behind red-black BST’s is to encode 2-3 trees bt starting with standard BST and adding extra informatio to encode 3-nodes. We think of the links as being of two different types: red links, which bind together two 2nodes to represent 3-nodes, and black links, which bind together 2-3 trees.
For most of scenarios, just going through all the elemenst linearly and matching the values would be extremely time-cosuming, especially considering the vast amount of records that are stored. it will have a time complexity of O(n).
The idea is to represent each value with a possibly unique key and, later on, use the same key to check for the presence of the key or to retrive a corresponding value.
we use a hash function to transoform any value of any data type to an integer in the desired range. This fuction can be simple a modulo function, as we’ll demonstrate now.
#include <vector>
using uint = unsigned int;
class hash_map {
public:
explicit hash_map(size_t n) {
data = std::vector<int>(n, -1);
}
As show here, we’re using -1 to indicate the absence of an element. N is the size o the table and it’ll be used in the module function.
void insert(uint value) {
int n = data.size();
data[value % n] = value;
std::cout << "Inserted" << value << std::endl;
}
We can’t insert two elements with the same hash value, we must to drop one of them. This is called a collision.
We a collision when multiple keys had the same hash value.
In this method instead of storing a single key in the hash table, we’ll store a linked list for each index.
private:
std::vector<std::list<int>> data;
};
void insert(uint value) {
int n = data.size();
data[value % n].push_back(value);
std::cout << "Inserted" << value << std::endl;
}
bool find(uint value) {
int n = data.size();
auto& entries = data[value % n];
return std::find(entries.begin(), entries.end(), value) != entries.end();
}
void erase(uint value) {
int n = data.size();
auto& entries = data[value % n];
auto iter = std::find(entries.begin(), entries.end(), value);
if (iter != entries.end()) {
entries.erase(iter);
std::cout << "Removed" << value << std::endl;
}
If the has table is very small compared to the number of keys to be stored, there will be a lot of collision, and the list will belonger on average. On the oher hand, if we keep a very big hash table, we may end up having very sparse data and end up wasting memory.
The load factor indicates the average number of keys present per list in our hash table. It can be computed using the following formula:
\begin{equation} load\ factor = \frac{number\ of\ keys}{size\ of\ hash\ table} \end{equation}
- if load factor == 1 This is an ideal scenario, this has O(1) for all the operations and the space will be utilized properly.
- if load factor < 1 We are not storing one key per list and wasting some space.
- if load factor > 1 We are storing more than one key per hash. and out find and removal function will be a bit slower on average.
We store all elements inside the hash table instead of chaining the elements. To accommodate all the elements, the size of the has table must be greater that the number of elements. The idea is probe if a cell correspoing to aparticilar hash value is already occupied.
If there is a collision at a particular has value, we simply look at the subsequent has value for an empty cell and insert our element once we find room for it.
if the data is clustered, there is, the data is distributed in such a way that there are some groups around which the frequency of value is very high. Most of the keys will be probed to some consecutively after that, and it will slow down our searching.
if there is a collision at a particula hash value, we go to the position hash(x + 12), and then hash(x + 22), and so on.
We keep two (it’s can have more than two) hash tables of the same size, each with their own unique hash fuction. Any element can be present in either of the hash tables, and its position is based on the correspoing hash function.
- Any element can be present in any of the two hash tables.
- Any element can be moved to another location in the future, even after insertion.
For lookup, O(1).
However, An insertin function, first checks whether it is possible to insert the new element in the first hash table. If so, inserts the elements there. but if that position is occupied by a preexisting element, we move this to the next table. And so on, until we are able to find empty slots for all elements
This could end up in a cycle and the recusion may lead to an infinite loop. To address this, once we’ve identified the cycle, we need to rehash everthing with new hash functions.
They are extremely space-efficient compared to hash tables, but at the cost of deterministic answers.
We will use multple hash functions. The fundamental idea is that instead of storing the avtual values, we store an array of Booleans indicating whether or not a value is present.
A divide-and-conquer type algorithm breaks the given problem into smaller parts, tries to solve the problem for each part, and, finally combines the solution for each part into the solution for the whole problem.
We are given a sorted sequence of positive integers and required to find out if a number, N, exists in the sequence.
We could iterate over the entire sequence, checking whether each element is equal to N. this is called a linear search. However, it is inefficient and does not take into account that the given array is sorted. O(n) algorithm.
An alternative solution that explits the fact that the sequence is sorted is as follow.
- Start with the whole sequence in range
- Compare the middle eleemnt of the current range with N, Let this middle
element be M.
- If M == N, we found N in the sequence and, therefore, the search stops.
- Otherwise, we modify the range:
- If N < M, it means that if N were to be present in the range, it would be to the left of M and, therefore, we can safely remove all the elemts to the right of M from the range.
- If N > M, the algorithm removes all the elements to the left of M form the range.
- If more than 1 element remains in the range, go to step 2.
- Otherwise, N does not exists in the sequence and search stops.
1. Algorithmic Thinking, Peak Finding - YouTube
A posistion 2 is a peek if and only if 2 >= 1 and 2 >= 3. in edge you just need to look to one side. 9 >= 8.
a | b | c | d | e | f | g | h | i |
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
1 | 2 | … | n/2 | … | n-1 | n |
we could iterate over the entire sequence, checking each element. O(n) algorithm.
- Start with the whole sequence in range
1 2 … n/2 -1 n/2 n/2 +1 … n-1 n - Compare the middle eleemnt of the current range with the neighbors
- we modify the range:
- If a[N/2] < a[n/2 -1], the only look at left half
1 2 … n/2 - If a[N/2] < a[n/2 + 1], the only look at right half
n/2 … n-1 n - Otherwise, a[n/2] is a peek
- If more than 1 element remains in the range, go to step 2.
O(log2 n) algorithm
O(n*m) complexity, n and m size of the matrix
- Pick middle column j = m/2
- Find gloval maximum on clumn j at (i,j)
- Compare (i, j-1), (i,j), (i, j+1)
- Pick left columns of (i, j-1) > (i,j)
- Similarly to right
- (i,j) is a 2d peak if neither condition
- Solve the new proble with half number of columns
- When you have a single column, find global maximum and you’re done
[[file:/mnt/hdd/nardo/Documents/Books/Data and Algorithm/[Thomas-H.-Cormen,-Charles-E.-Leiserson,-Ronald-L.(z-lib.org).pdf][MIT Introduction to Algorithms]]
- Divide the problem into a number of subproblems that are smaller instaces of the same problem.
- Conquer the subproblems by solving them recusively. If the subproblem sizes are small enough, however, just solve the subproblems in a traightforward manner.
- Combine the solutions to the subproblems into the solution for the original problem.
3. Insertion Sort, Merge Sort - YouTube
- Divide: Divide the n-elemnt sequence to be sorted into two subsequences of n/2
elements each
- Conquer Sort the two subsequences recusibely using merge sort
- Combine Merge the two sorted subsequences to produce the sorted answer
Introduction to Algorithm Algorithm design 5 1 Quicksort Overview 12 min - YouTube
Algorithm | Best case | Average case | Worst case |
---|---|---|---|
Merge sort | O(n log n) | O(n log n) | O(n log n) |
Quicksort | O(n log n) | O(n log n) | O(n^2) |
The advantage between Merge sort and quicksort, is the memory usage.
Given an array and a pivot element, P, in the array, the partition operation does two things.
- It divides the original array int two subarrays, L and R, where L contains all the elements of the given array that are less than or equal to P, and R contains all elements of the given array that are greater than P.
- It reorganizes the elements in the array in the order L, P, R.
Algorithm:
The worst case secnario for Quick sort is a sorted list, in this case Quick sort operate in O(n^2).
Given a randomly ordered set of elements, S, you are asked to find ith smallest element in S.
Sorte S and then selecting the ith element. O(nlogn).
O(n)
- Divide the input vector, V, into V1, V2, V3 .. Vn/5, each containing five eleemnts (the last can have less)
- We sort each Vi
- For each Vi, find the median, mi, and collect all medians into a set, M.
- Find the median element, q, of M.
- Use the partition operation on V using q as the pivot to get two vectors, L and R. L contains all the elements less than q and R contains all the elements greater than q.
- L has (k -1) elements:
- if k = i, then q is the ith eleemnt in V
- if k < i, then set V = L and go to step 1.
- if k > i, the set V = R and go to step 1.
Assumes that each of the n input eleemnts is an integer in the range 0 to k, for some integer k.
std::vector C(k, 0);
for (auto i : A) {
++C[i];
}
// C[i] now contains the number of elements equal to i
for (auto i : C) {
if (i == C.begin())
continue
C[i] = C[i] + C[i - 1];
}
// C[i] now contains the number of elements less than or equato to i
for (auto i = A.rbegin(); i != A.rend(); ++i) {
B[C[A[i]]] = A[i];
--C[A[i]];
}
Sorting the least significant digit first. then is sorts the entire deck again on the second-leas significant digit and so on.
Assumes that the input is generated by a random process that distributes elements uniformly and independently over the interval [0,1)