Giter Club home page Giter Club logo

algorithms-hw1's Introduction

Algorithms-HW1

Build Status

Minumum Spanning Tree with Kruskal and Prim algorithms.

An hands-on experience about algorithms over graph and graph theory with the focus on efficiency and, obviously, correctness.

Usage

The repositores comes with some utility commands that allows you to compile, test and bench owr algorithms. We use make to automatize this process.

Available commands

  • make all, to compile all the algorithm sources in this project.
  • make ALG, where ALG is one of KruskalNaive, KruskalUnionFind, KruskalUnionFindCompressed, PrimBinaryHeap, PrimKHeap to compile given algorithm sources.
  • make testall, to run tests on our algorithms.
  • make testall_explicit, to run tests with verbose output on our algoritms.

Within the Makefile we provieded some variables to modify our pipeline. In particular you can use your own compiler rewriting the CXX flag. Other variables can be modified, for example CXXFLAGS, OUT_DIR and EXT.

Example

make CXX="g++" CXXFLAGS="-O3 -std=c++17 -I Shared" OUT_DIR="build" EXT="exe" all

Scripts

If you are a Windows user you can look at test.ps1, testall.ps1 and time.ps1 scripts in order to run tests and bench algorithms.

If you are a Linux user, we have created a porting of the above scripts. You can look at test.sh, testall.sh and time.sh. Note that these Linux scripts have less features than their Windows counterpart.

Project Structure

The project is structured as a unique Visual Studio solution containing multiple subprojects, one for every implemented algorithm. The code for each project is stored in a folder with the same name of the related algorithm. These projects are:

  • KruskalNaive: Kruskal MST with simple DFS cycle detection;
  • KruskalUnionFind: Kruskal MST implemented with Disjoint-Set (Union-Find) data structure, with union-by-size policy;
  • KruskalUnionFindCompressed: Kruskal MST implemented with Disjoint-Set (Union-Find) data structure, with union-by-rank policy and path-compression;
  • PrimBinaryHeap: Prim MST with a Priority Queue based on a Binary Heap;
  • PrimKHeap: Prim MST with a Priority Queue based on a K-ary Heap.

The shared data structures and utils are stored in the Shared folder.

The project comes with some extra folders:

  • benchmark: it contains CSV benchmarks of the algorithm as well as the script used to analyze them (analysis.py);
  • datasets: it contains the input data for the graphs given by our professor, i.e. 68 random connected, weighted and non-directed graphs up to 100K nodes and ~130K edges;
  • test: it contains 68 test graphs and their exact MST value, used test our algorithms' correctness.

Related Projects

Some of the data-structures created for this project are general enough that we were able to copy their source to separate, indipendent repositories, each with its own documentation and unit tests. These repositories are:

Authors

Bryan Lucchetta

Luca Parolari

Alberto Schiabel

License

This project is MIT licensed. See LICENSE file.

algorithms-hw1's People

Contributors

jkomyno avatar lparolari avatar

Watchers

 avatar  avatar  avatar

Forkers

lparolari

algorithms-hw1's Issues

Implement a k-ary Heap

Implement a k-ary Heap data structure to find it out if it's possible to further optimize our Prim algorithm implementation. BinaryHeap.h should be used as a template for creating Shared/KHeap.h. The value of k should be passed as template parameter (template <..., int k>).

K-ary Heaps might be slightly worse than Binary Heaps theoretically (most operations become O(k*log_k(n)) instead of O(log_2(n))) but in general they are known to be more performant in practice, due to cache locality. They are considered better than Fibonacci Heaps too in practice.

Useful reference: https://www.geeksforgeeks.org/k-ary-heap/

Create Python script for analyzing the benchmark CSV files

We should decide what kind of analysis we should perform on the CSV files collected by the benchmark.ps1 script in the benchmark folder.
Ideally, we should be able to gather 20 benchmarks each for KruskalUnionFind and PrimBinaryHeap (they take roughly 1 minute to complete) and 8 benchmarks for KruskalSimple (at the moment, this takes ~12.5 hours, see issue #2).

These benchmarks should be run on the same machine with no other foreground process in execution but the benchmark script. The WiFi card should be disabled as well.

The CSV files should be imported in pandas dataframes. Their execution time should be averaged out between the different CSV files that belong to the same executable. For each executable, we should show how the time grows with the number of nodes, with the number of edges, and maybe with the combination of the two (both nodes and edges).

Other visualizations can be proposed and discussed before implementing them

Refactor AdjListGraph.h

We should remove duplicate edges directly when the graph is constructed, not when the edges are returned via a getter method.

We should:

  • remove the hacky maintain_duplicates boolean flag behaviour in get_edges, making the method O(1)
  • store the edges in std::pair directly in add_edge
  • update the other parts of the code in which AdjListGraph is used accordingly

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.