Group Holycow 955648 - Operating Systems

Final Submission

The following are what is implemented in our final submission to the OS challenge:

Multithreading with a pool-thread (6 threads)
Priority Cost function for scheduling tasks
Multiprocessing
Hash-tables for caching

Hardware Specification

All tests have run on the same computer (using Vagrant). The specifications of the computer is listed below:

Specification	Value
CPU	Intel i7-9750H
CPU clock speed	2.6 GHz
No. of CPU cores	4
RAM amount	32 GB
RAM type	2666 MHz DDR4
OS	W11 22H2

Although all tests have been conducted on the same machine there is still the possibility for errors coming from background OS tasks. The tests were run back-to-back.

Experiments

Sequential Model vs Multithread Model

Author: Thibaud Bourgeois, s221592

Branch: MultiThread-Basic, Thread-Pool

Experiment Motivation

This overall experiment seeks to figure out how much (if any) we can benefit from multithreading, i.e. handling multiple client request simultaneously with threads. The reason we at all bother to do these experiments is because the computer which are running the test have a multi-core CPU with 4 cores. Splitting the workload between these 4 cores should result in a speed boost.

It was clear that we needed to use a first parallelization method to drastically improve the performance of our server. To start, we decided to implement a multi-threaded version of the server.

In this experiment, we want to compare an implementation using only one process (sequential) and an implementation using multiple threads. The motivation for this experiment is to solve the weakness described above. The thread model is somehow similar to the process model however threads are more lightweight than processes and thus changing threads is much faster than changing processes and creating and terminating threads is also much faster. Furthermore, threads share the same address space as opposite to processes.

The implementation of multithreading has been done in several parts to improve its performance and to make it safe:

Pool thread

First of all, we had to limit the number of threads created to limit the use of the CPU. Moreover, an infinite number of threads does not help anymore at a certain point. This is why the creation of a pool-thread was necessary. It ensures we don't create an unbounded number of threads. we used first in first out (FIFO) scheduling to start and wait for threads, i.e. when we had started the maximum number of threads we called join() on the thread which was first started, and started the new thread when join() returned.

Thread-Safe Implementation (Locks and Semaphore)

But even if we got comfortable performances (see results below), we got lucky. There was still a major bug in the code. The shared data structure we created to stack the connection (the queue) was not thread safe: we were still subject to race conditions. Hence, we needed to protect those calls (enqueue() and dequeue()) with the help of a mutex lock.

Notes about condition variables and semaphores

At the beginning of the experiment I had added (wrongly) variable conditions to avoid threads doing busy waiting on the queue while waiting for it to fill up. But in our situation this was not really useful since when threads freed CPU, they had no one to share it with since they were all in the same waiting situation. This would have worked if several threads were actually working on a common work.

Setup

All tests regarding this experiment has been executed on the same computer, we have tried to keep the workload constant during the test session, however this is nearly impossible and background activity could result in errors. The configuration parameters of the client was the following:

Run Configuration

Setting	Value
SERVER	192.168.101.10
PORT	5003
SEED	3435245
TOTAL	100
START	0
DIFFICULTY	30000000
REP_PROB_PERCENT	20
DELAY_US	600000
PRIO_LAMBDA	1.50

Results

Below are the results of the tests:

Run	Sequential	Threads
First run	95.220.820	24.671.300
Second run	97.668.541	25.518.803
Third run	92.979.926	24.280.959
Average	95.289.762	24.823.687

Discussion and Conclusion

Looking at the results there is a noticeable difference when comparing the averages, the multi-thread model is clearly faster that the sequential model. However, there are still some problem with the thread model. It starts more threads than we have CPU cores, which result in context switches. We will now try to fix the problem with having more threads than CPU cores.

Optimizing Maximum Number Threads

Author: Thibaud Bourgeois s221592

Branch: MultiThread-Basic, Thread-Pool

In this experiment we want to address the problem with having more threads than CPU cores, the reason that this is a problem is, that if we start more threads than we have cores, the scheduler have to do context (thread) switches (which takes time even if thread switching is very efficient and much cheaper than process switching. The motivation is to avoid making these context switches. This should be as simple as setting the number of threads equal to the number of cores, however we will try some different number of threads to see which number actually give us the best performance.

Run Configuration

Setting	Value
SERVER	192.168.101.10
PORT	5003
SEED	3435245
TOTAL	100
START	0
DIFFICULTY	30000000
REP_PROB_PERCENT	20
DELAY_US	600000
PRIO_LAMBDA	1.50

Results

We did the experiment by changing the maximum number of threads to 3, 4, 5, 6, 7, 8 and 20. The result can be found in the table below. The listed number of threads is only how many threads that are actively calculating client requests, in addition to those there are also the main thread which is managing the threads as well as setting up the server initially.

Run	3 Threads	4 Threads	5 Threads	6 Threads	7 Threads	8 Threads	20 Threads
First Run	29.801.572	24.671.300	20.899.400	19.063.160	20.425.256	21.532.380	23.463.315
Second Run	29.603.170	25.518.803	21.005.840	23.730.163	20.681.164	21.664.842	22.048.472
Third Run	28.995.181	24.280.959	20.995.528	20.129.427	22.083.564	21.868.455	22.498.278
Average	29.464.635	24.818.339	20.966.867	20.882.172	21.050.894	21.688.119	22.662.390

Below is a graphical representation of the results:

Discussion and Conclusion

Looking at the graph the best number of threads seems to be 6. We initially expected that the best number would be 3 (4 including the main thread) because this would match the number of cores, and thereby we would avoid making expensive context switches. However, a valid point is that 5 (6 including the main thread) could result in a better performance, because the main thread is idle most of the time, so it could be worth to switch between the main thread and a thread handling client request. The reason that 6 is the best is more difficult to answer, a guess could be that it is better to start more threads because of the FIFO problem described in the "Process Model vs Thread Model" experiment (Idle threads if the first one takes a lot of time).

Scheduling Threads with a Priority-based Cost Function

Author: Thibaud Bourgeois, s221592

Branch: Priority-Queue

In this experiment, the goal is to optimize the score by choosing tasks based on their priority. At first, we will sort the tasks with the priority value, and then with: (priority)/(end - start)

Elements are put into a priority queue, implemented as a linked list. Worker threads then draw from this queue, taking the higher value computations first, and run them to completion. For these tests, I set the number of threads in the worker pool-thread to be 6.

Run Configuration

Setting	Value
SERVER	192.168.101.10
PORT	5003
SEED	3435245
TOTAL	100
START	0
DIFFICULTY	30000000
REP_PROB_PERCENT	20
DELAY_US	600000
PRIO_LAMBDA	1.50

Results

Run	FIFO	Priority-Queue sorted by taking into account p	Priority-Queue sorted by taking into account start, end and p
First Run	19063160	18648245	17955248
Second Run	23730163	19413889	17968412
Third Run	20129427	18515116	18045548
Average	20882172	18854961.44	17989691.95

Below is a graphical representation of the results:

Using the 6 threads pool-thread, we achieve a noticeably better score than the FIFO. Moreover, taking into account the distance between the start and the end also allows to gain slightly in performance.

Discussion

Due to the structure of the priority queue - a linked-list - this code is most efficient when the requests are coming at approximately the speed they can be solved or slower. If requests come too fast, then each request needs to be inserted into the linked list individually, leading to O(n^2) performance of insertion sort. This overhead bogs down the main thread and might interfere with network receive operations.

Conclusion

This result of this experiment is as expected - a more intelligent scheduling algorithm outperforms a simple FIFO.

Multiprocess Model vs Multithread Model

Author: Ghalia Bennani, s221649

Branch: Multiprocess-Basic

Experiment Motivation

The aim of this experiment is to test another type of parallelization. Indeed, we want to compare the concurrent implementations using multiples threads and multiple processes. Even if we believe that the multiprocess version will be slower than the multi-threads one, we want to give it a try. Of course, in this experiment, we keep exactly the same configuration as the last kept version (priority queue...). We only replacing the multithreading by multiprocessing.

The process model works that way: every time the server receives a client request a new process is created using the fork system call. We need to choose wisely the number of processes that can live simultaneously, in our case it is four since the computer which is running the test have a multi-core CPU with 4 cores. This way there is no shared memory (critical sections) and thus no need for synchronisation. If we want to increase the number of processes we will need to protect the critical sections. We will see later what is the optimal choice.

Optimizing the number of processes

In this section we seek to find the optimal number of processes.

Setup

Run Configuration

Setting	Value
SERVER	192.168.101.10
PORT	5003
SEED	3435245
TOTAL	100
START	0
DIFFICULTY	30000000
REP_PROB_PERCENT	20
DELAY_US	600000
PRIO_LAMBDA	1.50

Results

Below are the results of the tests: Note that when the number of processes exceeds 4, some security of the critical region have been implemented.

Run	4 Processes	5 Processes	10 Processes	20 Processes	50 Processes	100 Processes
First Run	43.490.995	56.853.092	28.273.468	25.301.062	32.465.741	38.502.073
Second Run	28.869.131	32.835.505	33.762.657	29.910.688	39.938.347	33.823.337
Third Run	32.406.480	33.714.794	26.862.934	37.785.497	34.472.936	33.482.823
Average	34.922.202	41.134.463	29.633.019	30.999.082	35.625.674	35.269.411

Below is a graphical representation of the results:

Discussion and Conclusion

Looking at the graph the best number of processes seems to be between 10 and 20. We initially expected that the best number would be 3 (4 including the parent process) because this would match the number of cores.

Comparison between multithreading and multiprocessing

In this section we compare the two parallelization techniques. Indeed, we run both models three times with their respective optimal number of threads/processes.

Setup

Run Configuration

Setting	Value
SERVER	192.168.101.10
PORT	5003
SEED	3435245
TOTAL	100
START	0
DIFFICULTY	30000000
REP_PROB_PERCENT	20
DELAY_US	600000
PRIO_LAMBDA	1.50

Results

Below are the results of the tests: Note that we used 6 threads and 10 processes.

Run	Processes	Threads
First run	28.273.468	18.689.768
Second run	33.762.657	23.543.790
Third run	26.862.934	20.635.433
Average	29.633.019	20.956.330

Below is a graphical representation of the results:

Discussion and Conclusion

Looking at the result there is a noticeable difference when comparing the averages, the thread model is approximately 1.5 times faster than the process model. That was expected since threads are more lightweight than processes and thus changing threads is much faster than changing processes and creating and terminating threads is also much faster. Furthermore, threads share the same address space as opposite to processes. Therefore, we won't keep the multiprocess option in the final configuration.

Caching the requests

Author: Ons Riahi, s221565

Branch: hashTables

Experiment Motivation

Repetition of taks or events is something common in our daily lives. For example, accessing DTU web page multiple times during the same day. For this purpose, web caching was invented, it’s the activity of storing data for reuse, such as a copy of a web page served by a web server. It’s cached or stored the first time a user visits the page and the next time a user requests the same page, a cache will serve the copy, which helps keep the origin server from getting overloaded as well as it enhances page delivery speed significantly and reduce the work needed to be done by the server.

It’s crystal clear to see the analogy between the given example and the fact that a server in our OS-Challenge can receive a duplicate of a previously sent request. So, a way to avoid computing the same reverse hashing, is to save or cache the previous hashes and their original values. So, whenever a request is repeated, there is no need to waste time on decoding, we can extract the right answer faster.

In practice, we can use several data structures for caching but in our project, we implemented a hash table which is known to be more efficient than other data structures since the insert and search operations have a time complexity O(1) . So, the idea was to implement an array of structs containing both the hash and its corresponding value. We also used closed hashing which is a method of collision resolution that consists in searching through alternative locations in the array until either the target record is found or an unused array slot is found, which indicates that there is no such key in the table. (The key here is going to be the hash).

Setup

All tests regarding this experiment has been executed on the same machine.

Run Configuration

Setting	Value
SERVER	192.168.101.10
PORT	5003
SEED	3435245
TOTAL	100
START	0
DIFFICULTY	30000000
REP_PROB_PERCENT	20
DELAY_US	600000
PRIO_LAMBDA	1.50

Results

Below are the results of the tests:

Run	Sequential without hash table	Sequential with hash table
First run	95.220.820	71.792.487
Second run	97.668.541	72.894.615
Third run	92.979.926	70.524.722
Average	95.289.762	71.737.275

Below is a graphical representation of the results :

Discussion and conclusion :

We can see that using a hashtable is faster than reverse hashing whenever we receive a request. Although, the difference is not so large since the repetition probability is only 20% but it is still better than using the simple sequential model. NB : below we increased the repetition probability, the results are better as expected .

Run	repetition probability 20%	repetition probability 50%
First run	71.792.487	41.191.500
Second run	72.894.615	43.275.962
Third run	70.524.722	40.760.749
Average	71.737.275	41.742.737

And this graph is a representation of the results :

Multi-threading and Caching

Author: Ons Riahi, s221565

Branch: Final-Version

Experiment Motivation

As we saw previously, both multi-threading and caching have a good impact on the server’s performance since these two experiments are efficient and makes the server faster when it comes to answering the client’s requests. That’s why we decided to combine these two features for our final version of OS-Challenge.

The hash table being used by multiple threads at the same time has a need for some kind of synchronization. The latter has to guarantee consistency of the hash table with parallel inserts, updates and queries on the data. To do so, we will use locks.

Run Configuration

Setting	Value
SERVER	192.168.101.10
PORT	5003
SEED	3435245
TOTAL	100
START	0
DIFFICULTY	30000000
REP_PROB_PERCENT	20
DELAY_US	600000
PRIO_LAMBDA	1.50

Results

Below are the results of the tests:

Run	Sequential	Multi-threading without hash table	Multi-threading with hash table
First run	95.220.820	18.689.768	18.958.127
Second run	97.668.541	23.543.790	17.198.736
Third run	92.979.926	20.635.433	17.763.928
Average	95.289.762	20.956.330	17.973.597

Below is a graphical representation of the results :

Discussion and conclusion :

The results show that a performance boost can be achieved by combining multi-threading and caching. Although this improvement seems to be slight since the repetition probability is not high but we still cannot ignore it. We tried to test with a repetition probability of 50% and these are the results :

Run	repetition probability 20%	repetition probability 50%
First run	18.958.127	4.855.266
Second run	17.198.736	4.974.597
Third run	17.763.928	4.873.498
Average	17.973.597	4.901.121

And the corresponding graphical representation :

Final Conclusion

Even though there are other experiments that could have a good impact on the server, multi-threading with an optimized numer of threads are the one that give the best results.

thibaudbrg / os-challenge-holycow Goto Github PK

os-challenge-holycow's Introduction

Group Holycow 955648 - Operating Systems

Final Submission

Hardware Specification

Experiments

Sequential Model vs Multithread Model

Author: Thibaud Bourgeois, s221592

Branch: MultiThread-Basic, Thread-Pool

Experiment Motivation

Pool thread

Thread-Safe Implementation (Locks and Semaphore)

Notes about condition variables and semaphores

Setup

Run Configuration

Results

Discussion and Conclusion

Optimizing Maximum Number Threads

Author: Thibaud Bourgeois s221592

Branch: MultiThread-Basic, Thread-Pool

Run Configuration

Results

Discussion and Conclusion

Scheduling Threads with a Priority-based Cost Function

Author: Thibaud Bourgeois, s221592

Branch: Priority-Queue

Run Configuration

Results

Discussion

Conclusion

Multiprocess Model vs Multithread Model

Author: Ghalia Bennani, s221649

Branch: Multiprocess-Basic

Experiment Motivation

Optimizing the number of processes

Setup

Run Configuration

Results

Discussion and Conclusion

Comparison between multithreading and multiprocessing

Setup

Run Configuration

Results

Discussion and Conclusion

Caching the requests

Author: Ons Riahi, s221565

Branch: hashTables

Experiment Motivation

Setup

Run Configuration

Results

Discussion and conclusion :

Multi-threading and Caching

Author: Ons Riahi, s221565

Branch: Final-Version

Experiment Motivation

Run Configuration

Results

Discussion and conclusion :

Final Conclusion

os-challenge-holycow's People

Contributors

Watchers

Recommend Projects

Recommend Topics

Recommend Org