Untitled

\documentclass{article}
\usepackage{amsmath}
\newpage
\title{COMP26120 Lab 5}
\author{Joshua Payapulli}

\begin{document}
\maketitle

% PART 2 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\section{Complexity Analysis}
\label{sec:complexity}
\subsection{Insertion Sort}
\textbf{How does Insertion sort work?}
\newline\newline
In each iteration, insertion sort compares the current element with the next element and determines whether the current element is greater than the one it was compared to. In other words, it works by sorting a small subset of the array which increases by one on each iteration of the main loop.
\newline\newline
If this is true, then it leaves the element in its place and moves on to the next element. If it is false, then it finds its correct position in the sorted array and moves it to that position by shifting all the elements which are larger in the sorted array to one position ahead. This is typically done starting from the start of the array and moving right until the whole thing is sorted but it can be done in other ways as well.
\newline\newline
\textbf{Best Case:}
\newline\newline
Let us first consider what the best case is for the insertion sort algorithm. It is easy to see that the best case for this algorithm will be where our input is in sorted order, this is because in this case we will not have to compute any swapping of elements therefore the inner loop will not be executed.
\begin{equation}
     T = t_2 + t_3 + .... + t_n \\
\end{equation}
\begin{equation}
     \sum_{j=2}^{n}{t_j = t_2 + t_3 + ... + t_n = 1 + 1 + ... 1 = n - 1} \\
\end{equation}
\begin{equation}
    T(n) = c_1n + c_{2}(n - 1) + c_{3}(n - 1) + c_{4}(n - 1) + ... + c_{7}(n - 1) \\
\end{equation}
\begin{equation}
    T(n) = (c_1 + c_2 + c_3 + c_4 + ... + c_7) . n - (c_2 + c_3 + c_4 + ... + c_7) \\
\end{equation}
\begin{equation}
    T(n) = (a_n - b) \\
\end{equation}
From this mathematical proof we can say see that insertion sort runs with a Big O complexity of n as the constants do not affect the Big O time complexity.
\newline\newline
\textbf{Worst Case:}
\newline\newline
The worst case for insertion sort will be where the input list is in decreasing order. This is because both the outer and inner loops will run to their entirety in this case which results in our worst case complexity. For example, in the first execution of the outer for loop the inner for loop will execute n times but decremented by one each time an outer loop is executed. This will result in a polynomial time complexity as both for loops are definitely going to run with linear time. The mathematical analysis of this is as follows.
\newline\newline
Outer loop execution:
\begin{equation}
     \sum_{i=0}^{n}{i=}\frac{n(n + 1)}{2} \\
\end{equation}
Inner loop execution:
\begin{equation}
    \sum_{j=2}^{n}{(j - 1)=}\frac{n(n - 1)}{2} \\
\end{equation}
\begin{equation}
    T(n) = c_1n + c_2(n - 1) + c_3(n - 1) + c_4(\frac{n(n + 1)}{2} - 1) + \\ c_5(\frac{n(n - 1)}{2})) + c_6(\frac{n(n - 1)}{2})) + c_7(n - 1) \\
\end{equation}
\begin{equation}
    = (\frac{c_4}{2} + \frac{c_5}{2} + \frac{c_6}{2})n^2 + (c_1 + c_2 + c_3 + \frac{c_4}{2} - \frac{c_5}{2} - \frac{c_6}{2} + c_7)n - (c_2 + c_3 + c_4 + ... + c_7) \\
\end{equation}
\begin{equation}
    T(n) = a_n^2 + bn - c \\
\end{equation}
\newline\newline
From this mathematical proof we can say see that insertion sort runs with a Big O complexity of n to the power of 2 as the constants do not affect the Big O time complexity.
\newline\newline
\textbf{Average Case:}
\newline\newline
That outer loop always does theta of n work work. The inner loop, however, does an amount of work that's proportional to the total number of swaps made across the entire runtime of the algorithm. To see how much work that loop will do, we will need to determine how many swaps are made.
\newline\newline
This inner loop can execute once or all the way to n but the average is that it runs with linear time. Therefore we can say both the loops combined run with a big theta time complexity of n squared.


\subsection{Quick Sort}
\newline\newline
\textbf{How does quick sort work?}
\newline\newline
Quicksort uses a divide and conquer algorithm approach. Quicksort first divides a large array into two smaller sub-arrays: the low elements and the high elements and sets the pivot. Quicksort can then recursively sort the sub-arrays usually implemented via a helper function with creates the partitions. The steps are as follows:
\newline\newline
Pick an element, called a pivot, from the array.
Partitioning: reorder the array so that all elements with values less than the pivot come before the pivot, while all elements with values greater than the pivot come after it. After this partitioning, the pivot is in its final position. This is called the partition operation.
\newline
Recursively apply the above steps to the sub-array of elements with smaller values and separately to the sub-array of elements with greater values.
The base case of the recursion is arrays of size zero or one, which are in order by definition, so they never need to be sorted.
\newline
The pivot selection and partitioning steps can be done in several different ways; the choice of specific implementation schemes greatly affects the algorithm's performance.
\newline\newline
\textbf{Best Case:}
\newline\newline
Quicksort best case occurs when the partitions are as evenly balanced as possible: their sizes either are equal or are almost equal. The former case occurs if the subarray has an odd number of elements and the pivot is right in the middle after partitioning. The latter case occurs if the subarray has an even number of elements. In either of these cases, we will have a runtime complexity of nlogn as we are able to half our input array each time and then iterate through it to each time to do the appropriate swaps.
\begin{equation}
     Let \:a = 2, b = 2 \\
\end{equation}
\begin{equation}
     T(n) = 2T(n/2) + \Omega(n)  \\
\end{equation}
\begin{equation}
     = 2T(n/2) + \Omega(n^{\log_2 2})  \\
\end{equation}
\begin{equation}
     = 2T(n/2) + \Omega(n)  \\
\end{equation}
\newline
From the appropriate rule of the master theorem we can derive the best case runtime complexity to be Big Omega of nlogn. This is as we expected from the reasoning given above.
\newline\newline
\textbf{Worst Case:}
\newline\newline
The worst case for quick sort is where the partition is as unequal as possible so you end up with one element in one half of the partition and the rest of the array in the rest of the partition. Each recursive call will run in linear time where the number of computations completed at each iteration decreases by one each time. Therefore, the worst case runtime complexity will be in polynomial time or n squared time.
\begin{equation}
     c_n + c(n - 1) + c(n - 2) + 2c = c(n+(n - 1)+(n - 2)+ ... + 2)\\
\end{equation}
\begin{equation}
     = c((n + 1)(n/2) - 1)\\
\end{equation}
\newline
From here we can see that the dominant term in our equation is n to the power of two so we have a worst case runtime complexity of n squared. Or Big O of n squared.
\newline\newline
\textbf{Average Case:}
\newline\newline
If the pivot is equally likely to end up anywhere in an n-element subarray after partitioning. Then to get a split that is 3-to-1 or better, the pivot would have to be somewhere in the middle half. So, if the pivot is equally likely to end up anywhere in the subarray after partitioning, there's a 50{\%} chance of getting at worst a 3-to-1 split. In other words, we expect a split of 3-to-1 or better about half the time. Therefore we have a runtime complexity of nlogn in the average case but in reality the runtime may be even better than this or perhaps worse. It depends entirely on the partition you end up getting.
\newline\newline


\subsection{Merge Sort}
\newline\newline
\textbf{How does merge sort work?}
\newline\newline
Merge Sort is a sorting algorithm, which is commonly used in computer science. Merge Sort is a divide and conquer algorithm, similar to quick sort. It works by recursively breaking down a problem into two or more sub-problems of the same or related type, until these become simple enough to be solved directly. In other words, it keeps halving an array and then is merged back into a new array but in the correct sorted order.
\newline\newline
\textbf{Best Case + Worst Case + Average Case:}
\newline\newline
For Merge Sort, the runtime complexity is the same in every case. This is because no matter what the data set we are given is, it will always find the middle index and partition based on this and then merge them back in, in the correct order. For example, if the array is in sorted order it will keep halving itself until we have sub-problems that are simple enough to solve and then it will merge back in. This is the exact same process as it would be if the array was in reverse sorted order therefore we can amalgamate the best case, worst case and average case all into one as they will all have the same runtime complexity. The process of halving the subarrays will run with complexity of logn and the process of merging it back in will run with complexity of n. Thus, resulting in a total runtime complexity of nlogn.
\begin{equation}
     Let \:a = 2, b = 2 \\
\end{equation}
\begin{equation}
     T(n) = 2T(n/2) + \Theta(n)  \\
\end{equation}
\begin{equation}
     = 2T(n/2) + \Theta(n^{\log_2 2})  \\
\end{equation}
\begin{equation}
     = 2T(n/2) + \Theta(n)  \\
\end{equation}
\newline
From the appropriate master theorem rule we can derive the best case runtime complexity to be Big Omega of nlogn. This is as we expected from the reasoning given above.


\subsection{Bubble Sort}
\textbf{How does Bubble sort work?}
\newline\newline
Bubble sort works by comparing a node to its adjacent node and swapping them if they are not in order. It repeats this process until the outer loop has run the same number of times as elements in the list. The also runs this many times. In other words, both the inner and outer loops will start at the first element and will terminate at the last element.
\newline\newline
\textbf{Best Case + Worst Case + Average Case:}
\newline\newline
My bubble sort implementation is the 'dumb' version of bubble sort. This optimised method for bubble sort is to check whether no values have been swapped at which point the algorithm would terminate as you would know the loop is sorted. Because we don't do this, my bubble sort algorithm runs with n squared runtime complexity regardless of the data that it is given as input.
\newline\newline
Outer loop execution:
\begin{equation}
     \sum_{i=o}^{n}{i=}\frac{n(n + 1)}{2} \\
\end{equation}
Inner loop execution:
\begin{equation}
    \sum_{j=o}^{n - 1}{j=}\frac{n(n + 1)}{2} \\
\end{equation}
\begin{equation}
    = 1 + 2 + 3 + 4 + ... + (n - 2) + (n - 1)\\
\end{equation}
\begin{equation}
    = (\frac{(n - 1)(n - 1 + 1)}{2})
\end{equation}
\begin{equation}
    = \frac{n(n - 1)}{2}
\end{equation}
\newline\newline
From the above proof we can see that in all cases my bubble sort algorithm has a runtime complexity that is of the order n squared, this is because the algorithm is more or less the same no matter what type of input is given to the algorithm.
\newline\newline
%====================================
\section{Experimental Analysis}
\label{sec:initialExperiments}

In this section we consider the question
	\begin{quote}
	Under what conditions is it better to perform linear search rather than binary search?
	\end{quote}

\subsection{Experimental Design}

The variables I will be changing in my experiment are: \\\\
 - Size of dictionary\\
 - Structure of dictionary e.g. sorted, random\\
I have chosen these factors as they should help me to see which algorithms perform the best on which data for sorting and what performs better in general between binary and linear search.\\\\
I plan to vary the structure and size of the dictionary that I will be passing to the tests to ensure there is a difference is sorting speed and complexity.\\
I have written a script that will produce these test results for with test cases.\\
\newline\newline
1. Small dictionary with sorted order\\
2. Small dictionary with random order\\
3. Small dictionary with reverse-sorted order\\
4. Medium dictionary with sorted order\\
5. Medium dictionary with random order\\
6. Medium dictionary with reverse-sorted order\\
7. Big dictionary with sorted order\\
8. Big dictionary with random order\\
9. Big dictionary with reverse-sorted order\\
\newline\newline
A small dictionary is under 10 elements, medium size is be 1,000 elements and large would be above 20,000 elements.\\
\subsection{Experimental Results}
\newline\newline
\begin{table}[h]
\begin{tabular}{llllllllll}
              &         &         &        &         &         &        &         &         &         \\
Algorithm     & \multicolumn{3}{l}{Sorted} & \multicolumn{3}{l}{Random} & \multicolumn{3}{l}{Reversed} \\
              & Small       & Med       & Lar      & Small       & Med       & Lar      & Small       & Med       & Lar       \\
Linear Search & 0.003s  & 0.009s  & 3.604s & 0.002s  & 0.011s  & 3.660s & 0.002s  & 0.016s  & 2.632s  \\
Insertion     & 0.002s  & 0.008s  & 3.653s & 0.002s  & 0.023s  & 6.146s & 0.003s  & 0.024s  & 5.390s  \\
Quick         & 0.002s  & 0.011s  & 3.456s & 0.004s  & 0.004s  & 0.085s & 0.002   & 0.014s  & 3.753s  \\
Bucket        & 0.003s  & 0.019s  & 0.234s & 0.003s  & 0.021s  & 0.343s & 0.002   & 0.028s  & 0.323s  \\
Merge         & 0.003s  & 0.003s  & 0.071s & 0.002s  & 0.006s  & 0.080s & 0.001   & 0.006s  & 0.079s  \\
Bubble        & 0.002s  & 0.014s  & 4.343s & 0.001s  & 0.025s  & 5.854s & 0.002   & 0.026s  & 4.578s
\end{tabular}
\end{table}
\textbf{Conclusion:}
\newline\newline
From this experiment I would have to deduce that merge sort is the best sorting algorithm as it in general has the lowest times to execute. This is as expected as in my analysis earlier I stated how merge sort has nlogn complexity regardless of what the input is so the fact that this runs quite consistently for all types of input values is consistent with my analysis as well. On other sorting algorithms, such as the insertion sort, they perform much worse on reversed arrays as is expected when I analysed the worst case of insertion sort which is where the array is in reverse order.
\newline\newline
With regard to which implementation is better in what instance, my results suggest that linear search is probably the best for very small inputs as it will not take long to iterate through a small number of elements. However, when using a big data set, binary search does run much faster than linear search as it has a runtime complexity of logn vs n from linear search. Therefore when dealing with large sets of data it is worth the cost of sorting the data to do a binary search as opposed to a linear search.

\section{Extending Experiment to Data Structures}
\label{sec:part3}

Now we consider the question:
\begin{quote}
Under what conditions are different implementations of the dictionary data structure preferable?
\end{quote}
We must design another experiment for trees and hashsets\\

It will go through the same 9 tests as before

LP - Linear Probing, QP - Quadratic Probing, DH - Double Hashing, SC - Separate Chaining.

The table is a time against a particular data structure and the way they are called in terms of methods of implementations.
The size varies as Small = 10, Medium = 1000, Large = 20000.

\begin{table}[h]
\begin{tabular}{llllllllll}
                               &         &         &        &         &         &        &         &         &         \\
Algorithm                      & \multicolumn{3}{l}{Sorted} & \multicolumn{3}{l}{Random} & \multicolumn{3}{l}{Reverse} \\
                               & S       & M       & L      & S       & M       & L      & S       & M       & L       \\
Binary Tree                    & 0.004s  & 0.027s  & 7.548s & 0.010s  & 0.021s  & 7.352s & 0.004s  & 0.023s  & 7.265s  \\
$Hashtable - LP_1$     & 0.006s  & 0.007s  & 0.004s & 0.004s  & 0.007s  & 10.326s & 0.003s  & 0.007s  & 10.227s  \\
$Hashtable - QP_1$  & 0.005s  & 0.005s  & 0.003s & 0.003s  & 0.004s  & 0.176s & 0.003s   & 0.005s  & 0.175s  \\
$Hashtable - DH_1$     & 0.003s  & 0.007s  & 0.002s & 0.003s  & 0.006s  & 0.390s & 0.003s   & 0.007s  & 0.386s  \\
$Hashtable - SC_1$  & 0.003s  & 0.010s  & 0.002s & 0.002s  & 0.005s  & 0.054s & 0.003s   & 0.005s  & 0.054s  \\
$Hashtable - LP_2$     & 0.004s  & 0.011s  & 0.002s & 0.002s  & 0.007s  & 9.897s & 0.002s   & 0.007s  & 9.710s \\
$Hashtable - QP_2$	 & 0.003s  & 0.006s  & 0.004s & 0.002s  & 0.007s  & 0.190s & 0.002s   & 0.005s  & 0.176s  \\
$Hashtable - DH_2$		 & 0.002s  & 0.005s  & 0.003s & 0.002s  & 0.005s  & 0.323s & 0.003s   & 0.006s  & 0.323s  \\
$Hashtable - SC_2$  & 0.003s  & 0.005s  & 0.003s & 0.009s  & 0.005s  & 0.056s & 0.003s   & 0.005s  & 0.058s  \\
\end{tabular}
\end{table}
\textbf{Conclusion:}
\newline\newline
From this experiment I would have to deduce that the hash set definitely performs worst when using linear probing and performs best when using quadratic probing. When it comes to answering the question of which data structure should be used where I would say that from my data it is evident that the Hashset is better in most cases with a few exceptions for linear probing. For small inputs my data suggests that there is very little difference in time so perhaps trees are more suitable for small inputs as the tree holds a sorted order which may be useful depending on what you want to implement. In general though, HashSets have a much faster time to find and insert bar linear probing.
\end{document}