Untitled

\section{Milestone 1 Design}

The team has considered solutions for each component of the design. Advantages and disadvantages of each solution are identified which will determine the candidates to be used in the final design.

\subsection{Data structure}

The team has decided to use B+ trees as the primary data structure to represent database tables. The deciding factor for using this data structure was for its notable usage in other database software, and its efficient CRUD operations. This data structure includes self-indexing to find records fast and efficiently. In addition, the tree retrieves a record more efficiently than binary search tree (uses less operations). The team has determined the implementation complexity of this data structure to be medium-high compared to other data structures (see Alternatives below).

\subsubsection{Alternatives}

These are the alternative data structures that the team has considered.

\begin{description}
  \setlength{\itemsep}{1pt}
  \setlength{\parskip}{0pt}
  \setlength{\parsep}{0pt}
\begin{comment} \item[Heaps] \hfill \\ Heaps are tree-based data structures. They are not efficient for accessing a specific row of a table or a random row, nor are they efficient for in order results. This data structure is good for small tables and tables where changes are infrequent. \cite{data_structure_1}
\end{comment}
    \item[Hash Buckets] \hfill \\ Hash buckets are a variant of the hash table data structure. The implementation complexity is high, but they are efficient for finding records of specific keys. Some disadvantages includes hash collisions, and no ability for key range retrieval. To avoid collisions, detection and restoration functions are needed, which can create overhead and hinder performance.

    \item[Binary Search Tree] \hfill \\ The implementation complexity of the binary search tree (BST) is medium. They are not suitable large tables which will require a large number of CRUD operations thus creating overhead. A BST may degenerate into a linked list, thus the performance of the data structure will act like a linked list which is the worst-case scenario than an average BST.

    \item[Linked List] \hfill \\ The implementation complexity of the linked list is low. On average half the records will have to be traversed to find a specific record which can affect the performance of large records. In addition, sorting the list will create performance overhead.

    \item[Arrays] \hfill \\ The implementation complexity of arrays are very low since they're built-in. One major disadvantage is that size needs to be known ahead of time. Resizing the array to insert new records will require the array to delete and re-insert every record which will create massive amounts of overhead performance.

\end{description}

%old
\begin{comment}
\begin{itemize}
    \item Heaps \cite{data_structure_1}
    \begin{itemize}
        \item not efficient for accessing a specific row of a table; or a random row
        \item not efficient in ordering result sets
        \item good for small tables and tables where changes are infrequent
    \end{itemize}

    \item Hash Buckets
    \begin{itemize}
        \item implementation complexity is high
        \item efficient for finding rows of specific keys
        \item collisions are possible; collision detection and restoration are needed, which can create overhead and hinder performance
        \item not suitable for key range retrieval
    \end{itemize}

    \item Binary Search tree
    \begin{itemize}
        \item implementation complexity is medium
        \item not suitable for large tables which will require a large number of CRUD operations thus creating overhead
        \item can degenerate to a linked list, thus the performance will act like a linked list
    \end{itemize}

    \item Linked list
    \begin{itemize}
        \item implementation complexity is low
        \item on average half the records will have to be traversed to find a specific record
        \item sorting the list will create performance overhead
    \end{itemize}

    \item Arrays
    \begin{itemize}
        \item implementation complexity is very low
        \item assumes every record is the same size
        \item size needs to be known ahead of time; resize will require to delete and re-insert every record which will create massive amounts of performance overhead
    \end{itemize}

    \item B+ trees
    \begin{itemize}
        \item implementation complexity is medium-high
        \item most used in database proprietary software
        \item self-indexing to find records fast and efficiently
        \item accesses a record more efficiently than binary search tree (uses less operations)
    \end{itemize}
\end{itemize}
\end{comment}

\subsection{Configuration file parser}

    Making a configuration file parser can take took long to implement from scratch. The team has decided to use an existing library to aid in function of the configuration file parser. The library used is called YAML (Yet Another Markup Language) which is a human friendly data serialization standard for all programming languages \cite{alt_solution_1}.

\subsection{Table records}

Table records will be stored as strings. Each field value of a record will be separated from the next by a delimiting character. The team will adopt a standard called comma-separated values (CSV) \cite{alt_solution_2}. Problems that may occur include delimiter collisions wherein a delimiter is introduced into a text without intending them to be interpreted as boundaries between separate regions \cite{alt_solution_3}.

\subsection{Data compression and cryptography}

The team wishes to add security and low-bandwidth characteristics to the connection pipes between the client and the server. Data compressions will help save bandwidth, but at the same time the decompression and compression operations have to be very fast to add insignificant overhead. The team has identified that the Lempel-Ziv-Oberhumer (LZO) library is suitable for this task since it is a lossless data compression algorithm that is focused on decompression speed \cite{alt_solution_4}.

The team has looked into numerous cryptography libraries and overall security, performance, and efficiency have been the deciding factor in choosing the best candidates. The team has chosen the Advanced Encryption Standard (AES) as the best candidate for encrypting and decrypting data. The deciding factor for using this is because the AES has been adopted by the U.S. government to protect sensitive data \cite{alt_solution_5}. Moreover, the team wants to protect communication data in the same manner.