Advertisement
Guest User

Untitled

a guest
Jul 20th, 2017
59
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
C 6.41 KB | None | 0 0
  1. \section{Milestone 1 Design}
  2.  
  3. The team has considered solutions for each component of the design. Advantages and disadvantages of each solution are identified which will determine the candidates to be used in the final design.
  4.  
  5. \subsection{Data structure}
  6.  
  7. The team has decided to use B+ trees as the primary data structure to represent database tables. The deciding factor for using this data structure was for its notable usage in other database software, and its efficient CRUD operations. This data structure includes self-indexing to find records fast and efficiently. In addition, the tree retrieves a record more efficiently than binary search tree (uses less operations). The team has determined the implementation complexity of this data structure to be medium-high compared to other data structures (see Alternatives below).
  8.  
  9. \subsubsection{Alternatives}
  10.  
  11. These are the alternative data structures that the team has considered.
  12.  
  13. \begin{description}
  14.   \setlength{\itemsep}{1pt}
  15.   \setlength{\parskip}{0pt}
  16.   \setlength{\parsep}{0pt}
  17. \begin{comment} \item[Heaps] \hfill \\ Heaps are tree-based data structures. They are not efficient for accessing a specific row of a table or a random row, nor are they efficient for in order results. This data structure is good for small tables and tables where changes are infrequent. \cite{data_structure_1}
  18. \end{comment}
  19.     \item[Hash Buckets] \hfill \\ Hash buckets are a variant of the hash table data structure. The implementation complexity is high, but they are efficient for finding records of specific keys. Some disadvantages includes hash collisions, and no ability for key range retrieval. To avoid collisions, detection and restoration functions are needed, which can create overhead and hinder performance.
  20.  
  21.     \item[Binary Search Tree] \hfill \\ The implementation complexity of the binary search tree (BST) is medium. They are not suitable large tables which will require a large number of CRUD operations thus creating overhead. A BST may degenerate into a linked list, thus the performance of the data structure will act like a linked list which is the worst-case scenario than an average BST.
  22.  
  23.     \item[Linked List] \hfill \\ The implementation complexity of the linked list is low. On average half the records will have to be traversed to find a specific record which can affect the performance of large records. In addition, sorting the list will create performance overhead.
  24.  
  25.     \item[Arrays] \hfill \\ The implementation complexity of arrays are very low since they're built-in. One major disadvantage is that size needs to be known ahead of time. Resizing the array to insert new records will require the array to delete and re-insert every record which will create massive amounts of overhead performance.
  26.  
  27. \end{description}
  28.  
  29. %old
  30. \begin{comment}
  31. \begin{itemize}
  32.     \item Heaps \cite{data_structure_1}
  33.     \begin{itemize}
  34.         \item not efficient for accessing a specific row of a table; or a random row
  35.         \item not efficient in ordering result sets
  36.         \item good for small tables and tables where changes are infrequent
  37.     \end{itemize}
  38.  
  39.     \item Hash Buckets
  40.     \begin{itemize}
  41.         \item implementation complexity is high
  42.         \item efficient for finding rows of specific keys
  43.         \item collisions are possible; collision detection and restoration are needed, which can create overhead and hinder performance
  44.         \item not suitable for key range retrieval
  45.     \end{itemize}
  46.  
  47.     \item Binary Search tree
  48.     \begin{itemize}
  49.         \item implementation complexity is medium
  50.         \item not suitable for large tables which will require a large number of CRUD operations thus creating overhead
  51.         \item can degenerate to a linked list, thus the performance will act like a linked list
  52.     \end{itemize}
  53.  
  54.     \item Linked list
  55.     \begin{itemize}
  56.         \item implementation complexity is low
  57.         \item on average half the records will have to be traversed to find a specific record
  58.         \item sorting the list will create performance overhead
  59.     \end{itemize}
  60.  
  61.     \item Arrays
  62.     \begin{itemize}
  63.         \item implementation complexity is very low
  64.         \item assumes every record is the same size
  65.         \item size needs to be known ahead of time; resize will require to delete and re-insert every record which will create massive amounts of performance overhead
  66.     \end{itemize}
  67.  
  68.     \item B+ trees
  69.     \begin{itemize}
  70.         \item implementation complexity is medium-high
  71.         \item most used in database proprietary software
  72.         \item self-indexing to find records fast and efficiently
  73.         \item accesses a record more efficiently than binary search tree (uses less operations)
  74.     \end{itemize}
  75. \end{itemize}
  76. \end{comment}
  77.  
  78. \subsection{Configuration file parser}
  79.  
  80.     Making a configuration file parser can take took long to implement from scratch. The team has decided to use an existing library to aid in function of the configuration file parser. The library used is called YAML (Yet Another Markup Language) which is a human friendly data serialization standard for all programming languages \cite{alt_solution_1}.
  81.  
  82. \subsection{Table records}
  83.  
  84. Table records will be stored as strings. Each field value of a record will be separated from the next by a delimiting character. The team will adopt a standard called comma-separated values (CSV) \cite{alt_solution_2}. Problems that may occur include delimiter collisions wherein a delimiter is introduced into a text without intending them to be interpreted as boundaries between separate regions \cite{alt_solution_3}.
  85.  
  86. \subsection{Data compression and cryptography}
  87.  
  88. The team wishes to add security and low-bandwidth characteristics to the connection pipes between the client and the server. Data compressions will help save bandwidth, but at the same time the decompression and compression operations have to be very fast to add insignificant overhead. The team has identified that the Lempel-Ziv-Oberhumer (LZO) library is suitable for this task since it is a lossless data compression algorithm that is focused on decompression speed \cite{alt_solution_4}.
  89.  
  90. The team has looked into numerous cryptography libraries and overall security, performance, and efficiency have been the deciding factor in choosing the best candidates. The team has chosen the Advanced Encryption Standard (AES) as the best candidate for encrypting and decrypting data. The deciding factor for using this is because the AES has been adopted by the U.S. government to protect sensitive data \cite{alt_solution_5}. Moreover, the team wants to protect communication data in the same manner.
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement