Advertisement
esimran

Untitled

Apr 24th, 2017
71
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 6.78 KB | None | 0 0
  1. Name: Simran Singh partnered with Trishul Nagenalli
  2. NetID: ss810 and tn74
  3. Hours Spent: 6.0
  4. Consulted With: Ramil
  5. Resources Used: NONE
  6. Impressions: I thought the assignment was really cool.
  7. ----------------------------------------------------------------------
  8. Problem 1: Describe testing
  9. We tested our files using the tester tab provided in the GUI to ensure that all provided files could be properly compressed
  10. and decompressed. We checked the frequency count of different characters in the data and the compression rate to ensure that
  11. a more highly concentrated frequency count for the file's characters did lead to better compression rates. We also used this
  12. information when testing if tiff images or text files compressed better.
  13.  
  14. Following testing with given files, we ran our own tests with edge cases and ascertained compression rates
  15. to see if they made sense. We compressed/decompressed a file that consisted entirely of a single repeated character. Because
  16. there is only one character repeated several times, we expect compression to be very high since the Huffman tree is essentially
  17. encoding that character with one bit. The compression rate for such a file should approach 88.5% (8:1 compression) as the length
  18. of the file approaches infinity, we were getting compression rates just above above 80%
  19.  
  20. We next tested with files that had 0 and 1 character(s) respectively. We discovered that a file with one character worked fine
  21. in our program though it did have a terrible compression rate. Rather than have the single character be represented by one
  22. byte, it had to be represented by 7 so the Huffman tree could be stored before it. This is expected as Huffman compression
  23. is rather unnecessary for a file that is only one character long. The empty file initially crashed our program, which we
  24. resolved by treating an empty file as an outlier case that would throw a HuffException.
  25.  
  26.  
  27. Problem 2: Benchmark and analyze your code
  28.  
  29. For Calgary:
  30. AlphFreq: 82
  31. AlphFreq: 83
  32. AlphFreq: 97
  33. AlphFreq: 257
  34. AlphFreq: 99
  35. AlphFreq: 257
  36. AlphFreq: 257
  37. AlphFreq: 96
  38. AlphFreq: 92
  39. AlphFreq: 85
  40. AlphFreq: 94
  41. AlphFreq: 160
  42. AlphFreq: 93
  43. AlphFreq: 88
  44. AlphFreq: 90
  45. AlphFreq: 100
  46. File Length: 111261
  47. Compresion Rate: 0.34496364404418445
  48. File Length: 768771
  49. Compresion Rate: 0.41891317588451327
  50. File Length: 610856
  51. Compresion Rate: 0.40987183477229683
  52. File Length: 102400
  53. Compresion Rate: 0.4020340327674595
  54. File Length: 377109
  55. Compresion Rate: 0.39135717319910657
  56. File Length: 21504
  57. Compresion Rate: 0.38968904579093033
  58. File Length: 246814
  59. Compresion Rate: 0.3701141056364924
  60. File Length: 53161
  61. Compresion Rate: 0.37011862770935255
  62. File Length: 82199
  63. Compresion Rate: 0.37181512799721994
  64. File Length: 46526
  65. Compresion Rate: 0.3725706962857571
  66. File Length: 38105
  67. Compresion Rate: 0.3724690955323654
  68. File Length: 513216
  69. Compresion Rate: 0.44490770619148146
  70. File Length: 39611
  71. Compresion Rate: 0.44355947618704494
  72. File Length: 71646
  73. Compresion Rate: 0.4425078790430267
  74. File Length: 49379
  75. Compresion Rate: 0.44160906198704064
  76. File Length: 93695
  77. Compresion Rate: 0.43756642767941634
  78.  
  79.  
  80.  
  81. For Waterloo:
  82. AlphFreq: 231
  83. AlphFreq: 156
  84. AlphFreq: 231
  85. AlphFreq: 257
  86. AlphFreq: 254
  87. AlphFreq: 21
  88. AlphFreq: 257
  89. AlphFreq: 19
  90. AlphFreq: 250
  91. AlphFreq: 117
  92. AlphFreq: 186
  93. AlphFreq: 227
  94. AlphFreq: 25
  95. AlphFreq: 227
  96. AlphFreq: 227
  97. AlphFreq: 254
  98. AlphFreq: 118
  99. AlphFreq: 256
  100. AlphFreq: 252
  101. AlphFreq: 238
  102. AlphFreq: 249
  103. AlphFreq: 21
  104. AlphFreq: 18
  105. AlphFreq: 254
  106. AlphFreq: 51
  107. AlphFreq: 188
  108. File Length: 262274
  109. Compresion Rate: 0.06235844956038339
  110. File Length: 65666
  111. Compresion Rate: 0.07893517106787828
  112. File Length: 262274
  113. Compresion Rate: 0.0905214041008855
  114. File Length: 65666
  115. Compresion Rate: 0.08469689577361716
  116. File Length: 65666
  117. Compresion Rate: 0.08725708409443056
  118. File Length: 65666
  119. Compresion Rate: 0.14319395537669655
  120. File Length: 2149096
  121. Compresion Rate: 0.07761447368600294
  122. File Length: 65666
  123. Compresion Rate: 0.09495118878444653
  124. File Length: 333442
  125. Compresion Rate: 0.10647037730825781
  126. File Length: 309388
  127. Compresion Rate: 0.12919953994782707
  128. File Length: 3706306
  129. Compresion Rate: 0.2706247083773743
  130. File Length: 65666
  131. Compresion Rate: 0.26873833050910534
  132. File Length: 65666
  133. Compresion Rate: 0.2738776992858748
  134. File Length: 163458
  135. Compresion Rate: 0.2736700715416105
  136. File Length: 262274
  137. Compresion Rate: 0.2671126103193986
  138. File Length: 1179784
  139. Compresion Rate: 0.240192791383939
  140. File Length: 307330
  141. Compresion Rate: 0.23946940210880174
  142. File Length: 786568
  143. Compresion Rate: 0.22394345392431403
  144. File Length: 1179784
  145. Compresion Rate: 0.20904772550441664
  146. File Length: 1498414
  147. Compresion Rate: 0.21357569727156667
  148. File Length: 65666
  149. Compresion Rate: 0.21275712300746108
  150. File Length: 65666
  151. Compresion Rate: 0.21587134736817148
  152. File Length: 65666
  153. Compresion Rate: 0.21911408486387285
  154. File Length: 1179784
  155. Compresion Rate: 0.20408930748376952
  156. File Length: 262274
  157. Compresion Rate: 0.21189652330384412
  158. File Length: 262274
  159. Compresion Rate: 0.20967661297883722
  160.  
  161. For a fixed alphabet size and varying file length, the compression rate was largely unchanged. This data suggests
  162. that file length does not heavily influence the compression rate. This held true for waterloo and calgary.
  163. The alphabet size, however, did result in changes to the compression rate in both tests. The greater the alphabet size,
  164. the slower the compression. This makes sense as file length does not influence the tree size and hence the compression,
  165. but a wide range of characters increases the tree size, thus decreasing compression rate.
  166. Conversely, for time (not shown, but was evaluated in huffviewer), the larger the file length,
  167. the greater the time, and time was indifferent with respect to alphabet size. For time, file length matters as it takes
  168. more time to traverse the bit stream, but the time to traverse to 0s and 1s is the same regardless of the actual value
  169. of the bits.
  170.  
  171. Problem 3: Text vs. Binary
  172.  
  173. The Binary files compressed about 21%, while the text compressed about 44%. The Binary files have a wider variety of
  174. character use than text files. This results in more branches in the tree and therefore less compression. As the data
  175. clearly shows, the more variety in characters, the worse the compression.
  176.  
  177. Problem 4: Compressing compressed files
  178.  
  179. Generally, no extra compression is achived by compressing a compressed file again, for our case, it was consistently worse
  180. by about .65% every time we tried to comrpess again for the Melville text specifically.
  181. For Huffman coding, compressing an already compressed file is inefficient as the second compression will try to
  182. compress the header that was written for the first compression. This results in larger files than the original compression.
  183. This trend continues as you constantly attempt to compress files that are already compressed by a Huffman method.
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement