Advertisement
Guest User

Untitled

a guest
Oct 24th, 2016
70
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 3.92 KB | None | 0 0
  1. \documentclass{article}
  2.  
  3. \usepackage[french]{babel}
  4. \usepackage[utf8]{inputenc}
  5.  
  6. \author{Matthieu DURAND, Antoine TOMBU, Kevin DAVY, Estelle BECUE, Fanomezantsoa RANDRIANARIDERA, Carole WRIGHT}
  7.  
  8. \title{Design report}
  9. \maketitle
  10. \begin{document}
  11.  
  12. \section{Context}
  13.  
  14. We are going to create a software to delete duplicated files in an hard drive, the goal is to save some space, this application comes from an administrator system who figured out that the employees of his company was losing too much space with some duplicated files.
  15.  
  16.  
  17. Severals functionalities were decided. The software will be divided in two parts. The first part to find all the duplicated files and the second one to delete those duplicated files with different options.
  18.  
  19.  
  20. Now that the functionalities were clearly defined, solutions will be describe in this report.
  21.  
  22. Let’s remind us the main functionalities:
  23.  
  24. \begin{itemize}
  25. \item Hard drive scan.
  26. \item Detection of potential duplicated files.
  27. \item Delete duplicated files.
  28. \end{itemize}
  29.  
  30. \section{Detection of duplicated files}
  31.  
  32. \subsection{Hard drive scan}
  33.  
  34. The hard drive will be scanned in totality, all the files will be compared to each other to see if they are potential duplicated files. Only the system files will not be scanned, indeed system files are not files that can be deleted so they don’t have to be in the potential duplicated files. To identify those system files we can use the System flags, in the programming language there are also some classes that can detect if a file is a system file or not. So every file will be tested if it’s a system file nothing will be done.
  35.  
  36.  
  37. The hard drive will be scanned recursively, so each folder at the root will be scanned.
  38.  
  39.  
  40. \subsection{Detection of potential duplicated files}
  41.  
  42. Some files are saved several times on the hard drive, but those files are the same, so we have duplicated files, we identified two criterias to detect those files.
  43.  
  44. The first criteria is that if two files have approximately the same name they are potential duplicated files. This criteria comes from the fact that when we versioning a file, we duplicate the same file every time but the name can change a little bit. We also sometimes copy and paste the same file everywhere in the hard drive.
  45.  
  46. So we are going to compare the name of every files. If the biggest sequence of the name match to the other biggest sequence, and if this sequence size is bigger than the half of the word they will be potential duplicated files.
  47. This means that for example:
  48.  
  49. We have two files named: projet1.txt and projetL2.txt, that means that for these two words the biggest sequence is “projet”, “projet” size is bigger than the half of “projet1” and “projetL2”, so they are potential duplicated files.
  50.  
  51.  
  52. \subsection{Transmission of the duplicated files}
  53.  
  54. For the transmission of files information, we will be using the JSON format containing for each file: name, type, modified date, the key, the path that we will need for the criterium of deletion. JSON (JavaScript Object Notation) is a textual data format derived from the notation of JavaScript objects.
  55.  
  56.  
  57. The files in JSON format are initially stored in a file called DuplicateSaveFile where we will start looking for duplicate from the information. We chose this format because it allows representing structured information which will be useful to create our backup file.
  58. Then the search for duplicates can start from these JSON files.
  59.  
  60. \section{Deletion of duplicated files}
  61.  
  62. We chose to use the decorator pattern for add different filters (behavior) to deletion object in order to ease the deletion of duplicate. We can use one filter or more, filters are the following :
  63. \begin{itemize}
  64. \item Same name
  65. \item Same content
  66. \item Type of file
  67. \item Date of file (recent or old)
  68. \end{itemize}
  69.  
  70. For example we can delete all older files with the same content to keep only the last copy.
  71.  
  72.  
  73. \end{document}
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement