Guest User

Untitled

a guest
Mar 16th, 2016
146
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Bash 16.55 KB | None | 0 0
  1. (spamfilter)
  2. tasdik at Acer in ~/Dropbox/projects/spamfilter on master [!]
  3. $ make run
  4. python test.py
  5. Enter Corpus Directory['eg: corpus2'] : corpus2
  6. Enter Spam Sub Directory[eg : 'spam']: spam
  7. Enter Clean Emails Sub Directory[eg :'ham']: ham
  8. Enter Limit of files per class(spam/ham)[eg: 1000]: 1500
  9. Training 1500 emails in spam class
  10. Skipping file 2912.2005-06-30.SA_and_HP.spam.txt because of bad coding
  11. Skipping file 3620.2005-07-06.SA_and_HP.spam.txt because of bad coding
  12. Skipping file 1669.2005-06-22.SA_and_HP.spam.txt because of bad coding
  13. Skipping file 3635.2005-07-06.SA_and_HP.spam.txt because of bad coding
  14. Skipping file 0075.2002-04-24.SA_and_HP.spam.txt because of bad coding
  15. Skipping file 1981.2005-06-24.SA_and_HP.spam.txt because of bad coding
  16. Skipping file 1934.2005-06-24.SA_and_HP.spam.txt because of bad coding
  17. Skipping file 3556.2005-07-05.SA_and_HP.spam.txt because of bad coding
  18. Skipping file 2406.2005-06-27.SA_and_HP.spam.txt because of bad coding
  19. Skipping file 1280.2002-09-22.SA_and_HP.spam.txt because of bad coding
  20. Skipping file 3977.2005-07-14.SA_and_HP.spam.txt because of bad coding
  21. Skipping file 2432.2005-06-27.SA_and_HP.spam.txt because of bad coding
  22. Skipping file 5070.2005-07-21.SA_and_HP.spam.txt because of bad coding
  23. Skipping file 3554.2005-07-05.SA_and_HP.spam.txt because of bad coding
  24. Skipping file 3050.2005-07-01.SA_and_HP.spam.txt because of bad coding
  25. Skipping file 2132.2005-06-25.SA_and_HP.spam.txt because of bad coding
  26. Skipping file 0085.2002-05-05.SA_and_HP.spam.txt because of bad coding
  27. Skipping file 3739.2005-07-06.SA_and_HP.spam.txt because of bad coding
  28. Skipping file 2789.2005-06-29.SA_and_HP.spam.txt because of bad coding
  29. Skipping file 4575.2005-07-19.SA_and_HP.spam.txt because of bad coding
  30. Skipping file 1161.2002-09-08.SA_and_HP.spam.txt because of bad coding
  31. Skipping file 2928.2005-06-30.SA_and_HP.spam.txt because of bad coding
  32. Skipping file 1988.2005-06-24.SA_and_HP.spam.txt because of bad coding
  33. Skipping file 3205.2005-07-02.SA_and_HP.spam.txt because of bad coding
  34. Skipping file 1563.2005-06-21.SA_and_HP.spam.txt because of bad coding
  35. Skipping file 0018.2001-07-13.SA_and_HP.spam.txt because of bad coding
  36. Skipping file 4968.2005-07-20.SA_and_HP.spam.txt because of bad coding
  37. Skipping file 2798.2005-06-29.SA_and_HP.spam.txt because of bad coding
  38. Skipping file 1053.2002-08-25.SA_and_HP.spam.txt because of bad coding
  39. Skipping file 0982.2002-08-08.SA_and_HP.spam.txt because of bad coding
  40. Skipping file 2661.2005-06-29.SA_and_HP.spam.txt because of bad coding
  41. Skipping file 1244.2002-09-19.SA_and_HP.spam.txt because of bad coding
  42. No of Features : 14492,
  43. Number of spam email : 1468,
  44. Number of ham email : 0,
  45. Total number of emails:  1468
  46. Training 1500 emails in ham class
  47. No of Features : 18792,
  48. Number of spam email : 1468,
  49. Number of ham email : 1500,
  50. Total number of emails:  2968
  51. Training took 5.0 minutes : 15.292683 seconds
  52.  
  53.  
  54. ########################################
  55. Testing the classifier on Test Dataset
  56.  
  57. Enter corpus directory: [eg: corpus3] corpus3
  58. Enter spam directory: spam
  59. Enter number of files to be scanned [defaults to 1000 files]: 1000
  60. Testing spam files
  61. Skipping file: '3898.2005-03-13.BG.spam.txt'' due to bad encoding!
  62. Classified 5130.2005-06-05.BG.spam.txt incorrectly
  63. Skipping file: '1472.2004-11-07.BG.spam.txt'' due to bad encoding!
  64. Skipping file: '5834.2005-07-19.BG.spam.txt'' due to bad encoding!
  65. Classified 5065.2005-05-31.BG.spam.txt incorrectly
  66. Skipping file: '4809.2005-05-14.BG.spam.txt'' due to bad encoding!
  67. Skipping file: '0452.2004-09-03.BG.spam.txt'' due to bad encoding!
  68. Classified 2480.2005-01-03.BG.spam.txt incorrectly
  69. Classified 2425.2004-12-30.BG.spam.txt incorrectly
  70. Skipping file: '0314.2004-08-23.BG.spam.txt'' due to bad encoding!
  71. Skipping file: '2572.2005-01-07.BG.spam.txt'' due to bad encoding!
  72. Skipping file: '2947.2005-01-23.BG.spam.txt'' due to bad encoding!
  73. Classified 0969.2004-10-08.BG.spam.txt incorrectly
  74. Skipping file: '3956.2005-03-17.BG.spam.txt'' due to bad encoding!
  75. Skipping file: '2697.2005-01-13.BG.spam.txt'' due to bad encoding!
  76. Classified 0841.2004-09-29.BG.spam.txt incorrectly
  77. Skipping file: '1796.2004-11-24.BG.spam.txt'' due to bad encoding!
  78. Classified 4608.2005-04-30.BG.spam.txt incorrectly
  79. Skipping file: '2271.2004-12-22.BG.spam.txt'' due to bad encoding!
  80. Skipping file: '0036.2004-08-03.BG.spam.txt'' due to bad encoding!
  81. Classified 0299.2004-08-21.BG.spam.txt incorrectly
  82. Classified 0442.2004-09-02.BG.spam.txt incorrectly
  83. Skipping file: '1292.2004-10-28.BG.spam.txt'' due to bad encoding!
  84. Skipping file: '0104.2004-08-08.BG.spam.txt'' due to bad encoding!
  85. Skipping file: '3356.2005-02-10.BG.spam.txt'' due to bad encoding!
  86. Skipping file: '2817.2005-01-19.BG.spam.txt'' due to bad encoding!
  87. Skipping file: '0018.2004-08-02.BG.spam.txt'' due to bad encoding!
  88. Skipping file: '3358.2005-02-10.BG.spam.txt'' due to bad encoding!
  89. Skipping file: '2173.2004-12-18.BG.spam.txt'' due to bad encoding!
  90. Skipping file: '0911.2004-10-04.BG.spam.txt'' due to bad encoding!
  91. Classified 4787.2005-05-12.BG.spam.txt incorrectly
  92. Skipping file: '3897.2005-03-13.BG.spam.txt'' due to bad encoding!
  93. Classified 2995.2005-01-24.BG.spam.txt incorrectly
  94. Classified 5869.2005-07-20.BG.spam.txt incorrectly
  95. Skipping file: '2087.2004-12-13.BG.spam.txt'' due to bad encoding!
  96. Skipping file: '2268.2004-12-22.BG.spam.txt'' due to bad encoding!
  97. Classified 0915.2004-10-05.BG.spam.txt incorrectly
  98. Skipping file: '1925.2004-12-02.BG.spam.txt'' due to bad encoding!
  99. Skipping file: '0680.2004-09-19.BG.spam.txt'' due to bad encoding!
  100. Classified 1942.2004-12-03.BG.spam.txt incorrectly
  101. Classified 4857.2005-05-16.BG.spam.txt incorrectly
  102. Skipping file: '1336.2004-10-30.BG.spam.txt'' due to bad encoding!
  103. Classified 0827.2004-09-29.BG.spam.txt incorrectly
  104. Skipping file: '5740.2005-07-14.BG.spam.txt'' due to bad encoding!
  105. Skipping file: '0425.2004-09-01.BG.spam.txt'' due to bad encoding!
  106. Skipping file: '2381.2004-12-28.BG.spam.txt'' due to bad encoding!
  107. Skipping file: '1268.2004-10-27.BG.spam.txt'' due to bad encoding!
  108. Skipping file: '4023.2005-03-20.BG.spam.txt'' due to bad encoding!
  109. Skipping file: '0089.2004-08-07.BG.spam.txt'' due to bad encoding!
  110. Skipping file: '1201.2004-10-24.BG.spam.txt'' due to bad encoding!
  111. Skipping file: '5743.2005-07-15.BG.spam.txt'' due to bad encoding!
  112. Skipping file: '0793.2004-09-27.BG.spam.txt'' due to bad encoding!
  113. Classified 3801.2005-03-07.BG.spam.txt incorrectly
  114. Classified 0842.2004-09-29.BG.spam.txt incorrectly
  115. Files classified correctly : 946 out of 964
  116. Precision : 0.98132780083
  117.  
  118.  
  119. ########################################
  120.  
Add Comment
Please, Sign In to add comment