Advertisement
Guest User

Untitled

a guest
Feb 2nd, 2017
161
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 4.02 KB | None | 0 0
  1. library(tm)
  2. library(ggplot2)
  3.  
  4. #tm is the text mining package of R
  5. #ggplot is for visualization
  6. #there are 2 sets of files for each type of mail and one will be used for training while other will be for testing
  7.  
  8. spam.path<-"data/spam/"
  9. spam2.path<-"data/spam_2/"
  10. easyham.path<-"data/easy_ham/"
  11. easyham2.path<-"data/easy_ham_2/"
  12. hardham.path<-"data/hard_ham/"
  13. hardham2.path<-"data/hard_ham_2"
  14.  
  15. get.msg<-function(path){
  16. print(path)
  17. connection<-file(path,open="rt", encoding="Latin1")
  18.  
  19. text<-readLines(connection)
  20. #the message begins after a full line break
  21.  
  22. t<-which(text=="")[1]+1
  23. print(length(text))
  24. print(t)
  25. msg<-text[seq(t, length(text),1)]
  26.  
  27.  
  28. close(connection)
  29. return (paste(msg, collapse="n"))
  30.  
  31. }
  32.  
  33.  
  34. # create a vector of emails
  35. #use apply function
  36.  
  37. spam.docs<-dir(spam.path)
  38. #this returns a list of file names in the directory
  39. spam.docs<-spam.docs[c(1:length(spam.docs)-1)]
  40. #spam.docs<-spam.docs[which(spam.docs!="")]
  41. #cmds file is a UNIX file which we dont need
  42. #spam.docs<-spam.docs[!startsWith(spam.docs, "cmds")]
  43.  
  44. all.spam<-sapply(spam.docs, function(p) get.msg(paste(spam.path,p, sep="")))
  45.  
  46. #use the command below for inspection
  47. head(all.spam)
  48.  
  49. wReturn-Path: ler@lerami.lerctr.org
  50. Delivery-Date: Sat Sep 7 06:14:26 2002
  51. Received: via dmail-2002(12) for +lists/freebsd/ports; Sat, 7 Sep 2002 06:14:26 -0500 (CDT)
  52. Return-Path: <owner-freebsd-ports@FreeBSD.ORG>
  53. Received: from mx2.freebsd.org (mx2.FreeBSD.org [216.136.204.119])
  54. by lerami.lerctr.org (8.12.2/8.12.2/20020902/$Revision: 1.30 $) with ESMTP id g87BEIX2008158
  55. for <ler@lerctr.org>; Sat, 7 Sep 2002 06:14:19 -0500 (CDT)
  56. Received: from hub.freebsd.org (hub.FreeBSD.org [216.136.204.18])
  57. by mx2.freebsd.org (Postfix) with ESMTP
  58. id D6B0B5548A; Sat, 7 Sep 2002 04:14:14 -0700 (PDT)
  59. (envelope-from owner-freebsd-ports@FreeBSD.ORG)
  60. Received: by hub.freebsd.org (Postfix, from userid 538)
  61. id 448B137B401; Sat, 7 Sep 2002 04:14:14 -0700 (PDT)
  62. Received: from localhost (localhost [127.0.0.1])
  63. by hub.freebsd.org (Postfix) with SMTP
  64. id 33D712E8016; Sat, 7 Sep 2002 04:14:14 -0700 (PDT)
  65. Received: by hub.freebsd.org (bulk_mailer v1.12); Sat, 7 Sep 2002 04:14:14 -0700
  66. Delivered-To: freebsd-ports@freebsd.org
  67. Received: from mx1.FreeBSD.org (mx1.FreeBSD.org [216.136.204.125])
  68. by hub.freebsd.org (Postfix) with ESMTP id 31C6137B400
  69. for <ports@freebsd.org>; Sat, 7 Sep 2002 04:14:13 -0700 (PDT)
  70. Received: from mgate.netpath.ne.jp (lilac.netpath.ne.jp [210.253.168.208])
  71. by mx1.FreeBSD.org (Postfix) with SMTP id 4816543E65
  72. for <ports@freebsd.org>; Sat, 7 Sep 2002 04:14:12 -0700 (PDT)
  73. (envelope-from joko@rs.128.ne.jp)
  74. Received: (qmail 46020 invoked from network); 7 Sep 2002 09:24:03 -0000
  75. Received: from p6044-ipad22marunouchi.tokyo.ocn.ne.jp (HELO D) (61.214.35.44)
  76. by lilac.netpath.ne.jp with SMTP; 7 Sep 2002 09:24:03 -0000
  77. From: =?iso-2022-jp?B?am9rb0Bycy4xMjgubmUuanA=?=@FreeBSD.ORG
  78. To: =?iso-2022-jp?B?MTIx?=@FreeBSD.ORG
  79. Reply-To: joko@rs.128.ne.jp
  80. Date: Sat, 07 Sep 2002 18:24:06 +0900
  81. Subject: =?iso-2022-jp?B?GyRCJDckOCRfJEgkYiRiJE4lMyVpJVwlbCE8JTclZyVzGyhK?=
  82. Content-Type: text/plain
  83. Content-Transfer-Encoding: 7bit
  84. MIME-Version: 1.0
  85. Message-Id: <20020907111412.4816543E65@mx1.FreeBSD.org>
  86. Sender: owner-freebsd-ports@FreeBSD.ORG
  87. List-ID: <freebsd-ports.FreeBSD.ORG>
  88. List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
  89. List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
  90. List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-ports>
  91. List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-ports>
  92. X-Loop: FreeBSD.org
  93. Precedence: bulk
  94. X-Virus-Scanned: by amavisd-milter (http://amavis.org/)
  95. Status: RO
  96. X-Status:
  97. X-Keywords:
  98. X-UID: 8
  99.  
  100. ààªÍ¶¯ÄÔǤªäêé->line 52
  101. µ¶ÝÆààÌR{[V
  102. [^rfIicucjêå
  103. ¢ÂÜÅcÆÅ«é©í©èܹñ
  104. ²¶Í¨ßÉI
  105. http://www.transrave.com/PC/rori
  106. ìiá
  107. ­`à@¼Ã®cn9@­Ì¹
  108. ÈÇÈÇ132ìiBD]­I
  109. (^-^)/~Fg[
  110. To Unsubscribe: send mail to majordomo@FreeBSD.org
  111. with "unsubscribe freebsd-ports" in the body of the message
  112.  
  113. msg<-text[seq(t, length(text),1)]
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement