Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- library(tm)
- library(ggplot2)
- #tm is the text mining package of R
- #ggplot is for visualization
- #there are 2 sets of files for each type of mail and one will be used for training while other will be for testing
- spam.path<-"data/spam/"
- spam2.path<-"data/spam_2/"
- easyham.path<-"data/easy_ham/"
- easyham2.path<-"data/easy_ham_2/"
- hardham.path<-"data/hard_ham/"
- hardham2.path<-"data/hard_ham_2"
- get.msg<-function(path){
- print(path)
- connection<-file(path,open="rt", encoding="Latin1")
- text<-readLines(connection)
- #the message begins after a full line break
- t<-which(text=="")[1]+1
- print(length(text))
- print(t)
- msg<-text[seq(t, length(text),1)]
- close(connection)
- return (paste(msg, collapse="n"))
- }
- # create a vector of emails
- #use apply function
- spam.docs<-dir(spam.path)
- #this returns a list of file names in the directory
- spam.docs<-spam.docs[c(1:length(spam.docs)-1)]
- #spam.docs<-spam.docs[which(spam.docs!="")]
- #cmds file is a UNIX file which we dont need
- #spam.docs<-spam.docs[!startsWith(spam.docs, "cmds")]
- all.spam<-sapply(spam.docs, function(p) get.msg(paste(spam.path,p, sep="")))
- #use the command below for inspection
- head(all.spam)
- wReturn-Path: ler@lerami.lerctr.org
- Delivery-Date: Sat Sep 7 06:14:26 2002
- Received: via dmail-2002(12) for +lists/freebsd/ports; Sat, 7 Sep 2002 06:14:26 -0500 (CDT)
- Return-Path: <owner-freebsd-ports@FreeBSD.ORG>
- Received: from mx2.freebsd.org (mx2.FreeBSD.org [216.136.204.119])
- by lerami.lerctr.org (8.12.2/8.12.2/20020902/$Revision: 1.30 $) with ESMTP id g87BEIX2008158
- for <ler@lerctr.org>; Sat, 7 Sep 2002 06:14:19 -0500 (CDT)
- Received: from hub.freebsd.org (hub.FreeBSD.org [216.136.204.18])
- by mx2.freebsd.org (Postfix) with ESMTP
- id D6B0B5548A; Sat, 7 Sep 2002 04:14:14 -0700 (PDT)
- (envelope-from owner-freebsd-ports@FreeBSD.ORG)
- Received: by hub.freebsd.org (Postfix, from userid 538)
- id 448B137B401; Sat, 7 Sep 2002 04:14:14 -0700 (PDT)
- Received: from localhost (localhost [127.0.0.1])
- by hub.freebsd.org (Postfix) with SMTP
- id 33D712E8016; Sat, 7 Sep 2002 04:14:14 -0700 (PDT)
- Received: by hub.freebsd.org (bulk_mailer v1.12); Sat, 7 Sep 2002 04:14:14 -0700
- Delivered-To: freebsd-ports@freebsd.org
- Received: from mx1.FreeBSD.org (mx1.FreeBSD.org [216.136.204.125])
- by hub.freebsd.org (Postfix) with ESMTP id 31C6137B400
- for <ports@freebsd.org>; Sat, 7 Sep 2002 04:14:13 -0700 (PDT)
- Received: from mgate.netpath.ne.jp (lilac.netpath.ne.jp [210.253.168.208])
- by mx1.FreeBSD.org (Postfix) with SMTP id 4816543E65
- for <ports@freebsd.org>; Sat, 7 Sep 2002 04:14:12 -0700 (PDT)
- (envelope-from joko@rs.128.ne.jp)
- Received: (qmail 46020 invoked from network); 7 Sep 2002 09:24:03 -0000
- Received: from p6044-ipad22marunouchi.tokyo.ocn.ne.jp (HELO D) (61.214.35.44)
- by lilac.netpath.ne.jp with SMTP; 7 Sep 2002 09:24:03 -0000
- From: =?iso-2022-jp?B?am9rb0Bycy4xMjgubmUuanA=?=@FreeBSD.ORG
- To: =?iso-2022-jp?B?MTIx?=@FreeBSD.ORG
- Reply-To: joko@rs.128.ne.jp
- Date: Sat, 07 Sep 2002 18:24:06 +0900
- Subject: =?iso-2022-jp?B?GyRCJDckOCRfJEgkYiRiJE4lMyVpJVwlbCE8JTclZyVzGyhK?=
- Content-Type: text/plain
- Content-Transfer-Encoding: 7bit
- MIME-Version: 1.0
- Message-Id: <20020907111412.4816543E65@mx1.FreeBSD.org>
- Sender: owner-freebsd-ports@FreeBSD.ORG
- List-ID: <freebsd-ports.FreeBSD.ORG>
- List-Archive: <http://docs.freebsd.org/mail/> (Web Archive)
- List-Help: <mailto:majordomo@FreeBSD.ORG?subject=help> (List Instructions)
- List-Subscribe: <mailto:majordomo@FreeBSD.ORG?subject=subscribe%20freebsd-ports>
- List-Unsubscribe: <mailto:majordomo@FreeBSD.ORG?subject=unsubscribe%20freebsd-ports>
- X-Loop: FreeBSD.org
- Precedence: bulk
- X-Virus-Scanned: by amavisd-milter (http://amavis.org/)
- Status: RO
- X-Status:
- X-Keywords:
- X-UID: 8
- ààªÍ¶¯ÄÔǤªäêé->line 52
- µ¶ÝÆààÌR{[V
- [^rfIicucjêå
- ¢ÂÜÅcÆÅ«é©í©èܹñ
- ²¶Í¨ßÉI
- http://www.transrave.com/PC/rori
- ìiá
- `à@¼Ã®cn9@̹
- ÈÇÈÇ132ìiBD]I
- (^-^)/~Fg[
- To Unsubscribe: send mail to majordomo@FreeBSD.org
- with "unsubscribe freebsd-ports" in the body of the message
- msg<-text[seq(t, length(text),1)]
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement