Advertisement
sujitpal

JUnit/OpenNLP code to extract Noun Phrases from text

Nov 10th, 2012
740
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Java 3.93 KB | None | 0 0
  1.   @Test
  2.   public void testNounPhraseExtractionStandalone() throws Exception {
  3.     SentenceDetectorME sentenceDetector;
  4.     TokenizerME tokenizer;
  5.     POSTaggerME posTagger;
  6.     ChunkerME chunker;
  7.     InputStream smis = null;
  8.     InputStream tmis = null;
  9.     InputStream pmis = null;
  10.     InputStream cmis = null;
  11.     try {
  12.       smis = new FileInputStream(new File("/Users/sujit/Projects/tgni/src/main/resources/models/en_sent.bin"));
  13.       tmis = new FileInputStream(new File("/Users/sujit/Projects/tgni/src/main/resources/models/en_token.bin"));
  14.       pmis = new FileInputStream(new File("/Users/sujit/Projects/tgni/src/main/resources/models/en_pos_maxent.bin"));
  15.       cmis = new FileInputStream(new File("/Users/sujit/Projects/tgni/src/main/resources/models/en_chunker.bin"));
  16.       SentenceModel smodel = new SentenceModel(smis);
  17.       sentenceDetector = new SentenceDetectorME(smodel);
  18.       TokenizerModel tmodel = new TokenizerModel(tmis);
  19.       tokenizer = new TokenizerME(tmodel);
  20.       POSModel pmodel = new POSModel(pmis);
  21.       posTagger = new POSTaggerME(pmodel);
  22.       ChunkerModel cmodel = new ChunkerModel(cmis);
  23.       chunker = new ChunkerME(cmodel);
  24.     } finally {
  25.       IOUtils.closeQuietly(cmis);
  26.       IOUtils.closeQuietly(pmis);
  27.       IOUtils.closeQuietly(tmis);
  28.       IOUtils.closeQuietly(smis);
  29.     }
  30.     String text = "This article provides a review of the literature on clinical correlates of awareness in dementia. Most inconsistencies were found with regard to an association between depression and higher levels of awareness. Dysthymia, but not major depression, is probably related to higher levels of awareness. Anxiety also appears to be related to higher levels of awareness. Apathy and psychosis are frequently present in patients with less awareness, and may share common neuropathological substrates with awareness. Furthermore, unawareness seems to be related to difficulties in daily life functioning, increased caregiver burden, and deterioration in global dementia severity. Factors that may be of influence on the inconclusive data are discussed, as are future directions of research.";
  31.     Span[] sentSpans = sentenceDetector.sentPosDetect(text);
  32.     for (Span sentSpan : sentSpans) {
  33.       String sentence = sentSpan.getCoveredText(text).toString();
  34.       int start = sentSpan.getStart();
  35.       Span[] tokSpans = tokenizer.tokenizePos(sentence);
  36.       String[] tokens = new String[tokSpans.length];
  37.       for (int i = 0; i < tokens.length; i++) {
  38.         tokens[i] = tokSpans[i].getCoveredText(sentence).toString();
  39.       }
  40.       String[] tags = posTagger.tag(tokens);
  41.       Span[] chunks = chunker.chunkAsSpans(tokens, tags);
  42.       for (Span chunk : chunks) {
  43.         if ("NP".equals(chunk.getType())) {
  44.           int npstart = start + tokSpans[chunk.getStart()].getStart();
  45.           int npend = start + tokSpans[chunk.getEnd() - 1].getEnd();
  46.           System.out.println(text.substring(npstart, npend));
  47.         }
  48.       }
  49.     }
  50.   }
  51.  
  52. produces following noun phrases:
  53.  
  54.     [junit] This article
  55.     [junit] a review
  56.     [junit] the literature
  57.     [junit] clinical correlates
  58.     [junit] awareness
  59.     [junit] dementia
  60.     [junit] Most inconsistencies
  61.     [junit] regard
  62.     [junit] an association
  63.     [junit] depression and higher levels
  64.     [junit] awareness
  65.     [junit] Dysthymia
  66.     [junit] not major depression
  67.     [junit] higher levels
  68.     [junit] awareness
  69.     [junit] Anxiety
  70.     [junit] higher levels
  71.     [junit] awareness
  72.     [junit] psychosis
  73.     [junit] patients
  74.     [junit] less awareness
  75.     [junit] common neuropathological substrates
  76.     [junit] awareness
  77.     [junit] unawareness
  78.     [junit] difficulties
  79.     [junit] daily life functioning
  80.     [junit] caregiver burden
  81.     [junit] deterioration
  82.     [junit] global dementia severity
  83.     [junit] Factors
  84.     [junit] that
  85.     [junit] influence
  86.     [junit] the inconclusive data
  87.     [junit] future directions
  88.     [junit] research
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement