Advertisement
Python253

nlp_tokenization_spacy

Mar 8th, 2024 (edited)
528
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Python 1.25 KB | None | 0 0
  1. #!/usr/bin/env python3
  2. # -*- coding: utf-8 -*-
  3. # Filename: nlp_tokenization_spacy.py
  4. # Author: Jeoi Reqi
  5.  
  6. """
  7. This script performs tokenization on a given text using SpaCy.
  8.  
  9. Requirements:
  10. - Python 3
  11. - SpaCy library with the 'en_core_web_sm' model
  12.  
  13. Usage:
  14. - Run the script, and it will print the tokenized words and sentences of the provided text.
  15.  
  16. Example:
  17. python tokenization_spacy.py
  18.  
  19. Output:
  20. Tokenized Words: ['Natural', 'Language', 'Processing', 'is', 'a', 'fascinating', 'field', '.', 'It', 'involves', 'the', 'use', 'of', 'computers', 'to', 'understand', 'and', 'process', 'human', 'language', '.']
  21. Tokenized Sentences: ['Natural Language Processing is a fascinating field.', 'It involves the use of computers to understand and process human language.']
  22. """
  23.  
  24. import spacy
  25.  
  26. # Load the SpaCy model
  27. nlp = spacy.load("en_core_web_sm")
  28.  
  29. # Sample text
  30. text = "Natural Language Processing is a fascinating field. It involves the use of computers to understand and process human language."
  31.  
  32. # Process the text using SpaCy to get the 'doc' object
  33. doc = nlp(text)
  34.  
  35. # Tokenize the text
  36. tokens = [token.text for token in doc]
  37. sentences = [sent.text for sent in doc.sents]
  38.  
  39. print("Tokenized Words:", tokens)
  40. print("Tokenized Sentences:", sentences)
  41.  
  42.  
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement