Guest User

Untitled

a guest
Aug 16th, 2018
69
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 0.87 KB | None | 0 0
  1. import re
  2.  
  3. speaker_words = {}
  4. speaker_pattern = re.compile(r'^(\w+?):(.*)$')
  5.  
  6. with open("transcript.txt", "r") as f:
  7. lines = f.readlines()
  8. current_speaker = None
  9. for line in lines:
  10. line = line.strip()
  11. match = speaker_pattern.match(line)
  12. if match is not None:
  13. current_speaker = match.group(1)
  14. line = match.group(2).strip()
  15. if current_speaker not in speaker_words.keys():
  16. speaker_words[current_speaker] = []
  17. if current_speaker:
  18. # you may want to do some sort of punctuation filtering too
  19. words = [word.strip() for word in line.split(' ') if len(word.strip()) > 0]
  20. speaker_words[current_speaker].extend(words)
  21.  
  22. print speaker_words
Add Comment
Please, Sign In to add comment