Advertisement
Guest User

Untitled

a guest
Dec 19th, 2014
191
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 1.07 KB | None | 0 0
  1.  
  2. 0
  3. down vote
  4. accepted
  5. You could do this through positive lookahead,
  6.  
  7. >>> import re
  8. >>> s = "My name is really nice. This is so awesome."
  9. >>> m = re.findall(r'(?=(\b\w+\b \S+))', s)
  10. >>> m
  11. ['My name', 'name is', 'is really', 'really nice.', 'This is', 'is so', 'so awesome.']
  12. Pattern Explanation:
  13.  
  14. (?=...) Lookaheads are zero-length assertions just like the start and end of line, and start and end of word. It won't consume characters in the string, but only assert whether a match is possible or not.
  15. () Capturing group which was used to capture characters which matches the pattern present inside the ().
  16. \b Word boundary. It matches between a word character and a non-word character.
  17. \w+ Matches one or more word characters.
  18. \S+ Matches the space and the following non-space characters.
  19. findall function usually prints the characters inside the captured groups. If there is no capturing groups then it would print the matches. In our case it would prints the characters which was present inside the group index 1. To match overlapping characters, you need to put the pattern inside a lookahead.
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement