Don't like ads? PRO users don't see any ads ;-)
Guest

Untitled

By: a guest on May 4th, 2012  |  syntax: None  |  size: 1.45 KB  |  hits: 12  |  expires: Never
download  |  raw  |  embed  |  report abuse  |  print
Text below is selected. Please press Ctrl+C to copy to your clipboard. (⌘+C on Mac)
  1. Context dependent split of a string in python
  2. s = '2-Methyl-3-phythyl-1,4-naphthochinon,Vitamin, K1,Antihemorrhagic vitamin'
  3.        
  4. splitS = ['2-Methyl-3-phythyl-1,4-naphthochinon', 'Vitamin, K1', 'Antihemorrhagic vitamin']
  5.        
  6. >>> s = '2-Methyl-3-phythyl-1,4-naphthochinon,Vitamin, K1,Antihemorrhagic vitamin'
  7. >>> pat = re.compile("([^ds],[^ds])|([^s],[^ds])|([^ds],[^s])")
  8. >>> re.split(pat, s)
  9. ['2-Methyl-3-phythyl-1,4-naphthochino', 'n,V', None, None, 'itamin, K', None, '1,A', None, 'ntihemorrhagic vitamin']
  10.        
  11. (?<!d),(?! )|(?<=d),(?![d ])
  12.        
  13. >>> re.split(r'(?<!d),(?! )|(?<=d),(?![d ])', s)
  14. ['2-Methyl-3-phythyl-1,4-naphthochinon', 'Vitamin, K1', 'Antihemorrhagic vitamin']
  15.        
  16. (?<!d),   # match a comma that is not preceeded by a digit...
  17.  (?! )      # ... as long as it is not followed by a space
  18. |           # OR
  19.  (?<=d),   # match a comma that is preceeded by a digit...
  20.  (?![d ])  # ... as long as it is not followed by a digit or a space
  21.        
  22. (?<!d),(?! )|,(?![d ])
  23.        
  24. s = '2-Methyl-3-phythyl-1,4-naphthochinon,Vitamin, K1,Antihemorrhagic vitamin'
  25.        
  26. all_commas = [match.start() for match in re.finditer(r',', s)]
  27. special_commas = [match.start()+1 for match in re.finditer(r'd,d|.,s', s)]
  28.        
  29. split_commas = set(all_commas) - set(special_commas)
  30.        
  31. splitS = []
  32. start = -1
  33. for end in sorted(split_commas) + [None]:
  34.     splitS.append(s[start+1:end])
  35.     start = end
  36.        
  37. >>> splitS
  38. ['2-Methyl-3-phythyl-1,4-naphthochinon', 'Vitamin, K1', 'Antihemorrhagic vitamin']