Guest User

Untitled

a guest
Mar 21st, 2018
103
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 0.68 KB | None | 0 0
  1. ID,FORM,LEMMA,UPOS,XPOS,FEATS,HEAD,DEPREL,DEPS,MISC=range(10)
  2.  
  3. def read_conll(inp,max_sent=0,drop_tokens=True):
  4. comments=[]
  5. sent=[]
  6. yielded=0
  7. for line in inp:
  8. line=line.strip()
  9. if line.startswith("#"):
  10. comments.append(line)
  11. elif not line:
  12. if sent:
  13. yield sent,comments
  14. yielded+=1
  15. if max_sent>0 and yielded==max_sent:
  16. break
  17. sent,comments=[],[]
  18. else:
  19. cols=line.split("\t")
  20. if drop_tokens and "-" in cols[ID]:
  21. continue
  22. sent.append(cols)
  23. else:
  24. if sent:
  25. yield sent,comments
Add Comment
Please, Sign In to add comment