Advertisement
Guest User

Untitled

a guest
Nov 18th, 2017
57
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 1.29 KB | None | 0 0
  1.  
  2.  
  3. Kod pythona:
  4.  
  5. from urllib.request import urlopen
  6. from bs4 import BeautifulSoup
  7. from nltk.tokenize import sent_tokenize
  8.  
  9. text_pl = []
  10. text_en = []
  11.  
  12. sentences_pl = []
  13. sentences_en = []
  14.  
  15. response_pl = urlopen("http://www.staff.amu.edu.pl/~rjawor/index.php")
  16. response_en = urlopen("http://www.staff.amu.edu.pl/~rjawor/index_en.php")
  17.  
  18. page_pl = BeautifulSoup(response_pl, 'html.parser')
  19. page_en = BeautifulSoup(response_en, 'html.parser')
  20.  
  21. for s in page_pl.stripped_strings:
  22. text_pl.append(s)
  23.  
  24. for s in text_pl:
  25. sentences_pl.append(sent_tokenize(s))
  26.  
  27. for s in page_en.stripped_strings:
  28. text_en.append(s)
  29.  
  30. for s in text_en:
  31. sentences_en.append(sent_tokenize(s))
  32.  
  33. f_pl = open('sentences_pl.txt', 'w')
  34.  
  35. for s in sentences_pl:
  36. f_pl.write(s[0]+"\n")
  37. f_pl.close()
  38.  
  39. f_en = open('sentences_en.txt', 'w')
  40.  
  41. for s in sentences_en:
  42. f_en.write(s[0]+"\n")
  43. f_en.close()
  44.  
  45.  
  46. hunalign:
  47.  
  48. Polecenie wywoływane z folderu głównego hunalign po zrobieniu make'a i przerzuceniu plików ze zdaniami wygenerowanych przez Pythona do folderu examples:
  49.  
  50. src/hunalign/hunalign data/english.dic examples/sentences_pl.txt examples/sentences_en.txt -text > ./align.txt
  51.  
  52. Plik align.txt: https://pastebin.com/c5w9VA9H (Łącza do strony zewnętrznej.)Łącza do strony zewnętrznej.
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement