Advertisement
Guest User

Untitled

a guest
Aug 20th, 2019
734
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 4.89 KB | None | 0 0
  1. # Results
  2. In the following section I present the results of the previous described procedures.
  3. First I will show some statistic on my generated corpus and compare its charateristica to the source material.
  4. The next part contains the results of the model training and some examplaric detailed metrics.
  5. I will then rank the training results and compare the runs in regards of the effecifness of side constrains.
  6. Lastly I will compare the impact of side constraint between two German-English and Czech-English.
  7.  
  8. ## Data Selection and Preparation
  9. ### Selection
  10. I choosed to include the three domains FINANCE, LAW and MEDICAL.
  11. For the FINANCE domain I used version 1 of the website and documentation of the European Central Bank \TODO{reference j tiedemann). This corpus will be refereed as ECB.
  12. For the LAW domain I used version 7 of a parrellel coprus extracted from the European Parliament web site \TODO{reference philipp koehn}. This corpus will be referred as Europarl.
  13. For the last domain (MEDICAL) I used version 3 of a corpus made of PDF documents from the European Medicines Agency, which will be refereed as EMEA.
  14.  
  15. All corpora were made available through the OPUS project \TODO{reference} and were downloaded in pairs of English-Germand and English-Czech.
  16.  
  17. ### Slicing
  18. All corpora needed to be reduced and merged by equal parts.
  19. For the preprocessing steps I used my own framework \TODO{link for framework} with this configuartion.
  20.  
  21. The corpora were splitted into smaller shards. Each shard represents one unit in the context of the document.
  22. The boundaries of the units were found by analyzing the German-English text and aplying closest possible splits on both langauge pairs.
  23.  
  24. The shards were than merged together into a trainings- and a valdations set, so that each corpus had roughly the same amount of example sentences in the new corpus.
  25.  
  26. The figure \TODO{include corpus_stats image} shows the number of words and the word length in comparison for each data set.
  27.  
  28. ### Preparation
  29. #### BPE
  30. I used Byte Pair Encoding to equalize the number of tokens per text.
  31. For that I run the implementation of \TODO{sennrich bpe} and reduced the text into 32000 \TODO{double check} tokens as suggested by \TODO{Kobus reference}.
  32.  
  33. #### Prefix Constraints
  34. After the processing with BPE I prefixed one copy of the data set with domain tags on a sentence level.
  35. I used @EMEA@, @Europarl@ and @ECB@ as tokens.
  36. Since the format of the token differs from the format produced by the BPE algorithm, all tokens can be considered unique.
  37.  
  38. After the generation I had the following for corpora: Not modified data in DE-EN and CS-EN and Modified data in DE-EN and CS-EN.
  39. In the folllowing I will refer them as Clean-de-en, Clean-cs-en, Tagged-de-en and Tagged-cs-en.
  40.  
  41.  
  42. ## Training and Model Selection
  43. ### Training
  44. I trained an bidirectional LSTM with the configuration shown \TODO{add config}.
  45. I ran a hyperparameter optimization on the optimzier (sgd, adam, adadelta), the learning rate(1, 0.1, 0,001, 0001) and the beginning of the learning rate decay(off, 5 epochs, 10 epochs).
  46. The OpenNMT framework was used to train the models and translate the scoring tests.
  47. In the OpenNMT context one epoch of the corpus translates roughly to 2,000 train steps.
  48. All models were trained for 18 epochs.
  49. I used a small MQTT scheduler to coordinate the runs on a mixture of NVIDIA GTX 980, 1080 and 1080Ti.
  50. One run takes between 2 and 3,5 hours depending on the GPU.
  51. Most of the models showed a decent trainings curve \TODO{image good curve} as this example run caped after ~5epochs.
  52. Some hyper parameter configuration learned comparable slow \TODO{image bad curve} and some configurations like \todo{show super mad config in table} produced unusable results.
  53.  
  54. All models were trained multiple times to ensure proper distribution of start vectors.
  55.  
  56. ### Selection of Hyper Parameter
  57. The resulting models were used to translate 1,000 examples from the validation data.
  58. The generated translation were stripped from the BPE and scored using BLEU \TODO{fancy optim}.
  59. From the ranking \TODO{include table} the best model were choosen for each corpus.
  60.  
  61. ## Scoring and Comparison
  62. The figure \TODO{side constraint comparison} shows the performance of the best model for the domain test sets calculated with bleu.
  63. As expected the related language pair had archived better scores overall domains.
  64.  
  65. \TODO{write something about the different domains}
  66.  
  67. However depending on the scoring metric, there were huge differences in the actual performance change between the pairs.
  68. The models that were trained with prefix constraints archieved a higher trainings and validation accuracy, however only in the CZ-EN pair.
  69. In the DE-EN pair was not noticeable difference in the trainings statistic.
  70.  
  71. For more content focused scores like BLEU or ROUGE \TODO{add table with socres} the prefix constraints impacted the score slightly negative.
  72.  
  73. While the difference is pretty small for related languages, it is notable different for the distant language pair.
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement