Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- # -*- coding: utf-8 -*-
- """
- Created on Mon Jan 25 16:54:35 2021
- @author: Ntsoa
- """
- def identifier(k, tr_gr_text, gr_text, it_text):
- """Identifies the passages in common"""
- raw_greek_expression = []
- it_expression = ''
- gr_expression = ''
- wordcount = 0
- #step 0.9 complete, now onto step 1
- for tr_gr_para in tr_gr_text:
- other_paragraphs = [x for x in gr_text] #first we make a deep copy
- other_paragraphs.pop(tr_gr_text.index(tr_gr_para))
- #now we build a superstring made of all other paragraphs
- o_p = ''
- for elem in other_paragraphs:
- o_p = (o_p + ' ' + elem).lstrip()
- #we have to pick up k words, starting from each greek word everytime
- while wordcount < len(tr_gr_para) - k:
- raw_greek_expression = tr_gr_para[wordcount : wordcount + k]
- #do not forget that we're looking for k+ words, not just k, so we'll
- #have to keep incrementing it IF the conditions are right
- """
- Step 1: We build an expression of k words of the greek paragraph, and
- check if it is in another paragraph, in which case we abort mission
- Step 2: We pick the italian translations, and check if it is in the
- italian text. If it isn't, abort
- Step 3: If it is, see if the translation of the next word is in too, and
- in that case compile an expression. Otherwise, this keep only previous
- expression.
- Good, now on to the substeps
- """
- '''
- Step 0.9: How do we check whether the expression is in another paragraph?
- Obviously we'll have to make use of the tr_gr_txt.
- We will have to go through each paragraph, do we make a superstring made up
- of only the other paragraphs? Is there enough time for that? Regardless,
- it does seem like the most efficient solution. The text made only of greek
- words (that can be translated) should be pre-built, so we don't have
- to build it at every run here.
- In other words, it will be an output of gr_para_div.
- Also, do disregard the punctuation when making that text.
- Step 1.9: We now have an expression of k words. We will have to build
- each possible translation of it, will we not? Actually, no, the moment we
- find a suitable translation we stop, right? Things are never that easy,
- are they? What a pain, well, we'll go through each possibility then.
- Do keep in mind that there actually won't be that many iterations in the
- end, since most of them will get refuted right away for not being present
- in the italian text.
- About that, it's possibly the most difficult part of this homework, so how
- do you plan on dealing with it?
- The difficult part being the part where we 1) find the parts that match,
- punctuations notwithstanding and 2) have to report those same parts, but
- with punctuation.
- Instinctively, I want to say that this will be easier if we have the
- italian text divided into sublists, as in each paragraph is a list,
- and each of these list are made up of words as elements.
- e.g.: [['io', 'vado'], ['ciao,', 'sono', 'andry.']]
- If we have that, things will be much easier.
- Say we have, from the translated greek text: 'atena, figlia di zeus.'
- if 'atena' in italian paragraph:
- with it_para.index('atena'), you search if the next elements match with
- 'figlia di zeus', and if an element has a punctuation, we disregard it
- temporarily.
- Seems like a good deal, we'll go with it.
- '''
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement