Advertisement
Andry41

Identifier function

Jan 25th, 2021
119
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Python 3.88 KB | None | 0 0
  1. # -*- coding: utf-8 -*-
  2. """
  3. Created on Mon Jan 25 16:54:35 2021
  4.  
  5. @author: Ntsoa
  6. """
  7.  
  8. def identifier(k, tr_gr_text, gr_text, it_text):
  9.     """Identifies the passages in common"""
  10.    
  11.     raw_greek_expression = []
  12.     it_expression = ''
  13.     gr_expression = ''
  14.     wordcount = 0
  15.    
  16.    
  17.    
  18.     #step 0.9 complete, now onto step 1
  19.     for tr_gr_para in tr_gr_text:
  20.        
  21.         other_paragraphs = [x for x in gr_text] #first we make a deep copy
  22.         other_paragraphs.pop(tr_gr_text.index(tr_gr_para))
  23.         #now we build a superstring made of all other paragraphs
  24.         o_p = ''
  25.         for elem in other_paragraphs:
  26.             o_p = (o_p + ' ' + elem).lstrip()
  27.        
  28.         #we have to pick up k words, starting from each greek word everytime
  29.         while wordcount < len(tr_gr_para) - k:
  30.             raw_greek_expression = tr_gr_para[wordcount : wordcount + k]
  31.             #do not forget that we're looking for k+ words, not just k, so we'll
  32.             #have to keep incrementing it IF the conditions are right
  33.    
  34.     """
  35.    
  36.    Step 1: We build an expression of k words of the greek paragraph, and
  37.    check if it is in another paragraph, in which case we abort mission
  38.    Step 2: We pick the italian translations, and check if it is in the
  39.    italian text. If it isn't, abort
  40.    Step 3: If it is, see if the translation of the next word is in too, and
  41.    in that case compile an expression. Otherwise, this keep only previous
  42.    expression.
  43.        
  44.    Good, now on to the substeps
  45.    
  46.                                                                            """
  47.    
  48.     '''
  49.    Step 0.9: How do we check whether the expression is in another paragraph?
  50.    Obviously we'll have to make use of the tr_gr_txt.
  51.    We will have to go through each paragraph, do we make a superstring made up
  52.    of only the other paragraphs? Is there enough time for that? Regardless,
  53.    it does seem like the most efficient solution. The text made only of greek
  54.    words (that can be translated) should be pre-built, so we don't have
  55.    to build it at every run here.
  56.    In other words, it will be an output of gr_para_div.
  57.    Also, do disregard the punctuation when making that text.
  58.  
  59.    Step 1.9: We now have an expression of k words. We will have to build
  60.    each possible translation of it, will we not? Actually, no, the moment we
  61.    find a suitable translation we stop, right? Things are never that easy,
  62.    are they? What a pain, well, we'll go through each possibility then.
  63.    Do keep in mind that there actually won't be that many iterations in the
  64.    end, since most of them will get refuted right away for not being present
  65.    in the italian text.
  66.    About that, it's possibly the most difficult part of this homework, so how
  67.    do you plan on dealing with it?
  68.    The difficult part being the part where we 1) find the parts that match,
  69.    punctuations notwithstanding and 2) have to report those same parts, but
  70.    with punctuation.
  71.    Instinctively, I want to say that this will be easier if we have the
  72.    italian text divided into sublists, as in each paragraph is a list,
  73.    and each of these list are made up of words as elements.
  74.    e.g.: [['io', 'vado'], ['ciao,', 'sono', 'andry.']]
  75.    If we have that, things will be much easier.
  76.    Say we have, from the translated greek text: 'atena, figlia di zeus.'
  77.    if 'atena' in italian paragraph:
  78.        with it_para.index('atena'), you search if the next elements match with
  79.        'figlia di zeus', and if an element has a punctuation, we disregard it
  80.        temporarily.
  81.    Seems like a good deal, we'll go with it.
  82.                                                                            '''
  83.                                                                            
  84.                                            
  85.                                            
  86.  
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement