Guest User

Untitled

a guest
Aug 26th, 2024
109
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Python 1.03 KB | None | 0 0
  1.  
  2. #s:(str)        - a string we break down into ngrams,
  3. #w:(int)        - width of ngrams,
  4. #stride:(int)   - how many indexes to move along the input
  5. #discard:(bool) - whether or not to discard any remaining ngrams when they are less than the specified width w.
  6. def ngrams(s, w=1, stride=None, discard=False):
  7.   #example usage #000:
  8.   #ngrams("foobarbazk", 3, 3)
  9.   #expected output:
  10.   #  ['foo', 'bar', 'baz', 'k']
  11.  
  12.   #example usage #001:
  13.   #ngrams("foobarbazk", 3, 2)
  14.   #expected output:
  15.   #  ['foo', 'oba', 'arb', 'baz', 'zk']
  16.  
  17.   #example usage #002:
  18.   #ngrams("foobarbazk", 3, 3, True)
  19.   #expected output:
  20.   #  ['foo', 'bar', 'baz']
  21.  
  22.   if stride==None:
  23.     stride=w
  24.   results = []
  25.   s=s.lower()
  26.   i = 0
  27.   if w < len(s):
  28.     results.append(s[0:w])
  29.   else:
  30.     return [s]
  31.  
  32.   i=i+stride
  33.  
  34.   while i < len(s):
  35.     result = s[i:i+w]
  36.     if len(result) == w or discard==False:
  37.       results.append(s[i:i+w]) #in a strict language this would probably be an overflow
  38.     i = i + stride
  39.  
  40.   return results
  41.  
Advertisement
Add Comment
Please, Sign In to add comment