Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- #s:(str) - a string we break down into ngrams,
- #w:(int) - width of ngrams,
- #stride:(int) - how many indexes to move along the input
- #discard:(bool) - whether or not to discard any remaining ngrams when they are less than the specified width w.
- def ngrams(s, w=1, stride=None, discard=False):
- #example usage #000:
- #ngrams("foobarbazk", 3, 3)
- #expected output:
- # ['foo', 'bar', 'baz', 'k']
- #example usage #001:
- #ngrams("foobarbazk", 3, 2)
- #expected output:
- # ['foo', 'oba', 'arb', 'baz', 'zk']
- #example usage #002:
- #ngrams("foobarbazk", 3, 3, True)
- #expected output:
- # ['foo', 'bar', 'baz']
- if stride==None:
- stride=w
- results = []
- s=s.lower()
- i = 0
- if w < len(s):
- results.append(s[0:w])
- else:
- return [s]
- i=i+stride
- while i < len(s):
- result = s[i:i+w]
- if len(result) == w or discard==False:
- results.append(s[i:i+w]) #in a strict language this would probably be an overflow
- i = i + stride
- return results
Advertisement
Add Comment
Please, Sign In to add comment