Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- sorry
- this is the code :
- B = 65521
- T = {}
- for datasetPath in directories:
- samples = [f for f in listdir(datasetPath)]
- for file in samples:
- filePath = datasetPath+"/"+file
- fileByteSequence = readFile(filePath)
- fileNgrams = byteSequenceToNgrams(fileByteSequence,N)
- hashFileNgramsIntoDictionary(fileNgrams,T)
- K1 = 1000
- import heapq
- K1_most_common_Ngrams_Using_Hash_Grams = heapq.nlargest(K1, T)
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement