Guest User

Untitled

a guest
Jan 20th, 2019
103
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 0.62 KB | None | 0 0
  1. def prepare_data(seqs,labels):
  2. """
  3. create the matrics from the datasets
  4. this pad each sequence to the same length:the length of the longest seuence or maxlen.
  5. if maxlen is set,we will out all sequence to this maximum length.
  6. this swap the axis
  7. """
  8. #x:a list of sentences
  9.  
  10. lengths = [len(s) for s in seqs]
  11. n_samples = len(seqs)
  12. maxlen = numpy.max(lengths)
  13.  
  14. x = numpy.zeros((maxlen,n_samples)).astype('int64')
  15. x_mask = numpy.ones((maxlen,n_samples)).astype(theano.config.floatx)
  16.  
  17. for idx,s in enumerate(seqs):
  18. x[:lengths[idx],idx] = s
  19.  
  20. x_mask *= (1-(x == 0)) #构建mask矩阵的绝佳技巧
  21.  
  22. return x,x_mask,labels
Add Comment
Please, Sign In to add comment