Guest User

Untitled

a guest
Dec 11th, 2017
74
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 1.84 KB | None | 0 0
  1. def create_sequences(tokenizer, max_length, descriptions, photos):
  2. """Creates sequences of images, input sequences and output words for an image.
  3.  
  4. X1, X2 (text sequence), y (word)
  5. photo startseq, little
  6. photo startseq, little, girl
  7. photo startseq, little, girl, running
  8. photo startseq, little, girl, running, in
  9. photo startseq, little, girl, running, in, field
  10. photo startseq, little, girl, running, in, field, endseq
  11.  
  12. :param tokenizer:
  13. :param max_length:
  14. :param descriptions:
  15. :param photos:
  16. :return:
  17. """
  18. X1, X2, y = [], [], []
  19. # Walk through each image identifier.
  20. for desc_key, desc_list in descriptions.iteritems():
  21. # Walk through each description for the image.
  22. for desc in desc_list:
  23. # Encode the sequence.
  24. seq = tokenizer.texts_to_sequences([desc])[0]
  25. # Split one sequence into multiple X,Y pairs.
  26. for i in range(1, len(seq)):
  27. # Split into input and output pair.
  28. in_seq, out_seq = seq[:i], seq[i]
  29. # Pad input sequence.
  30. in_seq = pad_sequences([in_seq], maxlen=max_length)[0]
  31. # Encode output sequence
  32. out_seq = to_categorical([out_seq], num_classes=vocab_size)[0]
  33. # Store.
  34. X1.append(photos[desc_key][0])
  35. X2.append(in_seq)
  36. y.append(out_seq)
  37. print len(X1), len(X2), len(y)
  38. print type(X1[0])
  39. #return array(X1), array(X2), array(y)
  40.  
  41. Dataset: 6000 train images.
  42. Descriptions: train=6000
  43. Vocabulary Size: 7579
  44. Photos: train=6000
  45. Description Length: 34
  46. Preparing text sequences for training.
  47. 306404 306404 306404
  48. <type 'numpy.ndarray'>
Add Comment
Please, Sign In to add comment