Advertisement
ceterumcenseo

Markov Chain Text Generator in Python

Oct 18th, 2019
458
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Python 4.36 KB | None | 0 0
  1. #!/usr/bin/python3
  2.  
  3. import random
  4. import sys
  5. import re
  6.  
  7. # This is the length of the "state" (sequence of characters) the next character is predicted from.
  8. STATE_LEN = 4
  9.  
  10. # This is a maximum for the characters in one sentence.
  11. MAX_CHARACTERS = 1000
  12.  
  13. # The contents of INPUT_TEXT are used to train the markov model. The longer the provided text is, the better
  14. # or more interesting the results become.
  15. INPUT_TEXT = """
  16. Romanesque architecture is an architectural style of medieval Europe characterized by semi-circular arches. There is no consensus for the beginning date of the Romanesque style, with proposals ranging from the 6th to the 11th century, this later date being the most commonly held. In the 12th century it developed into the Gothic style, marked by pointed arches. Examples of Romanesque architecture can be found across the continent, making it the first pan-European architectural style since Imperial Roman architecture. The Romanesque style in England is traditionally referred to as Norman architecture.
  17. Combining features of ancient Roman and Byzantine buildings and other local traditions, Romanesque architecture is known by its massive quality, thick walls, round arches, sturdy pillars, barrel vaults, large towers and decorative arcading. Each building has clearly defined forms, frequently of very regular, symmetrical plan; the overall appearance is one of simplicity when compared with the Gothic buildings that were to follow. The style can be identified right across Europe, despite regional characteristics and different materials.
  18. Many castles were built during this period, but they are greatly outnumbered by churches. The most significant are the great abbey churches, many of which are still standing, more or less complete and frequently in use. The enormous quantity of churches built in the Romanesque period was succeeded by the still busier period of Gothic architecture, which partly or entirely rebuilt most Romanesque churches in prosperous areas like England and Portugal. The largest groups of Romanesque survivors are in areas that were less prosperous in subsequent periods, including parts of southern France, rural Spain and rural Italy. Survivals of unfortified Romanesque secular houses and palaces, and the domestic quarters of monasteries are far rarer, but these used and adapted the features found in church buildings, on a domestic scale.     
  19. """
  20. # source: https://en.wikipedia.org/wiki/Romanesque_architecture
  21.  
  22. # learn records in the _model_ an occurrence of the character _next_ after the sequence _state_.
  23. def learn(model, state, next):
  24.     if state not in model:
  25.         model[state] = {}
  26.     if next not in model[state]:
  27.         model[state][next] = 0
  28.     model[state][next] += 1
  29.     return model
  30.  
  31. # train creates a model based on the input text in _data_.
  32. def train(data):
  33.     model = {}
  34.     # remove everything except letters, numbers and punctuation.
  35.     data = re.sub("[^A-Za-z0-9 .,-]+", "", data)
  36.    
  37.     state = ""
  38.     skipWhitespace = True
  39.     for i in range(len(data)):
  40.         next = data[i]
  41.  
  42.         if skipWhitespace and next == ' ':
  43.             continue
  44.         skipWhitespace = False
  45.    
  46.         model = learn(model, state, next)
  47.         if next == ".":
  48.             state = ""
  49.             skipWhitespace = True
  50.             continue
  51.        
  52.         state += next
  53.         if len(state) > STATE_LEN:
  54.             state = state[1:]
  55.     return model
  56.  
  57. # choose the next character based on the possibilities given in _d_ by random weighted choice.
  58. # This is done by adding up all the weights and choosing a number randomly between 0 and the resulting
  59. # sum and using this random number as an index.
  60. def choose(d):
  61.     weights = sum(d.values())
  62.     r = random.randint(0, weights)
  63.     for next,count in d.items():
  64.         r -= count
  65.         if r <= 0:
  66.             return next
  67.     raise "error"
  68.  
  69. # sentence returns a sentence generated randomly by the model.
  70. def sentence(model):
  71.     state = ""
  72.     text = ""
  73.     for i in range(MAX_CHARACTERS):
  74.         next = choose(model[state])
  75.         text += next
  76.         state += next
  77.         if len(state) > STATE_LEN:
  78.             state = state[1:]
  79.         if next == ".":
  80.             break
  81.     return text
  82.  
  83.  
  84. # Train the model based on the provided text, then
  85. # create 3 sentences based on it.
  86. model = train(INPUT_TEXT)
  87. for i in range(3):
  88.     print(sentence(model))
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement