Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- let english_words be "the most frequent 20,000 words taken from all available manuscripts on google books from 1600 to 1850"
- for each language in all_languages:
- let phonemes[language] be "a list of phonemes and associated ligatures for language, generated according to some hidden parameters based on the language"
- initialize "generatedWords[language]" collection
- for each word in english_words:
- let expected_length be "length of word in english divided by number of letters in english times number of letters in language"
- if word shares a stem with an already generated word:
- newWord = combine(language, shared_stem, generate_word(word, language, expected_length - stem_length))
- else:
- newWord = generate_word(word, language, expected_length)
- generatedWords[language].Add(newWord)
- def combine:
- get language rule for prefix, infix, postfix
- combine stem with partial word according to rule
- return result
- def generate_word:
- generated_word = ""
- for each phoneme in word:
- select a phoneme from phonemes[language] with approximately similar frequency to the word's phoneme in english
- generated_word += ligature associated with selected phoneme
- apply some rules for truncation/contraction according to hidden language parameters
- apply some prefix/suffix rules according to hidden language parameters
- if length of generated_word is greater than expected_length or word has too many of the same symbols:
- regenerate the word with probability proportional to misfit
- return generated_word
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement