Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- #! /usr/bin/env ruby
- # This is a game that helps people understand genome assembly. Given a string,
- # it generates sequence reads giving perfect coverage of the string and with a
- # fixed overlap. The idea is to print the generated reads, cut them out, and
- # have learners assemble them by hand. Different difficulties can be
- # demonstrated by using a string with repeats, low complexity, etc., to mimic
- # real assembly problems, or by adjusting the parameters (overlap and number
- # of fragments).
- # generate n approximately equally-sized fragments with each contiguous pair
- # of fragments overlapping by k such that the entire quote is covered by the
- # resulting fragments
- def sequence_quote_with_overlap(quote, n_fragments, k)
- quote = quote.downcase
- # the quote length may not be divisible by the number of fragments,
- # so we distribute the remainder across the fragments randomly.
- remainder = quote.length % n_fragments
- bump = ([1] * remainder + [0] * (n_fragments - remainder)).shuffle
- fraglen = quote.length / n_fragments + k
- # because the final fragment will be too short by the overlap size,
- # we recover k characters from the other fragments at random
- (0...bump.length).to_a.sample(k).each{ |i| bump[i] -= 1 }
- # for each fragment, adjust the fragment length by the bump
- # and sequence the fragment from the quote
- fragments = []
- firstchar = 0
- adj_fraglen = fraglen + bump[0]
- lastchar = firstchar + adj_fraglen - 1
- (1...n_fragments).each do |i|
- adj_fraglen = fraglen + bump[i]
- fragments << quote[firstchar..lastchar]
- firstchar = lastchar - k + 1
- lastchar = firstchar + adj_fraglen - 1
- end
- # store the final fragment
- fragments << quote[firstchar..lastchar]
- fragments
- end
- # demos
- # simple assembly
- quote = "Try a thing you haven’t done three times. Once, to get over the fear of doing it. Twice, to learn how to do it. And a third time, to figure out whether you like it or not."
- n_fragments = 18
- k = 6
- frags = sequence_quote_with_overlap quote, n_fragments, k
- frags.shuffle.each { |f| puts "\"#{f}\"" }
- # with repeats
- quote = "Happiness resides not in possessions, and not in gold, happiness dwells in the soul."
- k = 3
- frags = sequence_quote_with_overlap quote, n_fragments, k
- frags.shuffle.each { |f| puts "\"#{f}\"" }
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement