Advertisement
Guest User

Untitled

a guest
May 30th, 2016
83
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 2.85 KB | None | 0 0
  1.  
  2.  
  3. Write a python command line program that reads a file of fast a sequences from STDIN, finds the unique subsequences that occur in a single tRNA and that set has no members that occur among any of the other tRNA sets. Each of those 22 sets should be minimized, such that no member of that tRNA set is found as a substring of any other member of that set.
  4.  
  5. As an example, let's say that ACG is n the set, and so is AAACGA. Since ACG is found in AAACGA we would remove AAACGA.
  6.  
  7. Use Python sets for this assignment. Not only will your code be smaller, but it will be more likely to work. The fact Nixon, intersection and difference operators will be extraordinarily useful.
  8.  
  9. Rough design plan...
  10.  
  11. 1) compute the set of all substrings from each tRNA sequence.
  12.  
  13. 2) for each tRNAset, compute the union of all other tRNA sets, and then remove that set from the current tRNAset. Notice that this operation finds all of the other elements from all other tRNA. If any of SOE are present in your current tRNA, then they are not unique !
  14.  
  15. 3) for each tRNAset, it now contains the truly unique ones, along with any extensions of that subsequence. IF, for example, it was found that G only occurred in a single tRNA, then adding an A onto that G must also be unique because it has a G in it. We only want the minimal form.. G.
  16.  
  17. Finally, print out a report that contains items as follows.
  18.  
  19. Line 1: the tRNA name
  20.  
  21. Line 2: the tRNA sequence
  22.  
  23. lines 3-80 or so, each u inquest element.
  24.  
  25. these unique elements need to be ordered by their starting position in the tRNA sequence, and should be spaced over so they are directly under that subsequence in the original tRNA. This looks like an alignment, but you can find where it belongs by using the string.find() method. Include dots in all positions to the left, serving as alignment guides for your reader.
  26.  
  27. Do this for all of the 22 tRNA sequences.
  28.  
  29. Print the tRNA out as above, in sort order.
  30.  
  31. Hints:
  32.  
  33. use sets !
  34.  
  35. your final code will be under 100 lines.
  36.  
  37. Do most of your coding using class methods.
  38.  
  39. when removing items from a set, don't do this while iterating through the sets. You frequently are affecting sets that you will be needing the original contents of. So.. build a new set for stuff you will delete after you are done iterating. This hint is especially true when finding elements that are in common across tRNA. Notice that you build the union of all other tRNA, and this happens 22 times. Example, consider 4 sets, A,B, C, D. we would compute the set B, C, D to use against set A, and we would compute the set A, C, D to use against B. You will get bad results if you alter A in this process.
  40.  
  41. This is a command line program, though it does not have any arguments. You really don't need the commandline class. Your input comes from STDIN and output goes to STDOUT. Use the FastaReader for input and print statement for output.
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement