hiddenGem

DNA Strand to Amino Acids: Comments

Jun 29th, 2020
119
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
MatLab 9.76 KB | None | 0 0
  1. %% DNA Strand to Amino Acids: Comments
  2.  
  3. clear
  4.  
  5. %% DNA Replication
  6.  
  7. % This section of code models DNA replication. We'll start with a single
  8. % strand of DNA to use as our template. By the end, we want a sequence for
  9. % the strand that's synthesized from this template. Refer to the diagram
  10. % for help visualizing everything.
  11.  
  12. % Import the template strand sequence, which comes from the human gene for
  13. % insulin. Note that this sequence is written in the 5'->3' direction, as
  14. % is standard. Take a second to look at the sequence after it's imported.
  15. seq_file = fopen('insulinDNAseq.txt');
  16. template5_3 = fscanf(seq_file,'%s');
  17. seq_len = length(template5_3); % length of imported sequence [bp]
  18.  
  19. % Now we'll model DNA replication, using this sequence as a template
  20.  
  21. %%%
  22.  
  23. % Look at the diagram and notice that when DNA is being replicated, the
  24. % newly-synthesized strand is synthesized 5'->3', BUT the template strand
  25. % is actually read 3'->5'. We'll mimic this property in our code. First
  26. % define a variable template3_5 that gives the sequence of the template
  27. % strand in the 3'->5' direction. Hint: look into the MATLAB command 'flip'
  28. % for help with this. Look at template5_3 and template3_5 in the command
  29. % window and make sure the result makes sense.
  30. template3_5 = flip(template5_3);
  31.  
  32. % Now we'll write code that mimics the function of the DNA replication
  33. % machinery. Below is a matrix that gives the base-pairing rules followed
  34. % for DNA replication. The letter in the first column gives the base of
  35. % template strand; the letter in the second column gives the corresponding
  36. % base of the newly-synthesized strand. (Don't worry too much about the
  37. % curly braces - it just means we're defining a cell array.)
  38. A = {'a' 't'
  39.      'c' 'g'
  40.      'g' 'c'
  41.      't' 'a'};
  42.  
  43. % Now initialize a variable that will be used to store the sequence of the
  44. % newly-synthesized strand (synth). Since we're synthesizing this strand in
  45. % the same order/direction as the DNA replication machinery, should we name
  46. % this variable synth3_5 or synth5_3?
  47. synth5_3 = [];
  48.  
  49. % The for loop below reads each base on the template strand and chooses the
  50. % appropriate base to be added to the newly-synthesized strand, but it's
  51. % incomplete. Fill in the gaps to complete the loop.
  52. for i = 1:seq_len
  53.    
  54.     template_base = template3_5(i); % pull the ith base of the template strand
  55.    
  56.     row_A = strcmp(template_base,A(:,2)); % identify the row in matrix A corresponding to templateBase; note that the variable row_A is a boolean vector
  57.    
  58.     % Define the base added to the synthesized strand using matrix 'A' and
  59.     % vector 'row_A'
  60.     synth_base = A(row_A);
  61.    
  62.     % Store synth_base in the ith element of your storage vector. You'll
  63.     % need to use the command 'char' to store the character properly.
  64.  
  65.     synth_base = char(synth_base);
  66.     synth5_3(i) = (synth_base);
  67.    
  68.    
  69. end
  70.  
  71. % The values saved in your storage vector are likely ASCII values. To
  72. % convert them into a character string, again use the MATLAB command
  73. % 'char.' Example: x = char(x), where x is the name of your storage
  74. % variable.
  75. synth5_3 = char(synth5_3);
  76.  
  77. % Look at the result of the storage variable in your command window. Is
  78. % this written in the 3'->5' or 5'->3' direction? Should we use the 'flip'
  79. % command to get it in the standard 5'->3' direction?
  80.  
  81. %%% This is already written in the 5'>3' direction. The original code was
  82. %%% flipped first so it would be in the standard direction
  83.  
  84. %%%
  85.  
  86. %% Transcription
  87.  
  88. % This section of code models transcription, the process of synthesizing
  89. % RNA from a DNA template. We'll use our sequences that we've generated so
  90. % far. In the context of DNA replication, we generally think in terms of
  91. % 'template' and 'newly-synthesized' strands (as above); in the context of
  92. % transcription, however, we think in terms of 'sense' and 'anti-sense'
  93. % strands. Note that these are two different ideas! A 'template' strand in
  94. % the case of DNA replication could be either a 'sense' or 'anti-sense'
  95. % strand in the case of transcription, and vice versa.
  96.  
  97. % In this case, we'll say that the strand we've been calling 'template'
  98. % in the DNA replication section is our 'anti-sense' strand when it comes
  99. % to transcribing this gene. So that means the 'synthesized' strand is now
  100. % our 'sense' strand. Let's go ahead and define new variables to reflect
  101. % this.
  102. antisense5_3 = template5_3;
  103. antisense3_5 = template3_5;
  104. sense5_3 = synth5_3;
  105.  
  106. %%%
  107.  
  108. % We'll assume that the entire DNA sequence that we've been looking at is
  109. % going to be transcribed into mRNA.
  110.  
  111. %%%
  112.  
  113. % Now we'll write code that mimics the function of this transcription
  114. % machinery. Below is a matrix that gives the base-pairing rules followed
  115. % for transcription. The letter in the first column gives the base of the
  116. % DNA strand; the letter in the second column gives the corresponding
  117. % base of the RNA strand.
  118. B = {'a' 'u'
  119.      'c' 'g'
  120.      'g' 'c'
  121.      't' 'a'};
  122.  
  123. % Now initialize a variable that will be used to store the sequence of the
  124. % newly-synthesized RNA strand.
  125. RNA5_3 = [];
  126.  
  127.  
  128. % The for loop below reads each base on the DNA strand and chooses the
  129. % appropriate base to be added to the RNA strand being synthesized, but
  130. % it's incomplete. Fill in the gaps to complete the loop.
  131. for i = 1:length(template3_5)
  132.    
  133.     % Pull the ith base of the DNA strand. We want to synthesize our
  134.     % virtual RNA strand 5'->3', just like a cell does. So which is the
  135.     % appropriate DNA strand to use here? Should we use antisense5_3,
  136.     % antisense3_5, or sense5_3? Hint: use matrix 'B' to help make your
  137.     % decision.
  138.     DNA_base = sense5_3(i);
  139.    
  140.     row_B = strcmp(DNA_base,B(:,1)); % identify the row in matrix B corresponding to templateBase; note that the variable row_B is a boolean vector
  141.    
  142.     % Define the synthesized base using matrix 'B' and vector 'row_B'
  143.     RNA_base = B(row_B,2);
  144.    
  145.     % Store synth_base in the ith element of your storage vector. Again,
  146.     % we'll need to use 'char'
  147.    
  148.     RNA_base = char(RNA_base);
  149.     RNA5_3(i) = (RNA_base);
  150.    
  151.    
  152. end
  153.  
  154. % Again, use the MATLAB command 'char' to convert the vector into a
  155. % character string.
  156. RNA5_3 = char(RNA5_3);
  157.  
  158. %% Translation
  159.  
  160. % This section of code models translation, the process of synthesizing a
  161. % polypeptide from an RNA template.
  162.  
  163. %%%
  164.  
  165. % Now we'll write code that mimics the function of this translation
  166. % machinery. Below is a matrix that gives the amino acid-codon rules
  167. % followed for translation. The letters in the first column gives the codon
  168. % of the RNA strand; the letter in the second column gives the
  169. % corresponding single letter amino acid code. Note that asterisks '*' are
  170. % used to denote stop codons.
  171. C = {'uuu' 'F'
  172.      'uuc' 'F'
  173.      'uua' 'L'
  174.      'uug' 'L'
  175.      'cuu' 'L'
  176.      'cuc' 'L'
  177.      'cua' 'L'
  178.      'cug' 'L'
  179.      'auu' 'I'
  180.      'auc' 'I'
  181.      'aua' 'I'
  182.      'aug' 'M'
  183.      'guu' 'V'
  184.      'guc' 'V'
  185.      'gua' 'V'
  186.      'gug' 'V'
  187.      'ucu' 'S'
  188.      'ucc' 'S'
  189.      'uca' 'S'
  190.      'ucg' 'S'
  191.      'ccu' 'P'
  192.      'ccc' 'P'
  193.      'cca' 'P'
  194.      'ccg' 'P'
  195.      'acu' 'T'
  196.      'acc' 'T'
  197.      'aca' 'T'
  198.      'acg' 'T'
  199.      'gcu' 'A'
  200.      'gcc' 'A'
  201.      'gca' 'A'
  202.      'gcg' 'A'
  203.      'uau' 'Y'
  204.      'uac' 'Y'
  205.      'uaa' '*'
  206.      'uag' '*'
  207.      'cau' 'H'
  208.      'cac' 'H'
  209.      'caa' 'Q'
  210.      'cag' 'Q'
  211.      'aau' 'N'
  212.      'aac' 'N'
  213.      'aaa' 'K'
  214.      'aag' 'K'
  215.      'gau' 'D'
  216.      'gac' 'D'
  217.      'gaa' 'E'
  218.      'gag' 'E'
  219.      'ugu' 'C'
  220.      'ugc' 'C'
  221.      'uga' '*'
  222.      'ugg' 'W'
  223.      'cgu' 'R'
  224.      'cgc' 'R'
  225.      'cga' 'R'
  226.      'cgg' 'R'
  227.      'agu' 'S'
  228.      'agc' 'S'
  229.      'aga' 'R'
  230.      'agg' 'R'
  231.      'ggu' 'G'
  232.      'ggc' 'G'
  233.      'gga' 'G'
  234.      'ggg' 'G'};
  235.  
  236. % The start codon for this transcript is at positions [60 61 62]. Double
  237. % check that you see a start codon at those positions in the string
  238. % variable RNA5_3 - if you do, you're probably on the right track so far!
  239. % Note that the sequence before the start codon is still part of the mRNA
  240. % transcript, but it's not translated. What name do we give to this part of
  241. % the gene?
  242.  
  243. %%% Leading Sequence
  244. %%%
  245.  
  246. % Let's cut off the beginning of the sequence so we start at the start
  247. % codon.
  248. ORF = RNA5_3(60:end);
  249.  
  250. % The while loop below reads each base on the DNA strand and chooses the
  251. % appropriate amino acid to be added to the synthesized polypeptide, but
  252. % it's incomplete. Fill in the gaps to complete the loop. NOTE: if you get
  253. % to this line of code and it takes more than a few seconds to run, there's
  254. % probably a bug in the while loop. Use ctrl+C to stop the code and try to
  255. % troubleshoot.
  256. AA_seq = NaN;
  257. AA_pos = 1;
  258. ORF_pos = 1:3;
  259. while ~strcmp(char(AA_seq(end)),'*') % continue until we've reached a stop codon
  260.    
  261.     % Use the indexing variable ORF_pos to grab the codon from ORF
  262.     codon = ORF(ORF_pos);
  263.    
  264.     % Use the codon to identify the corresponding amino acid using
  265.     % reference table
  266.     row_C = strcmp(codon,C(:,1));
  267.    
  268.     % Define the synthesized base using matrix 'C' and vector 'row_C'
  269.     AA = C(row_C,2);
  270.    
  271.     % Store AA in the AA_pos element of the storage vector AA_seq. You'll
  272.     % need to use the 'char' command here.
  273.     AA = char(AA);
  274.     AA_seq(AA_pos) = AA;
  275.    
  276.     % Update the AA_pos and ORF_pos for the next codon. Think about what
  277.     % these values should be, and check that they match your expectation
  278.     % for a couple iterations of the while loop.
  279.     AA_pos = AA_pos + 3;
  280.     ORF_pos = ORF_pos + 3;
  281. end
  282.  
  283. %{
  284. This code is virtually the same to the other code on my page called 'DNA Strand to
  285.     Amino Acids except this has comments to help understand what is actually going on
  286.     and uses a different scanned file
  287. This was a homework assignment so there might be some random questions throughout. Yeah have fun
  288. %}
Add Comment
Please, Sign In to add comment