Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- %% DNA Strand to Amino Acids: Comments
- clear
- %% DNA Replication
- % This section of code models DNA replication. We'll start with a single
- % strand of DNA to use as our template. By the end, we want a sequence for
- % the strand that's synthesized from this template. Refer to the diagram
- % for help visualizing everything.
- % Import the template strand sequence, which comes from the human gene for
- % insulin. Note that this sequence is written in the 5'->3' direction, as
- % is standard. Take a second to look at the sequence after it's imported.
- seq_file = fopen('insulinDNAseq.txt');
- template5_3 = fscanf(seq_file,'%s');
- seq_len = length(template5_3); % length of imported sequence [bp]
- % Now we'll model DNA replication, using this sequence as a template
- %%%
- % Look at the diagram and notice that when DNA is being replicated, the
- % newly-synthesized strand is synthesized 5'->3', BUT the template strand
- % is actually read 3'->5'. We'll mimic this property in our code. First
- % define a variable template3_5 that gives the sequence of the template
- % strand in the 3'->5' direction. Hint: look into the MATLAB command 'flip'
- % for help with this. Look at template5_3 and template3_5 in the command
- % window and make sure the result makes sense.
- template3_5 = flip(template5_3);
- % Now we'll write code that mimics the function of the DNA replication
- % machinery. Below is a matrix that gives the base-pairing rules followed
- % for DNA replication. The letter in the first column gives the base of
- % template strand; the letter in the second column gives the corresponding
- % base of the newly-synthesized strand. (Don't worry too much about the
- % curly braces - it just means we're defining a cell array.)
- A = {'a' 't'
- 'c' 'g'
- 'g' 'c'
- 't' 'a'};
- % Now initialize a variable that will be used to store the sequence of the
- % newly-synthesized strand (synth). Since we're synthesizing this strand in
- % the same order/direction as the DNA replication machinery, should we name
- % this variable synth3_5 or synth5_3?
- synth5_3 = [];
- % The for loop below reads each base on the template strand and chooses the
- % appropriate base to be added to the newly-synthesized strand, but it's
- % incomplete. Fill in the gaps to complete the loop.
- for i = 1:seq_len
- template_base = template3_5(i); % pull the ith base of the template strand
- row_A = strcmp(template_base,A(:,2)); % identify the row in matrix A corresponding to templateBase; note that the variable row_A is a boolean vector
- % Define the base added to the synthesized strand using matrix 'A' and
- % vector 'row_A'
- synth_base = A(row_A);
- % Store synth_base in the ith element of your storage vector. You'll
- % need to use the command 'char' to store the character properly.
- synth_base = char(synth_base);
- synth5_3(i) = (synth_base);
- end
- % The values saved in your storage vector are likely ASCII values. To
- % convert them into a character string, again use the MATLAB command
- % 'char.' Example: x = char(x), where x is the name of your storage
- % variable.
- synth5_3 = char(synth5_3);
- % Look at the result of the storage variable in your command window. Is
- % this written in the 3'->5' or 5'->3' direction? Should we use the 'flip'
- % command to get it in the standard 5'->3' direction?
- %%% This is already written in the 5'>3' direction. The original code was
- %%% flipped first so it would be in the standard direction
- %%%
- %% Transcription
- % This section of code models transcription, the process of synthesizing
- % RNA from a DNA template. We'll use our sequences that we've generated so
- % far. In the context of DNA replication, we generally think in terms of
- % 'template' and 'newly-synthesized' strands (as above); in the context of
- % transcription, however, we think in terms of 'sense' and 'anti-sense'
- % strands. Note that these are two different ideas! A 'template' strand in
- % the case of DNA replication could be either a 'sense' or 'anti-sense'
- % strand in the case of transcription, and vice versa.
- % In this case, we'll say that the strand we've been calling 'template'
- % in the DNA replication section is our 'anti-sense' strand when it comes
- % to transcribing this gene. So that means the 'synthesized' strand is now
- % our 'sense' strand. Let's go ahead and define new variables to reflect
- % this.
- antisense5_3 = template5_3;
- antisense3_5 = template3_5;
- sense5_3 = synth5_3;
- %%%
- % We'll assume that the entire DNA sequence that we've been looking at is
- % going to be transcribed into mRNA.
- %%%
- % Now we'll write code that mimics the function of this transcription
- % machinery. Below is a matrix that gives the base-pairing rules followed
- % for transcription. The letter in the first column gives the base of the
- % DNA strand; the letter in the second column gives the corresponding
- % base of the RNA strand.
- B = {'a' 'u'
- 'c' 'g'
- 'g' 'c'
- 't' 'a'};
- % Now initialize a variable that will be used to store the sequence of the
- % newly-synthesized RNA strand.
- RNA5_3 = [];
- % The for loop below reads each base on the DNA strand and chooses the
- % appropriate base to be added to the RNA strand being synthesized, but
- % it's incomplete. Fill in the gaps to complete the loop.
- for i = 1:length(template3_5)
- % Pull the ith base of the DNA strand. We want to synthesize our
- % virtual RNA strand 5'->3', just like a cell does. So which is the
- % appropriate DNA strand to use here? Should we use antisense5_3,
- % antisense3_5, or sense5_3? Hint: use matrix 'B' to help make your
- % decision.
- DNA_base = sense5_3(i);
- row_B = strcmp(DNA_base,B(:,1)); % identify the row in matrix B corresponding to templateBase; note that the variable row_B is a boolean vector
- % Define the synthesized base using matrix 'B' and vector 'row_B'
- RNA_base = B(row_B,2);
- % Store synth_base in the ith element of your storage vector. Again,
- % we'll need to use 'char'
- RNA_base = char(RNA_base);
- RNA5_3(i) = (RNA_base);
- end
- % Again, use the MATLAB command 'char' to convert the vector into a
- % character string.
- RNA5_3 = char(RNA5_3);
- %% Translation
- % This section of code models translation, the process of synthesizing a
- % polypeptide from an RNA template.
- %%%
- % Now we'll write code that mimics the function of this translation
- % machinery. Below is a matrix that gives the amino acid-codon rules
- % followed for translation. The letters in the first column gives the codon
- % of the RNA strand; the letter in the second column gives the
- % corresponding single letter amino acid code. Note that asterisks '*' are
- % used to denote stop codons.
- C = {'uuu' 'F'
- 'uuc' 'F'
- 'uua' 'L'
- 'uug' 'L'
- 'cuu' 'L'
- 'cuc' 'L'
- 'cua' 'L'
- 'cug' 'L'
- 'auu' 'I'
- 'auc' 'I'
- 'aua' 'I'
- 'aug' 'M'
- 'guu' 'V'
- 'guc' 'V'
- 'gua' 'V'
- 'gug' 'V'
- 'ucu' 'S'
- 'ucc' 'S'
- 'uca' 'S'
- 'ucg' 'S'
- 'ccu' 'P'
- 'ccc' 'P'
- 'cca' 'P'
- 'ccg' 'P'
- 'acu' 'T'
- 'acc' 'T'
- 'aca' 'T'
- 'acg' 'T'
- 'gcu' 'A'
- 'gcc' 'A'
- 'gca' 'A'
- 'gcg' 'A'
- 'uau' 'Y'
- 'uac' 'Y'
- 'uaa' '*'
- 'uag' '*'
- 'cau' 'H'
- 'cac' 'H'
- 'caa' 'Q'
- 'cag' 'Q'
- 'aau' 'N'
- 'aac' 'N'
- 'aaa' 'K'
- 'aag' 'K'
- 'gau' 'D'
- 'gac' 'D'
- 'gaa' 'E'
- 'gag' 'E'
- 'ugu' 'C'
- 'ugc' 'C'
- 'uga' '*'
- 'ugg' 'W'
- 'cgu' 'R'
- 'cgc' 'R'
- 'cga' 'R'
- 'cgg' 'R'
- 'agu' 'S'
- 'agc' 'S'
- 'aga' 'R'
- 'agg' 'R'
- 'ggu' 'G'
- 'ggc' 'G'
- 'gga' 'G'
- 'ggg' 'G'};
- % The start codon for this transcript is at positions [60 61 62]. Double
- % check that you see a start codon at those positions in the string
- % variable RNA5_3 - if you do, you're probably on the right track so far!
- % Note that the sequence before the start codon is still part of the mRNA
- % transcript, but it's not translated. What name do we give to this part of
- % the gene?
- %%% Leading Sequence
- %%%
- % Let's cut off the beginning of the sequence so we start at the start
- % codon.
- ORF = RNA5_3(60:end);
- % The while loop below reads each base on the DNA strand and chooses the
- % appropriate amino acid to be added to the synthesized polypeptide, but
- % it's incomplete. Fill in the gaps to complete the loop. NOTE: if you get
- % to this line of code and it takes more than a few seconds to run, there's
- % probably a bug in the while loop. Use ctrl+C to stop the code and try to
- % troubleshoot.
- AA_seq = NaN;
- AA_pos = 1;
- ORF_pos = 1:3;
- while ~strcmp(char(AA_seq(end)),'*') % continue until we've reached a stop codon
- % Use the indexing variable ORF_pos to grab the codon from ORF
- codon = ORF(ORF_pos);
- % Use the codon to identify the corresponding amino acid using
- % reference table
- row_C = strcmp(codon,C(:,1));
- % Define the synthesized base using matrix 'C' and vector 'row_C'
- AA = C(row_C,2);
- % Store AA in the AA_pos element of the storage vector AA_seq. You'll
- % need to use the 'char' command here.
- AA = char(AA);
- AA_seq(AA_pos) = AA;
- % Update the AA_pos and ORF_pos for the next codon. Think about what
- % these values should be, and check that they match your expectation
- % for a couple iterations of the while loop.
- AA_pos = AA_pos + 3;
- ORF_pos = ORF_pos + 3;
- end
- %{
- This code is virtually the same to the other code on my page called 'DNA Strand to
- Amino Acids except this has comments to help understand what is actually going on
- and uses a different scanned file
- This was a homework assignment so there might be some random questions throughout. Yeah have fun
- %}
Add Comment
Please, Sign In to add comment