DNA Strand to Amino Acids: Comments

%% DNA Strand to Amino Acids: Comments

clear

%% DNA Replication

% This section of code models DNA replication. We'll start with a single
% strand of DNA to use as our template. By the end, we want a sequence for
% the strand that's synthesized from this template. Refer to the diagram
% for help visualizing everything.

% Import the template strand sequence, which comes from the human gene for
% insulin. Note that this sequence is written in the 5'->3' direction, as
% is standard. Take a second to look at the sequence after it's imported.
seq_file = fopen('insulinDNAseq.txt');
template5_3 = fscanf(seq_file,'%s');
seq_len = length(template5_3); % length of imported sequence [bp]

% Now we'll model DNA replication, using this sequence as a template

%%%

% Look at the diagram and notice that when DNA is being replicated, the
% newly-synthesized strand is synthesized 5'->3', BUT the template strand
% is actually read 3'->5'. We'll mimic this property in our code. First
% define a variable template3_5 that gives the sequence of the template
% strand in the 3'->5' direction. Hint: look into the MATLAB command 'flip'
% for help with this. Look at template5_3 and template3_5 in the command
% window and make sure the result makes sense.
template3_5 = flip(template5_3);

% Now we'll write code that mimics the function of the DNA replication
% machinery. Below is a matrix that gives the base-pairing rules followed
% for DNA replication. The letter in the first column gives the base of
% template strand; the letter in the second column gives the corresponding
% base of the newly-synthesized strand. (Don't worry too much about the
% curly braces - it just means we're defining a cell array.)
A = {'a' 't'
     'c' 'g'
     'g' 'c'
     't' 'a'};

% Now initialize a variable that will be used to store the sequence of the
% newly-synthesized strand (synth). Since we're synthesizing this strand in
% the same order/direction as the DNA replication machinery, should we name
% this variable synth3_5 or synth5_3?
synth5_3 = [];

% The for loop below reads each base on the template strand and chooses the
% appropriate base to be added to the newly-synthesized strand, but it's
% incomplete. Fill in the gaps to complete the loop.
for i = 1:seq_len

    template_base = template3_5(i); % pull the ith base of the template strand

    row_A = strcmp(template_base,A(:,2)); % identify the row in matrix A corresponding to templateBase; note that the variable row_A is a boolean vector

    % Define the base added to the synthesized strand using matrix 'A' and
    % vector 'row_A'
    synth_base = A(row_A);

    % Store synth_base in the ith element of your storage vector. You'll
    % need to use the command 'char' to store the character properly.

    synth_base = char(synth_base);
    synth5_3(i) = (synth_base);


end

% The values saved in your storage vector are likely ASCII values. To
% convert them into a character string, again use the MATLAB command
% 'char.' Example: x = char(x), where x is the name of your storage
% variable.
synth5_3 = char(synth5_3);

% Look at the result of the storage variable in your command window. Is
% this written in the 3'->5' or 5'->3' direction? Should we use the 'flip'
% command to get it in the standard 5'->3' direction?

%%% This is already written in the 5'>3' direction. The original code was
%%% flipped first so it would be in the standard direction

%%%

%% Transcription

% This section of code models transcription, the process of synthesizing
% RNA from a DNA template. We'll use our sequences that we've generated so
% far. In the context of DNA replication, we generally think in terms of
% 'template' and 'newly-synthesized' strands (as above); in the context of
% transcription, however, we think in terms of 'sense' and 'anti-sense'
% strands. Note that these are two different ideas! A 'template' strand in
% the case of DNA replication could be either a 'sense' or 'anti-sense'
% strand in the case of transcription, and vice versa.

% In this case, we'll say that the strand we've been calling 'template'
% in the DNA replication section is our 'anti-sense' strand when it comes
% to transcribing this gene. So that means the 'synthesized' strand is now
% our 'sense' strand. Let's go ahead and define new variables to reflect
% this.
antisense5_3 = template5_3;
antisense3_5 = template3_5;
sense5_3 = synth5_3;

%%%

% We'll assume that the entire DNA sequence that we've been looking at is
% going to be transcribed into mRNA.

%%%

% Now we'll write code that mimics the function of this transcription
% machinery. Below is a matrix that gives the base-pairing rules followed
% for transcription. The letter in the first column gives the base of the
% DNA strand; the letter in the second column gives the corresponding
% base of the RNA strand.
B = {'a' 'u'
     'c' 'g'
     'g' 'c'
     't' 'a'};

% Now initialize a variable that will be used to store the sequence of the
% newly-synthesized RNA strand.
RNA5_3 = [];


% The for loop below reads each base on the DNA strand and chooses the
% appropriate base to be added to the RNA strand being synthesized, but
% it's incomplete. Fill in the gaps to complete the loop.
for i = 1:length(template3_5)

    % Pull the ith base of the DNA strand. We want to synthesize our
    % virtual RNA strand 5'->3', just like a cell does. So which is the
    % appropriate DNA strand to use here? Should we use antisense5_3,
    % antisense3_5, or sense5_3? Hint: use matrix 'B' to help make your
    % decision.
    DNA_base = sense5_3(i);

    row_B = strcmp(DNA_base,B(:,1)); % identify the row in matrix B corresponding to templateBase; note that the variable row_B is a boolean vector

    % Define the synthesized base using matrix 'B' and vector 'row_B'
    RNA_base = B(row_B,2);

    % Store synth_base in the ith element of your storage vector. Again,
    % we'll need to use 'char'

    RNA_base = char(RNA_base);
    RNA5_3(i) = (RNA_base);


end

% Again, use the MATLAB command 'char' to convert the vector into a
% character string.
RNA5_3 = char(RNA5_3);

%% Translation

% This section of code models translation, the process of synthesizing a
% polypeptide from an RNA template.

%%%

% Now we'll write code that mimics the function of this translation
% machinery. Below is a matrix that gives the amino acid-codon rules
% followed for translation. The letters in the first column gives the codon
% of the RNA strand; the letter in the second column gives the
% corresponding single letter amino acid code. Note that asterisks '*' are
% used to denote stop codons.
C = {'uuu' 'F'
     'uuc' 'F'
     'uua' 'L'
     'uug' 'L'
     'cuu' 'L'
     'cuc' 'L'
     'cua' 'L'
     'cug' 'L'
     'auu' 'I'
     'auc' 'I'
     'aua' 'I'
     'aug' 'M'
     'guu' 'V'
     'guc' 'V'
     'gua' 'V'
     'gug' 'V'
     'ucu' 'S'
     'ucc' 'S'
     'uca' 'S'
     'ucg' 'S'
     'ccu' 'P'
     'ccc' 'P'
     'cca' 'P'
     'ccg' 'P'
     'acu' 'T'
     'acc' 'T'
     'aca' 'T'
     'acg' 'T'
     'gcu' 'A'
     'gcc' 'A'
     'gca' 'A'
     'gcg' 'A'
     'uau' 'Y'
     'uac' 'Y'
     'uaa' '*'
     'uag' '*'
     'cau' 'H'
     'cac' 'H'
     'caa' 'Q'
     'cag' 'Q'
     'aau' 'N'
     'aac' 'N'
     'aaa' 'K'
     'aag' 'K'
     'gau' 'D'
     'gac' 'D'
     'gaa' 'E'
     'gag' 'E'
     'ugu' 'C'
     'ugc' 'C'
     'uga' '*'
     'ugg' 'W'
     'cgu' 'R'
     'cgc' 'R'
     'cga' 'R'
     'cgg' 'R'
     'agu' 'S'
     'agc' 'S'
     'aga' 'R'
     'agg' 'R'
     'ggu' 'G'
     'ggc' 'G'
     'gga' 'G'
     'ggg' 'G'};

% The start codon for this transcript is at positions [60 61 62]. Double
% check that you see a start codon at those positions in the string
% variable RNA5_3 - if you do, you're probably on the right track so far!
% Note that the sequence before the start codon is still part of the mRNA
% transcript, but it's not translated. What name do we give to this part of
% the gene?

%%% Leading Sequence
%%%

% Let's cut off the beginning of the sequence so we start at the start
% codon.
ORF = RNA5_3(60:end);

% The while loop below reads each base on the DNA strand and chooses the
% appropriate amino acid to be added to the synthesized polypeptide, but
% it's incomplete. Fill in the gaps to complete the loop. NOTE: if you get
% to this line of code and it takes more than a few seconds to run, there's
% probably a bug in the while loop. Use ctrl+C to stop the code and try to
% troubleshoot.
AA_seq = NaN;
AA_pos = 1;
ORF_pos = 1:3;
while ~strcmp(char(AA_seq(end)),'*') % continue until we've reached a stop codon

    % Use the indexing variable ORF_pos to grab the codon from ORF
    codon = ORF(ORF_pos);

    % Use the codon to identify the corresponding amino acid using
    % reference table
    row_C = strcmp(codon,C(:,1));

    % Define the synthesized base using matrix 'C' and vector 'row_C'
    AA = C(row_C,2);

    % Store AA in the AA_pos element of the storage vector AA_seq. You'll
    % need to use the 'char' command here.
    AA = char(AA);
    AA_seq(AA_pos) = AA;

    % Update the AA_pos and ORF_pos for the next codon. Think about what
    % these values should be, and check that they match your expectation
    % for a couple iterations of the while loop.
    AA_pos = AA_pos + 3;
    ORF_pos = ORF_pos + 3;
end

%{
This code is virtually the same to the other code on my page called 'DNA Strand to
    Amino Acids except this has comments to help understand what is actually going on
    and uses a different scanned file
This was a homework assignment so there might be some random questions throughout. Yeah have fun
%}