Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- /*
- Write a function named dnaErrors that accepts two strings representing DNA sequences as parameters
- and returns an integer representing the number of errors found between the two sequences,
- using a formula described below. DNA contains nucleotides,
- which are represented by four different letters A, C, T, and G. DNA is made up of a pair of nucleotide strands,
- where a letter from the first strand is paired with a corresponding letter from the second.
- The letters are paired as follows:
- A is paired with T and vice-versa.
- C is paired with G and vice-versa.
- Below are two perfectly matched DNA strands. Notice how the letters are paired up according to the above rules.
- "GCATGGATTAATATGAGACGACTAATAGGATAGTTACAACCCTTACGTCACCGCCTTGA"
- |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
- "CGTACCTAATTATACTCTGCTGATTATCCTATCAATGTTGGGAATGCAGTGGCGGAACT"
- In some cases, errors occur within DNA molecules; the task of your function is to find two particular kinds of errors:
- Unmatched nucleotides, in which one strand contains a dash ('-') at a given index,
- or does not contain a nucleotide at the given index (if the strings are not the same length).
- Each of these counts as 1 error.
- Point mutations, in which a letter from one strand is matched against the wrong letter in the other strand.
- For example, A might accidentally pair with C, or G might pair with G. Each of these counts as 2 errors.
- For example, consider these two DNA strands:
- index 01234567890123456789012
- "GGGA-GAATCTCTGGACT"
- "CTCTACTTA-AGACCGGTACAGG"
- This pair of strands has three point mutations (at indexes 1, 15, and 17),
- and seven unmatched nucleotides (dashes at indexes 4 and 9, and nucleotides
- in the second string with no match at indexes 18-22). The point mutations
- count as a total of 3 * 2 = 6 errors, and the unmatched nucleotides count as 7 * 1 = 7 errors,
- so your function would return an error count of 6+7 = 13 total errors if passed the two above strands.
- You may assume that each string consists purely of the characters A, C, T, G, and - (the dash character),
- but the letters could appear in either upper or lowercase. The strings might be the same length,
- or the first or second might be longer than the other. Either string could be very long, very short,
- or even the empty string. If the strings match perfectly with no errors as defined above, your function should return 0.
- */
- #include<iostream>
- using namespace std;
- int dnaErrors(string &T_1, string &T_2)
- {
- int len1 = T_1.size();
- int len2 = T_2.size();
- int total_errors = len1 - len2;
- int error_1 = 0, error_2 = 0;
- for (int i = 0; i < len2; i++)
- {
- if (T_1[i] != 'A' && T_1[i] != 'T' && T_1[i] != 'G' && T_1[i] != 'C')
- {
- error_1++;
- }
- if (T_2[i] != 'A' && T_2[i] != 'T' && T_2[i] != 'G' && T_2[i] != 'C')
- {
- error_1++;
- }
- else if ((T_1[i] == 'A' || T_1[i] == 'T' || T_1[i] == 'G' || T_1[i] == 'C') &&
- (T_2[i] == 'A' || T_2[i] == 'T' || T_2[i] == 'G' || T_2[i] == 'C'))
- {
- if (!((T_1[i] == 'A' && T_2[i] == 'T') || (T_1[i] == 'T' && T_2[i] == 'A') || (T_1[i] == 'G' && T_2[i] == 'C') || (T_1[i] == 'C' && T_2[i] == 'G')))
- {
- error_2++;
- }
- }
- }
- total_errors += error_1 + error_2 * 2;
- return total_errors;
- }
- int main()
- {
- string dna1, dna2;
- getline(cin, dna1);
- getline(cin, dna2);
- int len_1 = dna1.size();
- int len_2 = dna2.size();
- if (len_1 >= len_2)
- cout << "Total Errors: " << dnaErrors(dna1, dna2) << endl;
- else
- cout << "Total Errors: " << dnaErrors(dna2, dna1) << endl;
- return 0;
- }
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement