Untitled

##Alphabet α##

Lets consider the most general case of IUPAC encoding. Our alphabet α is:

*α = { T, K, H, Y, G, C, W, V, A, D, S, M, B, R, N }*

symbol|reverse complement|options|full name
------|------------------|-------|---------|
**T**|A|T|Thymine
**K**|M|G, T|Keto
**H**|D|A, C, T|Not G
**Y**|R|T, C|Pyrimidine
**G**|C|G|Guanine
**C**|G|C|Cytosine
**W**|W|A, T|Weak bonds
**V**|B|G, C, A|Not T
**A**|T|A|Adenine
**D**|H|G, A, T|Not C
**S**|S|G, C|Strong bonds
**M**|K|A, C|Amino
**B**|V|G, T, C|Not A
**R**|Y|G, A|Purine
**N**|N|A, G, C, T|Any

##Fragment ε##
A fragment **ε** is defined as the triplet *ε = { n, o, l }* where:
 * **n** is the nibble number
 * **o** is the offset in nibble **n** to the start of the fragment
 * **l** is the length of the fragment in nucleotides

##Barcode set β##
Each **b** in the barcode set **β** is a pair *{ s, t }* where:
   * **s** is a word over the alphabet **α** of some length **l**
   * **t** is an ordered set of fragments who's total concatenated length is **l**

## Read and Nibble ##
Each read **r** in R is a set of nibbles. Each nibble is a nucleotide sequence with corresponding Phred quality scores.

##Quality scores##
Lets assume those are encoded in the *Illumina 1.8 Phred+33* so the value is encoded in ASCII.
To get the *Phred* score we first get the ordinal of the character and than remove 33 from it.

*Phred* is *-10 * log base 10 of p*, where *p* is the probability of an error.

To get *p* we take *10 ^ -(Phred / 10)*.

for instance:
```
Ordinal('+') = 43, Phred = 43 - 33 = 10, p = 10 ^ -1 = 0.1
Ordinal('5') = 53, Phred = 53 - 33 = 20, p = 10 ^ -2 = 0.01
Ordinal(';') = 53, Phred = 59 - 33 = 26, p = 10 ^ -2.6 = 0.00251188643151
Ordinal('A') = 65, Phred = 65 - 33 = 32, p = 10 ^ -3.2 = 0.00063095734448
```

For each read **r** in **R** we can calculate the word for each barcode.
So we get the vector **Br** which is a list of words over **α** of length **card(β)**,
and the vector **Qr** which is a list of corresponding quality scores, or probabilities of error for each base in each element of the vector **Br**.

##Score##
For each read and each barcode we calculate the score of the barcode for that read.