daily pastebin goal
43%
SHARE
TWEET

Infection of biological DNA with digital Computer Code

a guest Dec 15th, 2013 11,248 Never
Upgrade to PRO!
ENDING IN00days00hours00mins00secs
  1.  
  2.             ************************************************************
  3.                  Infection of biological DNA with digital Computer Code
  4.                                         by Second Part To Hell
  5.             ************************************************************
  6.                
  7.  
  8.  
  9.  
  10.  
  11.      Biological life spreads in the biological-chemical world, computer codes
  12.      can spread in the digital computer world. That is a rule - no self-
  13.      replicator has ever overcome the digital-biological barrier. Until today.
  14.      Here I show a method how a digital computer code can infect biological
  15.      DNA, thus spread in the biological-chemical world. The method is mainly
  16.      based on the fantastic research by the J. Craig Venter Institute on
  17.      synthetic life, and might ask new questions about the definition of
  18.      life itself.
  19.  
  20.  
  21.                
  22.  
  23.  
  24.   1) Introduction
  25.    
  26.   2) Craig Venter's synthetical cell
  27.      2.1) General concept
  28.      2.2) Watermarks and DNA encoding language
  29.  
  30.   3) FASTA file format
  31.      3.1) plain fasta-files
  32.      3.2) xml fasta-files
  33.      
  34.   4) Infection scenario
  35.      4.1) Stealth technique in DNA
  36.  
  37.   5) Conclusion
  38.  
  39.  
  40.  
  41.  
  42.  
  43.   1) Introduction
  44.  
  45.      In 2010, the J. Craig Venter Institute (JCVI) reported the creation of a
  46.      bacterial cell with a chemically synthesized genome [1]. They sequenced
  47.      the  DNA of a bacteria (M.mycoides), modified several parts of its DNA in
  48.      the computer, synthetized the novel genome and transplanted it to a
  49.      different bacteria's cell (M.capricolum). They observed the control of
  50.      the cell only by the new DNA. For verification, they introduced
  51.      artificial "watermarks" sequences (non-coded part of the DNA) to the
  52.      genome, which contained among other things the names of the involved
  53.      scientists (written in a specially designed DNA encoding alphabet). The
  54.      artificially created genome was capable of continuous self-replication.
  55.      They call their new artificial bacterial Mycoplasma mycoides JCVI-syn1.0.
  56.      
  57.      This is in my opinion one of the greatest scientific achievement in
  58.      recent years.
  59.      
  60.      In this text I explain the implementation of a computer code that makes
  61.      the step from the digital to the biological world.
  62.      The computer code, written in C++, hosts the DNA sequence of M.mycoides
  63.      JCVI-syn1.0. At runtime it acts as follows:
  64.      
  65.      1) Preparing the DNA sequence of M.mycoides JCVI-syn1.0 in the memory,
  66.         (with slightly modified watermarks).
  67.      2) Encoding own file-content in base32. The base32 code is then encoded in
  68.         JCVI's DNA-encoded alphabet.
  69.      3) This representation of its digital form is then copied to a
  70.         watermark of the bacteria's genome in memory. With this, a fully
  71.         functional bacterial DNA sequence including the digital code is
  72.         generated.
  73.      4) Next it searches for FASTA-files on the computer, which are text-based
  74.         representations of DNA sequences, commonly used by many DNA sequence
  75.         libraries.
  76.      5) For each FASTA-file, it replaces the original DNA with the bacterial
  77.         DNA containing the digital form of the computer code.
  78.        
  79.      The code has a classical self-replication mechanism as well, to eventually
  80.      end up on a computer in a microbiology-laboratory with the ability of
  81.      creating DNA out of digital genomes (such as laboratories by the JCVI).
  82.      
  83.      If the scientists are incautious, the computer code's genome (instead of
  84.      the intented original DNA) might be written to the biological cell.
  85.      The new cell will start replicating in the biological world, and with it
  86.      the representation of the digital computer code.
  87.      
  88.      
  89.      
  90.      
  91.      
  92.   2) Craig Venter's synthetical cell
  93.  
  94.      2.1) General concept
  95.      
  96.           The team of Craig Venter has demonstrated how to create bacteria
  97.           controlled by artificially designed and synthesized DNA. For that,
  98.           they used the sequenced DNA of a ~1 mega-base pair bacteria
  99.           M.mycoides. They modified the genome on the computer - deactivated
  100.           several genes, and introduced watermarks (artificial non-coding
  101.           parts of the DNA). A company called Blue Heron sequenced 1000 bp
  102.           fragments of the full DNA. With a three-step procedere, they assembled
  103.           the full DNA. This was transplanted into an empty receiver cell of the
  104.           bacteria M.capricolum.
  105.          
  106.           Amazingly, the cell with the new genom booted up, and was able to
  107.           self-replicate. To verify that the expected genome was replicating,
  108.           they introduced special functionality to the watermarks which are
  109.           visible with chemical methods.
  110.          
  111.           In their article [1] they write:
  112.               "This work provides a proof of principle for producing cells
  113.                based on computer-designed genome sequences. DNA sequencing
  114.                of a cellular genome allows storage of the genetic instructions
  115.                for life as a digital file."
  116.                
  117.           The project describe here uses the method of their proof-of-principle.
  118.          
  119.          
  120.                
  121.      2.2) Watermarks and DNA encoding language  
  122.          
  123.           The watermarks are parts of the genome that are not translated into
  124.           functional proteins. That means: They are part of the DNA, but have
  125.           no functional effect on the behaviour of the cell.
  126.          
  127.           The watermarks are represented by nucleotides A,C,G,T. JCVI
  128.           developed an encoding technique from DNA to human letters. Three
  129.           nucleotides (one codon) represent one letter or ascii symbol. With
  130.           that encoding methode, they encode readable information into the
  131.           cell: It contains the name of the involved scientists, philosophical
  132.           quotes and one html-code with an e-mail adresse.
  133.          
  134.           The encoding from codons to letters has never been documented
  135.           explicitly, but can be deduced mainly from the implicit information
  136.           given in the article. The known alphabet looks like this:
  137.          
  138.           TAG = a        GCA = k        TCC = u        AGA = 4        CAC = /
  139.           AGT = b        AAC = l        TTG = v        GCG = 5        CCA = =
  140.           TTT = c        CAA = m        GTC = w        GCC = 6        CGA = .
  141.           ATT = d        TGC = n        GGT = x        TAT = 7        GAG = !
  142.           TAA = e        CGT = o        CAT = y        CGC = 8        CAG = :
  143.           GGC = f        ACA = p        TGG = z        GTA = 9        GGA = "
  144.           TAC = g        TTA = q        TCT = 0        ATA = space    GTG = ,
  145.           TCA = h        CTA = r        CTT = 1        GGG = chr(10)  TCG = @
  146.           CTG = i        GCT = s        ACT = 2        AGC = >        CCC = -
  147.           GTT = j        TGA = t        AAT = 3        CGG = <
  148.          
  149.           Four watermarks have been introduced to the modified bacterial DNA
  150.           in the computer.
  151.          
  152.           As an example, a part of the DNA sequence of one watermark is:
  153.          
  154.               GCTTAATAAATATGATCACTGTGCTACGCTATATGCCGTTGAATATAGGCTATATGATC
  155.               ATAACATATATAGCTATAAGTGATAAGTTCCTGAATATAGGCTATATGATCATAACATA
  156.               TACAACTGTACTCATGAATAAGTTAACGA
  157.        
  158.           The sequence is divided into three-nucleotide parts (codons):
  159.          
  160.               GCT TAA TAA ATA TGA TCA CTG TGC TAC GCT ATA TGC CGT TGA ATA
  161.               TAG GCT ATA TGA TCA TAA CAT ATA TAG CTA TAA GTG ATA AGT TCC
  162.               TGA ATA TAG GCT ATA TGA TCA TAA CAT ATA CAA CTG TAC TCA TGA
  163.               ATA AGT TAA CGA
  164.              
  165.           We can see in the above list that GCT stands for "s", TAA stands for
  166.           "e", ATA is a space, TGA stands for "t" ... and so on.
  167.          
  168.           In the end we can extract the sentence:
  169.                  "see things not as they are, but as they might be."
  170.                  
  171.           Obviously we can also write in this encoding technique:
  172.    
  173.           "hello vxers!" ->
  174.                 TCA TAA AAC AAC CGT ATA TTG GGT TAA CTA GCT GAG  
  175.                
  176.           The full structure of the alphabet is not known ,therefor only 49 out
  177.           of 64 codon's representation are presented here. However all of them
  178.           are used in the watermark (i.e. there is no biological reason for not
  179.           using specific codons).  
  180.                                                                                            
  181.      
  182.      
  183.      
  184.      
  185.   3) FASTA file format    
  186.      
  187.      Fasta files are textbased representations of nucleotide sequences, commonly
  188.      used in micro-biologic libraries. There are two fasta-file types that I
  189.      will describe here. The first one is plain fasta-format (which usually have
  190.      the file-extention .fasta or .fas.
  191.      Both are available from the genome-database
  192.      http://www.ncbi.nlm.nih.gov/.
  193.      
  194.      For example, if you want to see the DNA of Mycoplasma mycoides JCVI-syn1.0:
  195.          http://www.ncbi.nlm.nih.gov/nuccore/296455217
  196.          
  197.      or something more common: E.coli
  198.          http://www.ncbi.nlm.nih.gov/nuccore/BA000007.2
  199.          
  200.      
  201.      
  202.   3.1) plain fasta-files
  203.  
  204.        The plain fasta-files have a small header, followed by a plain
  205.        representation of the DNA in the nucleotide basis (A, T, G, C).
  206.        Two examples:
  207.        
  208.        a) Mycoplasma mycoides JCVI-syn1.0
  209.        This is about 1MB of data
  210.        
  211. - - - - - - - - Mycoplasma mycoides JCVI-syn1.0.fasta - - - - - - - -      
  212. >gi|296455217|gb|CP002027.1| Synthetic Mycoplasma mycoides JCVI-syn1.0 clone sMmYCp235-1, complete sequence
  213. ATGAACGTAAACGATATTTTAAAAGAACTTAAACTAAGTTTAATGGCTAATAAAAATATTGATGAATCCG
  214. TGTATAACGACTATATAAAGACAATAAATATTCATAAAAAGGGGTTTTCTGATTATATTGTTGTTGTTAA
  215. ATCACAATTTGGTTTGTTAGCTATAAAACAGTTTCGTCAAACTATTGAAAATGAGATAAAAAATATTTTA
  216. AAAGAACCTGTAAATATTAGTTTTACATACGAACAAGAATATAAAAAACAACTAGAAAAAGATGAATTAA
  217. TTAATAAAGATCATTCTGATATCATTACTAAAAAAGTTAAAAAAACTAATGAAAACACTTTTGAAAATTT
  218. ...
  219. - - - - - - - - Mycoplasma mycoides JCVI-syn1.0.fasta - - - - - - - -          
  220.          
  221.        b) Escherichia coli
  222.        This is about 5.5MB of data
  223.        
  224. - - - - - - - - - - - - - - - E.coli.fasta - - - - - - - - - - - - - - -        
  225. >gi|47118301|dbj|BA000007.2| Escherichia coli O157:H7 str. Sakai DNA, complete genome
  226. AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTCTCTGACAGCAGC
  227. TTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGGTCACTAAATACTTTAACCAA
  228. TATAGGCATAGCGCACAGACAGATAAAAATTACAGAGTACACAACATCCATGAAACGCATTAGCACCACC
  229. ATTACCACCACCATCACCACCACCATCACCATTACCATTACCACAGGTAACGGTGCGGGCTGACGCGTAC
  230. AGGAAACACAGAAAAAAGCCCGCACCTGACAGTGCGGGCTTTTTTTTCGACCAAAGGTAACGAGGTAACA    
  231. ...
  232. - - - - - - - - - - - - - - - E.coli.fasta - - - - - - - - - - - - - - -
  233.  
  234.  
  235.  
  236.      3.2) xml fasta-files
  237.      
  238.      The second form is pure DNA aswell, however in a small xml-file. Two
  239.      examples again:
  240.  
  241. - - - - - - - - Mycoplasma mycoides JCVI-syn1.0.fasta.xml - - - - - - - -      
  242. <?xml version="1.0"?>
  243.  <!DOCTYPE TSeqSet PUBLIC "-//NCBI//NCBI TSeq/EN" "http://www.ncbi.nlm.nih.gov/dtd/NCBI_TSeq.dtd">
  244.  <TSeqSet>
  245. <TSeq>
  246.   <TSeq_seqtype value="nucleotide"/>
  247.   <TSeq_gi>296455217</TSeq_gi>
  248.   <TSeq_accver>CP002027.1</TSeq_accver>
  249.   <TSeq_taxid>766747</TSeq_taxid>
  250.   <TSeq_orgname>synthetic Mycoplasma mycoides JCVI-syn1.0</TSeq_orgname>
  251.   <TSeq_defline>Synthetic Mycoplasma mycoides JCVI-syn1.0 clone sMmYCp235-1, complete sequence</TSeq_defline>
  252.   <TSeq_length>1078809</TSeq_length>
  253.   <TSeq_sequence>ATGAACGTAAACGATATTTTAAAAGAACTTAAACTAAGTTTAATGGCTAATAAAAATATTGATGAATCCGTGTATAACGACTATATAAAGACAATAAATATTCATAAAAAGGGGTTTTCTGATTATATTGTTGTTGTTAAATCA...</TSeq_sequence>
  254. </TSeq>
  255.  
  256. </TSeqSet>
  257. - - - - - - - - Mycoplasma mycoides JCVI-syn1.0.fasta - - - - - - - -
  258.  
  259.      or E.coli again:
  260.  
  261. - - - - - - - - - - - - - - - E.coli.fasta.xml - - - - - - - - - - - - - - -      
  262. <?xml version="1.0"?>
  263.  <!DOCTYPE TSeqSet PUBLIC "-//NCBI//NCBI TSeq/EN" "http://www.ncbi.nlm.nih.gov/dtd/NCBI_TSeq.dtd">
  264.  <TSeqSet>
  265. <TSeq>
  266.   <TSeq_seqtype value="nucleotide"/>
  267.   <TSeq_gi>47118301</TSeq_gi>
  268.   <TSeq_accver>BA000007.2</TSeq_accver>
  269.   <TSeq_taxid>386585</TSeq_taxid>
  270.   <TSeq_orgname>Escherichia coli O157:H7 str. Sakai</TSeq_orgname>
  271.   <TSeq_defline>Escherichia coli O157:H7 str. Sakai DNA, complete genome</TSeq_defline>
  272.   <TSeq_length>5498450</TSeq_length>
  273.   <TSeq_sequence>AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTCTCTGACAGCAGCTTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGGTCACTAAATACTTTAACCAATATA...</TSeq_sequence>
  274. </TSeq>
  275.  
  276. </TSeqSet>
  277. - - - - - - - - - - - - - - - E.coli.fasta.xml - - - - - - - - - - - - - - -          
  278.          
  279.          
  280.          
  281.          
  282.          
  283.   4) Infection scenario
  284.  
  285.      The strategy of this digitally and biologically self-replicating code is
  286.      the following:
  287.      
  288.      It starts as a digital computer file, and replicates itself via local
  289.      networks, USB sticks and other removeable devices.
  290.      
  291.      There are two potential scenarios to step from the digital to the
  292.      biological world:
  293.      
  294.      1) The self-replicating code might end up at a USB stick of a
  295.         microbiologist. (S)he runs it unintentionally at a computer that host
  296.         DNA sequences (stored in the common fasta-file format) which will be
  297.         synthesized and transplanted to receiving cells (such as in the labs of
  298.         JCVI). The computer code will find these fasta-files and replace their
  299.         DNA sequences with the bacterial genome of M.mycoides. This genome
  300.         contains a watermark with the DNA-representation of the file-content of
  301.         the computer code. When the DNA files are synthesized, the computer code
  302.         is synthesized aswell, and will continuously self-replicate in the
  303.         biological world in the form of a bacteria.
  304.      
  305.      2) In this scenario, the code gets to the computer of a genome library
  306.         (such as NCBI, National Center for Biotechnology Information).
  307.         The computer code will search for FASTA files and replace their DNA
  308.         content with its own DNA code. The employee will unintentionally upload
  309.         the computer code's DNA instead of the original DNA.
  310.         Then - back in a laboratory like that from JCVI - scientists will
  311.         download the modified DNA sequence. When they synthesize the wrong DNA
  312.         sequence, the computer code lands in the bacterial cell again, again
  313.         capable of continuously self-replicate in the biological world.
  314.      
  315.      There is a different interesting scenario: First, Mycoplasma mycoides
  316.      bacteria are usually infecting cattles and goats. Imagine an unknown
  317.      outbreak of the here presented bacteria. Goats or cattles would get sick,
  318.      and microbiologists want to know the exact reason. They take samples of
  319.      the infectious cells and sequence them in their laboratories.
  320.      Now they see the DNA, and find out that the bacteria contains a rather
  321.      big non-coded sequence - the watermark. They find this very unnatural and
  322.      analyse the watermark, also by applying Craig Venter's DNA encoding
  323.      alphabet (because it is very famous due to their first fascinating
  324.      results). After decoding, they see that the code only contains
  325.      a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z,2,3,4,5,6,7
  326.      
  327.      This is a curious structure, they research a bit and see that its base32-
  328.      encoding. They decode it, and see 'M','Z',0x90,0x0,...
  329.      They immediatly see that its a windows executeable, and I guess would be
  330.      surprised :)
  331.      
  332.      
  333.      
  334.      4.1) Stealth technique in the DNA
  335.      
  336.           In their modified genome of M.mycoides JCVI-syn1.0, the JCVI-team
  337.           introduces four watermarks. Every watermark contains a special
  338.           sequence which is useed to test whether a cell has the intended
  339.           genome or is a contamination (for example, from the receiving cell).
  340.          
  341.           In the supplementary material of their article [1], they describe the
  342.           exact representation of these sequences (primer). Each of the four
  343.           watermarks contain one primer. When they perform a multiplex PCR, each
  344.           watermark creats a specific characteristic.
  345.  
  346.           In my code, I removed the total original content of all watermarks,
  347.           except for the identified primer-sequence. As a result, when a team
  348.           tests the bacteria cell with the representation of the digital code,
  349.           it will have the same characteristic as their original designed DNA.
  350.           Thus the computer code's DNA will pass this test.
  351.      
  352.      
  353.        
  354.        
  355.        
  356.   5) Conclusion
  357.  
  358.      I've shown the implementation of a technique that allows a digital computer
  359.      code to make the step to the biological world. This is done by infecting a
  360.      DNA-file with the genome of a self-replicating biological bacteria. The
  361.      bacteria's genome contains the digital code of the self-replicator in form
  362.      of a base32-representation encoded via Craig Venter's DNA encoding
  363.      alphabet.
  364.      The biological bacteria will self-replicator in the biological world, and
  365.      so will the representation of the digital computer code.                
  366.        
  367.      The outbreak-probability of such cross-domain infectors is very low. The
  368.      researchers in [1] have made ethical studies, and I'm convinced that they
  369.      came up with perfect protections against potential attacks as this.
  370.                
  371.      Finally, digital self-replicators are usually not considered as a form of
  372.      life, even they fulfill the most important characteristic of life:
  373.      capability of self-replication and subject to evolution [2].
  374.      I wonder whether this computer code can count as a form of life - if so, I
  375.      would call it
  376.        
  377.                              Mycoplasma mycoides SPTH-syn1.0    
  378.                                                                 :)
  379.                                                                
  380.  
  381.  
  382.                                                            Second Part To Hell
  383.                                                                  October 2013
  384.                                                                      
  385.                                                           http://spth.virii.lu/
  386.                                                           sperl.thomas@gmail.com
  387.                                                           twitter: @SPTHvx                                                          
  388.                                        
  389.                                        
  390. [1] Daniel G. Gibson et al., "Creation of a Bacterial Cell Controlled by a
  391.               Chemically Synthesized Genome", Science 329, 52 (2010).
  392.  
  393. [2] SPTH, "Taking the redpill: Artificial Evolution in native x86 systems",
  394.               http://vxheaven.org/lib/vsp26.html, (2010).
  395.     SPTH, "Imitation of Life: Advanced system for native Artificial Evolution",
  396.               in valhalla#1, http://vxheaven.org/lib/vsp37.html, (2011).                                                            
  397.  
  398.  
  399. PS: Thanks to hh86 for motivation. Thanks to the JCVI-team for their awesome
  400.     research, looking forward reading more discoveries on the boarder between
  401.     dead and living material!
  402.    
  403.    
  404. PPS: I'm not a microbiologist (or biologist at all). Even if I tried as hard as
  405.      possible, I can not rule out that some assumptions might be wrong, some
  406.      things I might have misunderstand.
  407.      In any case, the main idea should be valid.
RAW Paste Data
We use cookies for various purposes including analytics. By continuing to use Pastebin, you agree to our use of cookies as described in the Cookies Policy. OK, I Understand
 
Top