Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- The MLST website will be closing down in the next few days or weeks. It has been running continuously since 2004, having moved with me from the
- Max-Planck Institut for infection biology in Berlin in 2007 to UCC, Cork and then again in 2013 to University of Warwick. The basic software
- running it is also from 2004 and cannot be readily updated to newer versions. As a result, it is not readily portable to a new computer. When
- it moved to University of Warwick, we paid an IT specialist to install it on a new UNIX cluster, but that cluster is now endangered and will
- probably die in the next few months. We are not willing to port MLST to another cluster because all the functionality and data of the MLST website
- is available in EnteroBase. So for those of you who continue to use the MLST website because you are used to it, I recommend you forget it now
- and move to EnteroBase. As long as the MLST website is running, you can use your old username and password to migrate your identity to EnteroBase.
- Once it has closed, this will no longer be possible and you would have to create a new identity.
- There will be no further support for the MLST website. Once it closes that website page will do an automatic redirect to the EnteroBase website
- page. We have continued to receive requests for how people can obtain new STs or allele numbers based on ABI sequences even though the MLST website
- clearly states this is no longer possible. I hereby reiterate that the only way to obtain new 7-gene ST numbers or allele numbers for Escherichia,
- Salmonella, Yersinia or Moraxella is by uploading Illumina short reads to EnteroBase.
- Improvements in EnteroBase since the last updates:
- 1. The Salmonella databases now contain >100,000 entries, most of which consist of genomic assemblies. EnteroBase has almost 1,000 users and is being
- cited by others even though we have yet to publish a reviewed publication. However, an unreviewed publication now exists which shows what you can do
- with EnteroBase for Salmonella (http://biorxiv.org/content/early/2017/02/14/105759). This provides a reasonably reliable SNP-based tree based on 50,000
- representative genomes of subspecies I, a pan-genome for the deep lineage we call the Para C Lineage, and publicly accessible workspaces where you can
- look at all the representative genomes and their relationships and at the pan-genome. It also discusses how long serovar Paratyphi C has been infecting
- humans, and provides a genome called Ragna from a 20-year woman who died in Trondheim in 1200 CE.
- 2. Martin Sergeant has implemented user-defined fields. These can consist of copies of experimental fields that are visible under the different
- experiment types. For example, for Salmonella you could have the 7-gene ST designation in one field and the 51-gene rMLST designation in a second
- field with the 7-gene eBG in a third field and the 51 gene reBG in a fourth field and the serovar predicted by SISTR in a fifth field. User-defined
- fields appear in a third window between metadata and experimental data, and can be used for colour-coding nodes in a tree as can any other field.
- They are stored permanently and you can add user-defined fields for your own private data that you wish to share with your designated buddies. You
- can associated them with a workspace, which you can also share with your designated buddies.
- 3. Martin Sergeant, Zhemin Zhou and Nabil Alikhan have implemented a GUI which generates GrapeTrees using the neighbor-joining (NJ) algorithm, the
- classical minimal spanning tree algorithm (MSTree) similar to PhyloViz, or an improved minimal spanning tree algorithm which we call MSTree V2. MSTree
- V2 calculates distances by Edmond's algorithm, which is a directed version of the minimal spanning tree algorithm that accounts for missing data
- correctly (https://en.wikipedia.org/wiki/Edmonds%27_algorithm).
- Edmond's algorithm is very important for core genome MLST (cgMLST) because we estimate that on average one or more of the 3002 cgMLST loci in Salmonella
- is not assigned an allele number for each entry because the genes are not assembled or disrupted or defective. The same applies to individual SNPs
- in a core genome SNP comparison. The result is that topologies are distorted when directional distances from more data to less data are not used.
- GrapeTree is our name for the cluster of related bacterial strains that tends to be presented in minimal spanning trees. Our GrapeTree GUI
- is available within EnteroBase once you have created a workspace or connected to somebody else's workspace. It will also soon be available as a
- stand-alone version. The EnteroBase version interacts directly with EnteroBase data whereas you need to provide your own data for the standalone version.
- People who are interested in becoming beta-testers of the GrapeTree GUI should get in touch with Martin by email. Beta-testers who make comments on
- the existing version(s) will also receive advanced access to a publication on this GUI.
- 4. We intend to soon make public an EnteroBase for Clostridioides, which already contains 6571 genomes and a 2556 gene cgMLST scheme.
- This is being curated by Ulrich Nuebel at the DSMZ, Braunschweig. If you are working with C. diff and want access to this database,
- please contact him.
- 5. Zhemin and Nabil will focus on the population genomics that are revealed by the additional 10,000 genomes we are uploading. They will no longer continue to
- develop classical EnteroBase because their salaries are from a Wellcome Trust grant dedicated to ancient DNA and reconstructing ancient history. Instead,
- William Nicholson has joined the EnteroBase team, and has been improving and updating the help files and documentation. He is also dealing with user
- requests. So if you are struggling with EnteroBase or need help, look at the improved documentation or contact him.
- 6. We intend to continue developing EnteroBase at Warwick until September, 2019. Until then, our group anticipate uploading 10,000 Salmonella genomes,
- inluding 100 completed genomes, and a second group has the funding and is intending to upload another Salmonella 10,000 genomes.
- We do not intend to put in an application for renewal funding for EnteroBase for after September, 2019, and our active participation will cease at that time.
- However, we now have verbal agreements from a distinct institution who is willing to take over the responsibility for further development and maintenance
- once binding legal agreements have been signed. It is still possible for other institutions to jump on board and commit to setting up an EnteroBase mirror
- and participating in its development.
- best wishes to all users
- Mark Achtman
Advertisement
Add Comment
Please, Sign In to add comment