Guest User

Enterobase update June 2017

a guest
Jun 28th, 2017
411
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 6.58 KB | None | 0 0
  1.  
  2. The MLST website will be closing down in the next few days or weeks. It has been running continuously since 2004, having moved with me from the
  3. Max-Planck Institut for infection biology in Berlin in 2007 to UCC, Cork and then again in 2013 to University of Warwick. The basic software
  4. running it is also from 2004 and cannot be readily updated to newer versions. As a result, it is not readily portable to a new computer. When
  5. it moved to University of Warwick, we paid an IT specialist to install it on a new UNIX cluster, but that cluster is now endangered and will
  6. probably die in the next few months. We are not willing to port MLST to another cluster because all the functionality and data of the MLST website
  7. is available in EnteroBase. So for those of you who continue to use the MLST website because you are used to it, I recommend you forget it now
  8. and move to EnteroBase. As long as the MLST website is running, you can use your old username and password to migrate your identity to EnteroBase.
  9. Once it has closed, this will no longer be possible and you would have to create a new identity.
  10.  
  11. There will be no further support for the MLST website. Once it closes that website page will do an automatic redirect to the EnteroBase website
  12. page. We have continued to receive requests for how people can obtain new STs or allele numbers based on ABI sequences even though the MLST website
  13. clearly states this is no longer possible. I hereby reiterate that the only way to obtain new 7-gene ST numbers or allele numbers for Escherichia,
  14. Salmonella, Yersinia or Moraxella is by uploading Illumina short reads to EnteroBase.
  15.  
  16. Improvements in EnteroBase since the last updates:
  17. 1. The Salmonella databases now contain >100,000 entries, most of which consist of genomic assemblies. EnteroBase has almost 1,000 users and is being
  18. cited by others even though we have yet to publish a reviewed publication. However, an unreviewed publication now exists which shows what you can do
  19. with EnteroBase for Salmonella (http://biorxiv.org/content/early/2017/02/14/105759). This provides a reasonably reliable SNP-based tree based on 50,000
  20. representative genomes of subspecies I, a pan-genome for the deep lineage we call the Para C Lineage, and publicly accessible workspaces where you can
  21. look at all the representative genomes and their relationships and at the pan-genome. It also discusses how long serovar Paratyphi C has been infecting
  22. humans, and provides a genome called Ragna from a 20-year woman who died in Trondheim in 1200 CE.
  23.  
  24. 2. Martin Sergeant has implemented user-defined fields. These can consist of copies of experimental fields that are visible under the different
  25. experiment types. For example, for Salmonella you could have the 7-gene ST designation in one field and the 51-gene rMLST designation in a second
  26. field with the 7-gene eBG in a third field and the 51 gene reBG in a fourth field and the serovar predicted by SISTR in a fifth field. User-defined
  27. fields appear in a third window between metadata and experimental data, and can be used for colour-coding nodes in a tree as can any other field.
  28. They are stored permanently and you can add user-defined fields for your own private data that you wish to share with your designated buddies. You
  29. can associated them with a workspace, which you can also share with your designated buddies.
  30.  
  31. 3. Martin Sergeant, Zhemin Zhou and Nabil Alikhan have implemented a GUI which generates GrapeTrees using the neighbor-joining (NJ) algorithm, the
  32. classical minimal spanning tree algorithm (MSTree) similar to PhyloViz, or an improved minimal spanning tree algorithm which we call MSTree V2. MSTree
  33. V2 calculates distances by Edmond's algorithm, which is a directed version of the minimal spanning tree algorithm that accounts for missing data
  34. correctly (https://en.wikipedia.org/wiki/Edmonds%27_algorithm).
  35. Edmond's algorithm is very important for core genome MLST (cgMLST) because we estimate that on average one or more of the 3002 cgMLST loci in Salmonella
  36. is not assigned an allele number for each entry because the genes are not assembled or disrupted or defective. The same applies to individual SNPs
  37. in a core genome SNP comparison. The result is that topologies are distorted when directional distances from more data to less data are not used.
  38. GrapeTree is our name for the cluster of related bacterial strains that tends to be presented in minimal spanning trees. Our GrapeTree GUI
  39. is available within EnteroBase once you have created a workspace or connected to somebody else's workspace. It will also soon be available as a
  40. stand-alone version. The EnteroBase version interacts directly with EnteroBase data whereas you need to provide your own data for the standalone version.
  41. People who are interested in becoming beta-testers of the GrapeTree GUI should get in touch with Martin by email. Beta-testers who make comments on
  42. the existing version(s) will also receive advanced access to a publication on this GUI.
  43.  
  44. 4. We intend to soon make public an EnteroBase for Clostridioides, which already contains 6571 genomes and a 2556 gene cgMLST scheme.
  45. This is being curated by Ulrich Nuebel at the DSMZ, Braunschweig. If you are working with C. diff and want access to this database,
  46. please contact him.
  47.  
  48. 5. Zhemin and Nabil will focus on the population genomics that are revealed by the additional 10,000 genomes we are uploading. They will no longer continue to
  49. develop classical EnteroBase because their salaries are from a Wellcome Trust grant dedicated to ancient DNA and reconstructing ancient history. Instead,
  50. William Nicholson has joined the EnteroBase team, and has been improving and updating the help files and documentation. He is also dealing with user
  51. requests. So if you are struggling with EnteroBase or need help, look at the improved documentation or contact him.
  52.  
  53. 6. We intend to continue developing EnteroBase at Warwick until September, 2019. Until then, our group anticipate uploading 10,000 Salmonella genomes,
  54. inluding 100 completed genomes, and a second group has the funding and is intending to upload another Salmonella 10,000 genomes.
  55. We do not intend to put in an application for renewal funding for EnteroBase for after September, 2019, and our active participation will cease at that time.
  56. However, we now have verbal agreements from a distinct institution who is willing to take over the responsibility for further development and maintenance
  57. once binding legal agreements have been signed. It is still possible for other institutions to jump on board and commit to setting up an EnteroBase mirror
  58. and participating in its development.
  59.  
  60. best wishes to all users
  61. Mark Achtman
Advertisement
Add Comment
Please, Sign In to add comment