Guest User

Untitled

a guest
May 21st, 2018
190
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 0.67 KB | None | 0 0
  1. Get summary file for NCBI RefSeq genomes.
  2.  
  3. wget ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/assembly_summary_refseq.txt
  4.  
  5. Grab the `ftp` path (and or any other fields you need) from this file for each genome.
  6.  
  7. awk -F '\t' '{print $20}' assembly_summary_refseq.txt > ftp_paths
  8.  
  9. You should get something like this:
  10.  
  11. ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/599/545/GCF_000599545.1_ASM59954v1
  12. ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/599/565/GCF_000599565.1_TruePyoMS2391.0
  13. ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/599/605/GCF_000599605.1_HJM029
  14.  
  15. From each of the ftp path directory you should be able to get the `*genomic.gff.gz` file for that genome.
Add Comment
Please, Sign In to add comment