Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- Get summary file for NCBI RefSeq genomes.
- wget ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/assembly_summary_refseq.txt
- Grab the `ftp` path (and or any other fields you need) from this file for each genome.
- awk -F '\t' '{print $20}' assembly_summary_refseq.txt > ftp_paths
- You should get something like this:
- ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/599/545/GCF_000599545.1_ASM59954v1
- ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/599/565/GCF_000599565.1_TruePyoMS2391.0
- ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/599/605/GCF_000599605.1_HJM029
- From each of the ftp path directory you should be able to get the `*genomic.gff.gz` file for that genome.
Add Comment
Please, Sign In to add comment