Print

How do I download GenBank (INSDC) genome assemblies for viruses?

NCBI RefSeq staff select from the submitted (GenBank/INSDC) viral genomes to create one or more RefSeq assemblies for each viral species. We further curate other submitted genomes from the same species as genome neighbors* that allow us to represent diversity within the species. We are in the process of making all neighbor assemblies available in the Assembly database and the GenBank Genome FTP site, but it has not been completed (November 2018). Thus, the Assembly database/FTP site currently contains all RefSeq viral/viroid assemblies, but not the entire set of GenBank/INSDC genomes neighbors.

If you desire to obtain the indexed genome neighbors, follow these steps on the web:   

An alternative way works in the Genome database as described in the How to retrieve non-RefSeq (DDBJ/EMBL/GenBank) nucleotide sequences of complete viral genomes FAQ.

Additionally, the Viruses directory on the genomes FTP site provides two separate lists —one for viruses and another one for viroids— that contain accessions for RefSeq genomes and their GenBank neighbors.

*Exception to note: Genome neighbors are not calculated for the influenza sequences and you will not be able to obtain the sequences for this taxon as described above. The GenBank influenza sequences are distributed separately through the INFLUENZA FTP site maintained through the Influenza Virus Resource page.