Print

How do I download RefSeq genome data for viruses?

NCBI RefSeq staff create reference sequence records (RefSeqs) only from selected GenBank (INSDC) records to obtain one or more RefSeq genome (assembly) for each viral species. You can access and download viral RefSeqs through the NCBI FTP site or the web. Choose your access/downloading path depending on your goal:

1. Accessing viral data that are organized in individual RefSeq assemblies
Assembly records aggregate all segments of segmented viral genomes as a single genome assembly. On the web, search the Assembly database for all viral entries or for a smaller taxonomic group (example):
  • On the search results page, select Latest RefSeq within the Status facet on the left side of the screen.
  • Use additional facets/filters to narrow your search results to the set that you want. (Tip: A statement above the records will indicate which filters are activated and allow you to Clear all before a new search/selection)
  • Use the blue Download Assemblies button at the top of the page and select the format of your choice.
  • Note the estimated size of the data (uncompressed). The data will download as a file with tar compression.
Alternatively, use the /genomes/refseq/viral path on the NCBI FTP site and refer to the assembly_summary file, which lists various metadata that you can use to determine your set of assemblies to download. For additional help on downloading genome assembly data see the Genome Download (FTP) FAQ.
 

2. Accessing individual RefSeq genome records for viruses (not organized in individual assemblies)

NCBI creates an individual RefSeq sequence record for each viral segment. Use the links under the Explore Viral Genome Sequences section of the Viral Genomes page (a part of Genome resource) for convenient access and selection of the data that you want:
The Viral Genomes resource page also provides the direct link (under the Download Viral Genome Data section) to the Complete RefSeq release of viral and viroid sequences:
  • RefSeq collection releases occur every two months
  • There is no archive of previous releases
  • You can update the records between releases by parsing the daily files