Use one of these three approaches:
(1)
Directly from the web; suitable only for organisms or taxonomic groups that have a relatively small number of sequence records in the
Nucleotide or
Protein database:
- Access the sequence database that you want on the web, for example Nucleotide.
- Search for your organism by entering your organism name limited to the organism field, for example:
Salarchaeum japonicum[organism]
- Use the Send to link (located top right above the results on the search results page) and select File.
- Select either Accession List or GI List as your Format and use the Create File button to download the list.
(2)
E-utilities; use the
NCBI E-utilities API for organisms or taxonomic groups that have a large number of sequence records in the
Nucleotide or
Protein database:
- Use esearch to search, for example, for all Archaea sequence records in the Nucleotide database. Search URL example:
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=nucleotide&term=Archaea[organism]&usehistory=y
The usehistory parameter will generate the Web environment (&WebEnv) and query key (&query_key) parameters that will specify the location of the retrieved GIs on the Entrez history server.
- Follow with efetch. Your URL should include the query key number and the web environment (WebEnv) string generated by esearch. Specify the rettype as uilist and retmode as text. Example:
efetch.fcgi?db=nucleotide&query_key=<key>&WebEnv=<webenv string>&rettype=uilist&retmode=text
(3)
EDirect; use
Entrez Direct (EDirect) as the UNIX command line alternative to E-utilities:
EDirect is a relatively new method for searching and accessing records in NCBI databases. It uses UNIX command line arguments, so you need to have access to a UNIX/LINUX terminal. EDirect will run on UNIX and Macintosh computers that have the Perl language installed, and under the Cygwin UNIX-emulation environment on Windows PC's.
Here are command line examples that would generate the GI list or the accession list for all Archaea records in the Protein database:
esearch -db protein -query "Archaea[organism]" | efetch -db protein -format uid > archaea.gis
esearch -db protein -query "Archaea[organism]" | efetch -db protein -format acc > archaea.acc