- Each sequence in the FASTA file contains a Definition Line followed by the sequence data.
- The Definition Line for each sequence begins with a ">" followed by a Sequence_ID (SeqID). The SeqID identifies the same specimen in all the steps of a submission (for example, in the nucleotide FASTA file, in a protein FASTA file, or in a Source Modifier file).
- Each SeqID must be unique within the file
- SeqIDs may not contain spaces.
- SeqIDs may contain only the following characters - letters, digits, hyphens (-), underscores (_), periods (.), colons (:), asterisks (*), and number signs (#)
- SeqIDs must be 25 or fewer characters.
- The SeqID must be separated by a space from the rest of the Definition Line text
- It is recommended that the Definition Line include the organism name. If Organism Names are not input as part of their FASTA Definition Lines, they must be provided in a separate table in a subsequent page of the submission process.
- The Organism Name must be provided in this format:
[organism=Organism Name]
(square bracket equal sign Organism Name square bracket).
- Source Modifiers provided in the FASTA file Definition Line must follow the same format as Organism Name. Examples: [isolate=mosquito12] [clone=AC3] [strain=BuzzLY]
- Brief, free text description of the sequence may follow the formatted
Organism Name and Source Modifiers. Examples: 'cytochrome oxidase I, partial CDS' 'trnH-psbA intergenic spacer'
- The FASTA Definition Line may not contain any internal hard returns.
- However, the FASTA Definition Line must be separated from the actual sequence by a hard return.
The placement of spaces and hard returns within a FASTA file is critical for the FASTA information and sequence(s) to be read correctly:
- Sample FASTA files showing Definition Lines and sequences
>Seq1 [organism=Carpodacus mexicanus] [clone=6b] actin (act) mRNA, partial cds
CCTTTATCTAATCTTTGGAGCATGAGCTGGCATAGTTGGAACCGCCCTCAGCCTCCTCATCCGTGCAGAA
TAATAATTTTCTTTATAGTAATACCAATCATGATCGGTGGTTTCGGAAACTGACTAGTCCCACTCATAAT
>Seq2 [organism=uncultured bacillus sp.] [isolate=A2] corticotropin (CT) gene, complete cds
GGTAGGTACCGCCCTAAGNCTCCTAATCCGAGCAGAACTANGCCAACCCGGAGCCCTTCTGGGAGACGAC
TCAACACCACCTTCTTTGACCCAGCAGGAGGAGGAGACCCAGTACTATACCAGCACCTATTCTGATTCTT
>Seq3 [organism=Phalaenopsis equestris var. leucaspis]
CCTATACCTAATTTTCGGCGCATGAGCCGGAATGGTGGGTACCGCTCTAAGCCTCCTCATTCGAGCAGAA
CTAGGCCAACCCGGAGCCCTTCTGGGAGACGACCAAGTCTACAACGTGGTTGTCACGGCCCATGCCTTCG
>Seq9 [organism=Petunia integrifolia subsp. inflata]
TAGTTGGAACAGCCCTCAGCCTACTCATCCGAGCAGAACTAGGCCAACCCGGAACCCTCCTGGGAGATGA
CCAAATCTACAATGTAATCGTCACTGCCCATGCCTTCGTAATAATCTTCTTCATAGTAATACCAGTCATA
Sample nucleotide FASTA