A nucleotide sequence may represent a 5’ (five prime) partial coding region (CDS). A 5’ partial CDS encodes a protein with an incomplete N-terminus. Such a nucleotide sequence can start with either the first nucleotide of a complete codon (coding triplet) or with an incomplete codon (lacking the first nucleotide or the first and the second nucleotides of the codon). Codon completion determines the reading frame for translating a 5’ partial CDS into protein. GenBank uses the term “codon_start” as a synonym for the reading frame.
Nucleotide BLAST (blastn) can help determine the correct reading frame of a 5’ partial CDS. Use the CDS feature display on the BLAST search results page. See the article on blastn and CDS feature set up.
To determine the reading frame for a 5’ partial CDS:
- Perform a blastn search with your sequence.
- On the search result page, click the Alignments tab to view pairwise alignments.
- Check the CDS feature box to display the CDS feature on the alignments.
- Select an alignment.
- Verify the following for the alignment:
- Subject has annotated coding region in the aligned region.
- Query (your sequence) aligns to Subject across its entire length.
- Query represents the coding strand (See the article on determining the coding strand for help.)
- Note the placement of the first amino acid (AA) code on Query from the 5’ end.
- To learn how to do the above checks, see the article on interpreting pairwise alignments.
You can determine the reading frame from the placements as follows:
AA code placed on the 2nd nucleotide: reading frame (codon_start) is 1
Explanation: BLAST places the single letter AA codes in the middle of the complete codons. In this case, nucleotides 1, 2, and 3 represent a complete codon. The translation therefore starts with nucleotide 1.
AA code placed on the 3rd nucleotide: reading frame (codon_start) is 2
Explanation: The translation skips the first base of the sequence to start at the first complete codon (nucleotides 2, 3, and 4).
AA code placed on the 4th nucleotide: reading frame (codon_start) is 3
Explanation: The translation skips the first two nucleotides of the sequence to start the first complete codon (bases 3, 4, and 5).
See Figures 1, 2, and 3 for examples of the three reading frames.
Figure 1: A pairwise BLAST alignment with the CDS feature display. Query aligns to Subject from base 1. Lack of initiation codon (ATG) indicates a 5’ partial CDS. The first complete codon (underlined in red) on Query are bases 1, 2, and 3 with the AA residue “L” in the middle of the codon. Query's reading frame is 1.
Figure 2: A pairwise BLAST alignment with the CDS feature display. Query aligns to Subject from base 1. Lack of initiation codon (ATG) indicates a 5’ partial CDS. The first complete codon (underlined in red) on Query are bases 2, 3, and 4 with the AA residue “A” in the middle of the codon. Query's reading frame is 2.
Figure 3: A pairwise BLAST alignment with the CDS feature display. Query aligns to Subject from base 1. Lack of initiation codon (ATG) indicates a 5’ partial CDS. The first complete codon (underlined in red) on Query are bases 3, 4, and 5 with the AA residue “G” in the middle of the codon. Query's reading frame is 3.