Print

How do I use Nucleotide BLAST (blastn) and CDS feature display to determine if I have nucleotide insertions or deletions in my sequence that may cause reading frame shifts?

A sequence may contain insertions (extra nucleotides) or deletions (missing nucleotides) within a coding region (CDS). For example, after the insertion of a single nucleotide within a codon (coding triplet), the reading frame of all the following codons shift. All subsequent codons now contain wrong sets of three bases; the extra base causes a reading frameshift. The remaining CDS following the frameshift translates incorrectly. The protein may even contain internal stop codons.  Insertions and/or deleteions within a coding region (CDS) will cause annotation problems during GenBank submissions. 

To detect frameshifts in CDS, you can use Nucleotide BLAST (blastn). Utilize the CDS feature display on the BLAST search results page. See the article on blastn and CDS feature set up.

Follow these steps:

  • Perform a blastn search.
  • On the search result page, click the Alignments tab to view pairwise alignments.
  • Check the CDS feature box to display the CDS feature on the alignments.
  • To learn more about the display, see the article on interpreting pairwise alignments.
  • Select an alignment to view.
  • To identify frameshift(s) in the alignment, look for:
    • Nucleotide insertions or deletions in the CDS indicated with dashes in the alignment
    • Amino acid (AA) codes translated from the Query sequence that do not match the Subject codes. 
    • Staggered protein translation with Subject AA codes in pink after the frameshift
    • Internal stop codons in the Query translation indicated with asterisks. 

 Figure 1 illustrates a Query with a single missing base that causes a frameshift in CDS.


 

Figure 1: Alignment of 631bp Query to the MG999754.1 sequence. Query contains an extra base (red arrow) that causes a frameshift.  A dash (yellow arrow) in Subject indicates a gap in the alignment. The altered Query frame (underlined in red) is staggered and does not match that of Subject (underlined in yellow). The correct Subject translation is shown in pink AA codes. The Query translation after the frameshift is wrong. The wrong protein contains an internal stop codon, TAA, indicated with an asterisk (red rectangle). Two additional asterisks (blue arrows) located before the frameshift falsely indicate internal stop codons. They result from usage of the standard genetic code in Query translation instead of the correct invertebrate mitochondrial code.