Print

How do I use Nucleotide BLAST (blastn) and the CDS feature display to determine the coding locations for eukaryotic genes (those with intron/exon structure)?

Nucleotide BLAST (blastn) can help you find coding regions (CDS) on your sequence. You can utilize the CDS feature display on the BLAST search results page. See the article on blastn and CDS feature set up

Many genes from eukaryotic genomes contain exons and introns. Only exons contribute to the coding region (CDS). To find exon locations on your sequence, follow these steps:

  • Perform a blastn search.
  • On the search result page, click the Alignments tab to view pairwise alignments.
  • Check the CDS feature box to display the CDS feature on the alignments.
  • Select an alignment to view.
  • Verify the following for the alignment:
    • Subject has annotated coding region in the aligned region.
    • Query (your sequence) aligns to Subject across its entire length.
    • The alignment has no gaps.
  • To learn how to verify the above items, see the article on interpreting pairwise alignments.
  • Click the GenBank link in the Range row above the alignment. The link will display only the aligned region of the Subject record.
  • Infer the CDS locations on Query from the FEATURES section in the Subject record. They will adjust to the alignment locations.

Any gaps in the alignment will affect CDS locations. Gaps within CDS may alter the reading frame. In addition, pay attention to the correct coding strand. See the article on determining the coding strand for more information.

Figures 1 and 2 below illustrate an example of the method.

Figure 1: A pairwise alignment of a 274 bp Query on the KC333362.1 (Subject) sequence. Query aligns in its entire length and the alignment is gapless.  The aligned region between the two sequences have one-to-one correspondence. The translation shows a partial CDS (the start and the stop codon are missing). The intronic sequence interrupts CDS. Strings of the tilde symbol (~~~~~) indicate the intronic sequence. CDS translation has three separate regions (three exons). The first and the third exons are partial (missing the 5’ and 3’ end respectively). The middle exon is complete. The GenBank link in the Range row (yellow rectangle) above the alignment (Range 1: 2651 to 2924 GenBank) displays the aligned part of the KC333362.1 record (locations 2651 to 2924).

Figure 2: An excerpt from the FEATURES section of the KC333362.1 record adjusted to the locations from the aligned region in Figure 1.  The join statement for both mRNA and the CDS that lists each of the CDS intervals. The carat symbols (<  >) inform that the sequence is 5’ and 3’ partial.