Nucleotide BLAST (blastn) can help you detect possible poor-quality data at the ends of a sequence. In this article we provide steps for checking sequence from protein-coding genes. We include a step on using the CDS feature display on the BLAST search results pages. See the article on blastn and CDS feature set up. If checking non-coding sequences, skip the step on displaying the CDS feature.
To check for poor-quality data or other errors at the ends of a sequence:
The Query ends may contain poor-quality data or other errors if:
To remedy the problem, check your sequencing reads. Trim the sequence ends if you aren’t confident that the reads are correct.
Figure 1 illustrates a sequence with possible sequencing errors at its 5' end.
Figure 1: Query containing poor-quality sequence at its 5’end: Query aligns at position 24 (red rectangle) to a Subject at position 4840 (orange rectangle). 23 bases of the Query stay unaligned, even though Subject extends 4839 bases past the 5’ end of the alignment. There are several mismatches at the 5' end (blue rectangle), but not in the rest of the aligned bases between Subject and Query. Unaligned bases and mismatches together indicate possible sequencing errors at the 5' end of the Query sequence.