Your GenBank submission may include nucleotide sequences from protein-coding genes. In BankIt, you must annotate the coding region (CDS) on such sequences. GenBank indexers can't verify your sequences without the correct CDS annotation.
BankIt will report annotation errors if you do any of the following:
- Enter incorrect CDS locations
- Select the wrong coding strand (for example the plus strand where it should be minus)
- Select a wrong reading frame (codon_start) for 5’ partial CDS
- Have a sequence with poor quality
You can avoid annotation problems if you analyze your sequences, such as with Nucleotide BLAST (blastn). The blastn search result page offers the CDS feature display. This option shows protein translations of coding regions. It can help you find the correct locations, reading frame, and strand for your CDS. In the same analysis, you can check for any sequencing errors. We recommend fixing the errors before you start working in BankIt.
See these two articles on how to:
- Set up Nucleotide BLAST (blastn) and the CDS feature display
- Interpret pairwise alignments with the CDS feature display
See these articles for blastn methods to determine CDS properties:
- CDS locations: prokaryotic/intronless genes
- CDS locations: eukaryotic genes (intron/exon structure)
- The coding strand (plus or minus)
- The reading frame for translation: for 5’ partial CDS
See these articles for blastn methods to check for sequencing errors in CDS:
- Sequence errors: frameshifts in CDS
- Sequence errors: wrong bases in CDS
- Sequence errors: poor quality ends
Try other BLAST tools if your blastn results are not adequate:
- blastx to check for frameshifts in CDS
- genomic blast for organisms with annotated assemblies