WGS Example Files
Remember that the columns in a .tbl file must be tab-delimited. If the samples in which the complete sequence is included do not work, check that tabs separate the columns, not spaces.
- Simple file
- Multiple sequences in a file
- Partial coding regions
- Features on the complementary strand
Simple file
sample.fsa
>Cont54 [organism=Homo sapiens] [chromosome=5]
acaagcgctgctgtcgatgcaaactttagcttttaaacaagtgcaaacgcacgctgtctc
acatgataacacacattatcagaatactttccatgcaatatgaaaccatagcaagctacg
....
sample.tbl
>Features Cont54
10400 12512 gene
locus_tag CCC_03116
10400 10462 mRNA
10533 10577
10651 11098
11182 11642
11716 12512
product aldolase
protein_id gnl|dbname|CCC_03116
transcript_id gnl|dbname|mrna.CCC_03116
10450 10462 CDS
10533 10577
10651 11098
11182 11642
11716 12233
product aldolase
protein_id gnl|dbname|CCC_03116
transcript_id gnl|dbname|mrna.CCC_03116
inference profile:Genscan:2.0
15801 17688 gene
locus_tag CCC_03118
15801 16607 mRNA
16750 17688
product hypothetical protein
protein_id gnl|dbname|CCC_03118
transcript_id gnl|dbname|mrna.CCC_03118
15840 16607 CDS
16750 17610
product hypothetical protein
protein_id gnl|dbname|CCC_03118
transcript_id gnl|dbname|mrna.CCC_03118
inference similar to RNA sequence, mRNA:INSD:AY123456.2
Here is the definition line of the flatfile view of the final record made
with these files:
DEFINITION Homo sapiens chromosome 5 Cont54, whole genome shotgun sequence.
Files for multiple sequences
multiple.fsa
>Cont348.225 [organism=Helicobacter pylori ABC1] [strain=ABC1] [host=Homo sapiens] [isolation-source=blood]
TTGAAGCAAGGCATTAGGCGAACCACTGCCTCTCTTTTACCTTCTTTTTTTTCCACCATTATTACTTTACTTTACATACGTTTAGGATCTGG
CGAGCAGCCCAGGCGAGTGTTTTGTAGTTTTCTCGGGGCTGCCTTTTTTTCTCTCTGTGGATGTGTGTGTGGGTATGGGCTGTATTTTCCTG
>Cont442.125 [organism=Helicobacter pylori ABC1] [strain=ABC1] [host=Homo sapiens] [isolation-source=blood]
TTGAAGCAAGGCATTAGGCGAACCACTGCCTCTCTTTTACCTTCTTTTTTTTCCACCATTATTACTTTACTTTACATACGTTTAGGATCTGG
CGAGCAGCCCAGGCGAGTGTTTTGTAGTTTTCTCGGGGCTGCCTTTTTTTCTCTCTGTGGATGTGTGTGTGGGTATGGGCTGTATTTTCCTG
multiple.tbl
>Features Cont348.225
11 109 gene
locus_tag HPC_002564
gene cheA
11 109 CDS
product CheA
protein_id gnl|dbname|HPC_002564
>Features Cont442.125
15 113 gene
locus_tag HPC_003020
15 113 CDS
product TPR repeat-containing protein
protein_id gnl|dbname|HPC_003020
experiment Northern blot
Partial coding region
The first coding region is partial at the 5' end and nucleotide 3 is the beginning of the first complete codon. Therefore, " < " indicates 5' partial, and codon_start "3" indicates the start of the first codon.
The second coding region is partial at the 3' end, so " > " is used to indicate 3' partial.
partial.fsa
>Cont3 [organism=Mus musculus] [strain=BALB/c] [chromosome=2]
TGcaaagtGGAATTCCAATTTCAACACCAGTTTTTGATGGCGCAAAAGAGCAAGATGTAACAAATATGTTAGAGCTTGCATCATTACCAAAATCTGG
TCAAACAAAATTGTGGGATGGTAGAACAGGTGAAAAATTTGATAGAGAAGTCACAGTTGGCACTATTTATATGTTAAAATTACACCATCTTGTAGAA
GATAAAATACACGCAAGATCTACAGGTCCTTATAGTTTAGTTACACAACAACCTCTTGGTGGTAAGGCTCAATTGGGAGGTCAACGATTTGGAGAAA
TGGAAGTTTGGGCTCTGGAAGCTTATGGGGCTTCTTATACTTTACAAGAAATTTTAACAGTAAAATCTGATGATGTTGCTGGTAGAGTTAAAGTTTA
TGAAACAATAGTAAAAGGTGAAGAGAATTTCGAGTCAGGAATACCTGAGTCATTTAATGTTTTAGTAAAAGAAATCAAAGCGCTAGCTCTTAATGTG
GAGTTAAATTAAAATGAAAAAAGATATTAAAGATTTTTTTAAAGAAACTGCCATATCAGACTCTCAAAATTTTAATAGTATTAAAATTACTTTAGCA
AGCCCTGAAAAGATAAAGTCATGGACTTATGGAGAAATAAAAAAACCCGAAACTATTAATTATAGAACTTTCAGACCTGAAAAAGACGGCCTATTTT
GTGCGAGAATATTTGGTCCAATAAAAGATTACGAATGTTTATGTGGAAAATATAAAAGAATGAAGTTCAGAGGAATTATTTGTGAGAAGTGTGGCGT
AGAGGTTACTAAATCAAATGTTCGTAGAGAAAGAATGGGGCACATCAATTTATCAACCCCAGTTGCACATATTTGGTTTTTAAAATCTTTACCAAGT
AGAATTTCACTAGCTATTGATATGAAGCTTAAAGAGGTTGAAAGAGTTCTATACTTTGAAAGTTTTATTGTTATAGAGCCTGGATTAACTAGTCTTA
AAAAAAATCAACTTTTAAACGAAGATGAATTAAATAAATATCAAGAGGAGTTTGGTGAAGAATCCTTTACTGCAGGAATAGGAGCAGAGGCGATACT
AGAGATTTTAAAATCTATAGACTTGAATAAAGAGAGAGAAATTTTATTAAAAAATATAAATGAGACAAAATCAAAGGTTGCTGAAGAAAGATCTATA
AAAAGATTAAAACTGATCGATTCATTTATTGAAACTGGTAACAAACCAGAATGGATGATTTTAACTACTATACCTGTAATACCACCAGAGTTAAGGC
CACTTGTTCCTCTAGATGGAGGTAGATTTGCAACATCAGATCTAAACGATTTGTATAGAAGAGTTATAAATAGAAATAATAGATTGAAAAGATTAAT
GGATCTTAAAGCTCCAGATATAATTATTAGAAATGAAAAACGAATGTTGCAAGAGTCAGTGGATGCTTTATTCGATAATGGCAGAAGAGGCAGAGTA
ATTACAGGAACTGGTAAACGTCCATTAAAATCTTTGGCTGAAATGCTTAAAGGAaaacaaG
partial.tbl
>Feature Cont3
<1 >497 gene
locus_tag KCS_111011
<1 497 CDS
note similar to Bacillus subtilis aldolase
product aldolase-like protein
codon_start 3
protein_id gnl|dbname|KCS_111011
transcript_id gnl|dbname|mrna.KCS_111011
<1 >497 mRNA
product aldolase-like protein
protein_id gnl|dbname|KCS_111011
transcript_id gnl|dbname|mrna.KCS_111011
<499 >1516 gene
locus_tag KCS_111012
499 >1516 CDS
product actin-like protein
protein_id gnl|dbname|KCS_111012
transcript_id gnl|dbname|mrna.KCS_111012
<499 >1516 mRNA
product actin-like protein
protein_id gnl|dbname|KCS_111012
transcript_id gnl|dbname|mrna.KCS_111012
Features on the complementary strand
Both genes are on the minus strand. The first CDS begins at nt1018 and is 3' partial. The second CDS is partial at its 5' end, at the end of the sequence at nt1516, and ends at nt1020. The first complete codon begins at nt1514, so it has codon_start=3.
complementary.fsa
>AMCont1022 [organism=Burkholderia terrae DEF2] [strain=DEF2] [isolation-source=soil in forest] [geo_loc_name=USA: Maryland]
CTTGTTTTCCTTTAAGCATTTCAGCCAAAGATTTTAATGGACGTTTACCAGTTCCTGTAATTACTCTGCC
TCTTCTGCCATTATCGAATAAAGCATCCACTGACTCTTGCAACATTCGTTTTTCATTTCTAATAATTATA
TCTGGAGCTTTAAGATCCATTAATCTTTTCAATCTATTATTTCTATTTATAACTCTTCTATACAAATCGT
TTAGATCTGATGTTGCAAATCTACCTCCATCTAGAGGAACAAGTGGCCTTAACTCTGGTGGTATTACAGG
TATAGTAGTTAAAATCATCCATTCTGGTTTGTTACCAGTTTCAATAAATGAATCGATCAGTTTTAATCTT
TTTATAGATCTTTCTTCAGCAACCTTTGATTTTGTCTCATTTATATTTTTTAATAAAATTTCTCTCTCTT
TATTCAAGTCTATAGATTTTAAAATCTCTAGTATCGCCTCTGCTCCTATTCCTGCAGTAAAGGATTCTTC
ACCAAACTCCTCTTGATATTTATTTAATTCATCTTCGTTTAAAAGTTGATTTTTTTTAAGACTAGTTAAT
CCAGGCTCTATAACAATAAAACTTTCAAAGTATAGAACTCTTTCAACCTCTTTAAGCTTCATATCAATAG
CTAGTGAAATTCTACTTGGTAAAGATTTTAAAAACCAAATATGTGCAACTGGGGTTGATAAATTGATGTG
CCCCATTCTTTCTCTACGAACATTTGATTTAGTAACCTCTACGCCACACTTCTCACAAATAATTCCTCTG
AACTTCATTCTTTTATATTTTCCACATAAACATTCGTAATCTTTTATTGGACCAAATATTCTCGCACAAA
ATAGGCCGTCTTTTTCAGGTCTGAAAGTTCTATAATTAATAGTTTCGGGTTTTTTTATTTCTCCATAAGT
CCATGACTTTATCTTTTCAGGGCTTGCTAAAGTAATTTTAATACTATTAAAATTTTGAGAGTCTGATATG
GCAGTTTCTTTAAAAAAATCTTTAATATCTTTTTTCATTTTAATTTAACTCCACATTAAGAGCTAGCGCT
TTGATTTCTTTTACTAAAACATTAAATGACTCAGGTATTCCTGACTCGAAATTCTCTTCACCTTTTACTA
TTGTTTCATAAACTTTAACTCTACCAGCAACATCATCAGATTTTACTGTTAAAATTTCTTGTAAAGTATA
AGAAGCCCCATAAGCTTCCAGAGCCCAAACTTCCATTTCTCCAAATCGTTGACCTCCCAATTGAGCCTTA
CCACCAAGAGGTTGTTGTGTAACTAAACTATAAGGACCTGTAGATCTTGCGTGTATTTTATCTTCTACAA
GATGGTGTAATTTTAACATATAAATAGTGCCAACTGTGACTTCTCTATCAAATTTTTCACCTGTTCTACC
ATCCCACAATTTTGTTTGACCAGATTTTGGTAATGATGCAAGCTCTAACATATTTGTTACATCTTGCTCT
TTTGCGCCATCAAAAACTGGTGTTGAAATTGGAATTCCACTTTGCA
complementary.tbl
>Feature AMCont1022
1018 >1 gene
locus_tag AMt_11123
1018 >1 CDS
product hypothetical protein
protein_id gnl|dbname|AMt_11123
<1516 1020 gene
locus_tag AMt_11124
<1516 1020 CDS
product oxidase
codon_start 3
protein_id gnl|dbname|AMt_11124
Genome Resources
- About WGS
- WGS Browser
- Genome Submission Guide
- Genome Submission Portal
- Update Genome Records
- FAQ
- table2asn
- Submitting Multiple Haplotype Assemblies
- Create Submission Template
- Eukaryotic Annotation Guide
- Prokaryotic Annotation Guide
- Annotation Example Files
- Annotating Genomes with GFF3 or GTF files
- Validation Error Explanations for Genomes
- Discrepancy Report
- NCBI Prokaryotic Genome Annotation Pipeline
- AGP Format
- Metagenome Submission Guide
- Structured Comment
- BioProject
- BioSample