Protein database searches using compositionally adjusted substitution matrices
- PMID: 16218944
- PMCID: PMC1343503
- DOI: 10.1111/j.1742-4658.2005.04945.x
Protein database searches using compositionally adjusted substitution matrices
Abstract
Almost all protein database search methods use amino acid substitution matrices for scoring, optimizing, and assessing the statistical significance of sequence alignments. Much care and effort has therefore gone into constructing substitution matrices, and the quality of search results can depend strongly upon the choice of the proper matrix. A long-standing problem has been the comparison of sequences with biased amino acid compositions, for which standard substitution matrices are not optimal. To address this problem, we have recently developed a general procedure for transforming a standard matrix into one appropriate for the comparison of two sequences with arbitrary, and possibly differing compositions. Such adjusted matrices yield, on average, improved alignments and alignment scores when applied to the comparison of proteins with markedly biased compositions. Here we review the application of compositionally adjusted matrices and consider whether they may also be applied fruitfully to general purpose protein sequence database searches, in which related sequence pairs do not necessarily have strong compositional biases. Although it is not advisable to apply compositional adjustment indiscriminately, we describe several simple criteria under which invoking such adjustment is on average beneficial. In a typical database search, at least one of these criteria is satisfied by over half the related sequence pairs. Compositional substitution matrix adjustment is now available in NCBI's protein-protein version of blast.
Figures
Comment in
-
Identifying protein interactions.FEBS J. 2005 Oct;272(20):5099-100. doi: 10.1111/j.1742-4658.2005.04944.x. FEBS J. 2005. PMID: 16218943 No abstract available.
Similar articles
-
The construction of amino acid substitution matrices for the comparison of proteins with non-standard compositions.Bioinformatics. 2005 Apr 1;21(7):902-11. doi: 10.1093/bioinformatics/bti070. Epub 2004 Oct 27. Bioinformatics. 2005. PMID: 15509610
-
The compositional adjustment of amino acid substitution matrices.Proc Natl Acad Sci U S A. 2003 Dec 23;100(26):15688-93. doi: 10.1073/pnas.2533904100. Epub 2003 Dec 8. Proc Natl Acad Sci U S A. 2003. PMID: 14663142 Free PMC article.
-
OXBench: a benchmark for evaluation of protein multiple sequence alignment accuracy.BMC Bioinformatics. 2003 Oct 10;4:47. doi: 10.1186/1471-2105-4-47. BMC Bioinformatics. 2003. PMID: 14552658 Free PMC article.
-
Substitution scoring matrices for proteins - An overview.Protein Sci. 2020 Nov;29(11):2150-2163. doi: 10.1002/pro.3954. Epub 2020 Oct 12. Protein Sci. 2020. PMID: 32954566 Free PMC article. Review.
-
Identifying remote protein homologs by network propagation.FEBS J. 2005 Oct;272(20):5119-28. doi: 10.1111/j.1742-4658.2005.04947.x. FEBS J. 2005. PMID: 16218946 Review.
Cited by
-
Biocatalytic sulfation of aromatic and aliphatic alcohols catalyzed by arylsulfate sulfotransferases.Appl Microbiol Biotechnol. 2024 Nov 19;108(1):520. doi: 10.1007/s00253-024-13354-5. Appl Microbiol Biotechnol. 2024. PMID: 39560778 Free PMC article.
-
Exploring the Siderophore Portfolio for Mass Spectrometry-Based Diagnosis of Scedosporiosis and Lomentosporiosis.ACS Omega. 2024 Oct 23;9(44):44815-44824. doi: 10.1021/acsomega.4c08257. eCollection 2024 Nov 5. ACS Omega. 2024. PMID: 39524635 Free PMC article.
-
Expression and characterization of pantothenate energy-coupling factor transporters as an anti-infective drug target.Protein Sci. 2024 Nov;33(11):e5195. doi: 10.1002/pro.5195. Protein Sci. 2024. PMID: 39473025 Free PMC article.
-
Structure and dimerization properties of the plant-specific copper chaperone CCH.Sci Rep. 2024 Aug 17;14(1):19099. doi: 10.1038/s41598-024-69532-y. Sci Rep. 2024. PMID: 39154065 Free PMC article.
-
Characterization and transmission of plasmid-mediated multidrug resistance in foodborne Vibrio parahaemolyticus.Front Microbiol. 2024 Jul 31;15:1437660. doi: 10.3389/fmicb.2024.1437660. eCollection 2024. Front Microbiol. 2024. PMID: 39144225 Free PMC article.
References
-
- Needleman SB, Wunsch CD. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol. 1970;48:443–53. - PubMed
-
- McLachlan AD. Tests for comparing related amino-acid sequences. Cytochrome c and cytochrome c 551. J Mol Biol. 1971;61:409–24. - PubMed
-
- Dayhoff, M. O., Schwartz, R. M. & Orcutt, B. C. (1978) A model of evolutionary change in proteins in Atlas of Protein Sequence and Structure (Dayhoff, M. O., ed) pp. 345–52, Natl Biomed Res Found, Washington, DC.
-
- Schwartz, R. M. & Dayhoff, M. O. (1978) Matrices for detecting distant relationships in Atlas of Protein Sequence and Structure (Dayhoff, M. O., ed) pp. 353–58, Natl Biomed Res Found, Washington, DC.
-
- Feng DF, Johnson MS, Doolittle RF. Aligning amino acid sequences: comparison of commonly used methods. J Mol Evol. 1984;21:112–25. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials