Homology modeling also known as comparative protein modeling, refers to the construction of an atomic resolution model of the "target" protein of its amino acid sequence and an experimental three-dimensional structure of a related homologous protein (the "template" ). Homology modeling is based on the identification of one or more known protein structures that may resemble the structure of the query sequence and in the production of an alignment that correlates residues in the query sequence with residues in the template sequence . Protein structures have been shown to be more conserved than protein sequences between homologues, but sequences that fall below a sequence identity of 20% may have a very different structure.
Evolutionarily related proteins have similar sequences and naturally occurring homologous proteins have a similar protein structure. It has been shown that the three-dimensional structure of the protein is evolutionarily more conserved than would be expected on the basis of sequence conservation alone.
Sequence alignment and template structure are then used to produce a structural model of the target. Because protein structures are more conserved than DNA sequences, detectable levels of sequence similarity usually imply significant structural similarity.
The quality of the homology model depends on the quality of the sequence alignment and the template structure. The approach may be complicated by the presence of alignment gaps (commonly called indels) that indicate a structural region present in the target but not in the template, and by structural gaps in the template that arise from poor resolution in the experimental procedure usually X for the resolution of the structure The quality of the model decreases with decreasing sequence identity, a typical model has a mean square deviation of ~ 1-2 Å between the Cα atoms coinciding with a sequence identity of 70 % but only a 2-4 Å agreement with a 25% sequence identity. However, errors are significantly greater in the loop regions, where the amino acid sequences of the target and template proteins may be completely different.
The regions of the model that were constructed without a template, generally by loop modeling, are generally much less accurate than the rest of the model. Packaging errors and lateral chain positioning also increase with decreasing identity, and variations in these packing configurations have been suggested as one of the main reasons for poor model quality at low identity. Taken together, these various atomic position errors are significant and preclude the use of homology models for purposes requiring atomic resolution data, such as drug design and protein-protein interaction predictions; even the quaternary structure of a protein may be difficult to predict from homology models of its subunit (s). However, homology models may be useful in arriving at qualitative conclusions about the biochemistry of the query sequence, especially in formulating hypotheses about why certain residues are conserved, which in turn may lead to experiments to test those hypothesis. For example, the spatial arrangement of conserved residues may suggest whether a particular residue is retained to stabilize folding, to participate in the binding of some small molecule, or to promote association with another protein or nucleic acid.
Homology modeling can produce high quality structural models when the objective and template are closely related, which has inspired the formation of a structural genomics consortium dedicated to the production of representative experimental structures for all kinds of protein folds. The main inaccuracies in homology modeling, which worsen with a lower sequence identity, result from errors in initial sequence alignment and inadequate template selection. Like other methods of predicting structure, current practice in the homology model is evaluated in a large-scale biennial experiment known as Critical Evaluation of Protein Structure Prediction Techniques, or CASP.