HOMOLOGY MODELING

The most successful protein structure prediction method to date is homology modeling (also known as comparative modeling). The approach is based on the structural conservation of the framework regions between the members of a protein family. Since the 3D structures are more conserved in evolution than sequence, even the best sequence alignment methods frequently fail to correctly identify the regions that possess the desired level of structural similarity, and the quality of alignment is the single most important factor determining the accuracy of the 3D model.

TEMPLATE-BASED MODELING FOR PROTEIN-PROTEIN INTERACTIONS

A relatively small number of protein complexes are available in the Protein Data Bank (PDB) compared to their individually crystallized components. As the prediction of protein complexes remains a challenging and active field, there is a continued need for docking and modeling tools that offer high quality predictions for a variety of target complexes. Strategies for predicting protein complexes typically fall into two categories: free docking and template‐based modeling.

A general comparison between template-based and free docking approaches to the prediction of an example heterodimer target. The template based method in (a) begins with the target sequence, which is used to perform a template search in hopes of identifying an existing heterodimer template from the PDB. Homology modeling is used to effectively graft the target sequence onto the structure of the template (shown in red and blue). The final homology model is shown in cyan and magenta. The free docking method in (b) starts with the individually crystallized component proteins. After evaluating billions of protein conformations using an FFT-based algorithm, final models of the heterodimer are ranked and minimized. Figure from K. Porter and I. Desta et. al (2019)

When good templates are available, homology modeling provides higher accuracy predictions than free docking. However, when no templates are available, free docking methodologies are necessary. To complement ClusPro free docking capabilities, we have developed ClusPro TBM, which allows users to input target sequences for both homomeric and heteromeric complexes. A database of template structures is searched and those that meet the given target stoichiometric criteria are modeled to offer users multiple potential structures for their desired targets. The approach was first implemented and tested during the CASP13/CAPRI competition, and performed well compared to other template-based modeling approaches. The addition of ClusPro TBM greatly expands the capabilities of ClusPro to produce accurate predictions when only the sequence of a structure is known. As such, we continue to explore methods for identifying and scoring candidate templates, as well as the improvement of docking of homology models, which is necessary when no template complex is available.