Protein Folds and Protein Fold Classification
Before going into the details of protein fold classification one could ask: Why do we need to bother about this? In general terms fold assignment is one of the basic moments in any effort towards understanding protein structure and function. Protein fold assignment will often reveal evolutionary relationships, which sometimes are difficult to detect at a sequence level. In turn, this may help in a better understanding of protein function, its biological activity and role in a living organisms. From the study of the relationships between sequence and structure we may also get deeper insights into the basic principles of protein structure and function, we may learn how to design new proteins with pre-defined activity, how to modify existing proteins in a direction we need, etc.
I mentioned in the introduction to this chapter that the amino acid sequence determines the protein three-dimensional structure. I also mentioned that this relationship is not unique: different sequences, sometimes totally unrelated, may have similar 3D structures. By other words, the degree of conservation of the three-dimensional structure is much higher than the degree of conservation of the amino acid sequence. I discussed earlier sequence conservation and the methods we use for the assessment of the similarity between two sequences. But how do we compare protein 3D structures to find out if they are similar? And how different similar protein structures can be? Obviously, there should be some criteria, which can be used to judge the degree of similarity between protein structures, an important step in any protein fold assignment. Some discussion on this subject may be found in the homology modeling chapter.
For now I would like to switch from the general term "3D structure" to a more specific term "fold". Three-dimensional structures sometimes may differ substantially from each other, at the sequence and even and at the structural level, but still have the same type of fold. I have noticed that sometimes students have difficulties understanding what fold is. A simple definition would be that a fold is a certain way of arrangement of secondary structure elements in space. I have actually mentioned some folds in the previous section on super-secondary structures (protein motif). The 4-helix bundle, for example, is a fold. Or the TIM barrel fold of alternating helices and strands. In the figures below shown is the coenzyme-binding domain of some dehydrogenases. one of the most common protein folds, also called the Rossman fold. Michael G. Rossmann is a protein crystallographer who solved the first structure with this type of fold. It is also the only protein fold named after the person who was first to discover it:
In this figure on the left a schematic presentation of the Rossmann fold. On the right the nucleotide binding domain of liver alcohol dehydrogenase is shown. Notice the parallel beta-sheet (shown in yellow).
Growth Of Unique Protein Classifications Per Year
• As Folds Defined By SCOP
• As Topologies Defined By CATH
SCOP and CATH are the two databases generally accepted as the two main authorities in the world of fold classification. According to SCOP there are 1393 different folds. Also notice the graph, the last time a new fold was identified was 2008:
The next graph shows the folds identified by CATH database, a total of 1282 folds: http://www.pdb.org/pdb/statistics/contentGrowthChart.do?content=fold-cath
Apparently the two databases use slightly different fold definitions and protein fold classification, which results in different total numbers of protein folds. It is also interesting to note that during the recent years essentially no new folds have emerged. Have we reached the limit? I don't know for sure, but I think there is still a chance that some new folds will be found.
Since many proteins contain different domains with different 3D structure, one could ask: What is actually being classified by these databases? The answer is the "simplest", or sometimes called "independent" folding unit in a protein - a domain. Knowing the protein fold is important in many cases, for example during homology modeling. In this case we need to have a clear idea about the fold of the protein, and if it contains several domains with different folds, we need to know the fold of each of the domains to be able to model them properly. But before that we will have a brief discussion on how to define a domain.