Protein folds and protein fold classification
Before going into the details of protein fold classification one could ask: Why do we need to bother about this? In general terms fold assignment is one of the basic moments in any effort towards understanding protein structure and function. Protein fold assignment will often reveal evolutionary relationships, which sometimes are difficult to detect at a sequence level. In turn this may help in a better understanding of protein function, its biological activity and role in a living organisms. From the study of the relationships between sequence and structure we may also get deeper insights into the basic principles of protein structure and function, we may learn how to design new proteins with pre-defined activity, how to modify existing proteins in a direction we need, etc.
I mentioned in the introduction to this chapter that the amino acid sequence determines the protein three-dimensional structure. I also mentioned that this relationship is not unique: different sequences, sometimes totally unrelated, may have similar 3D structures. By other words, the degree of conservation of the three-dimensional structure is much higher than the degree of conservation of the amino acid sequence. I discussed earlier sequence conservation and the methods we use for the assessment of the similarity between two sequences. But how do we compare protein 3D structures to find out if they are similar? And how different similar protein structures can be? Obviously, there should be some criteria, which can be used to judge the degree of similarity between protein structures, an important step in any protein fold assignment. A discussion on this subject will appear shortly in the homology modeling chapter. For now I would like to switch from talking about 3D structures and start talking about folds. Three-dimensional structures sometimes may differ substantially from each other, and still have the same type of protein fold. I have noticed that sometimes students have difficulties understanding what fold is. I would define a protein fold as a certain type of arrangement of secondary structure elements in space. I have actually already mentioned some folds in the previous page on super-secondary structures (protein motif). The 4-helix bundle, for example, is a fold. Or the TIM barrel fold of alternating helices and strands. In the figures below shown is the coenzyme-binding domain of some dehydrogenases. one of the most common protein folds, also called the Rossman fold. Michael G. Rossmann is a protein crystallographer who solved the first structure with this type of fold. It is also the only protein fold named after the person who was first to discover it:


In this figure on the left a schematic presentation of the Rossmann fold. On the right the nucleotide binding domain of liver alcohol dehydrogenase is shown Notice the parallel beta-sheet (shown in yellow).
There are many types of protein folds of course, but how many? Taking into account the huge number of amino acid sequences, one would expect a high number of different folds. But in reality it is not like that. The number of folds is limited. Nature has re-used the same folding types again and again for performing totally new functions. Some people would refer to the common ancestor, from which all other organisms have originated. However, I am not going to discuss this now, may be sometime in the future. To find out how many folds are out there we can simply go to the Protein Databank (PDB) and click the PDB Statistics on the right upper corner. At the end of the page which will appear you may click one of the following two:
Growth Of Unique Protein Classifications Per Year
• As Folds Defined By SCOP
• As Topologies Defined By CATH
SCOP and CATH are the two databases generally accepted as the two main authorities in the world of fold classification. According to SCOP there are 1393 different folds. Also notice the graph, the last time a new fold was identified was 2008:
Growth Of Unique Protein Classifications Per Year
• As Folds Defined By SCOP
• As Topologies Defined By CATH
SCOP and CATH are the two databases generally accepted as the two main authorities in the world of fold classification. According to SCOP there are 1393 different folds. Also notice the graph, the last time a new fold was identified was 2008:

The next graph shows the folds identified by CATH database, a total of 1233 folds:

Apparently the two databases use slightly different fold definitions and protein fold classification, which results in a different number of protein folds for the same amount of protein structures. In any case, as mentioned in the outline of this chapter, knowing the protein fold is important in many cases, for example during homology modeling of a protein structure. The question now is: How do we identify a protein fold? What is the main folding unit? It is a domain, discussed on the next page.