Protein Domains and Domain Classification
When discussing protein folds and evolution we first need to identify the folding unit. Such unit is a protein domain. This means that when we talk about fold classification we actually mean DOMAIN CLASSIFICATION. A domain is the basic building block of a protein structure, although many proteins only contain a single domain. Certain protein domains have some clearly defined function associated with them, like the Rossmann-type domain, also called coenzyme-binding domain, shown on the previous page. They “carry” this function with them when they get inserted into different proteins during evolution. A domain may be characterized by the following:
1- A spatially separated unit of the protein structure
2- May have sequence and/or structural resemblance to another protein structure or domain.
3- May have a specific function associated with it.
To characterize the folding of protein domains, we need to discuss the details of fold classification: How folds are defined by different databases, what are the relationships between a fold and the protein family, etc. For classification we need the following scheme:
1- Assignment of secondary structure.
2- Assignment of independent folding units - domains.
3- Assignment of a structural class to each domain.
4- Assignment of fold (called Architecture in the CATH database).
5- Assignment of topology (superfamily)
Secondary structure is usually assigned automatically, using some specific computer software. For example, most of the protein structure visualization programs like SwissPDB Viewer, will do it, and usually all PDB files contain secondary structure assignment (shown in a later section).
For information on the number of domains and their folding class we need to consult CATH or SCOP, two database dedicated to fold classification. Although one needs to be aware that CATH and SCOP use slightly different terminology in domain assignment. CATH comes from the first letters in Class-Architecture-Topology-Homologous superfamily. According to the latest information (end of September 2012) there are currently 173536 domains in CATH. For clarity I show below examples of two proteins, one is consisted of one domain, while the second has 3 domains. On the left is the structure of one of the subunits of hemoglobin and on the right is the structure of pyruvate kinase:


The domains in pyruvate kinase are well separated from each other. The top domain on the figure is built up by beta-sheets, while the other two domains contain a mixture of helices and strands. For illustration, the figure below shows the quaternary structure of pyruvate kinase (right) and hemoglobin (left):


From the table above we can see that there are 4 chains designated A, B, C, D (4 identical subunits in the quaternary structure). For example, in 1e0tA01 first comes the PDB entry code, followed by the chain identification (A) and the domain number (01). You may notice that there are in total 3 domains in each chain: 01, 02 and 03. Small icons with CATH generated ribbon representation are also shown in the column on the right.
If we click on one of the IDs, for example the first one for domain 1, we get information about its classification - Class: Alpha Beta, Architecture: 2-Layer(ABA) Sandwich, Topology: Pyruvate Kinase. This information is highly valuable in homology modeling, especially in cases when we need to model different domains using different modeling templates, the so called multi-template homology modeling. Click on the images and try it yourself!
In the next section I will make a short overview of some protein databases, which we are going to use later in the homology modeling project.

