Protein Domains and Domain Classification



When discussing protein folds and evolution we first need to identify the folding unit. Such unit is a protein domain. This means that when we talk about fold classification we actually mean DOMAIN CLASSIFICATION. A domain is the basic building block of a protein structure, although many proteins only contain a single domain. Certain protein domains have some clearly defined function associated with them, like the Rossmann-type domain, also called coenzyme-binding domain, shown on the previous page. They “carry” this function with them when they get inserted into different proteins during evolution. A domain may be characterized by the following:

1- A spatially separated unit of the protein structure
2- May have sequence and/or structural resemblance to another protein structure or domain.
3- May have a specific function associated with it.

To characterize the folding of protein domains, we need to discuss the details of fold classification: How folds are defined by different databases, what are the relationships between a fold and the protein family, etc. For classification we need the following scheme:

1- Assignment of
secondary structure.
2- Assignment of independent folding units - domains.
3- Assignment of a structural class to each domain.
4- Assignment of fold (called Architecture in the CATH database).
5- Assignment of topology (superfamily)

Secondary structure is usually assigned automatically, using some specific computer software. For example, most of the protein structure visualization programs like SwissPDB Viewer, will do it, and usually all
PDB files contain secondary structure assignment (shown in a later section).
For information on the number of domains and their folding class we need to consult CATH or SCOP, two database dedicated to fold classification. Although one needs to be aware that CATH and SCOP use slightly different terminology in domain assignment. CATH comes from the first letters in Class-Architecture-Topology-Homologous superfamily. According to the latest information (end of September 2012) there are currently 173536 domains in CATH. For clarity I show below examples of two proteins, one is consisted of one domain, while the second has 3 domains. On the left is the structure of one of the subunits of hemoglobin and on the right is the structure of pyruvate kinase:

hemoglobinPyruvate kinase

Interestingly, the functional units of both proteins consists of 4 subunits, by other words they have a quaternary structure. In the case of hemoglobin this will make 4 domains, while for pyruvate kinase there will be 12 protein domains in the functional unit. A subunit of hemoglobin consists of a single alpha-helical type domain. You may also see the heme molecule (in sticks representation) bound within a pocket created by the α-helices.
The domains in pyruvate kinase are well separated from each other. The top domain on the figure is built up by beta-sheets, while the other two domains contain a mixture of helices and strands. For illustration, the figure below shows the quaternary structure of pyruvate kinase (right) and hemoglobin (left):


HaemoglobinPyrovate-Kinase 3D structure

Protein domains may be assigned using automatic procedures, often in combination with manual inspection. In pyruvate kinase, for example, the domains are well separated from each other, but in many cases it may be difficult to separate them visually for an untrained person. In such cases the easiest would be to consult the CATH database, which gives a clear definition of the domains. For example, when I perform a search with the PDB ID I am using (1e0t) for pyruvate kinase, I get the following result:

Pyrovate-kinase-CATH

From the table above we can see that there are 4 chains designated A, B, C, D (4 identical subunits in the quaternary structure). For example, in 1e0tA01 first comes the PDB entry code, followed by the chain identification (A) and the domain number (01). You may notice that there are in total 3 domains in each chain: 01, 02 and 03. Small icons with CATH generated ribbon representation are also shown in the column on the right.
If we click on one of the IDs, for example the first one for domain 1, we get information about its classification -
Class: Alpha Beta, Architecture: 2-Layer(ABA) Sandwich, Topology: Pyruvate Kinase. This information is highly valuable in homology modeling, especially in cases when we need to model different domains using different modeling templates, the so called multi-template homology modeling. Click on the images and try it yourself!

CATH-search-Pyruvate-Kinase-2

In the next section I will make a short overview of some protein databases, which we are going to use later in the homology modeling project.

Basics of Protein Structure