Protein Domains and protein domain classification, the CATH database
When discussing protein fold we first need to identify the folding unit. Such unit is called a protein domain. This means that when we talk about fold classification we actually mean DOMAIN CLASSIFICATION. Protein domains are the basic building blocks of a protein structure and they are also the basic evolutionary unit of any structure. Of course a protein may consist of a single domain or may be a multi-domain protein. Certain protein domains have some function associated with them, like the Rossmann-type domain, also called coenzyme-binding domain, shown on the previous page. They “carry” this function with them when they get inserted into different proteins during evolution. A domain may be characterized by the following:
1- A spatially separated unit of the protein structure
2- May have sequence and/or structural resemblance to another protein structure or domain.
3- May have a specific function associated with it.
To characterize the folding of protein domains, we need to discuss the details of fold classification: How folds are defined by different databases, what are the relationships between a fold and the protein family, etc. For classification we usually need to follow the scheme:
1- Assignment of secondary structure.
2- Assignment of independent folding units: Domains.
3- Assignment of a structural class.
4- Assignment of fold (also called architecture).
5- Assignment of topology (superfamily)
Secondary structure is usually assigned automatically, using some specific computer programs. For example, most of the protein structure visualization programs will do it, and usually all PDB files contain secondary structure assignment.
For information on the fold of a protein domain we simply need to consult the CATH and SCOP databases. Although one needs to be aware that CATH and SCOP use slightly different terminology in domain assignment. CATH comes from the first letters in Class-Architecture-Topology-Homologous superfamily. For clarity I show below some examples of proteins consisted of one or several domains:
On the left is the structure of one of the subunits of hemoglobin and on the right is the structure of pyruvate kinase. The functional units of both proteins consist of 4 subunits, by other words they have a quaternary structure. In the case of hemoglobin this will make 4 domains, while for pyruvate kinase there will be 12 protein domains in the functional unit. A subunit of hemoglobin consists of a single alpha-helical type domain. You may also see the heme molecule (in sticks representation) bound within a pocket created by the helices.
The domains in pyruvate kinase are well separated from each other. The top domain on the figure is built up by beta-sheets, while the other two domains contain a mixture of helices and sheets.
For illustration, the figure below shows the quaternary structure of pyruvate kinase:
The domains in pyruvate kinase are well separated from each other. The top domain on the figure is built up by beta-sheets, while the other two domains contain a mixture of helices and sheets.
For illustration, the figure below shows the quaternary structure of pyruvate kinase:
Protein domains may be assigned using automatic procedures, often in combination with manual inspection. In pyruvate kinase, for example, the domains are well separated from each other, but in many cases it may be difficult to separate them visually for an untrained person. In such cases the easiest would be to consult the CATH database, which gives a clear definition of the domains. For example, when I perform a search with the PDB ID I am using (1e0t) for pyruvate kinase, I get the following result:
The protein domains are organized in rows in the table above. There are 4 subunits (4 separate polypeptide chains) in the quaternary structure, and that is why we see in the Table above the designations, called chain identifiers, A, B, C, D. For example, in 1e0tA01we first have the PDB entry code, followed be the chain identification (A) and the domain number (01), as it is numbered by the database. You may also notice that there are 3 domains: 01, 02 and 03. A CATH generated ribbon representation of the structure of the 3 protein domains, is shown below:
There is also a table telling us which amino acid residues each domain is consisted of (start PDB residue-stop PDB residue), and schematic presentation of domain composition. This information is very valuable, for example when you make a homology model of a multidomain protein.
To keep this page length within reasonable Web-limits, I will continue of protein domains later to show some interesting examples. As usual, you may always go back to the outline of the protein structures chapter, if you want to jump to some other pages.