Protein Three-Dimensional Structure: Structural Levels, Motifs, Folds and Protein Databases

Short introduction
To understand the basic principles of protein three-dimensional structure and the potential of their use in various areas of research, academic or industrial - like pharmacological or biotech industries - we first need to look at the way the four levels of protein structure are related to each-other. The different structural levels depend on each-other, together creating an extremely complex network of interactions between hundreds and thousands of atoms packed into secondary, tertiary and quaternary structure according to certain rules. The first level of structure is considered to be the amino acid sequence. There are 20 amino acids most commonly found in proteins. The sequence of these amino acids in a polypeptide chain determines the folding of the chain, which types of secondary structure elements present and the way by which they will be arranged in space, creating structural motifs and folds. An independent folding unit of a three-dimensional protein structure is called a domain. It is independent because domains may often be cloned and expressed independently of the rest of the protein, and they may even show activity, if there is any known activity ascribed to them. Some proteins contain one single domain while others may contain several domains. A protein domain is assigned a certain type of fold. Domains with the same fold may or may not be related to each-other evolutionary, even though their three-dimensional structures are similar. This is simply because Nature appears to have re-used the same fold many times in a totally different context. The next level is the quaternary structure. The quaternary structure, sometimes called oligomeric structure, is consisted of several polypeptide units, similar or different. The subunits within such structures interact with each other, may contribute to an active site (or sites) present within the complex, contribute to the dynamics of the complex and may interact with some target proteins which are not directly part of the complex.

The currently available protein three-dimensional structures in the Protein Data Bank have been classified into more than 1000 different unique folds, and discussing all of them here is impossible. However, all we need to do is to understand the basic principles, the rest will be easier. Generally the relationship between the amino acid sequence and the tertiary structure is rather complex since a huge variations in the amino acid sequence can be tolerated within a particular type of a three-dimensional structure. By other words, tertiary (3D) structure has much higher degree of conservation than the amino acid sequence. In many cases the determination of the protein structure may help in revealing the function of a protein, allowed its assignment to a certain protein family. An interesting example from my own work was provided by the anaerobic cobaltochelatase, an enzyme active in vitamin B12 synthesis. Although in this case the function of the protein was known before the structure determination (Schubert et al., 1999), the similarity of the structure of ferrochelatase (Al-Karadaghi et al., 1997), an enzyme active in heme biosynthesis, could only be revealed after the structure determination of cobaltochelatase. The reason is that there is only 11% sequence identity between the two proteins, a number much smaller than the so-called "homology-threshold", normally considered in sequence alignment as an indication of homology (around 20-25%), a common evolutionary origin.

Jöns Jacob Berzelius (b. 1779), the most famous Swedish scientist, coined the word ”protein”


Jöns_Jakob_Berzelius

In the following pages of this chapter I will discuss the details and determinants of the different levels of protein structure. These may be decided into three groups:

Amino acid sequence, torsion angles, the Ramachanran plot and protein secondary structure
Here I briefly discuss the different classes of amino acids and their characteristic properties, the torsion angles, which essentially determine the
secondary structure elements and folding of the polypeptide chain. I will also discuss the importance of the Ramachandran plot in the assessment of the quality of protein structures.

Protein motifs, folds and domains
In this section I will discuss the next level of complexity in a protein structure, the arrangement of secondary structure elements into certain patterns in space, called
protein motifs or structural motifs. Motifs may be found in different protein families, although their presence does not require conservation of the amino acid sequence within the motif and evolutionary relationships between the proteins. I will also discuss protein folds and protein domains. Protein domains form the basis for protein structure classification. Although all newly determined protein structures are usually deposited with the Protein Data Bank (PDB), two separate databases, SCOP and CATH, are dedicated to fold classification.

Protein databases, overview of protein databank files
Here I will give an introduction to the
Protein Data Bank, where all known protein three-dimensional structures are deposited, as well as some other protein structure databases. From teaching experience I find that it is better to keep the initial number of databases discussed at a certain minimum to give the students the opportunity to get acquainted with the basic ideas and concepts. You may always find other resources on the Internet, when you know what you are looking for.

ID complex-extended_side2
An example of a quaternary protein structure. The figure shows the complex of two of the subunits of the enzyme magnesium chelatase. The structure was obtained using single-particle reconstruction from cryo-electron microscopic (cryo EM) images of the complex. Where appropriate, the available X-ray 3D structure of subunit BchI of the enzyme was docked into the EM density (shown in ribbon representation). Other domains where homology-modeled based on known 3D structures from other proteins. Published in Lunqvist et al, Structure 2010.

Basics of Protein Structure