Protein sequence, sequence alignment and sequence analysis, amino acid substitution and replacement (substitution) matrices
Amino acid sequence alignment and analysis is one of the central components in most biochemical and molecular biology applications. Sequence analysis addresses questions like how to reveal and understand an observed conservation pattern, what is a consensus sequence in a protein family and how is it related to function, how to locate important functional residues, what are the relationships between sequence and 3D structure and what kind of 3D structural information we can extract from the amino acid sequence, and many, many more. Some of these questions are discussed in the chapter on protein structure, while this chapter provides an overview of the basic ideas and methods in protein amino acid sequence analysis. In the tutorials I will give examples of the techniques of sequence alignment using the resources available at the Expasy server.
Introduction
Since evolutionary relationships assume that a certain number of the amino acid residues within a family are conserved, we need to have some way to assess the degree of conservation when we make a sequence alignment. To assist us in the process scoring schemes for sequence alignment have been developed. Here I discuss the basic ideas behind that.
Alignment
When making a sequence alignment we need to understand the effect of amino acid substitutions, that is when one amino acid is replaced by another in the sequence. This is important to take into account when counting the alignment score. Some substitutions are conservative, i.e., they will not introduce any substantial disturbances in the protein structure. But other substitutions may have dramatic effect on the structure and for this reason they are rare. Here I will discuss how we take these effects into account when making a sequence alignment.
Substitution matrices
I will also provide two guided examples of the use of the resources available at the Expasy server for sequence alignment, and the tools for sequence alignment analysis. In some cases the alignment may be strait forward to perform, while in others some extra attention may be needed, for example when we align multidomain proteins. Also three-dimensional structural information may be used to correct the sequence alignment. In the second tutorial I will discuss the use of protein secondary structure in the alignment of a multidomain protein.
Tutorial 1
Tutorial 2