Sequence alignment tutorial: Part I

As mentioned
earlier amino acid sequence alignment may be be rather simple to perform, but may also need some extra attention, for example in cases when the proteins have considerably diverged and there is a large number of insertions and deletions, or in cases of multidomain proteins, especially if not all domains are present in one of the proteins being compared, something which could happen for example during homology modeling. Information on the tertiary structure is of course of great help for obtaining a correct alignment. In this first tutorial I will show you an easy way for making a sequence alignment. I will be focusing on using the tools available at Expasy and EBI servers, although there are of course many others, which will do exactly the same job. We start with a case of a protein of highly conserved sequence - subunit BchI of the enzyme magnesium chelatase. It is one of three subunits, which are required for this enzyme to catalyze the first committed step in chlorophyl biosynthesis, the insertion of a Mg2+ into protoporphyrin IX. In the second tutorial we will go through a slightly more complicated case and will first identify the part of the second subunit of magnesium chelatase BchD, which is homologous to subunits BchI, then we will make an alignment to be able to closer examine the conservation pattern in the two proteins. A proper sequence alignment is central for homology modeling, and the alignments here will be used later in the homology modeling tutorial.

To perform the alignment we first need to choose and retrieve the sequences. For this purpose we will use the
UniProtKB database within the Expasy group of servers. To start, simply write the name of the protein (BchI) into the UniProt or Expasy search window, and you will be taken to a list of sequences of BachI from different organisms:


I am just showing the first few sequences, the list contained a total of 295 sequences when I did the search. There you need to click on BCHI_RHOCB (entry
P26239), which is subunit BchI from Rhodobacter capsulatus. The page which will open is almost like a tutorial on megnesium chelatase - you will find there information on the biological function (photosynthesis, magnesium chelatase activity), type of ligands/substrate it binds (ATP), catalytic function (ATP hydrolysis), Protein Data Bank (PDB) entries, if available, links to published works, links to entries related to this particular protein in other databases, and of course the amino acid sequence of the protein. One of the links, which I find very useful, is the one to the InterPro database. It provides a plenty of information about the protein and the family to which it belongs:


For sequence alignment we first need to retrieve the sequences of BchI from different organisms. Normally the sequence is presented in the following format:


To make an alignment we need to choose some additional sequences (and sometimes also choose a server where we want to make the alignment). Normally one needs to spend few minutes and think which sequences to include in the alignment. Then we can run a Blast search (on the top right in the above figure). The results are shown in the image below:


To keep the size of this page at some reasonable level (with too many images it is going to take longer time for it to download), I will continue on the next page.

Sequence alignment and substitution matrices