Protein Homology Modeling Using the Swiss Model Server
As mentioned earlier, it is an advantage to have an idea on the complexity of the protein homology modeling project before the start. This can be done by making and analyzing a multiple sequence alignment of your protein with some homologues, including the amino acid sequence of the modeling template (or templates) identified by the server. As a rule of thumb, a percentage sequence identity above 50% will mean a relatively strait forward modeling project, while anything below that will require careful planning. There are 3 modeling alternatives available at the SwissModeling server:
◦ Automatic mode
◦ Alignment mode
◦ Project mode
Easy modeling, which is possible in cases of high percentage sequence identity and absence of large insertions and deletions, may be done using the automatic mode of the server. Alignment mode lets you start with your own manually adjusted alignment. This method is often the method of choice in order to get the best quality models, as the automatic procedure often fails to correctly place the gaps for insertions and deletions. For those new for modeling I would suggest working in the project mode using Deep View (SwissPDB Viewer, or SPDBV) modeling software. This will assure a better control of the homology modeling process and may promote a deeper understanding of various features of the protein structure you are going to model. The following tutorial will demonstrate the technique of modeling using the Swiss Modeling server. We use this tutorial in our teaching. To follow it you will need some basic skills in using Deep View. There are a couple of tutorials, one by Simon Andrews and an older one by Gale Rhodes. The tutorial may be downloaded as a PDF file, if you prefer to work with it that way.
Before starting the tutorial please make sure to register at the Swiss-Modeling server (myWorkspace). You will receive a login password by e-mail, which later can be used to submit the modeling request.
Step by step modeling
In this example, we will make a model for the enzyme magnesium chelatase subunit BchI from Cyanobacteria Synechocystis (SWISSPROT entry P51634). You may even learn about the enzyme from the UniProt page. To start the modeling:
• Get the UniProt entry P51634 in FASTA format, save it in a text file and call it, for example, SynChlI.txt. If you are using MS Word or WordPad (on PC) make sure that when you save the file you choose the option "save as" Plain text (.txt).
• Launch Swiss-PdbViewer and choose the "Load Raw Sequence from amino acids" item of the "SwissModel" Menu to load the file. The sequence is presented as a long perfect alpha helix (an impossibility, of course).
• Now go to the “Select” menu of the SwissPDB Viewer and choose “All” to select all the amino acids in the sequence. When you receive the password for logg in into MyWorkspace of the SwissModel site, choose the "Swiss-Model" item of the "Preferences" menu, and enter your name and e-mail address.
• Now that everything is set up properly, choose the "Submit Template Search Request against ExPDB for current layer" item of the "SwissModel" menu. This will ask you for your password and automatically submit your request. The search results will appear within "Workspace", so you will need to logg in into your account to see them.
The Blast search is tun against a database of protein sequences for which there are known three-dimensional structures. Please be aware that the search may take a while. After logging in choose the “workunit” with a number corresponding to your job number (it is shown when you use SwissPDB Viewer for submission), and the page containing the results listing possible modeling templates identified by the Blast search will appear.
If you, for some reason, encounter any problems with this step you may also go directly to the Swiss Model site, choose the Template identification option of the server. In this case the search parameters are slightly different and you will get a more extensive output, like the one shown in the image below. You just need to paste the UniProt identification code of your sequence (P51634) and run the search. Here is a screen-shut showing the list of potential templates:
Obviously the first protein in the list is R. capsulatus BchI. But if check the list you will find something interesting, a lot of other proteins with a relatively high sequence identity with BchI. This shows the evolutionary history of the AAA+ domain, which is reused by Nature many time in very different functional contexts.
This imported file contains the SyncChlI sequence-thread along the X-ray structure of R. capsulatus BchI (PDB code 1g8p), as well as the coordinates of the actual 1g8p entry. It is essentially a half-ready homology model of Sync I subunit (if you are not familiar with the content of PDB files, please refer to the tutorial on PDB). Until we check the sequence alignment to ensure that everything was correctly done, the imported at this stage model cannot be considered to be the final homology model. The imported file is a normal text file and may be open with WordPad, MS Word or any other text editing program. The x,y,z coordinates assigned to the atoms of the SyncChlI sequence are of course the same as those for BchI, provided that the corresponding amino acids do exist. It is also possible to view the sequence alignment using SwissPDB viewer by clicking a small icon in left corner of the alignment window (you need to open the alignment window from the "windows" menu of the program:
You may notice that there are some insertions and deletions in the sequences, thus, we need to make sure that they are placed correctly.
First of all, there are some small problems with the PDB file. If you open it in MS Word you will discover that the first 11 amino acid residues of the SyncChlI sequence have x,y,z coordinates 9999.999, 9999.999, 9999.999, which simply means that they lack any coordinates:
This happened because the corresponding residues of 1g8p are not included in the PDB file, the structure starts from Arg13. The reason is that there was no electron density to guide the building of the model for this part of the protein. Actually there is another part of the sequence which did not have electron density and which is missing from the 1g8p structure, residues 328-340 (in R. capsulatus BchI). I you would open the PDB file in a text editor, you will notice that these residues are missing, while in the modeled SyncChlI structure the corresponding region from Lys331 to Ala357 have coordinates 9999.999, 9999.999, 9999.999. The modeling program has apparently decided to skip the few remaining residues at the C-terminus, which were still present in 1g8p (341-350), while SwissPDB Viewer ignores residues with these coordinates and does not display them.
Generally, flexible regions in proteins are not unusual, and they result in weak electron density, making model building difficult or impossible. By other words, in the future don't be surprised if you will discover that some parts in a protein structure are missing in the PDB file.
In SwissPDB Viewer, have a look at the 1g8p structure, find residue Arg328 and check for yourself that there is no residue 329. The next amino acid in the structure is separated from Arg328 by a long distance. The first residue in the last helix in the structure is Val 341, while the last residue is Pro350. In the sequence alignment Arg328 is aligned with SyncChlI Arg329. As mentioned above, this is the last residue with assigned coordinates in the SyncChlI preliminary homology model.
We still need to check the correctness of the alignment, may be edit and move around misplaced gaps, remove residues with unknown coordinates, save the project and then we may submit it again to the SwissModel server in a project mode for model optimization. Thus, as mentioned in the outline, we now need to make a sequence alignment which includes our template.
Again, to keep this page reasonably long, the tutorial continues on the next page.