Google

Molecular Replacement Tutorial

This tutorial solves one protein structure using the structure of a similar protein.

The Problem

The example is to solve cardiotoxin which is:
 
  • a small protein of 60 residues
  • space group C2
  • data to 1.5Angstrom resolution
  • We  have the structure of the same cardiotoxin in a different space group but residues 6-11 and 28-31 are missing.
  • This protein is now in the PDB as 1tgx.pdb (solved by A.Bilwes,B.Rees,D.Moras J.Mol.Biol. 1994, V239, p-122).

    Outline of the Method

    1) Make an estimate of the number of molecules in the asymmetric unit so we know how many molecules to look for with the molecular replacement programs.

    2) Look at our experimental data - are there any problems?

    3) Run molecular replacement program to find solutions.

    4) Refine the phases using NCS (non-cystallographic symmetry) phased refinement.

    The Data Files

    Files in directory $DATA

    model.pdb  -    contains coordinates of the model we will use to solve cardiotoxin.
    cardiotoxin.mtz  - contains the experimental data.

    Files in the directory $RESULTS

    matthews.log    - the log file from Cell Content Analysis
    mr_analyse.log    - the log file from Analyse Data/FONT>
    molrep.log - the log file from Molrep
    model_molrep1.pdb - the output coordinates from Molrep
    mr_refmac.log - the log file from Refmac5 refinement

    Stage 1) Estimate the Number of Molecules in the Asymmetric Unit

    Most protein crystals contain about 50% water.  We will calculate the number of protein molecules in the asymmetric unit of our crystal which will give a water content of about 50%.

    For next steps see the picture here

    1.1 Change to the Coordinate Utilities module.  Select the Cell Content Analysis task.

    1.2  Select the MTZ file - the program will read the space group and cell dimensions from the MTZ file (so you do not need to type them in):

    MTZ file             DATA         cardiotoxin.mtz.

    1.3 Enter the molecular weight of the protein.  The protein has 60 residues and we say average residue weight is 100 Dalton.  So

    Molecular weight of protein 6000.

    1.4 Click the Run Now button.

    1.5 Look at the output in the window - it shows a table of the Matthew's coefficient and percentage solvent content dependent on the number of molecules that are in the asymmetric unit.
     

    For estimated protein molecular weight 6000
    Nmol/asym  Matthew's Coeff %solvent
      1         6.6             81.2
      2         3.3             62.5
      3         2.2             43.7
      4         1.6             24.9
      5         1.3              6.2
    We are looking for the right number of molecules to give about 50% solvent.   It looks as if our crystal will have three molecules in the asymmetric unit but two molecules is also possible.

    1.6) Close the Cell Content Analysis window.

    Stage 2)  Look at the Experimental Data

    We will do two things:

    a) Create a Patterson map and search it for peaks.  We expect a big peak at the origin (position 0,0,0) but if there is another big peak (perhaps about 0.25 the size of the origin peak) then perhaps there is  translation between the molecules in the asymmetric unit and it will be more difficult to solve.

    (The theory behind this is explained on the web site of Bernhard Rupp:

    http://www-structure.llnl.gov/xray/101index.html

    For more information, go to the section on Phasing Techniques on this website, and click on NCS with native Patterson maps)

    b) Create a Wilson plot which is an indication of the self consistency of the data.  Also find the average B-value of the data -  this can be used to help the molecular replacement program.

    For the next steps look at picture here

    2.1) Change to the Molecular Replacement module and select the Analyse Data for MR task.

    2.2) Select the input experimental data

    MTZ in      DATA      cardiotoxin.mtz

    2.3)  Select input model:

    PDB in   DATA     model.pdb

    2.4)  Enter the Number of residues in the asymmetric unit - this is:

      number of molecules in asymmetric unit  *  number of residues per molecule

     =    3  *  60   = 180

    Number of residues in asymmetric unit  180

    2.5)   Run the job.  You can now Close the Analyse Data window.

    2.6)  Look at the log file when the job has finished.  In the main CCP4i window click on the job called mr_analyse and then from menu View Files from Job select View Log File.  In the log file is output from the programs FFT which created the Patterson map and Peakmax which searched for  peaks in the map.  To find what we want click on the Find button and enter the text List of peaks.  You can now see table like this:

     Order No. Site Height/Rms    Grid      Fractional coordinates   Orthogonal coordinates
         1    1    1   38.66     0   0   0   0.0000  0.0000  0.0000     0.00   0.00   0.00
         2    2    2    3.34     4   0   0   0.0731  0.0000  0.0000     5.75   0.00   0.00
         3    7    3    3.10    25  16   5   0.4403  0.5000  0.1174    31.66  20.20   5.85
    The biggest peak is size Height/Rms  38.66 and is at position x=0,y=0,z=0 - this is as we expect.  The next biggest peak is 3.34  It is much smaller so there is no translation (good!).

    Hints

    In fact the values you get may be different, for example:

    
     Order No. Site Height/Rms    Grid      Fractional coordinates   Orthogonal coordinates
    
         1    1    1   58.43     0   0   0   0.0000  0.0000  0.0000     0.00   0.00   0.00
         2    4    2    3.47     4   0  55   0.0546  0.0000  0.9821   -20.75   0.00  48.96
         3   10    2    3.47    36  20   1   0.4454  0.5000  0.0253    34.40  20.20   1.26
    
    but the comments above still apply.

    You may also see a different number of peaks, for example:

    
     Order No. Site Height/Rms    Grid      Fractional coordinates   Orthogonal coordinates
    
         1    6    1   38.66    28  16   0   0.5000  0.5000  0.0000    39.35  20.20   0.00
         2    8    1   24.28    28  16  43   0.5000  0.5000  0.9773    14.42  20.20  48.72
         3    5    2    3.34    24  16   0   0.4269  0.5000  0.0000    33.60  20.20   0.00
         4    3    3    3.10     3   0  39   0.0597  0.0000  0.8826   -17.82   0.00  44.00
    
    In this case x=0.5,y=0.5,z=0.0 is the centring operation of the spacegroup of the data (C2). It is a crystallographic translation of the origin peak (as opposed to a non-crystallographic translation).

    The difference is an effect of the width of the Patterson origin peak being related to the resolution range of data included when generating the Patterson map. At lower resolutions the origin peak may overlap neighbouring grid points in the map, and result in apparent extra peaks in these adjacent positions.

    Including higher resolution data narrows the origin peak and reduces the effect; try changing the high resolution limit from 4.0 A to 3.0 A in the Define Map folder, and re-run.

    2.7)  Now go to the bottom of the log file where you will see:

    Average B value for experimental data = 18.178
    Average B value for model = 20.000
    Running aMoRe: set the Tabling parameter BADD
    (the amount to add to the Bvalue) to -1.822
    2.8 Look at the graphs in the log file. From the View Files from Job menu select View Log Graphs. There are three tables in the log file - look at then in turn:

    Wilson Plot

    This is a usual Wilson plot - no problems here!

    Amplitude Analysis v. Resolution

    This plot is usual shape for amplitude versus resolution plot with 'water' peak at about 4A.

    Average B v. Residue

    This shows the difference from the mean - in this PDB file all the B values are set to 20.   This is not interesting for this protein.

    Quit from the two windows which display the log file and the graphs.
     

    Stage 3) Run MolRep Molecular Replacement Program

    This program will solve the structure - you must input a coordinate file for a protein similar to the protein in the crystal and the program will output a coordinate file with the molecule moved to the right position(s) in the crystal.

    For the next few steps look at picture here

    3.1)  From the Molecular Replacement menu select MolRep - auto MR

    3.2)  The default mode for running MolRep is good:

    Do molecular replacement performing rotation and translation function

    3.3) Select the input experimental data file.

    MTZ in DATA    cardiotoxin.mtz.

    3.4) Select the input model.

    Model in   DATA     model.pdb

    3.5) Look down the window a little: set the

    Search for 3 monomers in the asymmetric unit.

    3.6) From the Run menu select Run Now.

    MolRep will take a long time to run - if it is too long you can see the output files: $RESULTS/molrep.log  and $RESULTS/mr_model_molrep1.pdb.
    Look at the log file by selecting View Any File from the right side of the main window, then select

    Go to directory RESULTS

    then

    File molrep.log

    and the click on Display and Exit

    The log file lists many possible solutions.  After the rotation function:
     

     Number of peaks :      50
                 alpha    beta   gamma   theta    phi    chi        Rf    Rf/sigma
     Sol_RF  1    28.27   60.29  182.91  148.91 -167.32  153.12    0.3796E+09  5.31
     Sol_RF  2    40.54   72.23  275.07  117.37  152.74   83.17    0.3249E+09  4.54
     Sol_RF  3   162.83   58.08  180.35  104.76  -98.76   60.26    0.3060E+09  4.28
     Sol_RF  4   325.42   64.83  249.36  146.36  128.03  150.77    0.3013E+09  4.21
     Sol_RF  5    64.03   63.07  262.23  115.31  170.90   70.70    0.2954E+09  4.13
    This shows the possible rotation of the molecule: alpha beta gamma (or theta phi chi in polar coordinates).  The score for the solution is the Rfactor.

    After a translation function:
     

                 alpha   beta    gamma   Xfrac  Yfrac  Zfrac Dens/sig R-fac   Corr
     Sol_TF_1  1   28.27   60.29 -177.09  0.091  0.000  0.297    3.91  0.598  0.139
     Sol_TF_1  2   28.27   60.29 -177.09  0.611  0.000  0.342    3.35  0.596  0.137
     Sol_TF_1  3   28.27   60.29 -177.09  0.174  0.000  0.105    3.23  0.602  0.125
     Sol_TF_1  4   28.27   60.29 -177.09  0.870  0.000  0.349    2.82  0.606  0.107
     Sol_TF_1  5   28.27   60.29 -177.09  0.268  0.000  0.471    2.67  0.607  0.120
    This shows the rotation (alpha beta gamma) and translation as fractional coordinates (Xfrac Yfrac Zfrac).  There are different ways to score the solutions density/sigma, Rfactor, Correlation coefficient - the correlation coefficient is the best - this is bigger for good solutions.

    The program runs the translation function three times to find three different molecules.  For the second run of the translation function it will take the best solution from the first run and try to find another molecule which will fit well with the first solution.  For the third run of the translation function it will keep the best solution from the first run and the second run and try to find molecule number three.

    It is not possible to say what is a good score - this will depend on many things but it is good if the best score (the correlation factor in the column labelled Corr) is much bigger than the second best score.  When you are looking for several molecules the best score for the first (and perhaps second) molecule will not be very good but you hope that the best score for all three molecules is much better than for other possible solutions.
     
    There are three molecules in the output PDB file - these have chain names: A, a and b.    At the bottom of the log file is some information about how these three molecules fit together.
     

    Mol_1 Mol_2 Direction_Cosine_of_Axis      Angle     Rotation  Translation
    A     a     0.0       0.1       1.0       86.8      124.3     1.3       0.1       0.0
    A     b     -0.1      -0.1      -1.0      91.8      104.8     0.5       1.1       -0.1
    a     b     0.0       0.0       1.0       85.6      131.3     0.9       0.8       0.0
    The diagram below shows what this means: there are three molecules and the vectors between the centres of the molecules.  The direction of the axis of rotation to map one molecule onto another is shown; it is at 90 degrees to the vector between the molecule centres.  To map one molecule onto the other needs some translation and a rotation of approximately 120 degrees (in fact the angles are 131.3 degrees, 104 degrees and 131 degrees which are surprisingly different from 120 degrees). This is right for a three-fold rotation axis and this shows there is a rotation axis and not a screw axis. This helps to confirm that the solution is correct because MolRep did not use this information to find the solution.

     

    3.7)  The output file mr_model_molrep1.pdb contains three copies of the input model moved to the right positions in the asymmetric unit.   The three molecules will pack together something like this.
     


     
     

    Alternative Stage 3) Run aMoRe Molecular Replacement Program

    Stage 4) Refine the Structure

    It is not always certain that the molecular replacement is correct - the best way to test it is to refine the model.  The best way to start refinement when there is more than one molecule in the asymmetric unit is to use the non-crystallographic symmetry to restrain the refinement - that is the refinement program must keep all the molecules the similar.

    For the next few steps look at the picture here

    4.1) Go to the Refinement module and chose  Run Refmac5

    4.2) Enter the name of the data file:

    MTZ in        DATA            cardiotoxin.mtz

    Enter the name of the input files - this is the coordinate file output from MolRep:

    PDB in         RESULTS     model_molrep1.pdb

    If you have not run molrep, use the model_molrep1.pdb file found in DATA:

    PDB in         DATA     model_molrep1.pdb
     

    4.3)  Now you must tell the program that it must keep the  non-crystallographic symmetry by keeping the three chains similar.  Click on the line with the folder title

    Non-crystallographic Symmetry

    4.4) Click on the button

    Add non-X restraint

    4.5) Click three times on the button

    Add chain

    4.6) Now set up like this:

    Restrain together chain A
    and B
    and C
     

    4.7) Make sure you are not creating a data harvesting file. Select:

    Do not create harvest file

    from the Data Harvesting section.
     

    4.8) Run the program by clicking Run and Run Now.
     

    4.9) You can look at a log file by using View Any File and selecting

    Go to  directory        RESULTS

    File type     log CCP4 log    filename filter   *.log

    Viewer  View Log Graphs

    and then select file:

    File        mr_refmac.log

    and then click on Display and Exit

    Go to the bottom of the list of Tables in File and select

    Rfactor analysis, stats vs cycle

     
     
    The graph shows the Rfactor (red) and the Free Rfactor (blue) for 6 cycles of refinement.  The Free Rfactor goes down from 52% to 46%   The  Rfactor is high - this is normal after molecular replacement because we do not have a good model yet  but it goes down so we probably have a good solution.



    To find out more:
    MolRep: http://www.ysbl.york.ac.uk/~alexei/molrep.html
    CCP4: http://www.dl.ac.uk/CCP/CCP4

    Prepared by Liz Potterton (lizp@ysbl.york.ac.uk) & Eleanor Dodson, July 2000
    Additional material: Peter Briggs & Martyn Winn, February 2001