CCP4 Tutorial: Session 2

See also the accompanying document giving background information.

2a) Data processing

The Problem

This example will start with intensity data that has already been scaled and merged (e.g. with scala or scalepack). The data is from the crystal structure of GerE, a transcription activator from Bacillus subtilis, which was solved by MAD phasing using the Se signal (V.M.A. Ducros, R.J. Lewis, C.S. Verma, E.J. Dodson, G. Leonard, J.P. Turkenburg, G.N. Murshudov, A.J. Wilkinson and J.A. Brannigan, J. Mol. Biol. (2001) 306 759-771).

We are going to convert the intensities to structure factor amplitudes, and discuss some statistics that are generated. These statistics are essential for assessing the quality of the data, whether there is anisotropy, and whether there is twinning.

Exercise

2.1 Select the Data Reduction module, and open the Convert Intensities to SFs task window.

2.2 On the first line, enter a suitable job title such as

Job title generate SFs for GerE native data.

2.3 Change the next line to:

Input data no anomalous data.

de-select

keep the input intensities in the output MTZ file

and de-select

Ensure unique data & add FreeR column for 0.05 fraction of data.

using the radiobuttons.

2.4 Now enter the input file as:

MTZ in DATA gere_nat.mtz

The input column labels should be correctly chosen as

Imean I SigImean SIGI

(if not, use the pull-down menus to select these column names). The output file will be automatically set to:

MTZ out TEST gere_nat_truncate1.mtz

Enter an identifier as:

Identifier to append to column labels nat

2.5 In the section Data Harvesting, leave as:

Do not create harvest file

2.6 In the section Required Parameters, we need to enter an estimate of the number of residues in the asymmetric unit. This is used in Wilson scaling, which allows one to put the data on an approximate absolute scale. Enter:

For solvent content analysis enter unit cell contents as number of residues

Number of residues in asymmetric unit 444

(there are 6 chains of 74 residues each in the asymmetric unit).

2.7 Do not change any of the remaining options, and click on Run -> Run Now.

2.8 When the job has finished, return to the main window, highlight the job in the Job List, and select View Files from Job -> View Log Graphs. This opens up the loggraph viewer. Graphs are selected by first clicking in the middle window to select a group of graphs, and then clicking in the bottom window to select a particular graph.

2.9 The graphs in Acentric Moments of ..., Centric Moments of ... and Cumulative intensity distribution are useful for deciding whether twinning is present. Have a look at these graphs. Use the cross-wires to estimate values. Compare the plotted values of the moments with the Expected values shown at the top of the window. These plots confirm there is no problem with twinning. The graph of the 2nd moment is the clearest.

(See the accompanying document for an example where twinning occurs.)

2.10 Next, look at the graphs in Anisotropy analysis (FALLOFF). The graph of Mn(F/sd) v. resolution suggests that the data is slightly poorer along direction 3, which is defined to be perpendicular to a* and b*, i.e. there is some anisotropy in the data.

2.11 Close the loggraph window using File -> Exit. Close all other windows except the main window.

2b) Standardise MTZ file

The Problem

You now have a file of structure factor amplitudes for the reflections that were collected. It is considered good practice to add in all other reflections appropriate to the spacegroup and resolution, even if there is no data for them ("completing the dataset"). It is also good practice to add a column of free-R flags at this stage.

Exercise

2.20 Select the Reflection Data Utilities module, and open the Convert to MTZ and Standardise task window.

2.21 On the first line, enter a suitable job title such as

Job title standardise GerE native data.

2.22 On the second line, select MTZ from the pull-down menu:

Import reflection file in MTZ format and create MTZ file

2.23 On the 3rd line, select:

Create full unique set of reflections and Generate FreeR data .

by clicking on the radiobutton on the left-hand side, and selecting from the pull-down menu.

2.24 Now enter the input file as:

In TEST gere_nat_truncate1.mtz

The output file will be automatically set to:

Out TEST gere_nat_unique1.mtz

2.25 In the section MTZ Project & Dataset Names, the correct project and dataset names will have been inherited from the input file, so do not change these.

2.26 The remainder of the task window can be left unchanged, so go to the bottom of the task window and click on Run -> Run Now.

2.27 When the job has finished, view the output file by selecting in the main window View Files from Job -> gere_nat_unique1.mtz. First, notice that there is now an extra column holding FreeR flags:


     * Column Labels :
 
      H K L F_nat SIGF_nat FreeR_flag

2.28 Click on List More Info at the bottom of the display window. Accept the defaults and click Apply&Exit. Now look at the table of statistics at the bottom of the display window:


 OVERALL FILE STATISTICS for resolution range   0.000 -   0.216
 ======================= 


 Col Sort    Min    Max    Num      %     Mean     Mean   Resolution   Type Column
 num order               Missing complete          abs.   Low    High       label 

   1 ASC    -50      50      0  100.00     -1.9     19.0  70.71   2.15   H  H
   2 NONE     0      28      0  100.00     10.5     10.5  70.71   2.15   H  K
   3 NONE     0      33      0  100.00     12.5     12.5  70.71   2.15   H  L
   4 NONE   16.0  1682.1   298   98.84   189.72   189.72  14.95   2.15   F  F_nat
   5 NONE    2.3    91.7   298   98.84    11.33    11.33  14.95   2.15   Q  SIGF_nat
   6 NONE    0.0    19.0     0  100.00     9.47     9.47  70.71   2.15   I  FreeR_flag


 No. of reflections used in FILE STATISTICS    25769

The standardise procedure has added 298 reflections, for which the structure factor amplitude is labelled as missing. The completeness is thus calculated as (25769 - 298)/25769 = 98.84%

2.29 Close all windows except the main window.

2c) Combine native data with MAD data

The Problem

You now have a file of structure factors suitable for using in structure solution. Often you will have several files, obtained from different crystals or heavy atom derivatives. It is usually convenient to combine all these files into one MTZ file. In this example, we will combine the native data we have just processed with some MAD data for a selenomethionine derivative of GerE.

Exercise

2.40 Select the Reflection Data Utilities module, and open the Merge MTZ Files (Cad) task window.

2.41 On the first line, enter a suitable job title such as

Job title Merge native with MAD data.

2.42 Now enter the first input file as:

MTZ in TEST gere_nat_unique1.mtz

This contains the standardised native data. Leave the next line as:

Input all columns from this file

2.43 Click on Add input MTZ file. Enter the second input file as:

MTZ in DATA gere_MAD_only.mtz

This contains the MAD data. Again, leave the next line as:

Input all columns from this file

2.44 Enter the output file as:

Output MTZ TEST gere_MAD.mtz

2.45 In the File completion and freeR extension section, make sure that FreeR_flag is declared:

Complete reflection list and extend freeR column FreeR_Flag

If this is not declared, add this manually by selecting Enter label from the pull down menu and declaring FreeR_flag in the dialogue box.

2.46 The remainder of the task window can be left unchanged, so go to the bottom of the task window and click on Run -> Run Now.

2.47 When the job has finished, view the output file by selecting in the main window View Files from Job -> gere_MAD.mtz. The output file has 38 columns:


 * Column Labels :
 
 H K L F_nat SIGF_nat FreeR_flag FSEinfl SIGFSEinfl DSEinfl SIGDSEinfl F(+)SEinfl
 SIGF(+)SEinfl F(-)SEinfl SIGF(-)SEinfl FSElrm SIGFSElrm DSElrm SIGDSElrm
 F(+)SElrm SIGF(+)SElrm F(-)SElrm SIGF(-)SElrm FSEpeak SIGFSEpeak DSEpeak
 SIGDSEpeak F(+)SEpeak SIGF(+)SEpeak F(-)SEpeak SIGF(-)SEpeak FSEhrm SIGFSEhrm
 DSEhrm SIGDSEhrm F(+)SEhrm SIGF(+)SEhrm F(-)SEhrm SIGF(-)SEhrm

3 columns for the hkl indices, 3 data columns from the native data file and 32 data columns from the MAD data file. For the MAD data, there are 8 columns for each of 4 wavelengths. These 8 columns are the average structure factor amplitude FSE, the anomalous difference DSE, the Friedel pair F(+)SE and F(-)SE, and the corresponding standard deviations.

2.48 Close all windows except the main window.