CCP4 Tutorial: Session 1

See also the accompanying document giving background information.

In the following instructions, when you need to type something, or click on something, it will be shown in red. Output from the programs or text from the interface is given in green.

Directory DATA contains input files. Directory RESULTS contains selected output files (you can look at these if you have problems, or the job is too slow). You will work in your own directory TEST.

If you have problems following the instructions, then you can use def files in directory DATA which contain the necessary parameters. You can load these files into the interface using the option at the bottom of the task window Save&Restore -> Restore from File -> select the file.

Often you will use the output file of one job as the input file for the next job. However, if you do not have the output file, then it will also be available in directory DATA.

1a) Setting up Project and Directory Aliases

The Problem

When using ccp4i for the first time, you need to set up a project to work in. You also need to define directories so that ccp4i knows where to find files.

Exercise

1.1 In your home directory, make a subdirectory TEST:

> mkdir TEST

1.2 Start ccp4i:

> ccp4i

The Main Window will appear.

1.3 Click on the Directories&ProjectDir button in the main window.

1.4 In the new window, click on Add Project and in the new line enter a project alias TEST and then enter the the full path name for the subdirectory TEST that you have just made:

Project TEST uses directory: $HOME/TEST

1.5 Select this new project on the next line:

Project for this session of CCP4Interface. TEST which uses:

1.6 Click on Add Directory Alias and in this new line add the the directory alias DATA and the path name:

Alias: DATA for directory: $CEXAM/tutorial2000/data

1.7 Repeat 1.6 to add the alias RESULTS:

Alias: RESULTS for directory: $CEXAM/tutorial2000/results

1.8 Click on Apply&Exit.

1b) Introduction to the MTZ format

The Problem

The MTZ file format is central to running the CCP4 programs. When using CCP4 for the first time, you will usually have to convert an external file to MTZ format. You also need to understand how information is arranged in an MTZ file. In this example, we convert a CNS reflection file for the protein toxd to MTZ format, and briefly examine the MTZ file.

Exercise

1.20 Find the choose module pull-down menu in the main window, and select Reflection Data Utilities.

1.21 In the tasks menu below, click on Convert to MTZ and Standardise. This will open a task window.

1.22 On the first line, enter a suitable job title such as

Job title Importing CNS file for toxd.

1.23 On the second line, select X-PLOR/CNS from the pull-down menu. Wait while the task window re-draws itself.

1.24 On the 3rd line, select:

Create full unique set of reflections and keep existing FreeR data.

by clicking on the radiobutton on the left-hand side, and selecting from the pull-down menu.

1.25 Now enter the input CNS file as:

In DATA toxd.hkl

(you can use the Browse button after selecting DATA from the pull-down menu). The output file should be automatically set to:

Out TEST toxd.mtz

(if it is not, type this yourself).

1.26 Now look at the section MTZ Project & Dataset Names. These names will be used to identify the data for Data Harvesting and other purposes. Enter

Harvest project name toxd and dataset name native

1.27 Now look at the section Extra information to be saved in MTZ file. We need to supply the spacegroup and cell dimensions, since these are not included in the input CNS file. Enter:

Space group name or number 19

Cell dimensions a 73.582 b 38.733 c 23.189 alpha 90 beta 90 gamma 90

1.28 The remainder of the task window can be left unchanged, so go to the bottom of the task window and click on Run -> Run Now.

Look at the main window of the interface again, and look at the Job List. The current import job should be at the top. The status will be given as STARTING, RUNNING and then FINISHED. This job is very quick, so you may only see the FINISHED status.

1.29 When the job has finished, highlight the job in the job list by clicking on it. Then select View Files from Job -> toxd.mtz in the main window. A window will open displaying the contents of the MTZ file that you have created (the MTZ file is a binary file, so you are actually just seeing the output of a viewer program). The information that is displayed comes from the header of the MTZ file. Look for the following:

      * Dataset ID, project name, dataset name:

            1 toxd
              native

Information about the datasets included in the file is given here. In this example, the file just contains one dataset.

      * Column Labels :

            H K L FP SIGFP FreeRflag

The file contains 6 columns; 3 holding the hkl indices, and 3 containing data. The names of these columns are given here. In the MTZ format, the column names are not fixed, and neither is the order of the columns. Programs use these names to identify the columns that are to be used.


     * Column Types :

           H H H F Q I

Each column has an associated type. For example, F refers to a structure factor amplitude: the column FP has this type.


     * Associated datasets :

           1 1 1 1 1 1

This is a list of the datasets associated with each column. In this example, all columns belong to dataset 1.

     * Cell Dimensions :

          73.5820 38.7330 23.1890 90.0000 90.0000 90.0000

     * Resolution Range :

          0.00074 0.18900 ( 36.761 - 2.300 A )

     * Sort Order :

          1 2 3 0 0

     * Space group = P212121 (number 19)

The cell dimensions, resolution range and space group are carried in the MTZ file header, so that you do not normally need to enter them explicitly when running programs.

1.30 By default, only the header information from the MTZ file is displayed. To see more, click on List More Info at the bottom of the display window. A dialogue box will appear. Accept the defaults and click Apply&Exit. Extra information is now displayed at the bottom of the display window. Scroll down and look at the table:


    OVERALL FILE STATISTICS

Each line corresponds to a column of data in the MTZ file, and for each line various statistics are given. For example, Num Missing gives the number of reflections in that column which have been flagged as missing data (e.g. a structure factor amplitude which wasn't measured in the diffraction experiment).

1.31 At the bottom of the display, the first 10 reflections are listed (more can be listed via the List More Info option):

The rows correspond to different reflections, and the columns correspond to the 6 columns of data described in the header. Some entries are given as "?". This represents missing data, and the total number of such entries for each column is listed in the table OVERALL FILE STATISTICS.

1.32 When you have finished examining the file, click on Quit. Close all other windows except the main window.

1c) MTZ format: unmerged files

This exercise is optional. Do not worry if you do not have enough time to do this.

The Problem

The previous example looked at a so-called merged MTZ file. This type of file has only one record for each set of hkl indices, and is the type of file one has after merging together all different observations of a particular reflection. In the early stages of data processing, however, one has several observations of each reflection (i.e. from different images or symmetry-related) and such reflection data are held in an unmerged MTZ file. In this exercise, we examine an unmerged MTZ file.

Exercise

1.40 Open the Convert to MTZ and Standardise task window again (see 1.20 and 1.21 above).

1.41 On the first line, change the job title to:

Job title Importing unmerged DMSO data.

1.42 On the second line, select ascii MTZ from the pull-down menu.

1.43 On the 3rd line, turn off Create full unique set of reflections using the radiobutton. This is not appropriate for unmerged data.

1.44 Now enter the input file as:

In DATA aucn.na4

The output file is set automatically to:

Out TEST aucn.mtz

1.45 Now look at the section MTZ Project & Dataset Names. Enter:

Harvest project name dmso and dataset name red_aucn

1.46 Cell and symmetry information is obtained from the input file and doesn't need to be entered. So click on Run -> Run Now.

1.47 When the job has finished, inspect the contents of the output unmerged file using View Files from Job -> aucn.mtz. Much of the information is the same as for the previous example, but there is some extra information specific to unmerged MTZ files.

1.48 Unmerged MTZ files have a standard set of column labels:

    * Column Labels :

       H K L M/ISYM BATCH I SIGI IPR SIGIPR FRACTIONCALC XDET YDET ROT WIDTH LP MPART

These will normally be the same for all unmerged files.

1.49 Reflection records are grouped into batches: a batch corresponds to an image (or group of images) upon which a subset of the reflections were recorded. The same hkl triplet may occur several times, with different instances being distinguished by different batch numbers. A list of batches is given at the end of the default display:


    Batch number:
    5
    Batch number:
    6
    Batch number:
    7
    Batch number:
    8
    Batch number:
    9
    Batch number:
    10

1.50 Click on List More Info, and this time select batch headers for multi-record MTZ before clicking Apply&Exit. In the main display window, the batch header for each batch is displayed.

Orientation data for batch 5 oscillation data

 Crystal number ................... 0
 Associated dataset ID ............ 1
 Cell dimensions .................. 88.91 88.91 229.22 90.00 90.00 90.00
 Cell fix flags ................... -1 1 -1 0 0 0
 Orientation matrix U ............. 1.0000 0.0000 0.0000
 (including setting angles) 0.0000 1.0000 0.0000
 0.0000 0.0000 1.0000
 Reciprocal axis nearest .. c*
 Mosaicity ........................ 0.020
 Datum goniostat angles (degrees).. 0.000
 Start & stop Phi angles (degrees). 343.000 344.000
 Range of Phi angles (degrees)..... 0.000
 Start & stop time (minutes)....... 0. 0.
 Crystal goniostat information :-
 Number of goniostat axes.......... 1
 Goniostat vectors..... .... 0.0000 0.0000 1.0000
 ..... .... 0.0000 0.0000 0.0000
 ..... .... 0.0000 0.0000 0.0000
 Beam information :-
 Idealized X-ray beam vector....... -1.0000 0.0000 0.0000
 X-ray beam vector with tilts...... -1.0000 0.0000 0.0000
 Wavelength and dispersion ........ 0.88000 0.00120 0.00010
 Divergence ....................... 0.120 0.020
 Detector information :-
 Number of detectors............... 0
 Crystal to Detector distance (mm). 0.000
 Detector swing angle.............. 0.000
 Pixel limits on detector.......... 0.0 0.0 0.0 0.0

The batch header contains information on how the corresponding image was recorded, and this information is used by certain programs such as SCALA.

1.51 When you have finished examining the file, click on Quit. Close all other windows except the main window.