ARP/wARP Web Services Tutorial

Introduction

ARP/wARP builds macromolecular models from X-ray crystallography data. This tutorial describes the steps needed to employ ARP/wARP to build protein models on a computational cluster at EMBL Hamburg using a web interface.

This tutorial uses the Leishmanolysin example found in the examples/Tracing/PSP/ subdirectory within the ARP/wARP distribution. However, you don't need to download ARP/wARP to use this tutorial. The data is made available through the web interface if you click on the 'use example data' checkbox in Step 1.

Step 1 : Upload Experimental Data

Go to the ARP/wARP web application and click on the 'continue' button. This takes you to a form where you can upload experimental data.

Because ARP/wARP can sometimes run for several hours, we ask that you fill in an email address so that details of the logging and download area can be sent to you. This email address will also be contacted if there is an error with the model building.

The minimum data that can be uploaded is an MTZ file that contains experimental phases. If you wish to start from an existing protein model, select 'existing model' in the 'Run ARP/wARP starting from' pulldown menu. The MTZ file does not need to contain phase information if starting from an existing model.

Choose the MTZ file that will be uploaded from your local machine. If you do not have one available click on the 'use example data' checkbox.

Step 2 : Set Options and Parameters

Click on submit and you will be taken to a form where options and parameters for the model building can be set.

If a sequence file is available, this will be used by the model building in a step known as 'sequence docking'. You can upload a sequence file as a plain text file in pir format or copy and paste your sequence when you select the "Paste sequence file into text box" option. Enter the number of residues in the asymmetric unit. A monomer will have the same number of residues as the sequence you uploaded. You can, optionally, select which structure factor and phase labels of the MTZ file to use for model building. In most cases, the automatically recognised defaults are acceptable.

The dissemination level controls who has access to your data. If 'Confidential' is selected the main input and output files are kept private and automatically deleted after a week. If 'ARPwARP-AutoRickshaw-Refmac developers' is selected the data is only accessible by developers on the ARP/wARP, AutoRickshaw and Refmac development teams for the purposes of improving their software. If 'World' is selected, the data is made available to software developers who request it.

To continue click on the submit button.

Step 3 : View the Running Application

An email will be sent to the email address that you gave in Step 1. It contains a link and a password needed to enter a password protected area. Enter your email and the password to view the logs of your running model building job.

Step 4 : Interpreting the Log File

A short log file is displayed during model building and a more detailed log file is available for download. To meet the needs of expert users, and to aid troubleshooting, the long log files produced by ARP/wARP are quite verbose. However, the logs are human readable! What follows is an overview of the most important parts.
Checking the estimated content
Should the solvent content be too high or too low (e.g. you have mis-typed the total number of residues expected in the AU), ARP/wARP will re-set it to approximately 50%. The target number of residues will be reset accordingly.
Checking the provided sequence file
Should the sequence length, the number of molecules in the AU and the total number of residues in the AU not match each other, the number of molecules in the AU will be reset accordingly. Should the sequence file not be interpretable (e.g. contain unexpected characters), an error message will be given.
Input MTZ file
We have observed that sometimes the MTZ files do not have proper headers, e.g. non-standard space group name or zero space group number. ARP/wARP uses CAD programme to always do a header fix, thus the MTZ file will have an extension .mtz.cad.
Space group number
ARP/wARP version 7.2 supports all standard non-centrosymmetric space groups, P1bar and several non-standard space groups (e.g. 1017 or 2017). The space group is figured out solely from the symmetry operators stored in the MTZ file header.
Input files
The ASCII files (sequence, input PDB or input file with heavy atoms) are always converted to a Unix line feed, thus they have an extension _lf.
Checking whether input PDB contains ligands
This check comes up if the initial model is available. Should the model contain ligands unknown to the Refmac library, they are renamed to the free DUM atoms. This should not affect the model building performance, but the warning is printed.
R factor after Refmac before model building
If the initial model is available, a number of restrained refinement cycles with Refmac is carried out until R factor convergence.
Building cycle zero
Normally one should expect a considerable part of the structure built already at the starting building cycle. If this is not the case, observe the situation for a few further building cycles. If, however, there is essentially nothing autotraced for further building cycles, please inspect whether the initial phases are sufficiently good.
Search for helices and strands
The module for building helical and beta-stranded fragments is invoked if requested or by default with data at 2.7Å resolution or lower. The number of built helical/stranded residues and chain fragments is printed.
Non-Crystallographic Symmetry (NCS)
If the resolution is worse than 2.3Å, chains are extended using information from other relevant parts of the structure. Non-Crystallographic symmetry is also used to provide restraints during refinement.
Rounds within building cycle
Each cycle of the main chain tracing is carried out in several rounds. Normally each successive round should result in more residues and in fewer fragments. The maximum length of the traced fragment and the score of the model building are also printed for information.
Chains, residues and estimated correctness of the model
The output from the best tracing round is processed further. Fragments of 4 residues or shorter are converted to free atoms. In addition, the terminal residues of the fragments are removed. The rest is kept and used to provide restraints for subsequent ARP/REFMAC cycles. The value of the estimated correctness of the model should steadily approach 100% if the model building is successful.
Residues docked into sequence
If the sequence is provided, the autotraced chain fragments are docked into it and the side chains are built and refined in real space. The results of this are printed out. If the sequence is not provided, side chain guesses only (GLY/ALA/SER/VAL) are built and refined.
Loop building
This is invoked if the sequence is available and if the tracing score is above 0.85. It is also invoked after the last building cycle.
R factor after Refmac during the iterations
The value of the R factor typically oscillates. It goes up after each tracing cycle (because the model is entirely rebuilt) and then decreases during the ARP/REFMAC refinement and update cycles. At the end of the procedure it should reach a value typical for a restrained refinement.
Sequence coverage
If the sequence is provided, the ratio of the number of docked residues to the total number of traced residues is printed. A value higher than 0.8 is deemed as good convergence. All free atoms are then removed from the file and the task is directed into a few cycles of restrained refinement with solvent search. If, however, the value of sequence coverage is lower than 0.8, the free atoms (DUM) are left in the file. You can inspect the density maps, start changing the model on the graphics or, alternatively, submit another model building task using the output of this job.
Job termination
The statement Task completed successfully indicates that the job is finished with no error. An error statement:
QUITTING ... ARP/wARP module stopped with an error message: name_of_the_program
indicates that one of the modules of the task has terminated with an error message. When this happens you will normally be contacted by one of the ARP/wARP developers.