ARP/wARP Web Services Tutorial
Introduction
ARP/wARP builds macromolecular models from X-ray crystallography data. This tutorial describes the steps
needed to employ ARP/wARP to build protein models on a computational cluster at EMBL Hamburg using a web interface.
This tutorial uses the Leishmanolysin example found
in the examples/Tracing/PSP/ subdirectory within the ARP/wARP distribution.
However, you don't need to download ARP/wARP to use this tutorial. The data is made available through
the web interface if you click on the 'use example data' checkbox in Step 1.
Step 1 : Upload Experimental Data
Go to the
ARP/wARP web application and
click on the 'continue' button. This takes you to a form where you can upload experimental data.
Because ARP/wARP can sometimes run for several hours, we ask that you fill in an email address so that
details of the logging and download area can be sent to you. This email address will also be contacted if
there is an error with the model building.
The minimum data that can be uploaded is an MTZ file that contains experimental phases. If you wish to start
from an existing protein model, select 'existing model' in the 'Run ARP/wARP starting from' pulldown menu.
The MTZ file does not need to contain phase information if starting from an existing model.
Choose the MTZ file that will be uploaded from your local machine. If you do not have one available click
on the 'use example data' checkbox.
Step 2 : Set Options and Parameters
Click on submit and you will be taken to a form where options and parameters for the model building can
be set.
If a sequence file is available, this will be used by the model building in a step known as 'sequence
docking'. You can upload a sequence file as a plain text file in
pir format or copy and paste your
sequence when you select the "Paste sequence file into text box" option.
Enter the number of residues in the asymmetric unit. A monomer will have the same number of residues
as the sequence you uploaded.
You can, optionally, select which structure factor and phase labels of the MTZ file to use for model
building. In most cases, the automatically recognised defaults are acceptable.
The dissemination level controls who has access to your data. If 'Confidential' is selected the main input and output files are kept private and automatically deleted after a week. If 'ARPwARP-AutoRickshaw-Refmac developers' is selected the data is only accessible by developers on the ARP/wARP, AutoRickshaw and Refmac development teams for the purposes of improving their software. If 'World' is selected, the data is made available to software developers who request it.
To continue click on the submit button.
Step 3 : View the Running Application
An email will be sent to the email address that you gave in Step 1. It contains a link and a
password needed to enter a password protected area. Enter your email and the password to view
the logs of your running model building job.
Step 4 : Interpreting the Log File
A short log file is displayed during model building and a more detailed log file is available for download. To meet the needs of expert users, and to aid troubleshooting, the long log files produced by ARP/wARP are quite verbose.
However, the logs are human readable! What follows is an overview of the most important parts.
- Checking the estimated content
- Should the solvent content be too high or too low (e.g. you have mis-typed the total number of residues expected in the AU), ARP/wARP will re-set it to approximately 50%. The target number of residues will be reset accordingly.
- Checking the provided sequence file
- Should the sequence length, the number of molecules in the AU and the total number of residues in the AU not match each other, the number of molecules in the AU will be reset accordingly. Should the sequence file not be interpretable (e.g. contain unexpected characters), an error message will be given.
- Input MTZ file
- We have observed that sometimes the MTZ files do not have proper headers, e.g. non-standard space group name or zero space group number. ARP/wARP uses CAD programme to always do a header fix, thus the MTZ file will have an extension .mtz.cad.
- Space group number
- ARP/wARP version 7.2 supports all standard non-centrosymmetric space groups, P1bar and several non-standard space groups (e.g. 1017 or 2017). The space group is figured out solely from the symmetry operators stored in the MTZ file header.
- Input files
- The ASCII files (sequence, input PDB or input file with heavy atoms) are always converted to a Unix line feed, thus they have an extension _lf.
- Checking whether input PDB contains ligands
- This check comes up if the initial model is available. Should the model contain ligands unknown to the Refmac library, they are renamed to the free DUM atoms. This should not affect the model building performance, but the warning is printed.
- R factor after Refmac before model building
- If the initial model is available, a number of restrained refinement cycles with Refmac is carried out until R factor convergence.
- Building cycle zero
- Normally one should expect a considerable part of the structure built already at the starting building cycle. If this is not the case, observe the situation for a few further building cycles. If, however, there is essentially nothing autotraced for further building cycles, please inspect whether the initial phases are sufficiently good.
- Search for helices and strands
- The module for building helical and beta-stranded fragments is invoked if requested or by default with data at 2.7Å resolution or lower. The number of built helical/stranded residues and chain fragments is printed.
- Non-Crystallographic Symmetry (NCS)
- If the resolution is worse than 2.3Å, chains are extended using information from other relevant parts of the structure. Non-Crystallographic symmetry is also used to provide restraints during refinement.
- Rounds within building cycle
- Each cycle of the main chain tracing is carried out in several rounds. Normally each successive round should result in more residues and in fewer fragments. The maximum length of the traced fragment and the score of the model building are also printed for information.
- Chains, residues and estimated correctness of the model
- The output from the best tracing round is processed further. Fragments of 4 residues or shorter are converted to free atoms. In addition, the terminal residues of the fragments are removed. The rest is kept and used to provide restraints for subsequent ARP/REFMAC cycles. The value of the estimated correctness of the model should steadily approach 100% if the model building is successful.
- Residues docked into sequence
- If the sequence is provided, the autotraced chain fragments are docked into it and the side chains are built and refined in real space. The results of this are printed out. If the sequence is not provided, side chain guesses only (GLY/ALA/SER/VAL) are built and refined.
- Loop building
- This is invoked if the sequence is available and if the tracing score is above 0.85. It is also invoked after the last building cycle.
- R factor after Refmac during the iterations
- The value of the R factor typically oscillates. It goes up after each tracing cycle (because the model is entirely rebuilt) and then decreases during the ARP/REFMAC refinement and update cycles. At the end of the procedure it should reach a value typical for a restrained refinement.
- Sequence coverage
- If the sequence is provided, the ratio of the number of docked residues to the total number of traced residues is printed. A value higher than 0.8 is deemed as good convergence. All free atoms are then removed from the file and the task is directed into a few cycles of restrained refinement with solvent search. If, however, the value of sequence coverage is lower than 0.8, the free atoms (DUM) are left in the file. You can inspect the density maps, start changing the model on the graphics or, alternatively, submit another model building task using the output of this job.
- Job termination
- The statement Task completed successfully indicates that the job is finished with no error. An error statement:
QUITTING ... ARP/wARP module stopped with an error message: name_of_the_program
indicates that one of the modules of the task has terminated with an error message. When this happens you will normally be contacted by one of the ARP/wARP developers.