DipCheck is a validation tool for protein backbone geometry, developed by Joana Pereira and Victor Lamzin, EMBL Hamburg.

The tool uses a Euclidean 3D space (DipSpace) of the orthogonal descriptors of the geometry of a 5-atom dipeptide unit:

The DipSpace database contains 1,024,000 data points derived from the selected set of 1,300,000 dipeptide fragments from the well-refined structures deposited in the PDB.

DipCheck classifies the geometry of the middle, CA(i) atom in four categories:

Favoured region 98.00% of the set, DipScore above 0.243
Allowed region 99.80% of the set, DipScore between 0.243 and 0.033
Generously allowed region 99.95% of the set, DipScore between 0.033 and 0.010
Disallowed region the remaining 0.05%, DipScore below 0.010

DipCheck also classifies the overall geometry of a protein model, according to its DipScore distribution, in four categories:

Favoured model 98.00% of a random set of protein models, Chi-score above -2.15
Allowed model 99.80% of a random set of protein models, Chi-score between -2.97 and -2.15
Generously allowed model 99.95% of a random set of protein models, Chi-score between -3.38 and -2.97
Outlier model the remaining 0.05%, Chi-score below -3.38

The output of the dipcheck version as of 06.05.2016 provides the following:

  1. The value of the DipScore for each residue and its annotation to the region.
  2. The number of CA atoms contained in the input file and the number of CA atoms evaluated. Dipeptide units containing atoms with partial occupancies are ignored.
  3. Summary table of the number of residues in each of the four regions.
  4. The first four central moments of the DipScore distribution with Z-scores.
  5. The overall Chi-score. This is the most important 'single-number' result.
    • The Chi-score is similar to a conventional Z-score, but its distribution is not a Gaussian, but resembles a Rayleigh distribution. In addition, the Chi-score has a sign. Negative sign indicates that the structure is worse than average, while positive overall Chi-score corresponds to the structure better than the average.
    • For good structures a value of the overall Chi-score below -3.38 should statistically occur once for 2000 structures.
    • Therefore, a structure with the overall Chi-score below -3.38 can be deemed an outlier and is worth inspecting.
    • A percentile is also printed for the whole structure. If this value is higher than 2.0 - the model is regarded as favoured. If it is lower, but higher than 0.2 - the model is 'allowed'. Below this but higher than 0.05 - 'generously allowed'. Otherewise 'disallowed' or an outlier.
    • The overall Chi-score is an average indicator of the whole model. With all other conditions being equal, it will be the same for a 100-residue model with 1 outlying CA and a 1000-residue model with 10 outliers.