DipCheck is a validation tool for protein backbone geometry, developed by Joana Pereira and Victor Lamzin, EMBL Hamburg.
The tool uses a Euclidean 3D space (DipSpace) of the orthogonal descriptors
of the geometry of a 5-atom dipeptide unit:
The DipSpace database contains 1,024,000 data points derived from the selected set of 1,300,000 dipeptide fragments from the well-refined structures deposited in the PDB.
DipCheck classifies the geometry of the middle, CA(i) atom in four categories:
|Favoured region||98.00% of the set, DipScore above 0.243|
|Allowed region||99.80% of the set, DipScore between 0.243 and 0.033|
|Generously allowed region||99.95% of the set, DipScore between 0.033 and 0.010|
|Disallowed region||the remaining 0.05%, DipScore below 0.010|
DipCheck also classifies the overall geometry of a protein model, according to its DipScore distribution, in four categories:
|Favoured model||98.00% of a random set of protein models, Chi-score above -2.15|
|Allowed model||99.80% of a random set of protein models, Chi-score between -2.97 and -2.15|
|Generously allowed model||99.95% of a random set of protein models, Chi-score between -3.38 and -2.97|
|Outlier model||the remaining 0.05%, Chi-score below -3.38|
The output of the dipcheck version as of 06.05.2016 provides the following:
- The value of the DipScore for each residue and its annotation to the region.
- The number of CA atoms contained in the input file and the number of CA atoms evaluated. Dipeptide units containing atoms with partial occupancies are ignored.
- Summary table of the number of residues in each of the four regions.
- The first four central moments of the DipScore distribution with Z-scores.
- The overall Chi-score. This is the most important 'single-number' result.
- The Chi-score is similar to a conventional Z-score, but its distribution is not a Gaussian, but resembles a Rayleigh distribution. In addition, the Chi-score has a sign. Negative sign indicates that the structure is worse than average, while positive overall Chi-score corresponds to the structure better than the average.
- For good structures a value of the overall Chi-score below -3.38 should statistically occur once for 2000 structures.
- Therefore, a structure with the overall Chi-score below -3.38 can be deemed an outlier and is worth inspecting.
- A percentile is also printed for the whole structure. If this value is higher than 2.0 - the model is regarded as favoured. If it is lower, but higher than 0.2 - the model is 'allowed'. Below this but higher than 0.05 - 'generously allowed'. Otherewise 'disallowed' or an outlier.
- The overall Chi-score is an average indicator of the whole model. With all other conditions being equal, it will be the same for a 100-residue model with 1 outlying CA and a 1000-residue model with 10 outliers.