
|
Leading Edge Predictors for Drug Discovery |

|
Download
a Pre-print |
|
CSGenoTox ...Calculation and Prediction |
|
Statististics of the CSGenoTox Predictor |
|
Development of the CSGenotox Predictor The CSGenoTox predictor is based on topological structural descriptors (proprietary and published) and was developed by the use of artificial neural networks. Neural network analysis was applied to select descriptors and then to optimize the relationship between experimental mutagenic index values (MI, 0=negative, 1=positive ) and the values calculated by the CSGenoTox predictor. MI=1 signifies a mutagen, 0 a non-mutagen, as determined by Ames testing and reported as such. The resultant predictor was cross-validated by the leave-group-out method then external validation was performed on a large test set of compounds (new chemical entities) tht wew not used in either descriptor selection of predictor development. The overall accuracy of MI (AMI) is defined as the percentage of correctly predicted MI values divided the total. AMI = (Total MIcorr / Total MIexp) x 100 The sensitivity of MI (MI0) is defined as percentage of correctly predicted non-mutagens divided by total number of non-mutagens in the dataset. MI0 = Total MI(0)corr / Total MI(0)exp The specificity of MI (MI1) is defined as percentage of correctly predicted mutagens divided by total number of mutagens in the dataset. MI1 = Total MI(1)corr / Total MI(1)exp The percent false negatives: MI0(false) = (100 – MI1) The percent false positives: MI1(false) = (100 – MI0). |
|
CSGenotox Training and External Validation Sets An overall dataset set of 3363 compounds was split randomly into a 2963 compound training set for predictor development and a 400 compound external validation set used to asses the accuracy of the final model. The 2963 compound training set contained 290 commercial drugs and there were 39 drugs in the 400 compound external validation set. Though the selection process was random, the balance between mutagens to non-mutagens was maintained in both the training and external validation sets. |
![]() |
|
Cross-Validation of CSGenoTox |
|
Predicted Results from CSGenoTox Cross Validation Cross-validation testing was conducted by setting up a series of 10 cross-validation test sets (VTS), each containing approximately 296 (~10%) of the 2963 compounds. Each VTS contained a set of unique compounds where no compound was used in more than 1 test set and each compound was used exactly once. For each VTS, a new neural network-based QSAR model was developed on the remaining 2667 compounds in the training set and applied to the VTS to predict MI. The process was repeated 10 times, once for each for each VTS. The 10-fold cross-validation prediction gave the following results: AMI = 89% (overall accuracy) MI(0) = 94% (accuracy for non-mutagens) MI(1) = 86% (accuracy for mutagens) MI0(false) = 8% (percentage for false negatives) MI1(false) = 3% (percentage of false positives) |
|
External Validation of CSGenoTox |
|
Predicted Results from CSGenoTox External Validation Validation of the CSGenoTox predictor involved the use of 400 unique compounds (NCEs, new chemical entities) not used in model building randomly selected from the initial dataset of 3363 compounds. The average MW was 237 for the NCEs with the preponderance of compounds containing aromatic, heteroaromatic ring systems as well as amine, nitro, epoxy, and amide groups |
| • | 39 commercial drugs. |
| • | 159 non-mutagenic NCE's |
| • | 241 mutagenic NCE's |
| • | 31 miscelaneous compounds from various literature sources |
|
(1) RTECS (US Government) (2) Handbook of Carcinogenic Potency and Genotoxicity Databases, L.S. Gold and E. Zeiger (CRC Press, 1996) |
|
The composition of the validation set was a 60/40 percent split between mutagens and non-mutagens which was the same split in the overall 3363 compound dataset. 39 commercial drugs were present, four of which were mutagenic (positive Amestest). |
|
External validation on 338 compounds gave the following results: AMI = 84% (overall accuracy) MI(0) = 87% (accuracy for non-mutagens) MI(1) = 82% (accuracy for mutagens) MI0(false) = 11% (percentage for false negatives) MI1(false) = 5% (percentage of false positives) These are results were excellent as seen below in the chart and ROC results given below. The most significant finding is that CSGenoTox gave a low percentage of false positives and negatives, which is evidence of its robustness for this diverse set of NCEs. The number of commercial drugs in the validation set was 39. Of the 4 the mutagenic drugs, CSGenoTox identified 3 correctly (MI1=75%), whereas MI0 =100% for the non-mutagenic drugs. These are results are excellent even though the vast majority of entities are mutagens. |
![]() |
![]() |
CSGenoTox Receiver Operator Curve |
|
ROC (Receiver Operator Curve) is measure of sensitivity to predict true vs false positives in some confidence interval. The technique was applied to 400 Validation set. It can be seen below that area under the curve was 0.925 from results with CSGenoTox on 400 compounds. This represents a 95% confidence interval. |
![]() ![]() |
![]() |
|
CSGenoTox Representative Compounds |
|
Compounds from the CSGenoTox External Validation Test Set Follow the link below to a set of 30 representative compounds of the 400 used in external validation testing of CSGenoTox. Each structure is given along with a comparison of experimental and predicted MI values. |
| search | |
| links | |
| user login | |
| contact us | |
|
To contact us: |
![]() |
Phone: 978-501-0633 Fax: 781-275-5197 Email: sales@chemsilico.com |
Copyright © 2003 ChemSilico LLC All Rights Reserved Terms and Conditions of Use | Privacy Policy ChemSilico is a registered trademark of ChemSilico LLC, Tewksbury, MA 01876 |