Pyrimidine Data

The pyrimidine data set was studied by Hirst et al (1994). They used neural networks and inductive logic programming to model the quantitative structure-activity relationships (QSAR) of the inhibition of dihydrofolate reductase (DHFR) by pyrimidines. QSAR is the process of relating physicochemical and/or structural properties with some known biological or chemical process. That is, activity = f(physicochemical and structural properties). The actual data set contains structural information on 74 2,4-diamino- 5-(substituted benzyl) pyrimidines used as inhibitors of DHFR in E. coli. There are 3 positions where chemical activity occurs and 9 attributes per position leading to 27 total predictors. Predictor 25 had no variability and was removed from the data set.

Data set with sas code to access.

Below is a description of the the variables used in the study. The size of a substituent is the number of carbon, nitrogen, and oxygen atoms that it contains.


Variable Name       Description
X1 p1_polar         Position 1 polarity
X2 p1_size          Position 1 number of carbon, nitrogen, and oxygen atoms
X3 p1_flex          Position 1 flexibility
X4 p1_h_donor       Position 1 number of hydrogen-bond donors
X5 p1_h_acceptor    Position 1 number of hydrogen-bond acceptors
X6 p1_pi_donor      Position 1 presence and strength of pi-donors
X7 p1_pi_acceptor   Position 1 presence and strength of pi-acceptors
X8 p1_polarizable   Position 1 polarizability
X9 p1_sigma         Position 1 sigma-effect
X10 p2_polar        Position 2 polarity
X11 p2_size         Position 2 number of carbon, nitrogen, and oxygen atoms
X12 p2_flex         Position 2 flexibility
X13 p2_h_donor      Position 2 number of hydrogen-bond donors
X14 p2_h_acceptor   Position 2 number of hydrogen-bond acceptors
X15 p2_pi_donor     Position 2 presence and strength of pi-donors
X16 p2_pi_acceptor  Position 2 presence and strength of pi-acceptors
X17 p2_polarizable  Position 2 polarizability
X18 p2_sigma        Position 2 sigma-effect
X19 p3_polar        Position 3 polarity
X20 p3_size         Position 3 number of carbon, nitrogen, and oxygen atoms
X21 p3_flex         Position 3 flexibility
X22 p3_h_donor      Position 3 number of hydrogen-bond donors
X23 p3_h_acceptor   Position 3 number of hydrogen-bond acceptors
X24 p3_pi_donor     Position 3 presence and strength of pi-donors
X25 p3_pi_acceptor  Position 3 presence and strength of pi-acceptors
X26 p3_polarizable  Position 3 polarizability
X27 p3_sigma        Position 3 sigma-effect
Y   activity        log 1/Ki, where Ki is the inhibition constant as experimentally assayed