Veterinary Epidemiologic Research: GLM (part 4) – Exact and Conditional Logistic Regressions

Next topic on logistic regression: the exact and the conditional logistic regressions.

Exact logistic regression
When the dataset is very small or severely unbalanced, maximum likelihood estimates of coefficients may be biased. An alternative is to use exact logistic regression, available in R with the elrm package. Its syntax is based on an events/trials formulation. The dataset has to be collapsed into a data frame with unique combinations of predictors.
Another possibility is to use robust standard errors, and get comparable p-values to those obtained with exact logistic regression.

### exact logistic regression
x <- xtabs(~ casecont + interaction(dneo, dclox), data = nocardia)
x
        interaction(dneo, dclox)
casecont 0.0 1.0 0.1 1.1
       0  20  15   9  10
       1   2  44   3   5
> nocardia.coll <- data.frame(dneo = rep(1:0, 2), dclox = rep(1:0, each = 2),
+                    casecont = x[1, ], ntrials = colSums(x))
nocardia.coll
    dneo dclox casecont ntrials
0.0    1     1       20      22
1.0    0     1       15      59
0.1    1     0        9      12
1.1    0     0       10      15

library(elrm)
Le chargement a nécessité le package : coda
Le chargement a nécessité le package : lattice
mod5 <- elrm(formula = casecont/ntrials ~ dneo,
             interest = ~dneo,
             iter = 100000, dataset = nocardia.coll, burnIn = 2000)

### robust SE
library(robust)
Le chargement a nécessité le package : fit.models
Le chargement a nécessité le package : MASS
Le chargement a nécessité le package : robustbase
Le chargement a nécessité le package : rrcov
Le chargement a nécessité le package : pcaPP
Le chargement a nécessité le package : mvtnorm
Scalable Robust Estimators with High Breakdown Point (version 1.3-02)

mod6 <- glmrob(casecont ~ dcpct + dneo + dclox + dneo*dclox,
+                family = binomial, data = nocardia, method= "Mqle",
+                control= glmrobMqle.control(tcc=1.2))
> summary(mod6)

Call:  glmrob(formula = casecont ~ dcpct + dneo + dclox + dneo * dclox,      family = binomial, data = nocardia, method = "Mqle", control = glmrobMqle.control(tcc = 1.2)) 


Coefficients:
             Estimate Std. Error z-value Pr(>|z|)    
(Intercept) -4.440253   1.239138  -3.583 0.000339 ***
dcpct        0.025947   0.008504   3.051 0.002279 ** 
dneo         3.604941   1.034714   3.484 0.000494 ***
dclox        0.713411   1.193426   0.598 0.549984    
dneo:dclox  -2.935345   1.367212  -2.147 0.031797 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
Robustness weights w.r * w.x: 
 89 weights are ~= 1. The remaining 19 ones are summarized as
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.1484  0.4979  0.6813  0.6558  0.8764  0.9525 

Number of observations: 108 
Fitted by method ‘Mqle’  (in 5 iterations)

(Dispersion parameter for binomial family taken to be 1)

No deviance values available 
Algorithmic parameters: 
   acc    tcc 
0.0001 1.2000 
maxit 
   50 
test.acc 
  "coef" 

Conditional logistic regression

Matched case-control studies analyzed with unconditional logistic regression model produce estimates of the odds ratios that are the square of their true value. But we can use conditional logistic regression to analyze matched case-control studies and get correct estimates. Instead of estimating a parameter for each matched set, a conditional model conditions the fixed effects out of the estimation. It can be run in R with clogit from the survival package:

temp <- tempfile()
> download.file(
+   "http://ic.upei.ca/ver/sites/ic.upei.ca.ver/files/ver2_data_R.zip", temp)
essai de l'URL 'http://ic.upei.ca/ver/sites/ic.upei.ca.ver/files/ver2_data_R.zip'
Content type 'application/zip' length 1107873 bytes (1.1 Mb)
URL ouverte
==================================================
downloaded 1.1 Mb

load(unz(temp, "ver2_data_R/sal_outbrk.rdata"))
unlink(temp)
### Salmonella outbreak dataset

library(survival)
mod7 <- clogit(casecontrol ~ slt_a + strata(match_grp), data = sal_outbrk)
summary(mod7)

Call:
coxph(formula = Surv(rep(1, 112L), casecontrol) ~ slt_a + strata(match_grp), 
    data = sal_outbrk, method = "exact")

  n= 112, number of events= 39 

        coef exp(coef) se(coef)     z Pr(>|z|)   
slt_a 1.4852    4.4159   0.5181 2.867  0.00415 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

      exp(coef) exp(-coef) lower .95 upper .95
slt_a     4.416     0.2265       1.6     12.19

Rsquare= 0.085   (max possible= 0.518 )
Likelihood ratio test= 10  on 1 df,   p=0.001568
Wald test            = 8.22  on 1 df,   p=0.004148
Score (logrank) test = 9.48  on 1 df,   p=0.002075

Veterinary Epidemiologic Research: GLM – Evaluating Logistic Regression Models (part 3)

Third part on logistic regression (first here, second here).
Two steps in assessing the fit of the model: first is to determine if the model fits using summary measures of goodness of fit or by assessing the predictive ability of the model; second is to deterime if there’s any observations that do not fit the model or that have an influence on the model.

Covariate pattern
A covariate pattern is a unique combination of values of predictor variables.

mod3 <- glm(casecont ~ dcpct + dneo + dclox + dneo*dclox,
+             family = binomial("logit"), data = nocardia)
summary(mod3)

Call:
glm(formula = casecont ~ dcpct + dneo + dclox + dneo * dclox, 
    family = binomial("logit"), data = nocardia)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.9191  -0.7682   0.1874   0.5876   2.6755  

Coefficients:
                  Estimate Std. Error z value Pr(>|z|)    
(Intercept)      -3.776896   0.993251  -3.803 0.000143 ***
dcpct             0.022618   0.007723   2.928 0.003406 ** 
dneoYes           3.184002   0.837199   3.803 0.000143 ***
dcloxYes          0.445705   1.026026   0.434 0.663999    
dneoYes:dcloxYes -2.551997   1.205075  -2.118 0.034200 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 149.72  on 107  degrees of freedom
Residual deviance: 103.42  on 103  degrees of freedom
AIC: 113.42

Number of Fisher Scoring iterations: 5

library(epiR)
Package epiR 0.9-45 is loaded
Type help(epi.about) for summary information

mod3.mf <- model.frame(mod3)
(mod3.cp <- epi.cp(mod3.mf[-1]))
$cov.pattern
    id  n dcpct dneo dclox
1    1  7     0   No    No
2    2 38   100  Yes    No
3    3  1    25   No    No
4    4  1     1   No    No
5    5 11   100   No   Yes
8    6  1    25  Yes   Yes
10   7  1    14  Yes    No
12   8  4    75  Yes    No
13   9  1    90  Yes   Yes
14  10  1    30   No    No
15  11  3     5  Yes    No
17  12  9   100  Yes   Yes
22  13  2    20  Yes    No
23  14  8   100   No    No
25  15  2    50  Yes   Yes
26  16  1     7   No    No
27  17  4    50  Yes    No
28  18  1    50   No    No
31  19  1    30  Yes    No
34  20  1    99   No    No
35  21  1    99  Yes   Yes
40  22  1    80  Yes   Yes
48  23  1     3  Yes    No
59  24  1     1  Yes    No
77  25  1    10   No    No
84  26  1    83   No   Yes
85  27  1    95  Yes    No
88  28  1    99  Yes    No
89  29  1    25  Yes    No
105 30  1    40  Yes    No

$id
  [1]  1  2  3  4  5  1  1  6  5  7  5  8  9 10 11 11 12  1 12  1  5 13 14  2 15
 [26] 16 17 18  1  2 19  2 14 20 21 12 14  5  8 22 14  5  5  5  1 14 14 23  2 12
 [51] 14 12 11  5 15  2  8  2 24  2  2  8  2 17  2  2  2  2 12 12 12  2  2  2  5
 [76]  2 25  2 17  2  2  2  2 26 27 13 17 28 29 14  2 12  2  2  2  2  2  2  2  2
[101]  2  2  2  2 30  2  2  5

There are 30 covariate patterns in the dataset. The pattern dcpct=100, dneo=Yes, dclox=No appears 38 times.

Pearson and deviance residuals
Residuals represent the difference between the data and the model. The Pearson residuals are comparable to standardized residuals used for linear regression models. Deviance residuals represent the contribution of each observation to the overall deviance.

residuals(mod3) # deviance residuals
residuals(mod3, "pearson") # pearson residuals

Goodness-of-fit test
All goodness-of-fit tests are based on the premise that the data will be divided into subsets and within each subset the predicted number of outcomes will be computed and compared to the observed number of outcomes. The Pearson \chi^2 and the deviance \chi^2 are based on dividing the data up into the natural covariate patterns. The Hosmer-Lemeshow test is based on a more arbitrary division of the data.

The Pearson \chi^2 is similar to the residual sum of squares used in linear models. It will be close in size to the deviance, but the model is fit to minimize the deviance and not the Pearson \chi^2 . It is thus possible even if unlikely that the \chi^2 could increase as a predictor is added to the model.

sum(residuals(mod3, type = "pearson")^2)
[1] 123.9656
deviance(mod3)
[1] 103.4168
1 - pchisq(deviance(mod3), df.residual(mod3))
[1] 0.4699251

The p-value is large indicating no evidence of lack of fit. However, when using the deviance statistic to assess the goodness-of-fit for a nonsaturated logistic model, the \chi^2 approximation for the likelihood ratio test is questionable. When the covariate pattern is almost as large as N, the deviance cannot be assumed to have a \chi^2 distribution.
Now the Hosmer-Lemeshow test, usually dividing by 10 the data:

hosmerlem <- function (y, yhat, g = 10) {
+   cutyhat <- cut(yhat, breaks = quantile(yhat, probs = seq(0, 1, 1/g)),
+                  include.lowest = TRUE)
+   obs <- xtabs(cbind(1 - y, y) ~ cutyhat)
+   expect <- xtabs(cbind(1 - yhat, yhat) ~ cutyhat)
+   chisq <- sum((obs - expect)^2 / expect)
+   P <- 1 - pchisq(chisq, g - 2)
+   c("X^2" = chisq, Df = g - 2, "P(>Chi)" = P)
+ }
hosmerlem(y = nocardia$casecont, yhat = fitted(mod3))
Erreur dans cut.default(yhat, breaks = quantile(yhat, probs = seq(0, 1, 1/g)),  (depuis #2) :
  'breaks' are not unique

The model used has many ties in its predicted probabilities (too few covariate values?) resulting in an error when running the Hosmer-Lemeshow test. Using fewer cut-points (g = 5 or 7) does not solve the problem. This is a typical example when not to use this test. A better goodness-of-fit test than Hosmer-Lemeshow and Pearson / deviance \chi^2 tests is the le Cessie – van Houwelingen – Copas – Hosmer unweighted sum of squares test for global goodness of fit (also here) implemented in the rms package (but you have to implement your model with the lrm function of this package):

mod3b <- lrm(casecont ~ dcpct + dneo + dclox + dneo*dclox, nocardia,
+              method = "lrm.fit", model = TRUE, x = TRUE, y = TRUE,
+              linear.predictors = TRUE, se.fit = FALSE)
residuals(mod3b, type = "gof")
Sum of squared errors     Expected value|H0                    SD 
           16.4288056            16.8235055             0.2775694 
                    Z                     P 
           -1.4219860             0.1550303 

The p-value is 0.16 so there’s no evidence the model is incorrect. Even better than these tests would be to check for linearity of the predictors.

Overdispersion
Sometimes we can get a deviance that is much larger than expected if the model was correct. It can be due to the presence of outliers, sparse data or clustering of data. The approach to deal with overdispersion is to add a dispersion parameter \sigma^2 . It can be estimated with: \hat{\sigma}^2 = \frac{\chi^2}{n - p} (p = probability of success). A half-normal plot of the residuals can help checking for outliers:

library(faraway)
halfnorm(residuals(mod1))
Half-normal plot of the residuals
Half-normal plot of the residuals

The dispesion parameter of model 1 can be found as:

(sigma2 <- sum(residuals(mod1, type = "pearson")^2) / 104)
[1] 1.128778
drop1(mod1, scale = sigma2, test = "F")
Single term deletions

Model:
casecont ~ dcpct + dneo + dclox

scale:  1.128778 

       Df Deviance    AIC F value    Pr(>F)    
<none>      107.99 115.99                      
dcpct   1   119.34 124.05 10.9350  0.001296 ** 
dneo    1   125.86 129.82 17.2166 6.834e-05 ***
dclox   1   114.73 119.96  6.4931  0.012291 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
Message d'avis :
In drop1.glm(mod1, scale = sigma2, test = "F") :
  le test F implique une famille 'quasibinomial'

The dispersion parameter is not very different than one (no dispersion). If dispersion was present, you could use it in the F-tests for the predictors, adding scale to drop1.

Predictive ability of the model
A ROC curve can be drawn:

predicted <- predict(mod3)
library(ROCR)
prob <- prediction(predicted, nocardia$casecont, 
+                    label.ordering = c('Control', 'Case'))
tprfpr <- performance(prob, "tpr", "fpr")
tpr <- unlist(slot(tprfpr, "y.values"))
fpr <- unlist(slot(tprfpr, "x.values"))
roc <- data.frame(tpr, fpr)
ggplot(roc) + geom_line(aes(x = fpr, y = tpr)) + 
+   geom_abline(intercept = 0, slope = 1, colour = "gray") + 
+   ylab("Sensitivity") + 
+   xlab("1 - Specificity")
ROC curve
ROC curve

Identifying important observations
Like for linear regression, large positive or negative standardized residuals allow to identify points which are not well fit by the model. A plot of Pearson residuals as a function of the logit for model 1 is drawn here, with bubbles relative to size of the covariate pattern. The plot should be an horizontal band with observations between -3 and +3. Covariate patterns 25 and 26 are problematic.

nocardia$casecont.num <- as.numeric(nocardia$casecont) - 1
mod1 <- glm(casecont.num ~ dcpct + dneo + dclox, family = binomial("logit"),
+             data = nocardia) # "logit" can be omitted as it is the default
mod1.mf <- model.frame(mod1)
mod1.cp <- epi.cp(mod1.mf[-1])
nocardia.cp <- as.data.frame(cbind(cpid = mod1.cp$id,
+                                    nocardia[ , c(1, 9:11, 13)],
+                                    fit = fitted(mod1)))
### Residuals and delta betas based on covariate pattern:
mod1.obs <- as.vector(by(as.numeric(nocardia.cp$casecont.num),
+                          as.factor(nocardia.cp$cpid), FUN = sum))
mod1.fit <- as.vector(by(nocardia.cp$fit, as.factor(nocardia.cp$cpid),
+                          FUN = min))
mod1.res <- epi.cpresids(obs = mod1.obs, fit = mod1.fit,
+                          covpattern = mod1.cp)

mod1.lodds <- as.vector(by(predict(mod1), as.factor(nocardia.cp$cpid),
+                            FUN = min))

plot(mod1.lodds, mod1.res$spearson,
+      type = "n", ylab = "Pearson Residuals", xlab = "Logit")
text(mod1.lodds, mod1.res$spearson, labels = mod1.res$cpid, cex = 0.8)
symbols(mod1.lodds, mod1.res$pearson, circles = mod1.res$n, add = TRUE)
Bubble plot of standardized residuals
Bubble plot of standardized residuals

The hat matrix is used to calculate leverage values and other diagnostic parameters. Leverage measures the potential impact of an observation. Points with high leverage have a potential impact. Covariate patterns 2, 14, 12 and 5 have the largest leverage values.

mod1.res[sort.list(mod1.res$leverage, decreasing = TRUE), ]
cpid  leverage
2   0.74708052
14  0.54693851
12  0.54017700
5   0.42682684
11  0.21749664
1   0.19129427
...

Delta-betas provides an overall estimate of the effect of the j^{th} covariate pattern on the regression coefficients. It is analogous to Cook’s distance in linear regression. Covariate pattern 2 has the largest delta-beta (and represents 38 observations).

mod1.res[sort.list(mod1.res$sdeltabeta, decreasing = TRUE), ]
cpid sdeltabeta
2    7.890878470
14   3.983840529
...

Veterinary Epidemiologic Research: GLM – Logistic Regression (part 2)

Second part on logistic regression (first one here).
We used in the previous post a likelihood ratio test to compare a full and null model. The same can be done to compare a full and nested model to test the contribution of any subset of parameters:

mod1.nest <- glm(casecont ~ dcpct, family = binomial("logit"),
+             data = nocardia)
lr.mod1.nest <- -(deviance(mod1.nest) / 2)
(lr <- 2 * (lr.mod1 - lr.mod1.nest))
[1] 30.16203
1 - pchisq(lr, 2)
[1] 2.820974e-07
### or, more straightforward, using anova to compare the 2 models:
anova(mod1, mod1.nest, test = "Chisq")
Analysis of Deviance Table

Model 1: casecont ~ dcpct + dneo + dclox
Model 2: casecont ~ dcpct
  Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
1       104     107.99                          
2       106     138.15 -2  -30.162 2.821e-07 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Interpretation of coefficients

### example 16.2
nocardia$dbarn <- as.factor(nocardia$dbarn)
mod2 <- glm(casecont ~ dcpct + dneo + dclox + dbarn,
+               family = binomial("logit"), data = nocardia)
(mod2.sum <- summary(mod2))

Call:
glm(formula = casecont ~ dcpct + dneo + dclox + dbarn, family = binomial("logit"), 
    data = nocardia)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.6949  -0.7853   0.1021   0.7692   2.6801  

Coefficients:
               Estimate Std. Error z value Pr(>|z|)    
(Intercept)   -2.445696   0.854328  -2.863  0.00420 ** 
dcpct          0.021604   0.007657   2.821  0.00478 ** 
dneoYes        2.685280   0.677273   3.965 7.34e-05 ***
dcloxYes      -1.235266   0.580976  -2.126  0.03349 *  
dbarntiestall -1.333732   0.631925  -2.111  0.03481 *  
dbarnother    -0.218350   1.154293  -0.189  0.84996    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 149.72  on 107  degrees of freedom
Residual deviance: 102.32  on 102  degrees of freedom
AIC: 114.32

Number of Fisher Scoring iterations: 5

cbind(exp(coef(mod2)), exp(confint(mod2)))
Waiting for profiling to be done...
                               2.5 %     97.5 %
(Intercept)    0.08666577 0.01410982  0.4105607
dcpct          1.02183934 1.00731552  1.0383941
dneoYes       14.66230075 4.33039752 64.5869271
dcloxYes       0.29075729 0.08934565  0.8889877
dbarntiestall  0.26349219 0.06729031  0.8468235
dbarnother     0.80384385 0.08168466  8.2851189

Note: Dohoo do not report the profile likelihood-based confidence interval on page 404. As said in the previous post, the profile likelihood-based CI is preferable due to the Hauck-Donner effect (overestimation of the SE).

Using neomycin in the herd increases the log odds of Nocardia mastitis by 2.7 units (or using neomycin increases the odds 14.7 times). Since Nocardia mastitis is a rare condition, we can interpret the odds ratio as a risk ratio (RR) and say neomycin increases the risk of Nocardia mastitis by 15 times. Changing the percentage of dry cows treated from 50 to 75% increases the log odds of disease by (75 - 50) \times 0.022 = 0.55 units (or 1.022^{(75-50)} = 1.72 ). An increase of 25% in the percentage of cows dry-treated increases the risk of disease by about 72% (i.e. 1.72 times). Tiestall barns and other barn types both have lower risks of Nocardia mastitis than freestall barns.

The significance of the main effects can be tested with:

drop1(mod2, test = "Chisq")
Single term deletions

Model:
casecont ~ dcpct + dneo + dclox + as.factor(dbarn)
                 Df Deviance    AIC     LRT  Pr(>Chi)    
<none>                102.32 114.32                      
dcpct             1   111.36 121.36  9.0388  0.002643 ** 
dneo              1   123.91 133.91 21.5939 3.369e-06 ***
dclox             1   107.02 117.02  4.7021  0.030126 *  
as.factor(dbarn)  2   107.99 115.99  5.6707  0.058697 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

The drop1 function tests each predictor relative to the full model.

Presenting effects of factors on the probability scale
Usually we think about the probability of disease rather than the odds, and the probability of disease is not linearly related to the factor of interest. A unit increase in the factor does not increase the probability of disease by a fixed amount. It depends on the level of the factor and the levels of other factors in the model.

mod1 <- glm(casecont ~ dcpct + dneo + dclox, family = binomial("logit"),
+             data = nocardia)
nocardia$neo.pred <- predict(mod1, type = "response", se.fit = FALSE)
library(ggplot2)
ggplot(nocardia, aes(x = dcpct, y = neo.pred, group = dneo,
+                      colour = factor(dneo))) + 
+   stat_smooth(method = "glm", family = "binomial", se = FALSE) +
+   labs(colour = "Neomycin", x = "Percent of cows dry treated",
+        y = "Probability of Nocardia")
Effect of dry-cow treatment
Effect of dry-cow treatment

Next: evaluating logistic regression models.

Veterinary Epidemiologic Research: GLM – Logistic Regression

We continue to explore the book Veterinary Epidemiologic Research and today we’ll have a look at generalized linear models (GLM), specifically the logistic regression (chapter 16). In veterinary epidemiology, often the outcome is dichotomous (yes/no), representing the presence or absence of disease or mortality. We code 1 for the presence of the outcome and 0 for its absence. This way, the mean of the outcome represents the proportion of subjects with the event, and its expectation \pi_i represents the probability for the event to occur. By definition, the expectation must lie in the range 0 to 1. We can’t use a linear regression to analyse these data because:

  • the error terms (\epsilon ) from this model will not be normally distributed. They can only take two values,
  • the probability of the outcome occurring, \pi = p(Y = 1) , follows a binomial distribution and depends on the value of the predictor variables, X . Since the variance of a binomial distribution is a function of the probability, the error variance will also vary with the level of X and as such the assumption of homoscedasticity will be violated,
  • the mean responses responses are constrained to lie within 0 and 1. With a linear regression model, the predicted values might fall outside of this range.

One way of getting around the problems described above is to use a suitable transformation, the logistic or logit function of \pi and model this as a linear function of a set of predictor variables. This leads to the model:

logit(\pi) = ln(\frac{p}{1 - p}) = \beta_0 + \beta_1x_1 + \ldots + \beta_jx_j

curve(log(x / (1 - x)), 0, 1)
Logit
Logit

The logit transform is ln(\frac{p}{1 - p}) . The logit of a probability is the log of the odds of the response taking the value 1. Then the above equation can be rewritten as

\pi(x_1, x_2, \ldots, x_j) = \frac{exp(\beta_0 + \beta_1x_1 + \ldots + \beta_jx_j)}{1 + exp(\beta_0 + \beta_1x_1 + \ldots + \beta_jx_j)} .

While the logit function can take any real value, the associated probability always lie within the bounds of 0 and 1. Within a logistic regression model, the parameter \beta_j associated with the explanatory variable x_j is such that exp(\beta_j) is the odds that the response variable takes the value 1 when x_j increases by 1, conditional on the other explanatory variables remaining constant.

As with the linear regression, the logistic regression model involves a linear combination of explanatory variables, even if the binary response is not modelled directly but through a logistic transformation. The expression of predictors on the response from a linear combination of predictors can be extended to models for other types of response than continous or binary and define the wider class of generalized linear models described by Nelder and Wedderburn. GLMs have 3 main features:

  • an error distribution giving the distribution of the response around its mean. For ANOVA and linear regression it is the normal, for logistic regression it is the binomial. These distributons come from the same exponential family of probability distributions,
  • a link function: how the linear function of the explanatory variables is related to the expected value of the response. For logistic regression it is the logit function (but can also be probit (\Phi^{-1}(p) where \Phi^{-1} is the inverse normal cumulative distribution function) or complementary log-log (ln(-ln(1 - p) ) ); for analysis of variance and regression it is the identity function,
  • the variance function: how the variance of the response variable depends on the mean.

The first example is from the Nocardia dataset: a case-control study of Nocardia spp. mastitis, from 54 case herds and 54 control herds, with the predictors related to the management of the cows during the dry period.

temp <- tempfile()
download.file(
"http://ic.upei.ca/ver/sites/ic.upei.ca.ver/files/ver2_data_R.zip", 
temp)
load(unz(temp, "ver2_data_R/nocardia.rdata"))
unlink(temp)

library(Hmisc)
nocardia <- upData(nocardia, 
            labels = c(id = 'Herd identification number',
                       casecont = 'Case/control status of herd',
                       numcow = 'Number of cows milked',
                  prod = 'Average milk production of the herd',
bscc = 'Average bulk tank SCC over the first 6 months of 1988',
                       dbarn = 'Type of barn dry cows kept in',
                  dout = 'Type of outdoor area used for dry cows',
dcprep = 'Method of teat end preparation prior to dry cow therapy administration',
dcpct = 'Percent of dry cows treated with dry-cow therapy',
dneo = 'Dry-cow product containing neomycin used on farm in last year',
dclox = 'Dry cow product containing cloxacillin used on farm in last year',
doth = 'Other dry cow products used on farm in last year'),
      levels = list(casecont = list('Control' = 0, 'Case' = 1),
      dbarn = list('freestall' = 1, 'tiestall' = 2, 'other' = 3),
dout = list('pasture' = 1, 'yard/drylot' = 2, 'none' = 3, 'other' = 4),
dcprep = list('no prep.' = 1, 'washed only' = 2, 'washed and disinfected' = 3, 'dry cow therapy not used' = 4),
       dneo = list('No' = 0, 'Yes' = 1), 
       dclox = list('No' = 0, 'Yes' = 1)),
                 units = c(prod = "kg/cow/day", 
                           bscc = "'000s of cells/ml",
                           dcpct = "%"))

Example 16.1 is a logistic regression with 3 predictor variables. We can first have a look at a conditional density plot of the response variable given the proportion of dry cows treated with dry cow therapy.

cdplot(as.factor(casecont) ~ dcpct, data = nocardia)
Conditional density plot of case/control status of the herd given proportion of cows receiving dry cow therapy
Conditional density plot of case/control status of the herd given proportion of cows receiving dry cow therapy

The logistic regression model is:

mod1 <- glm(casecont ~ dcpct + dneo + dclox, 
             family = binomial("logit"),
             data = nocardia) # "logit" can be omitted as it is the default
(mod1.sum <- summary(mod1))

Call:
glm(formula = casecont ~ dcpct + dneo + dclox, family = binomial("logit"), 
    data = nocardia)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.8425  -0.8915   0.1610   0.6361   2.3745  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) -2.984272   0.772196  -3.865 0.000111 ***
dcpct        0.022668   0.007187   3.154 0.001610 ** 
dneoYes      2.212564   0.578012   3.828 0.000129 ***
dcloxYes    -1.412516   0.557197  -2.535 0.011243 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 149.72  on 107  degrees of freedom
Residual deviance: 107.99  on 104  degrees of freedom
AIC: 115.99

Number of Fisher Scoring iterations: 4

c(0.022668-1.96*0.007187, 0.022668+1.96*0.007187)
[1] 0.00858148 0.03675452
### or can also do:
confint.default(mod1, parm = "dcpct")
           2.5 %     97.5 %
dcpct 0.00858227 0.03675422

All variables are significant at the 5% level. An increase of one unit in the proportion of cows receiving dry cow therapy increases the log-odds to be a case herd by 0.02. The confidence interval can be constructed using normal approximation for the parameter estimate: \hat{\beta}_i \pm z^{\alpha/2}SE(\hat{\beta}_i) . The 95% confidence interval is (0.009, 0.037). But it is better to construct a profile likelihood-based confidence interval (from MASS package):

confint(mod1, parm = "dcpct")
Waiting for profiling to be done...
      2.5 %      97.5 % 
0.009211164 0.037694948 

We can get the odds by exponentiating the estimates:

cbind(exp(coef(mod1)), exp(confint(mod1)))
                             2.5 %     97.5 %
(Intercept) 0.05057629 0.009587881  0.2007101
dcpct       1.02292712 1.009253717  1.0384144
dneoYes     9.13911806 3.139924972 31.3596110
dcloxYes    0.24352979 0.078283239  0.7094844

The statistic to determine the overall significance of a logistic model is the likelihood ratio test. It compares the likelihood of the full model (with all the predictors included) with the likelihood of the null model (which contains only the intercept). It is analogous to the overall F-test in linear regression. The likelihood ratio test statistic is: G_0^2 = 2ln\frac{L}{L_0} where L is the likelihood of the full model and L_0 is the likelihood of the null model. The likelihood ratio test statistic (G_0^2 = 41.73) can be compared to a \chi^2 distribution with 3 degrees of freedom. The predictor variables are significant predictors of case-control status.

mod1.null <- glm(casecont ~ 1, family = binomial, data = nocardia)
lr.mod1 <- -(deviance(mod1) / 2)
lr.mod1.null <- -(deviance(mod1.null) / 2)
(lr <- 2 * (lr.mod1 - lr.mod1.null))
[1] 41.73248
1 - pchisq(lr, 2)
[1] 8.667767e-10

We can also compare the likelihood of the model under investigation to the likelihood of a fully saturated model (one in whcih there would be one parameter fit for each data point):

pchisq(deviance(mod1), df.residual(mod1), lower = FALSE)
[1] 0.3748198

The p-value is over 0.05 and we can conclude this model fits sufficiently well.

Finally we can evaluate a single coefficient with a Wald test. If we had barn type to the model below, we can test for an overall effect of barn type (b gives the coefficient, Sigma the variance covariance matrix of the error terms, Terms indicates which terms in the model to test (in the order they appear in the table of coefficients). The overall effect of barn is significant. We can also test for other barn against tie-stall barn.

mod1b <- glm(casecont ~ dcpct + dneo + dclox + as.factor(dbarn),
           family = binomial, data = nocardia)
summary(mod1b)

Call:
glm(formula = casecont ~ dcpct + dneo + dclox + as.factor(dbarn), 
    family = binomial, data = nocardia)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.6949  -0.7853   0.1021   0.7692   2.6801  

Coefficients:
                          Estimate Std. Error z value Pr(>|z|)    
(Intercept)              -2.445696   0.854328  -2.863  0.00420 ** 
dcpct                     0.021604   0.007657   2.821  0.00478 ** 
dneoYes                   2.685280   0.677273   3.965 7.34e-05 ***
dcloxYes                 -1.235266   0.580976  -2.126  0.03349 *  
as.factor(dbarn)tiestall -1.333732   0.631925  -2.111  0.03481 *  
as.factor(dbarn)other    -0.218350   1.154293  -0.189  0.84996    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 149.72  on 107  degrees of freedom
Residual deviance: 102.32  on 102  degrees of freedom
AIC: 114.32

Number of Fisher Scoring iterations: 5

library(aod)
wald.test(b = coef(mod1b), Sigma = vcov(mod1b), Terms = 4:5)
Wald test:
----------

Chi-squared test:
X2 = 10.2, df = 2, P(> X2) = 0.0062

comp <- cbind(0, 0, 0, 0, 1, -1)
wald.test(b = coef(mod1b), Sigma = vcov(mod1b), L = comp)
Wald test:
----------

Chi-squared test:
X2 = 1.1, df = 1, P(> X2) = 0.29

That’s it for now…

Transformer un vieux laptop en ordinateur éducatif

L’idée est de fournir à un enfant un environnement informatique adapté avec un ensemble de logiciels destinés à l’éducation. Le projet Edubuntu semblait être parfait, surtout que le projet québécois EduLinux développé à l’Université de Sherbrooke s’est arrêté et a rejoint Edubuntu. Problème: récupérer un vieux laptop circa 2004-2005, Dell Inspiron 2200 avec un Intel Pentium M processor 1.6GHz et 256Mb de RAM.

Edubuntu est basé sur Ubuntu 12.04 et ne peut être installé sur ce portable qui a un CPU non-PAE. Je me tourne donc vers une distribution linux légère et compatible avec mon CPU, Linux Mint Maya avec environnement de bureau Xfce.

Après avoir gravé l’image iso de la distro sur un CD et avoir “booté” le laptop dessus, l’installation ne démarre pas et il faut recourir au mode de compatibilité pour l’installation. Lorsque l’installation est terminée et que l’ordinateur redémarre, la procédure de démarrage n’aboutit que sur un écran noir. Il semblerait que ce soit un problème avec la vieille carte graphique du portable. Pour éviter de futurs problèmes, quand on arrive à la fenêtre avec les différentes options de démarrage, appuyer sur la touche de tabulation permet d’avoir les options de démarrage. Plusieurs lignes apparaissent alors. Il faut ajouter “nomodeset” à la fin de la ligne où apparait “quiet splash”. Ne pas oublier de modifier les paramètres de démarrage de manière permanente. Avec la ligne de commande:

sudo gedit /etc/default/grub

et de changer la ligne GRUB_CMDLINE_LINUX_DEFAULT=”quiet splash” en GRUB_CMDLINE_LINUX_DEFAULT=”quiet splash nomodeset”. Enregistrer le fichier puis:

sudo update-grub

Là, tout fonctionne vachement bien pour un vieil ordi (je le trouve même plus “vif” que lorsqu’il était neuf avec Windows). Mais encore un problème: pas de wi-fi. Facile, cela vient de la carte qui n’est pas prise en charge (une Broadcom BCM4318). Il suffit d’installer le firmware b43:

sudo apt-get install firmware-b43-installer

Et voila! Maintenant tout marche. Reste à installer les logiciels éducatifs: GCompris, Childsplay, TuxMath, Tuxpaint, Etoys, gBrainy. Ne pas oublier d’ajouter un contrôle parental (Nanny) et même d’installer les extensions Firefox ProCon Latte et Public Fox.

Résultats étonnant avec cette “vieille machine”!

Veterinary Epidemiologic Research: Linear Regression Part 3 – Box-Cox and Matrix Representation

In the previous post, I forgot to show an example of Box-Cox transformation when there’s a lack of normality. The Box-Cox procedure computes values of \lambda which best “normalises” the errors.

\lambda value Transformed value of Y
2 Y^2
1 Y
0.5 \sqrt{Y}
0 \ln{Y}
-0.5 \frac{1}{\sqrt{Y}}
-1 \frac{1}{Y}
-2 \frac{1}{Y^2}

For example:

lm.wpc2 <- lm(wpc ~ vag_disch + milk120 + milk120.sq, data = daisy3)
library(MASS)
boxcox(lm.wpc2, plotit = TRUE)

boxcox

The plot indicates a log transformation.

Matrix Representation
We can use a matrix representation of the regression equation. The regression equation y_i = \beta_0 + \beta_{1}x_{1i} + \beta_{2}x_{2i} + \epsilon_i, i = 1, \dots, n can be written as y = X\beta + \epsilon where y = (y_1 \dots y_n)^T , \epsilon = (\epsilon_1 \dots \epsilon_n)^T , \beta = (\beta_0, \dots, \beta_2)^T and
X = \begin{pmatrix}  1 & x_{11} & x_{12} \\  1 & x_{21} & x_{22} \\  \vdots & \vdots & \vdots \\  1 & x_{n1} & x_{n2}  \end{pmatrix}  .

With the following regression:

lm.wpc3 <- lm(wpc ~ milk120 + hs100_ct, data = daisy3)
(lm.wpc3.sum <- summary(lm.wpc3))

Call:
lm(formula = wpc ~ milk120 + hs100_ct, data = daisy3)

Residuals:
Interval from wait period to conception 
   Min     1Q Median     3Q    Max 
-71.83 -36.57 -15.90  23.86 211.18 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 75.884063   6.115894  12.408   <2e-16 ***
milk120     -0.001948   0.001858  -1.049    0.294    
hs100_ct    18.530340   2.106920   8.795   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 50.53 on 1522 degrees of freedom
Multiple R-squared: 0.0495,	Adjusted R-squared: 0.04825 
F-statistic: 39.63 on 2 and 1522 DF,  p-value: < 2.2e-16 

anova(lm.wpc3)
Analysis of Variance Table

Response: wpc
            Df  Sum Sq Mean Sq F value Pr(>F)    
milk120      1    4865    4865  1.9053 0.1677    
hs100_ct     1  197512  197512 77.3518 <2e-16 ***
Residuals 1522 3886309    2553                   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

We start by making the X matrix and response y :

X <- cbind(1 , daisy3[, c(8, 24)])
# or alternatively: model.matrix(lm.wpc3)
y <- daisy3$wpc

Then we get the X^{T}X :

XX <- t(X) %*% X
# or alternatively: crossprod(X, X)
XX
                     1       milk120      hs100_ct
1           1525.00000     4907834.4     -29.87473
milk120  4907834.39954 16535517546.0 -120696.45468
hs100_ct     -29.87473     -120696.5     576.60900

We calculate the inverse of X^{T}X :

XX.inv <- solve(t(X) %*% X)
# or alternatively: qr.solve(XX)
# or also: lm.wpc3.sum$cov.unscaled
XX.inv
                     1       milk120      hs100_ct
1         1.464864e-02 -4.348902e-06 -1.513557e-04
milk120  -4.348902e-06  1.351675e-09  5.761289e-08
hs100_ct -1.513557e-04  5.761289e-08  1.738495e-03

Now to get the coefficients estimates:

XX.inv %*% t(X) %*% y
# or alternatively: solve(t(X) %*% X, t(X) %*% y)
# or also: XX.inv %*% crossprod(X, y)
# or: qr.solve(X, y)
                 [,1]
1        75.884062540
milk120  -0.001948437
hs100_ct 18.530340385

The fitted values:

crossprod(t(X), qr.solve(X, y)

SSE and MSE;

SSE <- crossprod(y, y) - crossprod(y, yhat)
SSE
        [,1]
[1,] 3886309
 
MSE <- SSE / (length(y) - 3)
MSE
         [,1]
[1,] 2553.423

The residual standard error:

sqrt(sum(lm.wpc3$res^2) / (1525 - 3))

The standard errors of the coefficients and the R^2 :

sqrt(diag(XX.inv)) * 50.5314
          1     milk120    hs100_ct 
6.115893490 0.001857794 2.106920139

1 - sum(lm.wpc3$res^2) / sum((y - mean(y))^2)
[1] 0.04949679

Veterinary Epidemiologic Research: Linear Regression Part 2 – Checking assumptions

We continue on the linear regression chapter the book Veterinary Epidemiologic Research.

Using same data as last post and running example 14.12:

tmp <- tempfile()
download.file("http://ic.upei.ca/ver/sites/ic.upei.ca.ver/files/ver2_data_R.zip", tmp) # fetch the file into the temporary file
load(unz(tmp, "ver2_data_R/daisy2.rdata")) # extract the target file from the temporary file
unlink(tmp) # remove the temporary file

### some data management
daisy2 <- daisy2[daisy2$h7 == 1, ] # we only use a subset of the data
library(Hmisc)
daisy2 <- upData(daisy2, labels = c(region = 'Region', herd = 'Herd number',
                           cow = 'Cow number',
                           study_lact = 'Study lactation number',
                           herd_size = 'Herd size',
                           mwp = "Minimum wait period for herd",
                           parity = 'Lactation number',
                  milk120 = 'Milk volume in first 120 days of lactation',
                           calv_dt = 'Calving date',
                           cf = 'Calving to first service interval',
                           fs = 'Conception at first service',
                           cc = 'Calving to conception interval',
                           wpc = 'Interval from wait period to conception',
                           spc = 'Services to conception', twin = 'Twins born',
                           dyst = 'Dystocia at calving',
                           rp = 'Retained placenta at calving',
                           vag_disch = 'Vaginal discharge observed',
                           h7 = 'Indicator for 7 herd subset'),
                 levels = list(fs = list('No' = 0, 'Yes' = 1),
                   twin = list('No' = 0, 'Yes' = 1),
                   dyst = list('No' = 0, 'Yes' = 1),
                   rp = list('No' = 0, 'Yes' = 1),
                   vag_disch = list('No' = 0, 'Yes' = 1)),
                 units = c(milk120 = "litres"))

library(lubridate)
daisy2$date <- as.character(daisy2$calv_dt)
daisy2$date <- ymd(daisy2$date)
daisy2$mth <- month(daisy2$date)
# calving in automn or not:
daisy2$aut_calv <- with(daisy2, ifelse(mth %in% c(9:12), "fall", "other"))
# centering variables:
daisy2$hs100 <- daisy2$herd_size / 100
daisy2$hs100_ct <- daisy2$hs100 - mean(daisy2$hs100)
daisy2$hs100_ctsq <- daisy2$hs100_ct^2
daisy2$parity_sc <- daisy2$parity - mean(daisy2$parity)

daisy3 <- daisy2[complete.cases(daisy2), ] # df with only complete cases

lm.wpc <- lm(wpc ~ aut_calv + hs100_ct + hs100_ctsq + parity_sc + twin +
+              dyst + rp + vag_disch + rp*vag_disch, data = daisy3)
(lm.wpc.sum <- summary(lm.wpc))

Call:
lm(formula = wpc ~ aut_calv + hs100_ct + hs100_ctsq + parity_sc + 
    twin + dyst + rp + vag_disch + rp * vag_disch, data = daisy3)

Residuals:
Interval from wait period to conception 
   Min     1Q Median     3Q    Max 
-76.40 -35.23 -15.22  22.64 210.63 

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)         66.9831     2.4970  26.825  < 2e-16 ***
aut_calvother       -6.1336     2.6420  -2.322 0.020385 *  
hs100_ct            21.8485     2.2546   9.690  < 2e-16 ***
hs100_ctsq          11.7763     3.1952   3.686 0.000236 ***
parity_sc            1.1421     0.8727   1.309 0.190848    
twinYes             19.7621     9.8677   2.003 0.045387 *  
dystYes             11.0347     5.5386   1.992 0.046515 *  
rpYes                7.1111     4.8654   1.462 0.144067    
vag_dischYes         2.0977     7.2588   0.289 0.772629    
rpYes:vag_dischYes  19.9543    12.7239   1.568 0.117031    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 50 on 1515 degrees of freedom
Multiple R-squared: 0.07355,	Adjusted R-squared: 0.06805 
F-statistic: 13.36 on 9 and 1515 DF,  p-value: < 2.2e-16 

Now we can create some plots to assess the major assumptions of linear regression. First, let’s have a look at homoscedasticity, or constant variance of residuals. You can run a statistical test, the Cook-Weisberg test (or Breusch-Pagan test), available in libraries car and lmtest:

library(car)
ncvTest(lm.wpc)
Non-constant Variance Score Test 
Variance formula: ~ fitted.values 
Chisquare = 23.08052    Df = 1     p = 1.553565e-06 

library(lmtest)
bptest(lm.wpc)

	studentized Breusch-Pagan test

data:  lm.wpc 
BP = 24.3082, df = 9, p-value = 0.00384

Both tests are significant indicating the presence of heteroscedasticity but they are slightly different. Quoting John Fox from this post: “bp.test() performs the same score test as ncvTest(), except that the default alternative hypothesis is different — in bp.test() the error variance is a function of a linear combination of the regressors and in ncvTest() the error variance is a function of the fitted values (i.e., a *particular* linear combination of regressors). Testing against the fitted values with 1 df will have greater power if this is the real pattern of heteroscedasticity”.

But the evaluation of linear model assumptions relies more often on graphical methods. You can use the 6 diagnostic plots readily available after fitting your model, or try it with ggplot2 using the fortify() method which allow combining a model with its data (“the model fortifies the data, and the data fortifies the model”):

plot.lm(lm.wpc, which = 1) # diag plot in base stats package
library(ggplot2)
ggplot(lm.wpc, aes(.fitted, .resid)) +
   geom_hline(yintercept = 0) +
   geom_point() +
   geom_smooth(se = FALSE)
Residuals vs. fitted values
Residuals vs. fitted values

We can check the normality of the residuals by a statistical test, the Shapiro-Wilk test, and a normal probability plot (Q-Q plot):

shapiro.test(resid(lm.wpc))

	Shapiro-Wilk normality test

data:  resid(lm.wpc) 
W = 0.8817, p-value < 2.2e-16

ggplot(lm.wpc, aes(sample = .stdresid)) +
   stat_qq() +
   geom_abline()
Q-Q plot
Q-Q plot

Other diagnostic plots: Scale-Location plot of sqrt(|residuals|) against fitted values, Cook’s distance plot, Residuals vs. leverages, Cook’s distances vs. leverages:

## Scale-Location plot of sqrt(|residuals|) against fitted values (which = 3)
ggplot(lm.wpc, aes(.fitted, abs(.stdresid))) +
       geom_point() +
       geom_smooth(se = FALSE) +
       scale_y_sqrt()

# Cook's distance plot (which = 4)
ggplot(lm.wpc, aes(seq_along(.cooksd), .cooksd)) +
  geom_bar(stat = "identity")

# Residuals vs. leverages (which = 5)
ggplot(lm.wpc, aes(.hat, .stdresid)) +
  geom_vline(size = 2, colour = "white", xintercept = 0) +
  geom_hline(size = 2, colour = "white", yintercept = 0) +
  geom_point() +
  geom_smooth(se = FALSE)

# Cook's distances  vs leverages/(1-leverages) (which = 6)
ggplot(lm.wpc, aes(.hat, .cooksd)) +
  geom_vline(colour = NA) +
  geom_abline(slope = seq(0, 3, by = 0.5), colour = "white") +
  geom_smooth(se = FALSE) +
  geom_point()

Scale-Location plot
Scale-Location plot

Cook's distance plot
Cook’s distance plot

Residuals vs. Leverages
Residuals vs. Leverages

Cook's distances vs. Leverages
Cook’s distances vs. Leverages

You could also check the linearity of the predictor-outcome association by plotting the residuals against each of the continuous predictor variables, for example here centered herd size:

ggplot(lm.wpc, aes(hs100_ct, .resid)) +
  geom_point() +
  geom_smooth(method = "loess", se = FALSE)
Linearity of predictor-outcome association
Linearity of predictor-outcome association

Log Transformation
Let’s do a log transformation (with a simpler model) and back-transform the coefficients:

daisy3$milk120.sq <- daisy3$milk120^2
lm.wpc2 <- lm(wpc ~ vag_disch + milk120 + milk120.sq, data = daisy3)
lm.lwpc <- lm(log(wpc) ~ vag_disch + milk120 + milk120.sq, data = daisy3)
(lm.lwpc.sum <- summary(lm.lwpc))

Call:
lm(formula = log(wpc) ~ vag_disch + milk120 + milk120.sq, data = daisy3)

Residuals:
Interval from wait period to conception 
    Min      1Q  Median      3Q     Max 
-3.9229 -0.5554  0.0231  0.5794  1.7245 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)   4.765e+00  3.140e-01  15.175   <2e-16 ***
vag_dischYes  1.477e-01  8.761e-02   1.686   0.0921 .  
milk120      -4.664e-04  1.951e-04  -2.390   0.0170 *  
milk120.sq    6.399e-08  2.965e-08   2.159   0.0310 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0.7615 on 1521 degrees of freedom
Multiple R-squared: 0.006997,	Adjusted R-squared: 0.005039 
F-statistic: 3.573 on 3 and 1521 DF,  p-value: 0.01356 

(ci.lwpc <- confint(lm.lwpc)) # confidence intervals
                     2.5 %        97.5 %
(Intercept)   4.149060e+00  5.380862e+00
vag_dischYes -2.416148e-02  3.195190e-01
milk120      -8.491117e-04 -8.367357e-05
milk120.sq    5.844004e-09  1.221453e-07
exp(lm.lwpc$coefficients[2])
vag_dischYes 
     1.15914 
exp(ci.lwpc[2, 1])
[1] 0.9761281
exp(ci.lwpc[2, 2])
[1] 1.376465

The trick to get the confidence interval is to get it on the transformed scale and then going back to the original scale. The most difficult part is in interpreting the results. A proprty of the logarithm is that “the difference between logs is the log of the ratio”. So the mean waiting period length difference between cows with and without vaginal discharge is 0.15 on a log scale, or exp(0.15) = 1.15 times longer for cows with vaginal discharge. Better maybe, express as ratios: the ratio of waiting periods from cows with vaginal discharge or no discharge is 1.15 (95% CI: 0.98 to 1.38). Note that the back-transformation of the confidence interval is not symmetric anymore.
So for your model \hat{y} = \hat{\beta}_0 + \hat{\beta}_1 x_1 + \hat{\beta}_2 x_2 + ... + \hat{\beta}_n x_n , or \hat{y} = e^{\hat{\beta}_0}e^{\hat{\beta}_{1}x_i}...e^{\hat{\beta}_{n}x_n} , an increase of 1 for x_1 multiply the predicted response (in the original scale) by e^{\hat{\beta}_1} (the regression coefficients are interpreted in a multiplicative rather than additive manner). If \hat{\beta}_1 >0 , when x increases by 1, the median of \hat{y} increases by (e^{\hat{\beta}_1} - 1) \times 100\% . If \hat{\beta}_1 <0 , when x increases by 1, the median of \hat{y} decreases by (1 - e^{\hat{\beta}_1}) \times 100\% .