R, JAGS and ggplot2

Last week a question was asked on the ggplot2 list about using ggplot2 and jags in R (). Here’s what was my answer (a bit updated):

Using as an example the school dataset from R2WinBUGS package:


schoolmodel <- function(){
  for (j in 1:J)
      y[j] ~ dnorm (theta[j], tau.y[j])
      theta[j] ~ dnorm (mu.theta, tau.theta)
      tau.y[j] <- pow(sigma.y[j], -2)
  mu.theta ~ dnorm (0.0, 1.0E-6)
  tau.theta <- pow(sigma.theta, -2)
  sigma.theta ~ dunif (0, 1000)
write.model(schoolmodel, "~/schoolmodel.bug")
J <- nrow(schools)
y <- schools$estimate
sigma.y <- schools$sd
forJags <- list(J = nrow(schools), y = schools$estimate, sigma.y = schools$sd)
inits <- function(){
  list(theta = rnorm(J, 0, 100), mu.theta = rnorm(1, 0, 100),
       sigma.theta = runif(1, 0, 100))

school.sim <- jags.model(file = "schoolmodel.bug", data = forJags,
                         inits = inits, n.chains = 3, n.adapt = 500)
school.out <- coda.samples(school.sim, variable.names =
                           c("theta", "mu.theta", "sigma.theta"),
                           n.iter = 1000)

Than you can use the mcmcplots package which give a “feel” of ggplot2:

denplot(school.out, parms = c('mu.theta', collapse = FALSE, auto.layout = TRUE))
traplot(school.out, parms = c('mu.theta'))
autplot1(school.out[,"mu.theta", drop = FALSE])

Density Plot - mcmcplots package

Trace Plot - mcmcplots package

Autoplot - mcmcplots package

If you really want to use ggplot2, you have to extract the information you need, like for example:

mu.theta.out <- cbind(school.out[[1]][,1], school.out[[2]][,1], school.out[[3]][,1])
attributes(mu.theta.out) <- NULL
dens.mu.theta <- density(mu.theta.out)
q25.mu.theta <- quantile(mu.theta.out, .025)
q975.mu.theta <- quantile(mu.theta.out, .975)
dd.mu.theta <- with(dens.mu.theta, data.frame(x,y))

qplot(x, y, data = dd.mu.theta, geom = "line", ylab = "", xlab = "") +
  geom_ribbon(data = subset(dd.mu.theta, x>q25.mu.theta & x<q975.mu.theta),
              aes(ymax = y), ymin = 0, fill = "red", colour = NA, alpha = 0.5)

This would give you the density plot with its 95% credible interval.

Density Plot - ggplot2 package

Publishing in Veterinary Academic Journals

Following the post by Arthur Charpentier (Freakonometrics), I wondered what would be the outcome considering my current engagement (veterinary medicine, epidemiology, bovine mastitis). Briefly, Arthur Charpentier’s post looked at clusters of journals publishing the same kind of papers.

So I looked at 25 journals (Journal of Dairy Science, Canadian Journal of Veterinary Medicine, Preventive Veterinary Medicine, New Zealand Veterinary Journal, Veterinary Research, Acta veterinaria Scandinavica, Journal of Dairy Research, The Veterinary Record, The Veterinary Journal, Journal of Veterinary Medicine-Series A, Journal of Veterinary Medicine-Series B, Journal of Veterinary Science, Journal of the American Veterinary Medical Association, American Journal of Veterinary Research, Journal of Veterinary Internal Medicine, Canadian Journal of Veterinary Research, Journal of Veterinary Diagnostic Investigation, Veterinary Microbiology, Veterinary Research Communications, BMC Veterinary Research, Australian Veterinary Journal, Theriogenology, Zoonoses and Public Health, Research in Veterinary Science, Tijdschrift voor Diergeneeskunde). But instead of looking at word frequency in the title of the papers, I counted the frequency of the following MeSH terms for the last 20 years in each journal: public health, diagnoses, surveys, dairy product, milk, lactation, cohort studies, case-control study, infections, prevalence, statistical regression, risk factor, bovine mastitis, statistical model, longitudinal study, survival analyses, udder, incidence, zoonoses, pathogen transmission, seroprevalence, cross-sectional studies, communicable disease, Bayesian analysis, evidence-based medicine, epidemiology, lactation disorder, preventive medicine. I then ran a principal component analysis on the 20 most frequent.

estim_ncp(pubmed, ncp.min = 0, ncp.max = NULL, scale = TRUE, method = "Smooth")
pca -> PCA(pubmed, scale.unit = TRUE, ncp = 13, graph = FALSE)
plot.PCA(pca, axes = c(1, 2), choix = "ind", cex = 0.75, new.plot = FALSE)

which gave this projection for the first 2 dimensions:

The first 2 dimensions explained roughly 80% of the variation. We have Journal of Dairy Science on one corner, Vet Record and JAVMA on a second one, Preventive Veterinary Medicine in-between and Journal of Dairy Research on a third corner. All other journals are making a pack all together. Are they all the same?

Plotting the variables we have:

Two distinct groups here: the second dimension is made of statistics and lactation/bovine mastitis/dairies, and the first dimension groups the other terms.

If we had to the dataset information on journal’s impact factor and type of journal (from ISI Web of Knowledge), we get:

That’s interesting! The first dimension is related to journal from Veterinary Science, while the second dimension is related to journals from Agriculture, Dairy, Animal Science and Food Science Technology (= Journal of Dairy Science and Journal of Dairy Research). And the impact factor is in the direction of these journals…

If we look at the third and fourth dimensions, we can split the veterinary journals:

Preventive Veterinary Medicine, Vet Record and JAVMA are now more clearly separated, with the first one oriented more towards statistical models, risk factors, Vet Record for transmission, prevalence, incidence of disease and JAVMA for epidemiologic studies (cohort, case-control).

Regarding clusters of journals, we have:

distance -> dist(pubmed)
cah -> hclust(distance)

Journal of Dairy Science is clearly on its own, with Vet Record and JAVMA together, but the others cannot really be broken down.

Using the sna library:

gplot(pubmed, gmode = "twomode", displaylabels = TRUE, edge.lwd =0, edge.col = "light blue", label.cex =0.7)

where we have on the upper right corner the 2 Animal Science journal, Journal of Dairy Science and Journal of Dairy Research, with their MeSH terms related to dairy, udder and stats. Facing them we have the terms related to epidemiologic study design, zoonoses, transmission of disease, and in-between we have the terms for public health, diagnoses, prevalence, surveys and risk factors. In this center, we have our JAVMA, Vet Record and Preventive Veterinary Medicine, while the other Veterinary Science journals are spread evenly above and below this left-right diagonal. So not enough diversity in Vet Science journals?