Beeswarm Plot with ggplot2

A colleague showed me results of his study project with beeswarm plots made by GraphPad. I was wondering if it could be implemented in R and more specifically with ggplot2.

There is a R package allowing to draw such graphs, the beeswarm package (beeswarm, cran). An implementation was shown on R-statistics blog but not with ggplot.

First here’s the example from the beeswarm package:

library(beeswarm)
data(breast)
breast2 <- breast[order(breast$event_survival, breast$ER),]

beeswarm(time_survival ~ event_survival, data = breast2, pch = 16,
         pwcol = as.numeric(ER), xlab = '', 
         ylab = 'Follow-up time (months)', 
         labels = c('Censored', 'Metastasis'))
legend('topright', legend = levels(breast$ER), title = 'ER', 
       pch = 16, col = 1:2)


Or even like in Tal Galili’s blog, with a boxplot:

beeswarm(time_survival ~ event_survival, data = breast2, pch = 16,
         pwcol = as.numeric(ER), xlab = '', 
         ylab = 'Follow-up time (months)', 
         labels = c('Censored', 'Metastasis'))
boxplot(time_survival ~ event_survival, data = breast2, add = T,
        names = c("",""), col="#0000ff22")  
legend('topright', legend = levels(breast$ER), title = 'ER', 
        pch = 16, col = 1:2)

The trick is to use the beeswarm call to get the x and y position. Beeswarm creates a dataframe from which we can get the necessary positionings.

beeswarm <- beeswarm(time_survival ~ event_survival, 
            data = breast, method = 'swarm', 
            pwcol = ER)[, c(1, 2, 4, 6)]
colnames(beeswarm) <- c("x", "y", "ER", "event_survival") 

library(ggplot2)
library(plyr)
beeswarm.plot <- ggplot(beeswarm, aes(x, y)) +
  xlab("") +
  scale_y_continuous(expression("Follow-up time (months)"))
beeswarm.plot2 <- beeswarm.plot + geom_boxplot(aes(x, y,
  group = round_any(x, 1, round)), outlier.shape = NA)
beeswarm.plot3 <- beeswarm.plot2 + geom_point(aes(colour = ER)) +
  scale_colour_manual(values = c("black", "red")) + 
  scale_x_continuous(breaks = c(1:2), 
         labels = c("Censored", "Metastasis"), expand = c(0, 0.5))

Do not forget to remove the outliers from your boxplot or they will superimpose with the points created by geom_point.

I wonder if these plots are more useful in certain field. If anybody has references for beeswarm plots, I would be very grateful.

Advertisements