Ggplot2: how to…

  • How to draw a multi-density plot from a data frame or a list
  • How to draw a scatterplot and assign color as a function of another variable
  • How to draw a regression line on a scatterplot
  • How to draw horizontal and vertical lines on a scatterplot

How to draw a multi-density plot from a data frame or a list

Before, we create a sample data set for our test. In this case we use a list, but a data frame could also be used in the case where all elements have the same length (just replace list() by data.frame()).

testDF <- list(mean0 = rnorm(1000), mean1 = rnorm(1200, mean = 1))

First step is to transform this data in wide format into long format. The resulting data frame will have two columns titled “values” (with the data) and “ind” (with labels corresponding to column names).

longDF <- stack(testDF)

Then we plot the densities. Library ggplot2 is loaded, and we use the qplot() function, specifying that “values” column contains the x coordinates (all data being in longDF), that we want the density to be displayed, and that values should be split according to the categories in “ind”.

library("ggplot2")
qplot(x = values, data = longDF, geom = "density", colour = ind)

plot of chunk unnamed-chunk-3

One-block code:

testDF <- list(mean0 = rnorm(1000), mean1 = rnorm(1200, mean = 1))
longDF <- stack(testDF)
library("ggplot2")
qplot(x = values, data = longDF, geom = "density", colour = ind)

How to draw a scatterplot and assign color as a function of another variable

Dataset: we use the 'iris' data set. We want to plot petal length versus width, like this graph :

library("ggplot2")
data(iris)
qplot(x = Petal.Length, y = Petal.Width, data = iris)

plot of chunk unnamed-chunk-5

and color the points depending on the petal area (approximated as length*width).

petalArea <- iris$Petal.Length * iris$Petal.Width

Continuous values, or a large set of discrete values

For continuous values, use the cut() function to define intervals and create a factor.

cutPetalArea <- cut(petalArea, breaks = c(min(petalArea), mean(petalArea), max(petalArea)), 
    include.lowest = TRUE)

Then use this factor for the 'colour' parameter.

p <- ggplot(data = iris, aes(x = Petal.Length, y = Petal.Width, colour = cutPetalArea)) + 
    geom_point()
p
## Error: objet 'cutPetalArea' introuvable

Or simply:

qplot(x = Petal.Length, y = Petal.Width, data = iris, colour = cutPetalArea)

plot of chunk unnamed-chunk-9

Discrete values, small finite set

## We artificially round all values to create a finite set of integers...
roundedPetalArea <- round(petalArea, digits = 0)

For a small finite set of discrete values, we can directly create a factor from the different values.

p <- ggplot(data = iris, aes(x = Petal.Length, y = Petal.Width, colour = factor(roundedPetalArea))) + 
    geom_point()
p
## Error: objet 'roundedPetalArea' introuvable

Or simply:

qplot(x = Petal.Length, y = Petal.Width, data = iris, colour = factor(roundedPetalArea))

plot of chunk unnamed-chunk-12


How to draw a regression line on a scatterplot

Dataset: we use the 'iris' data set. We want to plot petal length versus width, like this graph :

library("ggplot2")
data(iris)
qplot(x = Petal.Length, y = Petal.Width, data = iris)

plot of chunk unnamed-chunk-13

and add a regression line on the plot.

First step is to construct the linear model that will provide the equation coefficients to be displayed.

model <- lm(iris$Petal.Width ~ iris$Petal.Length)
model
## 
## Call:
## lm(formula = iris$Petal.Width ~ iris$Petal.Length)
## 
## Coefficients:
##       (Intercept)  iris$Petal.Length  
##            -0.363              0.416

The first coefficient is the intercept of the equation and the second one, the slope.

We can also add the Rsquared value:

rSquared <- summary(model)$r.squared

Then we format those values in a character string.

intercept <- coef(model)[1]
slope <- coef(model)[2]

eq <- paste0("y = ", format(intercept, digits = 2), ifelse((slope >= 0), " + ", 
    " - "), format(abs(slope), digits = 3), " . x")
r2val <- paste0("r2 = ", format(rSquared, digits = 3))

Finally, we plot the scatterplot. As we must write the texts at the desired place, instead of using qplot, we use advanced functions, describing layers one after another.

p <- ggplot(data = iris, aes(x = Petal.Length, y = Petal.Width)) + geom_point() + 
    geom_smooth(method = lm, formula = y ~ x, se = FALSE, color = "blue") + 
    annotate("text", x = 4, y = 0.3 + 0.1, hjust = 0, vjust = 0, label = eq) + 
    annotate("text", x = 4, y = 0.3 - 0.1, hjust = 0, vjust = 1, label = r2val)
p

plot of chunk unnamed-chunk-17

One-block code:

library("ggplot2")
data(iris)
model <- lm(iris$Petal.Width ~ iris$Petal.Length)
rSquared <- summary(model)$r.squared
intercept <- coef(model)[1]
slope <- coef(model)[2]

eq <- paste0("y = ", format(intercept, digits = 2), ifelse((slope >= 0), " + ", 
    " - "), format(abs(slope), digits = 3), " . x")
r2val <- paste0("r2 = ", format(rSquared, digits = 3))

p <- ggplot(data = iris, aes(x = Petal.Length, y = Petal.Width)) + geom_point() + 
    geom_smooth(method = lm, formula = y ~ x, se = FALSE, color = "blue") + 
    annotate("text", x = 4, y = 0.3 + 0.1, hjust = 0, vjust = 0, label = eq) + 
    annotate("text", x = 4, y = 0.3 - 0.1, hjust = 0, vjust = 1, label = r2val)
p

How to draw horizontal and vertical lines on a scatterplot

Dataset: we use the 'iris' data set. We want to plot petal length versus width, like this graph :

library("ggplot2")
data(iris)
qplot(x = Petal.Length, y = Petal.Width, data = iris)

plot of chunk unnamed-chunk-19

and draw two horizontal and vertical lines on the plot.

This can be done with the geom_ hline() and geom_vline() functions.

p <- ggplot(data = iris, aes(x = Petal.Length, y = Petal.Width)) + geom_point() + 
    geom_hline(yintercept = mean(iris$Petal.Width), colour = "grey") + geom_vline(xintercept = mean(iris$Petal.Length), 
    colour = "grey")
p

plot of chunk unnamed-chunk-20


Creative Commons License
This work by Celine Hernandez is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.