banner



How To Draw Box Plots In R

What is box plot in R programming? A boxplot in R, also known as box and whisker plot, is a graphical representation which allows y'all to summarize the main characteristics of the information (position, dispersion, skewness, …) and identify the presence of outliers. In this tutorial we will review how to make a base R box plot.

  • 1 How to interpret a box plot in R?
  • 2 The boxplot part in R
    • ii.1 Boxplot from vector
    • 2.two Box plot with confidence interval for the median
    • 2.3 Boxplot by group in R
    • ii.4 Multiple boxplots
    • 2.5 Reorder boxplot in R
    • 2.6 Boxplot customization
  • 3 Add mean point to a boxplot in R
  • 4 Return values from boxplot
  • v Boxplot and histogram
  • 6 Boxplot in R ggplot2
    • 6.ane Boxplot in ggplot2 from vector
    • six.ii Boxplot in ggplot2 by grouping
    • six.three Boxplot in ggplot2 from dataframe

How to interpret a box plot in R?

The box of a boxplot starts in the first quartile (25%) and ends in the third (75%). Hence, the box represents the 50% of the central data, with a line within that represents the median. On each side of the box at that place is fatigued a segment to the furthest information without counting boxplot outliers, that in example there exist, volition exist represented with circles.

An outlier is that observation that is very distant from the rest of the data. A data point is said to be an outlier if it is greater than Q_3 + one.5 \cdot IQR (right outlier), or is less than Q_1 – ane.v \cdot IQR (left outlier), being Q_1 the showtime quartile, Q_3 the third quartile and IQR the interquartile range (Q_3Q_1) that represents the width of the box for horizontal boxplots.

The boxplot function in R

A box and whisker plot in base R can be plotted with the boxplot function. You tin can plot this type of graph from different inputs, similar vectors or data frames, equally we will review in the following subsections. In example of plotting boxplots for multiple groups in the same graph, you lot tin also specify a formula as input. In addition, you can customize the resulting box plot with several arguments.

Boxplot from vector

If yous are wondering how to make box plot in R from vector, you just need to pass the vector to the boxplot function. By default, the boxplot will be vertical, but you can alter the orientation setting the horizontal argument to TRUE.

          x <- c(8, v, xiv, -ix, 19, 12, three, 9, 7, 4,        iv, vi, 8, 12, -8, two, 0, -1, 5, iii)        
          boxplot(x, horizontal = TRUE)        
Simple boxplot in R

Note that boxplots hibernate the underlying distribution of the data. In guild to solve this result, you can add together points to boxplot in R with the stripchart function (jittered data points will avoid to overplot the outliers) as follows:

          stripchart(x, method = "jitter", pch = 19, add = TRUE, col = "blue")        
Adding points to a boxplot in R with stripchart function

Since R 4.0.0 boxplots are grey by default instead of white.

Box plot with confidence interval for the median

You tin can stand for the 95% confidence intervals for the median in a R boxplot, setting the notch statement to True.

          boxplot(x, notch = True)        
Boxchart with notch

Note that if the notches of two or more boxplots don't overlap ways at that place is strong prove that the medians differ.

Boxplot by group in R

If your dataset has a categorical variable containing groups, you tin can create a boxplot from formula. In this case, we are going to apply the base of operations R chickwts dataset.

          caput(chickwts)        
                      weight    feed 1   179    horsebean 2   160    horsebean three   136    horsebean four   227    horsebean 5   217    horsebean vi   168    horsebean        

Now, yous can create a boxplot of the weight against the blazon of feed. Notice that when working with datasets you can telephone call the variable names if you lot specify the dataframe name in the data statement.

          boxplot(chickwts$weight ~ chickwts$feed) boxplot(weight ~ feed, information = chickwts) # Equivalent        
Box plot by group

In addition, in this example you could add together points to each boxplot typing:

          stripchart(chickwts$weight ~ chickwts$feed, vertical = TRUE, method = "jitter",            pch = nineteen, add = TRUE, col = ane:length(levels(chickwts$feed)))        
Multiple boxplots with data points

Multiple boxplots

In case all variables of your dataset are numeric variables, you can directly create a boxplot from a dataframe. For analogy purposes we are going to apply the trees dataset.

          head(copse)        
                      Girth Meridian Book i   eight.3     70   10.3 2   eight.6     65   10.3 3   8.8     63   10.2 four  ten.five     72   xvi.four 5  10.seven     81   18.8 6  10.8     83   19.7        

Note the difference respect to the chickwts dataset. Even so, you tin can convert this dataset as one of the same format as the chickwts dataset with the stack function.

          stacked_df <- stack(trees) head(stacked_df)        
                      values  ind one    8.3  Girth 2    eight.vi  Girth 3    8.eight  Girth four   10.v  Girth 5   x.7  Girth vi   ten.viii  Girth        

Now, you can plot the boxplot with the original or the stacked dataframe equally we did in the previous department. Note that you tin alter the boxplot color by grouping with a vector of colors as parameters of the col argument. Thus, each boxplot volition take a different colour.

          # Boxplot from the R copse dataset boxplot(trees, col = rainbow(ncol(copse)))  # Equivalent to: boxplot(stacked_df$values ~ stacked_df$ind,         col = rainbow(ncol(trees)))        
Creating multiple boxplots in R

You can stack dataframe columns with the stack function.

In case you lot need to plot a different boxplot for each column of your R dataframe you lot can employ the lapply function and iterate over each column. In this instance, we volition carve up the graphics par in one row and as many columns equally the dataset has, but you could plot private graphs. Notation that the invisible role avoids displaying the output text of the lapply role.

          par(mfrow = c(ane, ncol(trees))) invisible(lapply(one:ncol(copse), part(i) boxplot(trees[, i])))        
Boxplot for each column

Reorder boxplot in R

Past default, boxplots will be plotted with the order of the factors in the information. Yet, you tin can reorder or sort a boxplot in R reordering the information by whatsoever metric, like the median or the hateful, with the reorder function.

          par(mfrow = c(1, 2))  # Lower to college medians <- reorder(chickwts$feed, chickwts$weight, median) # medians <- with(chickwts, reorder(feed, weight, median)) # Equivalent  boxplot(chickwts$weight ~ medians, las = 2, xlab = "", ylab = "")  # Higher to lower medians <- reorder(chickwts$feed, -chickwts$weight, median) # medians <- with(chickwts, reorder(feed, -weight, median)) # Equivalent  boxplot(chickwts$weight ~ medians, las = 2, xlab = "", ylab = "")  par(mfrow = c(1, 1))        
Reordering box graphs in R

If you want to order the boxplot with other metric, just change median for the 1 you adopt.

Boxplot customization

A boxplot tin be fully customized for a nice result. In the following block of code we show a wide instance of how to customize an R box plot and how to add a grid. Notation that there are fifty-fifty more than arguments than the ones in the following case to customize the boxplot, like boxlty, boxlwd, medlty or staplelwd. Review the full list of graphical boxplot parameters in the pars argument of aid(bxp) or ?bxp.

          plot.new()  gear up.seed(1)  # Light gray groundwork rect(par("usr")[i], par("usr")[iii], par("usr")[2], par("usr")[four],      col = "#ebebeb")  # Add together white grid grid(nx = Nil, ny = Zippo, col = "white", lty = 1,      lwd = par("lwd"), equilogs = TRUE)  # Boxplot par(new = TRUE) boxplot(rnorm(500), # Data         horizontal = FALSE, # Horizontal or vertical plot         lwd = 2, # Lines width         col = rgb(i, 0, 0, alpha = 0.four), # Colour         xlab = "10 label",  # X-centrality label         ylab = "Y label",  # Y-axis label         main = "Customized boxplot in base R", # Title         notch = TRUE, # Add together notch if TRUE         border = "black",  # Boxplot border color         outpch = 25,       # Outliers symbol         outbg = "green",   # Outliers color         whiskcol = "blue", # Whisker color         whisklty = 2,      # Whisker line blazon         lty = 1) # Line type (box and median)  # Add a legend legend("topright", legend = "Boxplot", # Position and title     fill = rgb(1, 0, 0, blastoff = 0.4),  # Colour     inset = c(0.03, 0.05), # Modify margins     bg = "white") # Fable background color        
Full customization of a boxplot

Add mean betoken to a boxplot in R

By default, when you create a boxplot the median is displayed. Nonetheless, you may too like to display the mean or other characteristic of the data. For that purpose, you can use the segments role if you desire to display a line as the median, or the points part to merely add points. Note that the code is slightly unlike if y'all create a vertical boxplot or a horizontal boxplot.

In the following code block nosotros show you how to add mean points and segments to both type of boxplots when working with a unmarried boxplot.

          par(mfrow = c(1, 2))  #----------------- # Vertical boxplot #-----------------  boxplot(ten)  # Add mean line segments(x0 = 0.8, y0 = hateful(x),          x1 = 1.two, y1 = mean(x),          col = "ruby-red", lwd = 2) # abline(h = mean(x), col = 2, lwd = ii) # Entire line  # Add together mean point points(hateful(x), col = 3, pch = nineteen)   #------------------- # Horizontal boxplot #-------------------  boxplot(x, horizontal = True)  # Add hateful line segments(x0 = mean(x), y0 = 0.8,          x1 = mean(10), y1 = i.2,          col = "crimson", lwd = ii) # abline(five = mean(x), col = 2, lwd = 2) # Unabridged line  # Add mean point points(mean(x), 1, col = 3, pch = nineteen)  par(mfrow = c(i, 1))        
Adding mean point and line to a box and whiskers plot

Notation that, in this case, the mean and the median are almost equal, as the distribution is symmetric.

Yous tin can change the hateful role of the previous code for other function to display other measures.

You can also add together the hateful indicate to boxplot past group. In this case, you tin make utilise of the lapply function to avoid for loops. In society to calculate the mean for each group you lot tin can utilize the apply function by columns or the colMeans function. You tin can follow the code cake to add the lines and points for horizontal and vertical box and whiskers diagrams.

          par(mfrow = c(1, 2))  my_df <- copse  #-------------------------- # Vertical boxplot by group #--------------------------  boxplot(my_df, col = rgb(0, 1, 1, alpha = 0.25))  # Add mean lines invisible(lapply(i:ncol(my_df),                 function(i) segments(x0 = i - 0.4,                                      y0 = mean(my_df[, i]),                                      x1 = i + 0.4,                                      y1 = mean(my_df[, i]),                                      col = "red", lwd = ii)))  # Add mean points means <- apply(my_df, 2, mean) ways <- colMeans(my_df) # Equivalent (more efficient)  points(means, col = "red", pch = nineteen)   #---------------------------- # Horizontal boxplot by grouping #----------------------------  boxplot(my_df, col = rgb(0, one, one, alpha = 0.25),         horizontal = True)  # Add hateful lines invisible(lapply(one:ncol(my_df),                 function(i) segments(x0 = mean(my_df[, i]),                                      y0 = i - 0.4,                                      x1 = mean(my_df[, i]),                                      y1 = i + 0.4,                                      col = "cherry-red", lwd = 2)))  # Add hateful points means <- utilise(my_df, two, hateful) means <- colMeans(my_df) # Equivalent (more than efficient)  points(means, 1:ncol(my_df), col = "red", pch = xix)  par(mfrow = c(i, 1))        
Learn how to add mean points and lines to a box and whiskers diagrams by groups

Return values from boxplot

If yous assign the boxplot to a variable, you can return a list with different components. Create a boxplot with the copse dataset and shop it in a variable:

          res <- boxplot(trees) res        
          $`stats`       [, ane] [, two] [, 3] [1, ]  8.30   63  ten.2 [2, ] eleven.05   72  19.4 [3, ] 12.ninety   76  24.2 [4, ] xv.25   80  37.3 [5, ] 20.60   87  58.three  $n [1] 31 31 31  $conf         [, ane]    [, ii]    [, 3] [ane, ] 11.70814 73.72979 xix.1204 [2, ] 14.09186 78.27021 29.2796  $out [1] 77  $group [1] 3  $names [1] "Girth" "Height" "Volume"        

The output will comprise six elements described below:

  • stats: each column represents the lower whisker, the first quartile, the median, the third quartile and the upper whisker of each group.
  • north: number of observations of each group.
  • conf: each column represents the lower and upper extremes of the confidence interval of the median.
  • out: full number of outliers.
  • group: total number of groups.
  • names: names of each grouping.

It is worth to mention that y'all can create a boxplot from the variable you take only created (res) with the bxp part.

          bxp(res)        

Boxplot and histogram

One limitation of box plots is that there are not designed to find multimodality. For that reason, it is also recommended plotting a boxplot combined with a histogram or a density line.

          par(mfrow = c(i, ane))  # Multimodal data n <- 20000 ii <- rbinom(n, ane, 0.5) dat <- rnorm(n, mean = 110, sd = xi) * ii +        rnorm(n, hateful = 70, sd = 5) * (1 - ii)  # Histogram hist(dat, probability = TRUE, ylab = "", col = "grayness",      axes = FALSE, main = "")  # Axis centrality(i)  # Density lines(density(dat), col = "crimson", lwd = two)  # Add together boxplot par(new = TRUE) boxplot(dat, horizontal = Truthful, axes = FALSE,         lwd = 2, col = rgb(0, 1, ane, alpha = 0.15))        
Adding a box diagram over a histogram in R

The boxplot can't find multimodality in the information.

As an alternative to this trouble you can use violin plots or beanplots.

Boxplot in R ggplot2

The boxplots we created in the previous sections can also be plotted with ggplot2 library. For further details read the complete ggplot2 boxplots tutorial.

Boxplot in ggplot2 from vector

The input of the ggplot library has to be a data frame, so you will demand convert the vector to data.frame grade. Then, y'all can use the geom_boxplot function to create and customize the box and the stat_boxplot function to add the error confined.

          # install.packages("ggplot2") library(ggplot2)  # Transform our 'x' vector ten <- data.frame(ten)  # Boxplot with vector ggplot(data = ten, aes(x = "", y = x)) +        stat_boxplot(geom = "errorbar",      # Error confined                     width = 0.two) +        geom_boxplot(fill up = "#4271AE",       # Box color                     outlier.color = "red", # Outliers colour                     alpha = 0.9) +          # Box color transparency        ggtitle("Boxplot with vector") + # Plot title        xlab("") +   # 10-axis label        coord_flip() # Horizontal boxplot        
Create a boxplot in R with stat_boxplot and geom_boxplot

Boxplot in ggplot2 by group

If you desire to create a ggplot boxplot by group, you lot will need to specify variables in the aes statement equally follows:

          # Boxplot by group ggplot(data = chickwts, aes(10 = feed, y = weight)) +        stat_boxplot(geom = "errorbar", # Boxplot with mistake bars                      width = 0.2) +        geom_boxplot(fill up = "#4271AE", color = "#1F3552", # Colors                     blastoff = 0.ix, outlier.colour = "red") +        scale_y_continuous(name = "Weight") +  # Continuous variable label        scale_x_discrete(proper name = "Feed") +      # Grouping label        ggtitle("Boxplot by groups ggplot2") + # Plot title        theme(axis.line = element_line(colour = "blackness", # Theme customization                                       size = 0.25))        
Boxchart bu groups with ggplot2

Boxplot in ggplot2 from dataframe

Finally, for creating a boxplot with ggplot2 with a data frame like the trees dataset, yous will need to stack the data with the stack function:

          # Boxplot from dataframe ggplot(data = stack(copse), aes(x = ind, y = values)) +        stat_boxplot(geom = "errorbar", # Boxplot with error confined                     width = 0.two) +        geom_boxplot(fill = "#4271AE", colour = "#1F3552", # Colors                     blastoff = 0.9, outlier.colour = "cherry") +        scale_y_continuous(proper noun = "Weight") +  # Continuous variable label        scale_x_discrete(name = "Feed") +      # Group characterization        ggtitle("Boxplot from data frame ggplot2") + # Plot title        theme(axis.line = element_line(color = "black", # Theme customization                                       size = 0.25))        
boxplot from data frame in ggplot2

Source: https://r-coder.com/boxplot-r/

Posted by: mcdonnellturper.blogspot.com

0 Response to "How To Draw Box Plots In R"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel