Efficient Multivariate Summary in R: A Guide to Analyzing Multiple Independent Variables (2/2)

Efficient Multivariate Summary in R: A Guide to Analyzing Multiple Independent Variables

In my previous post, I introduced how to summarize data using dplyr(). Let’s upload a dataset and summarize the data by calculating the mean, standard deviation, and standard error.

if(!require(readr)) install.packages("readr")
library(readr)
github="https://raw.githubusercontent.com/agronomy4future/raw_data_practice/main/fertilizer_fungicide_treatment_with_missing_values.csv"
df= data.frame(read_csv(url(github), show_col_types= FALSE))

head(df, 3)
  Fertilizer Fungicide Yield Height
1    Control       Yes  12.2     45
2    Control       Yes  12.4     NA
3    Control       Yes  11.9     42
.
.
.

if(!require(dplyr)) install.packages("dplyr")
library(dplyr)
summary= data.frame(df %>%
                   group_by(Fertilizer, Fungicide) %>%
                   dplyr::summarize(across(c(Yield, Height), 
                                .fns= list(Mean=~mean(., na.rm= TRUE), 
                                  SD= ~sd(., na.rm= TRUE), 
                                   n=~length(.),
                                   se=~sd(.,na.rm= TRUE) / sqrt(length(.))))))

print(summary)     
  Fertilizer Fungicide Yield_Mean  Yield_SD Yield_n   Yield_se Height_Mean
1    Control        No     12.340 0.6426508       5 0.28740216       48.00
2    Control       Yes     11.920 0.4207137       5 0.18814888       40.50
3       Fast        No      9.675 0.5057997       5 0.22620050       55.20
4       Fast       Yes      9.525 0.0500000       5 0.02236068       53.20
5       Slow        No     15.800 0.1632993       5 0.07302967       48.25
6       Slow       Yes     15.675 0.6396614       5 0.28606526       49.40
  Height_SD Height_n Height_se
1 11.518102        5  5.151052
2  4.203173        5  1.879716
3 10.329569        5  4.619524
4  5.167204        5  2.310844
5  2.362908        5  1.056724
6 13.867228        5  6.201613

Here’s an alternative method to summarize the data.

if(!require(dplyr)) install.packages("dplyr")
library(dplyr)

summary1= data.frame(df %>%
  group_by(Fertilizer, Fungicide) %>%
  summarize(
    Yield_mean = mean(Yield, na.rm = TRUE),   
    Yield_SD = sd(Yield, na.rm = TRUE),        
    Yield_n = n(),                                  
    Yield_SE = Yield_SD / sqrt(Yield_n), 
    
    Height_mean = mean(Height, na.rm = TRUE),     
    Height_SD = sd(Height, na.rm = TRUE),          
    Height_n = n(),                                 
    Height_SE = Height_SD / sqrt(Height_n),     
    .groups = "drop"
  ))

print(summary1)


  Fertilizer Fungicide Yield_mean  Yield_SD Yield_n   Yield_SE Height_mean
1    Control        No     12.340 0.6426508       5 0.28740216       48.00
2    Control       Yes     11.920 0.4207137       5 0.18814888       40.50
3       Fast        No      9.675 0.5057997       5 0.22620050       55.20
4       Fast       Yes      9.525 0.0500000       5 0.02236068       53.20
5       Slow        No     15.800 0.1632993       5 0.07302967       48.25
6       Slow       Yes     15.675 0.6396614       5 0.28606526       49.40
  Height_SD Height_n Height_SE
1 11.518102        5  5.151052
2  4.203173        5  1.879716
3 10.329569        5  4.619524
4  5.167204        5  2.310844
5  2.362908        5  1.056724
6 13.867228        5  6.201613

We aim to develop open-source code for agronomy ([email protected])

Last Updated: 11/12/2023