Step-by-Step Guide to Calculating and Adding Variable Means in R

Step-by-Step Guide to Calculating and Adding Variable Means in R


Here is one dataset.

treatment=rep(c("A","B","C","D","E"), each=3)
rep=rep(c("I","II","III"), time=5)
yield= c(10,11,21,13,23,23,13,13,5,33,21,13,42,12,13)
dataA=data.frame(treatment,rep, yield)

print(head(dataA, 5))
  treatment rep yield
1         A   I    10
2         A  II    11
3         A III    21
4         B   I    13
5         B  II    23
.
.
.

I want to add the mean of each treatment to a new column, and I am using the following code.

dataA$mean= NA #to create an empty column

dataA$mean[dataA$treatment=="A"]=mean(dataA$yield[dataA$treatment=="A"], na.rm=TRUE)
dataA$mean[dataA$treatment=="B"]=mean(dataA$yield[dataA$treatment=="B"], na.rm=TRUE)
dataA$mean[dataA$treatment=="C"]=mean(dataA$yield[dataA$treatment=="C"], na.rm=TRUE)
dataA$mean[dataA$treatment=="D"]=mean(dataA$yield[dataA$treatment=="D"], na.rm=TRUE)
dataA$mean[dataA$treatment=="E"]=mean(dataA$yield[dataA$treatment=="E"], na.rm=TRUE)

print(head(dataA, 5))
  treatment rep yield     mean
1         A   I    10 14.00000
2         A  II    11 14.00000
3         A III    21 14.00000
4         B   I    13 19.66667
5         B  II    23 19.66667
.
.
.


However, the code is quite lengthy. Let’s simplify it using tapply()

if(!require(base)) install.packages("base")
library(base)

dataA$mean2=tapply(dataA$yield, dataA$treatment, mean, na.rm=TRUE)[dataA$treatment]

print(head(dataA, 5))
  treatment rep yield     mean    mean2
1         A   I    10 14.00000 14.00000
2         A  II    11 14.00000 14.00000
3         A III    21 14.00000 14.00000
4         B   I    13 19.66667 19.66667
5         B  II    23 19.66667 19.66667
.
.
.


How about there are more variables?

treatment=rep(rep(c("A","B","C","D","E"), each=3),2)
rep=rep(rep(c("I","II","III"), time=5),2)
environment=rep(c("East","West","North"), each=10)
yield=c(10,11,21,13,23,23,13,13,5,33,21,13,42,12,13,10,11,54,45,39,33,29,43,55,33,24,32,42,28,43)
dataA=data.frame(treatment,rep, environment, yield)

print(head(dataA, 5))
  treatment rep environment yield
1         A   I        East    10
2         A  II        East    11
3         A III        East    21
4         B   I        East    13
5         B  II        East    23
.
.
.

Now, I want to add the mean of the combination of treatment and environment.

if(!require(base)) install.packages("base")
library(base)

dataA$mean=tapply(dataA$yield, list(dataA$treatment, dataA$environment), mean, na.rm=TRUE)[cbind(dataA$treatment, dataA$environment)]

print(head(dataA, 5))
  treatment rep environment yield     mean
1         A   I        East    10 14.00000
2         A  II        East    11 14.00000
3         A III        East    21 14.00000
4         B   I        East    13 19.66667
5         B  II        East    23 19.66667
.
.
.

I want to calculate the mean of combination between A and North

Full: https://github.com/agronomy4future/r_code/blob/main/Step_by_Step_Guide_to_Calculating_and_Adding_Variable_Means_in_R.ipynb

We aim to develop open-source code for agronomy ([email protected])

© 2022 – 2025 https://agronomy4future.com – All Rights Reserved.

Last Updated: 01/03/2025

Your donation will help us create high-quality content.
PayPal @agronomy4furure / Venmo @agronomy4furure / Zelle @agronomy4furure

Comments are closed.