Step-by-Step Guide to Calculating and Adding Variable Means in R
Here is one dataset.
treatment=rep(c("A","B","C","D","E"), each=3)
rep=rep(c("I","II","III"), time=5)
yield= c(10,11,21,13,23,23,13,13,5,33,21,13,42,12,13)
dataA=data.frame(treatment,rep, yield)
print(head(dataA, 5))
treatment rep yield
1 A I 10
2 A II 11
3 A III 21
4 B I 13
5 B II 23
.
.
.
I want to add the mean of each treatment to a new column, and I am using the following code.
dataA$mean= NA #to create an empty column
dataA$mean[dataA$treatment=="A"]=mean(dataA$yield[dataA$treatment=="A"], na.rm=TRUE)
dataA$mean[dataA$treatment=="B"]=mean(dataA$yield[dataA$treatment=="B"], na.rm=TRUE)
dataA$mean[dataA$treatment=="C"]=mean(dataA$yield[dataA$treatment=="C"], na.rm=TRUE)
dataA$mean[dataA$treatment=="D"]=mean(dataA$yield[dataA$treatment=="D"], na.rm=TRUE)
dataA$mean[dataA$treatment=="E"]=mean(dataA$yield[dataA$treatment=="E"], na.rm=TRUE)
print(head(dataA, 5))
treatment rep yield mean
1 A I 10 14.00000
2 A II 11 14.00000
3 A III 21 14.00000
4 B I 13 19.66667
5 B II 23 19.66667
.
.
.
However, the code is quite lengthy. Let’s simplify it using tapply()
if(!require(base)) install.packages("base")
library(base)
dataA$mean2=tapply(dataA$yield, dataA$treatment, mean, na.rm=TRUE)[dataA$treatment]
print(head(dataA, 5))
treatment rep yield mean mean2
1 A I 10 14.00000 14.00000
2 A II 11 14.00000 14.00000
3 A III 21 14.00000 14.00000
4 B I 13 19.66667 19.66667
5 B II 23 19.66667 19.66667
.
.
.
How about there are more variables?
treatment=rep(rep(c("A","B","C","D","E"), each=3),2)
rep=rep(rep(c("I","II","III"), time=5),2)
environment=rep(c("East","West","North"), each=10)
yield=c(10,11,21,13,23,23,13,13,5,33,21,13,42,12,13,10,11,54,45,39,33,29,43,55,33,24,32,42,28,43)
dataA=data.frame(treatment,rep, environment, yield)
print(head(dataA, 5))
treatment rep environment yield
1 A I East 10
2 A II East 11
3 A III East 21
4 B I East 13
5 B II East 23
.
.
.
Now, I want to add the mean of the combination of treatment and environment.
if(!require(base)) install.packages("base")
library(base)
dataA$mean=tapply(dataA$yield, list(dataA$treatment, dataA$environment), mean, na.rm=TRUE)[cbind(dataA$treatment, dataA$environment)]
print(head(dataA, 5))
treatment rep environment yield mean
1 A I East 10 14.00000
2 A II East 11 14.00000
3 A III East 21 14.00000
4 B I East 13 19.66667
5 B II East 23 19.66667
.
.
.
I want to calculate the mean of combination between A and North
Full: https://github.com/agronomy4future/r_code/blob/main/Step_by_Step_Guide_to_Calculating_and_Adding_Variable_Means_in_R.ipynb

We aim to develop open-source code for agronomy ([email protected])
© 2022 – 2025 https://agronomy4future.com – All Rights Reserved.
Last Updated: 01/03/2025
Your donation will help us create high-quality content.
PayPal @agronomy4furure / Venmo @agronomy4furure / Zelle @agronomy4furure