[R package] Streamlined Mixed-Effects Analysis for Agrivoltaics Experiments (Feat. agrivoltaics)

In my previous post, I suggested different statistical models for agrivoltaics studies to explain why we should consider Linear Mixed Models in this field. In many agrivoltaics studies, researchers overlook the actual experimental layout and analyze data using split-plot or RCBD models, focusing only on treatment variables (e.g., inside vs. outside the solar panel array). However, this approach does not accurately reflect real field conditions.
■ [STAT Article] Statistical Models in Agrivoltaics: Linear Mixed Models Across Different Field Layouts
Randomization is fundamental to experimental design because it ensures that, on average, the only systematic difference between treatments is the treatment itself. In typical agrivoltaics experiments, however, the treatment (e.g., shading by solar panels) is perfectly confounded with space—meaning that any observed treatment effect may also be attributable to spatial variation, not just the treatment. As a result, standard split-plot designs are theoretically inappropriate due to a lack of randomization at the site level. Similarly, RCBD is invalid in this context because the experimental layout violates key assumptions required for statistical inference, such as homogeneity and random allocation within blocks.
Therefore, in agrivoltaics studies, Linear Mixed Models are more suitable for analyzing data, but composing the model can be complicated. To assist with this process, I recently developed a new R package called agrivoltaics(). This package helps generate statistical outputs—all we need to do is designate the variables.

1) install the package
First, we need to install agrivoltaics() package, but before installing, you might need to download Rtools (https://cran.r-project.org/bin/windows/Rtools), and the run the following code.
if(!require(remotes)) install.packages("remotes")
library(remotes)
if (!requireNamespace("agrivoltaics", quietly = TRUE)) {
remotes::install_github("agronomy4future/agrivoltaics", force= TRUE)
}
library(agrivoltaics)
After installing this package, run ?agrivoltaics and you can see the R documentation.

and also you can practice this code with actual dataset.

The basic structure of this package is below.
model= agrivoltaics(
output= independent variable (default),
treatment= treatment between AV and control (default),
genotype= genotype (optional),
plot= experimental plot (optional),
block= experimental block or replicate (default),
row= rows within experimental plot (optional),
season= different seasons (optional),
location= different location (optional),
data= dataset
)
and you can just designate each variable.
Let’s practice with an actual dataset.
if(!require(readr)) install.packages("readr")
library(readr)
github="https://raw.githubusercontent.com/agronomy4future/raw_data_practice/refs/heads/main/agrivoltaics.csv"
df= data.frame(read_csv(url(github), show_col_types=FALSE))
df$Plot= as.factor(df$Plot)
df$Yield= as.numeric (df$Yield)
set.seed(100)
print(df[sample(nrow(df),5),])
Season Location AV_Site Genotype Plot Block Row Yield
202 2015 season East AV cv1 101 I East 195.38
112 2016 season MidWest Control cv1 115 IV East 625.07
206 2015 season East AV cv2 102 II East 135.86
4 2015 season MidWest Control cv1 109 I West 384.36
311 2016 season East AV cv1 107 III East 125.11
.
.
.
This dataset is simulated agrivoltaics data collected over 2 seasons at 2 different locations, involving 2 genotypes and 4 replicates (blocks) in both the control and AV sites. Each plot consists of 3 rows.
Let’s simulate in different cases.

First, let’s analyze the yield difference between the AV and control sites. Let’s assume that it’s a single genotype and that there are no rows within the plots.
model= agrivoltaics(
output= Yield,
treatment= AV_Site,
genotype= NULL,
plot= NULL,
block= Block,
row= NULL,
season= NULL,
location= NULL,
data= df
)
MODEL FORMULA USED:
Yield ~ AV_Site + (1 | AV_Site:Block)
VARIANCE COMPONENTS:
Groups Name Variance
AV_Site:Block (Intercept) 113.57
Residual 8086.79
VARIANCE COMPONENT BREAKDOWN (%):
Random [AV_Site:Block]: 1.38%
Residual: 98.62%
TYPE III ANOVA:
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
AV_Site 5311986 5311986 1 5.972 656.87 2.455e-07 ***
agrivoltaics() provides variance component of random factors which are already designated when variables are set up. The variance component analysis shows that approximately 98.6% of the total variance is due to the residual, which reflects within-plot variability or experimental error — such as differences among individual plants, measurement error, or other uncontrolled environmental factors within plots. In contrast, only about 1.4% of the variance is attributed to the random effect of Site:Block, which captures variation among blocks within each site — i.e., spatial heterogeneity in how sites perform across different blocks.
There was a yield difference between the AV and control sites. Let’s conduct a Tukey post-hoc analysis.
post_hoc= cld (emmeans(model, ~ AV_Site), adjust= "sidak", Letters=letters, reverse= TRUE)
print(post_hoc)
AV_Site emmean SE df lower.CL upper.CL .group
Control 475 8.90 6.02 448 501 a
AV 152 8.88 5.98 126 179 b

Now, let’s assume there are rows within the plots.
model= agrivoltaics(
output= Yield,
treatment= AV_Site,
genotype= NULL,
plot= NULL,
block= Block,
row= Row,
season= NULL,
location= NULL,
data= df
)
MODEL FORMULA USED:
Yield ~ AV_Site + Row + AV_Site:Row + (1 | AV_Site:Block) + (1 | Block:Row)
VARIANCE COMPONENTS:
Groups Name Variance
Block:Row (Intercept) 228.72
AV_Site:Block (Intercept) 103.48
Residual 6525.07
VARIANCE COMPONENT BREAKDOWN (%):
Random [Block:Row]: 3.34%
Random [AV_Site:Block]: 1.51%
Residual: 95.16%
TYPE III ANOVA:
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
AV_Site 4187365 4187365 1 4.851 641.7345 2.422e-06 ***
Row 87937 43969 2 7.812 6.7384 0.01992 *
AV_Site:Row 304664 152332 2 300.813 23.3456 3.758e-10 ***
post_hoc= cld (emmeans(model, ~ AV_Site:Row), adjust= "sidak", Letters= letters, reverse= TRUE)
print(post_hoc)
AV_Site Row emmean SE df lower.CL upper.CL .group
Control East 524 13.7 14.7 483 566 a
Control West 471 13.6 14.4 429 512 a
Control Middle 385 16.9 34.0 338 433 b
AV Middle 184 16.9 34.0 136 231 c
AV East 157 13.6 14.4 116 199 c
AV West 131 13.6 14.4 90 173 c

Next, let’s assume there are rows within the plots and that different genotypes are present.
model= agrivoltaics(
output= Yield,
treatment= AV_Site,
genotype= Genotype,
plot= NULL,
block= Block,
row= Row,
season= NULL,
location= NULL,
data= df
)
MODEL FORMULA USED:
Yield ~ AV_Site + Genotype + Row + AV_Site:Row + AV_Site:Genotype + AV_Site:Genotype:Row + (1 | AV_Site:Block) + (1 | Block:Row)
VARIANCE COMPONENTS:
Groups Name Variance
Block:Row (Intercept) 244.45
AV_Site:Block (Intercept) 111.65
Residual 6097.75
VARIANCE COMPONENT BREAKDOWN (%):
Random [Block:Row]: 3.79%
Random [AV_Site:Block]: 1.73%
Residual: 94.48%
TYPE III ANOVA:
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
AV_Site 2851397 2851397 1 9.128 467.6150 3.742e-09 ***
Genotype 132953 132953 1 294.729 21.8036 4.594e-06 ***
Row 82284 41142 2 7.756 6.7470 0.02007 *
AV_Site:Row 236106 118053 2 294.756 19.3601 1.260e-08 ***
AV_Site:Genotype 2197 2197 1 294.788 0.3603 0.54882
AV_Site:Genotype:Row 50912 12728 4 294.732 2.0873 0.08247 .
Since there is no significant AV_Site:Genotype or AV_Site: Genotype:Row interaction, the AV_Site:Row post-hoc means apply similarly to both genotypes (CV1 and CV2).
post_hoc= cld (emmeans(model, ~ AV_Site:Row), adjust= "sidak", Letters=letters, reverse= TRUE)
print(post_hoc)
AV_Site Row emmean SE df lower.CL upper.CL .group
Control East 524 13.6 14.4 482 565 a
Control West 471 13.6 14.2 429 512 a
Control Middle 385 16.7 32.0 338 432 b
AV Middle 184 16.7 32.0 137 231 c
AV East 157 13.6 14.2 116 199 c
AV West 131 13.6 14.2 90 173 c
The AV_Site:Genotype interaction is not significant, and it means that the effect of AV vs. Control is the same for both genotypes. The three-way interaction (AV_Site:Genotype:Row) is also not significant, and it means that the AV_Site:Row pattern does not change depending on genotype, indicating that the ranking of means (Control East > Control West > Control Middle > AV Middle > AV East > AV West) applies to both CV1 and CV2.

How about using a standard ANOVA?
Even though a standard RCBD violates key assumptions for agrivoltaic experiments — such as homogeneity and random allocation within blocks — let’s conduct a standard ANOVA for comparison with the linear mixed model.
if(!require(agricolae)) install.packages("agricolae")
library(agricolae)
model= aov(Yield ~ AV_Site + Genotype + Row + AV_Site:Genotype:Row + AV_Site:Row + AV_Site:Genotype + Block, data=df)
summary(model)
Df Sum Sq Mean Sq F value Pr(>F)
AV_Site 1 8284302 8284302 1317.222 < 2e-16 ***
Genotype 1 114194 114194 18.157 2.72e-05 ***
Row 2 165598 82799 13.165 3.29e-06 ***
Block 3 42001 14000 2.226 0.0852 .
AV_Site:Row 2 303911 151956 24.161 1.83e-10 ***
AV_Site:Genotype 1 2052 2052 0.326 0.5683
AV_Site:Genotype:Row 4 50969 12742 2.026 0.0907 .
Residuals 304 1911923 6289
Tukey_Test= HSD.test(model, c("AV_Site","Row"))
print(Tukey_Test)
Yield groups
Control:East 523.9762 a
Control:West 470.6886 b
Control:Middle 385.2591 c
AV:Middle 183.7072 d
AV:East 157.4742 de
AV:West 131.4369 e
Let’s compare the Tukey post-hoc analyses from the two models, noting that there were some differences in the results.

code summary: https://github.com/agronomy4future/r_code/blob/main/Streamlined_Mixed_Effects_Analysis_for_Agrivoltaics_Experiments.ipynb

We aim to develop open-source code for agronomy ([email protected])
© 2022 – 2025 https://agronomy4future.com – All Rights Reserved.
Last Updated: 03/08/2025