[R package] Streamlined Mixed-Effects Analysis for Agrivoltaics Experiments (Feat. agrivoltaics)

August 3, 2025 JK

In my previous post, I suggested different statistical models for agrivoltaics studies to explain why we should consider Linear Mixed Models in this field. In many agrivoltaics studies, researchers overlook the actual experimental layout and analyze data using split-plot or RCBD models, focusing only on treatment variables (e.g., inside vs. outside the solar panel array). However, this approach does not accurately reflect real field conditions.

■ [STAT Article] Statistical Models in Agrivoltaics: Linear Mixed Models Across Different Field Layouts

Randomization is fundamental to experimental design because it ensures that, on average, the only systematic difference between treatments is the treatment itself. In typical agrivoltaics experiments, however, the treatment (e.g., shading by solar panels) is perfectly confounded with space—meaning that any observed treatment effect may also be attributable to spatial variation, not just the treatment. As a result, standard split-plot designs are theoretically inappropriate due to a lack of randomization at the site level. Similarly, RCBD is invalid in this context because the experimental layout violates key assumptions required for statistical inference, such as homogeneity and random allocation within blocks.

Therefore, in agrivoltaics studies, Linear Mixed Models are more suitable for analyzing data, but composing the model can be complicated. To assist with this process, I recently developed a new R package called agrivoltaics(). This package helps generate statistical outputs—all we need to do is designate the variables.

1) install the package

First, we need to install agrivoltaics() package, but before installing, you might need to download Rtools (https://cran.r-project.org/bin/windows/Rtools), and the run the following code.

if(!require(remotes)) install.packages("remotes")
library(remotes)
if (!requireNamespace("agrivoltaics", quietly = TRUE)) {
  remotes::install_github("agronomy4future/agrivoltaics", force= TRUE)
}
library(agrivoltaics)

After installing this package, run ?agrivoltaics and you can see the R documentation.

and also you can practice this code with actual dataset.

The basic structure of this package is below.

model= agrivoltaics(
  output= independent variable (default),
  treatment= treatment between AV and control (default),
  genotype= genotype (optional),
  plot= experimental plot (optional),
  block= experimental block or replicate (default),
  row= rows within experimental plot (optional),
  season= different seasons (optional),
  location= different location (optional),
  data= dataset
)

and you can just designate each variable.

Let’s practice with an actual dataset.

if(!require(readr)) install.packages("readr")
library(readr)
github="https://raw.githubusercontent.com/agronomy4future/raw_data_practice/refs/heads/main/agrivoltaics.csv"
df= data.frame(read_csv(url(github), show_col_types=FALSE))
df$Plot= as.factor(df$Plot)
df$Yield= as.numeric (df$Yield)
set.seed(100)
print(df[sample(nrow(df),5),])
         Season Location AV_Site Genotype Plot Block  Row  Yield
202 2015 season     East      AV      cv1  101     I East 195.38
112 2016 season  MidWest Control      cv1  115    IV East 625.07
206 2015 season     East      AV      cv2  102    II East 135.86
4   2015 season  MidWest Control      cv1  109     I West 384.36
311 2016 season     East      AV      cv1  107   III East 125.11
.
.
.

This dataset is simulated agrivoltaics data collected over 2 seasons at 2 different locations, involving 2 genotypes and 4 replicates (blocks) in both the control and AV sites. Each plot consists of 3 rows.

Let’s simulate in different cases.

First, let’s analyze the yield difference between the AV and control sites. Let’s assume that it’s a single genotype and that there are no rows within the plots.

model= agrivoltaics(
  output= Yield,
  treatment= AV_Site,
  genotype= NULL,
  plot= NULL,
  block= Block,
  row= NULL,
  season= NULL,
  location= NULL,
  data= df
)

 MODEL FORMULA USED:
Yield ~ AV_Site + (1 | AV_Site:Block) 

 VARIANCE COMPONENTS:
 Groups        Name        Variance
 AV_Site:Block (Intercept)  113.57 
 Residual                  8086.79 

 VARIANCE COMPONENT BREAKDOWN (%):
Random [AV_Site:Block]: 1.38%
Residual: 98.62%

 TYPE III ANOVA:
Type III Analysis of Variance Table with Satterthwaite's method
         Sum Sq Mean Sq NumDF DenDF F value    Pr(>F)    
AV_Site 5311986 5311986     1 5.972  656.87 2.455e-07 ***

agrivoltaics() provides variance component of random factors which are already designated when variables are set up. The variance component analysis shows that approximately 98.6% of the total variance is due to the residual, which reflects within-plot variability or experimental error — such as differences among individual plants, measurement error, or other uncontrolled environmental factors within plots. In contrast, only about 1.4% of the variance is attributed to the random effect of Site:Block, which captures variation among blocks within each site — i.e., spatial heterogeneity in how sites perform across different blocks.

There was a yield difference between the AV and control sites. Let’s conduct a Tukey post-hoc analysis.

post_hoc= cld (emmeans(model, ~ AV_Site), adjust= "sidak", Letters=letters, reverse= TRUE)

print(post_hoc)
 AV_Site emmean   SE   df lower.CL upper.CL .group
 Control    475 8.90 6.02      448      501  a    
 AV         152 8.88 5.98      126      179   b

Now, let’s assume there are rows within the plots.

model= agrivoltaics(
  output= Yield,
  treatment= AV_Site,
  genotype= NULL,
  plot= NULL,
  block= Block,
  row= Row,
  season= NULL,
  location= NULL,
  data= df
)

MODEL FORMULA USED:
Yield ~ AV_Site + Row + AV_Site:Row + (1 | AV_Site:Block) + (1 | Block:Row)

 VARIANCE COMPONENTS:
 Groups        Name        Variance
 Block:Row     (Intercept)  228.72 
 AV_Site:Block (Intercept)  103.48 
 Residual                  6525.07 

 VARIANCE COMPONENT BREAKDOWN (%):
Random [Block:Row]: 3.34%
Random [AV_Site:Block]: 1.51%
Residual: 95.16%

 TYPE III ANOVA:
Type III Analysis of Variance Table with Satterthwaite's method
             Sum Sq Mean Sq NumDF   DenDF  F value    Pr(>F)    
AV_Site     4187365 4187365     1   4.851 641.7345 2.422e-06 ***
Row           87937   43969     2   7.812   6.7384   0.01992 *  
AV_Site:Row  304664  152332     2 300.813  23.3456 3.758e-10 ***

post_hoc= cld (emmeans(model, ~ AV_Site:Row), adjust= "sidak", Letters= letters, reverse= TRUE)

print(post_hoc)
 AV_Site Row    emmean   SE   df lower.CL upper.CL .group
 Control East      524 13.7 14.7      483      566  a    
 Control West      471 13.6 14.4      429      512  a    
 Control Middle    385 16.9 34.0      338      433   b   
 AV      Middle    184 16.9 34.0      136      231    c  
 AV      East      157 13.6 14.4      116      199    c  
 AV      West      131 13.6 14.4       90      173    c

Next, let’s assume there are rows within the plots and that different genotypes are present.

model= agrivoltaics(
  output= Yield,
  treatment= AV_Site,
  genotype= Genotype,
  plot= NULL,
  block= Block,
  row= Row,
  season= NULL,
  location= NULL,
  data= df
)

 MODEL FORMULA USED:
Yield ~ AV_Site + Genotype + Row + AV_Site:Row + AV_Site:Genotype +      AV_Site:Genotype:Row + (1 | AV_Site:Block) + (1 | Block:Row) 

 VARIANCE COMPONENTS:
 Groups        Name        Variance
 Block:Row     (Intercept)  244.45 
 AV_Site:Block (Intercept)  111.65 
 Residual                  6097.75 

 VARIANCE COMPONENT BREAKDOWN (%):
Random [Block:Row]: 3.79%
Random [AV_Site:Block]: 1.73%
Residual: 94.48%

 TYPE III ANOVA:
Type III Analysis of Variance Table with Satterthwaite's method
                      Sum Sq Mean Sq NumDF   DenDF  F value    Pr(>F)    
AV_Site              2851397 2851397     1   9.128 467.6150 3.742e-09 ***
Genotype              132953  132953     1 294.729  21.8036 4.594e-06 ***
Row                    82284   41142     2   7.756   6.7470   0.02007 *  
AV_Site:Row           236106  118053     2 294.756  19.3601 1.260e-08 ***
AV_Site:Genotype        2197    2197     1 294.788   0.3603   0.54882    
AV_Site:Genotype:Row   50912   12728     4 294.732   2.0873   0.08247 .

Since there is no significant AV_Site:Genotype or AV_Site: Genotype:Row interaction, the AV_Site:Row post-hoc means apply similarly to both genotypes (CV1 and CV2).

post_hoc= cld (emmeans(model, ~ AV_Site:Row), adjust= "sidak", Letters=letters, reverse= TRUE)

print(post_hoc)
AV_Site Row    emmean   SE   df lower.CL upper.CL .group
 Control East      524 13.6 14.4      482      565  a    
 Control West      471 13.6 14.2      429      512  a    
 Control Middle    385 16.7 32.0      338      432   b   
 AV      Middle    184 16.7 32.0      137      231    c  
 AV      East      157 13.6 14.2      116      199    c  
 AV      West      131 13.6 14.2       90      173    c

The AV_Site:Genotype interaction is not significant, and it means that the effect of AV vs. Control is the same for both genotypes. The three-way interaction (AV_Site:Genotype:Row) is also not significant, and it means that the AV_Site:Row pattern does not change depending on genotype, indicating that the ranking of means (Control East > Control West > Control Middle > AV Middle > AV East > AV West) applies to both CV1 and CV2.

How about using a standard ANOVA?

Even though a standard RCBD violates key assumptions for agrivoltaic experiments — such as homogeneity and random allocation within blocks — let’s conduct a standard ANOVA for comparison with the linear mixed model.

if(!require(agricolae)) install.packages("agricolae")
library(agricolae)

model= aov(Yield ~ AV_Site + Genotype + Row + AV_Site:Genotype:Row + AV_Site:Row + AV_Site:Genotype + Block, data=df)

summary(model)
                      Df  Sum Sq Mean Sq  F value   Pr(>F)    
AV_Site                1 8284302 8284302 1317.222  < 2e-16 ***
Genotype               1  114194  114194   18.157 2.72e-05 ***
Row                    2  165598   82799   13.165 3.29e-06 ***
Block                  3   42001   14000    2.226   0.0852 .  
AV_Site:Row            2  303911  151956   24.161 1.83e-10 ***
AV_Site:Genotype       1    2052    2052    0.326   0.5683    
AV_Site:Genotype:Row   4   50969   12742    2.026   0.0907 .  
Residuals            304 1911923    6289                   

Tukey_Test= HSD.test(model, c("AV_Site","Row"))

print(Tukey_Test)
                  Yield groups
Control:East   523.9762      a
Control:West   470.6886      b
Control:Middle 385.2591      c
AV:Middle      183.7072      d
AV:East        157.4742     de
AV:West        131.4369      e

Let’s compare the Tukey post-hoc analyses from the two models, noting that there were some differences in the results.

code summary: https://github.com/agronomy4future/r_code/blob/main/Streamlined_Mixed_Effects_Analysis_for_Agrivoltaics_Experiments.ipynb

We aim to develop open-source code for agronomy ([email protected])

Last Updated: 03/08/2025

Agronomy4future

Stories about cereals and statistics (plus coding). We aim to develop open-source code for agronomy.

[R package] Streamlined Mixed-Effects Analysis for Agrivoltaics Experiments (Feat. agrivoltaics)

August 3, 2025 JK

■ [STAT Article] Statistical Models in Agrivoltaics: Linear Mixed Models Across Different Field Layouts

1) install the package

How about using a standard ANOVA?