[STAT Article] Statistical Models in Agrivoltaics: Linear Mixed Models Across Different Field Layouts

June 28, 2025 JK

Agrivoltaics is the study and practice of combining agriculture and solar energy production on the same land. The core idea is to install solar panels above or among crops, allowing for simultaneous food and energy production.

Generally, an agrivoltaics study investigates how this dual-use approach affects crop growth and yield (due to changes in light, temperature, and moisture), microclimate conditions under the panels, solar panel efficiency influenced by vegetation, as well as land-use efficiency, economic outcomes, and sustainability metrics.

Today, I’ll explain why standard ANOVA or split-plot models are often inadequate for agrivoltaics research, and why linear mixed models (LMMs) are recommended to account for the complex variability in treatment effects.

■ Layout for Split-Plot and RCBD Designs, and Their Limitations in Agrivoltaics Studies

The layout below represents a typical split-plot design used in field experiments. This design is fully randomized across both sites (Control and Agrivoltaics, or AV) and crops (e.g., Crop 1 to Crop 3), adhering to the principles of a split-plot structure.

However, such a design becomes impractical for agrivoltaics research due to the scale and fixed infrastructure of solar panel installations. For instance, if the agrivoltaics site spans 50 acres, implementing full randomization as per this layout would require ca. 200 acres of land to accommodate both control and treatment conditions. Establishing a control site within the AV system is not feasible, as achieving full-sun conditions would necessitate removal of existing solar panels, defeating the purpose of the study.

Source: ACT Power Services

Most researchers conducting agrivoltaics studies overlook this issue and apply a split-plot statistical model, even when the actual experimental layout does not conform to the assumptions of a true split-plot design.

Additionally, some researchers use a standard ANOVA model; however, this approach is poorly aligned with the actual layout of agrivoltaics studies. The design shown below is a randomized complete block design (RCBD), where treatments are fully randomized within each block. While RCBD is appropriate for many agricultural field trials, its assumptions and structure differ significantly from the typical constraints and spatial configurations found in agrivoltaics systems. As such, applying an RCBD model to agrivoltaics experiments can lead to misrepresentation of treatment effects and variability.

In many agrivoltaics studies, researchers overlook the actual experimental layout and analyze the data using split-plot or RCBD models by simply accounting for treatment variables (e.g., inside vs. outside the solar panel array). However, this approach does not accurately reflect real field conditions.

Randomization is fundamental to experimental design because it ensures that, on average, the only systematic difference between treatments is the treatment itself. In typical agrivoltaics experiments, however, the treatment (e.g., shading by solar panels) is perfectly confounded with space—meaning that any observed treatment effect may also be attributable to spatial variation, not just the treatment. As a result, standard split-plot designs are theoretically inappropriate due to a lack of randomization at the site level. Similarly, RCBD is invalid in this context because the experimental layout violates key assumptions required for statistical inference, such as homogeneity and random allocation within blocks.

To address these challenges, I consulted with several statisticians and developed tailored statistical models for specific cases, ensuring that the analysis aligns with the spatial and structural constraints unique to agrivoltaics research.

■ Linear Mixed Models for different field layouts

Before starting, let’s make it clear about the terminology. I defined site, plot, block and row like below.

1) Single crop (only 1 cultivar) with a single row

if(!require(readr)) install.packages("readr")
library(readr)
github="https://raw.githubusercontent.com/agronomy4future/raw_data_practice/refs/heads/main/yield_shading_spatial_genotype.csv"
df= data.frame(read_csv(url(github), show_col_types= FALSE))
df$Spatial= as.factor(df$Spatial)
colnames(df)[2]=c("Site")
CV1=subset(df, Genotype=="CV1")
set.seed(100)

print(CV1[sample(nrow(CV1),5),])
   Genotype    Site Block Spatial Fruit_No Yield
10      CV1 Control   III       2        1  14.0
6       CV1 Control    II       2        4  17.5
16      CV1      AV   III       2        1  10.0
3       CV1 Control     I       3        5  15.2
9       CV1 Control   III       1        4   5.8
.
.
.

This field layout is for 1 single crop (no genotypic variance) with a single row.

if(!require(lme4)) install.packages("lme4")
if(!require(lmerTest)) install.packages("lmerTest")
library(lme4)
library(lmerTest)

model= lmer(Yield ~ Site + (1|Site:Block), data=CV1)
print(VarCorr(model), comp="Variance")
 Groups     Name        Variance
 Site:Block (Intercept)  1.9747 
 Residual               42.6318 

cat("Variance [Site:Block]", round(1.9747 / (1.9747 + 42.6318) * 100, 2), "%\n")
cat("Variance [Residual]", round(42.6318 / (1.9747 + 42.6318) * 100, 2), "%\n")
Variance [Site:Block] 4.43 %
Variance [Residual] 95.57 %

The variance component analysis shows that approximately 96% of the total variance is due to the residual, which reflects within-plot variability or experimental error — such as differences among individual plants, measurement error, or other uncontrolled environmental factors within plots. In contrast, only about 4% of the variance is attributed to the random effect of Site:Block, which captures variation among blocks within each site — i.e., spatial heterogeneity in how sites perform across different blocks.

print(anova(model, type= 3))
Type III Analysis of Variance Table with Satterthwaite's method
     Sum Sq Mean Sq NumDF  DenDF F value Pr(>F)
Site 42.178  42.178     1 4.4632  0.9893 0.3707

There is no Site effect, indicating that there is no shading effect on crop yield.

if(!require(emmeans)) install.packages("emmeans")
library(emmeans)
if(!require(pbkrtest)) install.packages("pbkrtest")
library(pbkrtest)

EMMeans= emmeans(model, ~ Site)
print(summary(EMMeans, digits= 2))
 Site    emmean   SE   df lower.CL upper.CL
 AV        15.4 3.61 5.93     6.58     24.3
 Control   11.5 1.91 3.12     5.59     17.5

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95 

###
pairwise= contrast(EMMeans, method= "pairwise", adjust= "none")
print(summary(pairwise))
 contrast     estimate   SE   df t.ratio p.value
 AV - Control     3.91 4.08 5.06   0.956  0.3823

Degrees-of-freedom method: kenward-roger

if(!require(multcompView)) install.packages("multcompView")
if(!require(multcomp)) install.packages("multcomp")
library(multcompView)
library(multcomp)

mean_compared= cld (EMMeans, adjust= "none", Letters= letters, reverse= TRUE) 
print(mean_compared)
Site    emmean   SE   df lower.CL upper.CL .group
AV        15.4 3.61 5.93     6.58     24.3  a    
Control   11.5 1.91 3.12     5.59     17.5  a

There is no difference between AV and control site. This indicates that most of the variability in yield is not due to differences among sites or blocks, but rather due to unexplained, plot-level noise. As such, if no statistically significant difference is detected between sites, it is not necessarily because treatments are equivalent, but rather because the high residual variability may be masking real treatment effects. Therefore, the lack of statistical significance should be interpreted with caution, as it may reflect insufficient statistical power due to large experimental error, rather than true equivalence between sites.

2) Single crop (only 1 cultivar) with multiple rows

if(!require(readr)) install.packages("readr")
library(readr)
github="https://raw.githubusercontent.com/agronomy4future/raw_data_practice/refs/heads/main/agrivoltaics_dataset_example_1.csv"
df= data.frame(read_csv(url(github), show_col_types= FALSE))
df$Rows= as.factor(df$Rows)
set.seed(100)

print(df[sample(nrow(df),5),])
      Site Block Rows  Yield   AGW    GN
10      AV    II    2 343.31 28.79 11926
38 Control    II    2 496.73 26.29 19952
25 Control     I    1 293.05 28.12 13248
14      AV   III    2 442.08 25.39 17414
23 Control   III    1 542.28 20.79 30974
.
.
.

This field layout is for 1 single crop (no genotypic variance) with multiple rows.

if(!require(lme4)) install.packages("lme4")
if(!require(lmerTest)) install.packages("lmerTest")
library(lme4)
library(lmerTest)

model= lmer(Yield~ Site + Rows + Site:Rows + (1|Block), data=df)
print(VarCorr(model), comp="Variance")
 Groups   Name        Variance
 Block    (Intercept) 5645.9  
 Residual             6492.3

cat("Variance [Block]", round(5645.9 / (5645.9  + 6492.3) * 100, 2), "%\n")
cat("Variance [Residual]", round(6492.3 / (5645.9  + 6492.3) * 100, 2), "%\n")
Variance [Block] 46.51 %
Variance [Residual] 53.49 %

The model indicates that block-to-block differences account for ca. 46.5% of the total yield variation, reflecting substantial environmental or management variability across blocks. The remaining 53.5% is due to residual (plot-level) variation, which may include differences in row position, microclimate, or other unmeasured factors.

print(anova(model, type= 3))
Type III Analysis of Variance Table with Satterthwaite's method
          Sum Sq Mean Sq NumDF DenDF F value   Pr(>F)   
Site       12704   12704     1    31  1.9568 0.171783   
Rows       78006   39003     2    31  6.0076 0.006237 **
Site:Rows  25074   12537     2    31  1.9310 0.162045

In particular, row position appears to influence yield, likely due to spatial factors such as edge effects or shading gradients, which are common in agrivoltaics systems. While there are yield differences between the AV and Control sites, the Site effect is not consistently stronger than the Row effect.

if(!require(emmeans)) install.packages("emmeans")
if(!require(pbkrtest)) install.packages("pbkrtest")
if(!require(multcompView)) install.packages("multcompView")
if(!require(multcomp)) install.packages("multcomp")
library(emmeans)
library(pbkrtest)
library(multcompView)
library(multcomp)

EMMeans= emmeans(model, ~ Rows)
mean_compared= cld (EMMeans, adjust= "none", Letters=letters, reverse = TRUE)
print(mean_compared)
 Rows emmean   SE   df lower.CL upper.CL .group
 3       505 47.1 5.89      390      621  a    
 2       409 42.6 3.99      290      527   b   
 1       387 42.6 3.99      268      505   b

This suggests that within-site spatial variability may obscure or confound any overall Site-level differences.

3) Single crop (only 1 cultivar) with a single row in different seasons

if(!require(readr)) install.packages("readr")
library(readr)
github="https://raw.githubusercontent.com/agronomy4future/raw_data_practice/refs/heads/main/agrivoltaics_dataset_example_2.csv"
df= data.frame(read_csv(url(github), show_col_types= FALSE))
df$Season=as.factor(df$Season)
df=subset(df, Rows=="3")
set.seed(100)

print(head(df,5))
   Season    Site Block Rows  Yield   AGW    GN
3    2017      AV     I    3 260.44 25.47 10224
8    2017      AV    II    3 619.96 23.88 25959
13   2017      AV   III    3 544.07 23.86 22799
18   2017      AV    IV    3 644.60 20.89 30858
29   2017 Control     I    3 372.60 28.12 13248
.
.
.

This field layout is for 1 single crop (no genotypic variance) with a single row in different seasons. In this model, plot is regarded as a random effect, but there is only one genotype (single crop), plot is the same as block.

if(!require(lme4)) install.packages("lme4")
if(!require(lmerTest)) install.packages("lmerTest")
library(lme4)
library(lmerTest)

model= lmer(Yield~ Season + Site + Season:Site + (1|Block), data=df)
print(VarCorr(model), comp="Variance")
 Groups   Name        Variance
 Block    (Intercept) 4473.4  
 Residual             8121.9  

cat("Variance [Block]", round(4473.4 / (4473.4 + 8121.9) * 100, 2), "%\n")
cat("Variance [Residual]", round(8121.9 / (4473.4 + 8121.9) * 100, 2), "%\n")
Variance [Block] 35.52 %
Variance [Residual] 64.48 %

The block effect is 35.5% and about 1/3 of the total variability in yield is due to differences between blocks. This suggests that block-to-block variability (e.g., differences in soil, microclimate, or management) has a moderate but meaningful effect on yield. Including block as a random effect is appropriate and important for properly accounting for spatial/environmental heterogeneity. In the same way, residual effect is 64.5% and the majority of variation comes from within-block variability, including plot-level environmental noise, plant-to-plant differences, measurement error, and any unmeasured factors not captured by block-level grouping.

print(anova(model, type= 3))
Type III Analysis of Variance Table with Satterthwaite's method
            Sum Sq Mean Sq NumDF DenDF F value   Pr(>F)   
Season      164026  164026     1     9 20.1956 0.001502 **
Site           127     127     1     9  0.0157 0.903075   
Season:Site   1307    1307     1     9  0.1610 0.697630

There was only a main effect of Season, with higher yield observed in the 2018 growing season; however, there was no significant Site effect across seasons.

if(!require(emmeans)) install.packages("emmeans")
if(!require(pbkrtest)) install.packages("pbkrtest")
if(!require(multcompView)) install.packages("multcompView")
if(!require(multcomp)) install.packages("multcomp")
library(emmeans)
library(pbkrtest)
library(multcompView)
library(multcomp)

season_site= emmeans(model, ~ Season)
mean_compared= emmeans:::cld.emmGrid(season_site, adjust="none", Letters=letters, reverse=TRUE)
print(mean_compared)
 Season emmean   SE df lower.CL upper.CL .group
 2018      708 46.2  5      589      827  a    
 2017      505 46.2  5      387      624   b

4) Single crop (only 1 cultivar) with multiple rows in different seasons

if(!require(readr)) install.packages("readr")
library(readr)
github="https://raw.githubusercontent.com/agronomy4future/raw_data_practice/refs/heads/main/agrivoltaics_dataset_example_2.csv"
df= data.frame(read_csv(url(github), show_col_types= FALSE))
df$Rows=as.factor(df$Rows)
df$Season=as.factor(df$Season)
set.seed(100)

print(df[sample(nrow(df),5),])
   Season    Site Block Rows    Yield   AGW       GN
74   2018      AV   III    2 651.2467 27.99 23267.12
78   2018      AV    IV    3 738.5171 28.83 25616.27
23   2017 Control   III    1 542.2800 20.79 30974.00
70   2018      AV    II    2 566.2730 29.29 19333.32
4    2017      AV     I    2 375.7800 29.24 12853.00
.
.
.

This field layout is for 1 single crop (no genotypic variance) with multiple rows in different seasons. Again, in this model, plot is regarded as a random effect, but there is only one genotype (single crop), plot is the same as block.

if(!require(lme4)) install.packages("lme4")
if(!require(lmerTest)) install.packages("lmerTest")
library(lme4)
library(lmerTest)

model= lmer(Yield~ Season + Site + Rows + Season:Site + Season:Rows + (1|Block) + 
Season:Site:Rows + (1|Block:Season) + (1|Block:Rows), data=df)
print(VarCorr(model), comp="Variance")
 Groups       Name        Variance
 Block:Rows   (Intercept)    0.00 
 Block:Season (Intercept) 2139.71 
 Block        (Intercept)  523.08 
 Residual                 8184.96 

cat("Variance [Block:Season]", round(2139.71 / (2139.71  + 523.08 + 8184.96) * 100, 2), "%\n")
cat("Variance [Block]", round(523.08 / (2139.71  + 523.08 + 8184.96) * 100, 2), "%\n")
cat("Variance [Residual]", round(8184.96 / (2139.71  + 523.08 + 8184.96) * 100, 2), "%\n")
Variance [Block:Season] 19.72 %
Variance [Block] 4.82 %
Variance [Residual] 75.45 %

The majority of variation is due to Residuals, but Block:Season also contributes significantly, indicating season-specific block effects are relevant. Block effects alone are moderate.

print(anova(model, type= 3))
Type III Analysis of Variance Table with Satterthwaite's method
                 Sum Sq Mean Sq NumDF  DenDF F value    Pr(>F)    
Season           363930  363930     1  4.501 44.4633  0.001710 ** 
Site             184957  184957     1 62.000 22.5972 0.0000123 ***
Rows              43829   21915     2 62.000  2.6774  0.076684 .  
Season:Site       84651   84651     1 62.000 10.3422  0.002067 ** 
Season:Rows        2909    1454     2 62.000  0.1777  0.837619    
Season:Site:Rows 109298   27324     4 62.000  3.3384  0.015361 *

Both Season and Site significantly affect yield, with a significant Season × Site interaction. The significant three-way interaction implies that yield responses to specific Site × Row combinations depend on the growing Season.

if(!require(emmeans)) install.packages("emmeans")
if(!require(pbkrtest)) install.packages("pbkrtest")
if(!require(multcompView)) install.packages("multcompView")
if(!require(multcomp)) install.packages("multcomp")
library(emmeans)
library(pbkrtest)
library(multcompView)
library(multcomp)

EMMeans= emmeans(model, ~ Season:Site:Rows)
mean_compared= cld (EMMeans, adjust= "sidak", Letters=letters, reverse = TRUE)
print(mean_compared)
 Season Site    Rows emmean   SE df lower.CL upper.CL .group
 2018   Control 1       836 41.1 20      704      968  a    
 2018   Control 2       805 41.1 20      672      937  a    
 2018   Control 3       714 52.1 40      556      872  ab   
 2018   AV      3       702 52.1 40      544      860  ab   
 2018   AV      1       600 41.1 20      468      732   bc  
 2018   AV      2       558 41.1 20      425      690   bc  
 2017   AV      3       517 52.1 40      359      675   bc  
 2017   Control 3       494 52.1 40      336      651   bc  
 2017   Control 2       462 41.1 20      330      594   bc  
 2017   Control 1       402 41.1 20      269      534    c  
 2017   AV      1       372 41.1 20      239      504    c  
 2017   AV      2       356 41.1 20      223      488    c

5) Multiple cultivars with a single row

if(!require(readr)) install.packages("readr")
library(readr)
github="https://raw.githubusercontent.com/agronomy4future/raw_data_practice/refs/heads/main/yield_shading_genotypes.csv"
df= data.frame(read_csv(url(github), show_col_types= FALSE))
colnames(df)[2]=c("Site")
df= df[, -c(4, 5)]
set.seed(100)

print(df[sample(nrow(df),5),])
   Genotype    Site Block Yield
10      CV1 Control     I   6.0
38      CV3      AV     I  17.5
48      CV3 Control     I  22.0
51      CV3 Control    II   8.0
25      CV2      AV   III   3.0
.
.
.

This field layout is for multiple genotypes for a crop (genotypic variance) with a single row.

if(!require(lme4)) install.packages("lme4")
if(!require(lmerTest)) install.packages("lmerTest")
library(lme4)
library(lmerTest)

model= lmer(Yield~ Site + Genotype + Site:Genotype + (1|Block), data=df)
print(VarCorr(model), comp="Variance")
 Groups   Name        Variance
 Block    (Intercept)  6.1686 
 Residual             18.8841

cat("Variance [Block]", round(6.1686 / (6.1686 + 18.8841) * 100, 2), "%\n")
cat("Variance [Residual]", round(18.8841 / (6.1686 + 18.8841) * 100, 2), "%\n")
Variance [Block] 24.62 %
Variance [Residual] 75.38 %

The largest component of variance is the Residual [75.38%]. This is the unexplained variation—likely due to unmeasured factors or inherent randomness in yield not captured by your fixed or random effects.

print(anova(model, type= 3))
Type III Analysis of Variance Table with Satterthwaite's method
              Sum Sq Mean Sq NumDF DenDF F value  Pr(>F)    
Site          517.70  517.70     1    46 27.4146  <0.001 ***
Genotype        9.02    4.51     2    46  0.2388  0.7885616    
Site:Genotype 315.66  157.83     2    46  8.3577  <0.001 ***

The interaction among Site:Genotype was significant.

if(!require(emmeans)) install.packages("emmeans")
if(!require(pbkrtest)) install.packages("pbkrtest")
if(!require(multcompView)) install.packages("multcompView")
if(!require(multcomp)) install.packages("multcomp")
library(emmeans)
library(pbkrtest)
library(multcompView)
library(multcomp)

EMMeans= emmeans(model, ~ Site:Genotype)
mean_compared= cld (EMMeans, adjust= "sidak", Letters=letters, reverse= TRUE)
print(mean_compared)
 Site    Genotype emmean   SE   df lower.CL upper.CL .group
 AV      CV1       15.66 2.04 5.83    7.720     23.6  a    
 AV      CV2       13.14 2.04 5.83    5.209     21.1  ab   
 AV      CV3       10.73 2.04 5.83    2.798     18.7  ab   
 Control CV3       10.46 2.04 5.83    2.520     18.4  ab   
 Control CV2        6.97 2.04 5.83   -0.969     14.9   bc  
 Control CV1        3.53 2.04 5.83   -4.402     11.5    c

6) Multiple cultivars with multiple rows

if(!require(readr)) install.packages("readr")
library(readr)
github="https://raw.githubusercontent.com/agronomy4future/raw_data_practice/refs/heads/main/agrivoltaics_crop_yield_multiple_season.csv"
df= data.frame(read_csv(url(github), show_col_types= FALSE))
df= subset(df, Season=="2015")
set.seed(100)

print(df[sample(nrow(df),5),])
   Season Genotype Plot Block Row1    Row    Site  Yield
74   2015      cv1  107   III    4  Side2      AV 168.64
78   2015      cv2  108    IV    3 Middle      AV 179.66
23   2015      cv2  113   III    3 Middle Control 460.51
70   2015      cv1  106    II    5  Side2      AV  86.02
4    2015      cv1  109     I    4  Side2 Control 384.36
.
.
.

This field layout is designed for evaluating multiple genotypes of a crop (genotypic variance), with a single row planted across two seasons.

if(!require(lme4)) install.packages("lme4")
if(!require(lmerTest)) install.packages("lmerTest")
library(lme4)
library(lmerTest)

model= lmer(Yield~ Site + Genotype + Row + Site:Genotype:Row + (1|Block) + (1|Plot), data=df)
print(VarCorr(model), comp="Variance")
 Groups   Name        Variance
 Plot     (Intercept)   70.929
 Block    (Intercept)    0.000
 Residual             3597.058

cat("Variance [Plot]", round(70.929 / (70.929 + 3597.058) * 100, 2), "%\n")
cat("Variance [Residual]", round(3597.058 / (70.929 + 3597.058) * 100, 2), "%\n")
Variance [Plot] 1.93 %
Variance [Residual] 98.07 %

The largest component of variance is the residual [98.07%]. This is the unexplained variation—likely due to unmeasured factors or inherent randomness in yield not captured by your fixed or random effects.

print(anova(model, type= 3))
Type III Analysis of Variance Table with Satterthwaite's method
                  Sum Sq Mean Sq NumDF  DenDF  F value  Pr(>F)    
Site              359936  359936     1 47.975 100.0641  0.001 ***
Genotype            5248    5248     1 47.975   1.4589  0.2330    
Row                20333   10167     2 56.000   2.8263  0.0677 .  
Site:Genotype:Row 198835   28405     7 62.384   7.8967  0.001 ***

The interaction among Site:Genotype:Row was significant.

if(!require(emmeans)) install.packages("emmeans")
if(!require(pbkrtest)) install.packages("pbkrtest")
if(!require(multcompView)) install.packages("multcompView")
if(!require(multcomp)) install.packages("multcomp")
library(emmeans)
library(pbkrtest)
library(multcompView)
library(multcomp)

EMMeans= emmeans(model, ~ Site:Genotype:Row)
mean_compared= cld (EMMeans, adjust= "sidak", Letters=letters, reverse = TRUE)
print(mean_compared)
 Site    Genotype Row    emmean   SE   df lower.CL upper.CL .group
 Control cv2      Side1     477 21.6 48.0    412.6      542  a    
 Control cv1      Side1     467 21.6 48.0    402.1      532  a    
 Control cv2      Side2     441 21.6 48.0    375.6      505  a    
 Control cv1      Side2     404 21.6 48.0    338.7      468  ab   
 Control cv2      Middle    391 30.3 67.3    301.2      480  ab   
 Control cv1      Middle    301 30.3 67.3    211.5      391   bc  
 AV      cv2      Middle    179 30.3 67.3     89.1      268    cd 
 AV      cv2      Side1     151 21.6 48.0     85.9      216     d 
 AV      cv1      Middle    148 30.3 67.3     58.4      238     d 
 AV      cv2      Side2     135 21.6 48.0     69.8      200     d 
 AV      cv1      Side1     131 21.6 48.0     66.6      196     d 
 AV      cv1      Side2     102 21.6 48.0     37.5      167     d

Although the field layout violates the assumptions of a standard ANOVA, I will still use ANOVA to compare the output.

model= aov (Yield~ Site:Genotype:Row + Block, data=df)
summary(model)
                  Df  Sum Sq Mean Sq F value Pr(>F)    
Block              3    8184    2728   0.737  0.534    
Site:Genotype:Row 11 1812838  164803  44.510 <2e-16 ***
Residuals         65  240672    3703   

if(!require(agricolae)) install.packages("agricolae")
library(agricolae)
Tukey_Test= HSD.test(model, c("Site","Genotype","Row"))
print(Tukey_Test)
                      Yield groups
Control:cv2:Side1  477.4725      a
Control:cv1:Side1  467.0025      a
Control:cv2:Side2  440.5100      a
Control:cv1:Side2  403.5825     ab
Control:cv2:Middle 390.8275     ab
Control:cv1:Middle 301.0875     bc
AV:cv2:Middle      178.6750     cd
AV:cv2:Side1       150.7937      d
AV:cv1:Middle      147.9650      d
AV:cv2:Side2       134.6788      d
AV:cv1:Side1       131.4638      d
AV:cv1:Side2       102.3950      d

The result is similar.

7) Multiple cultivars with a single row in different seasons

if(!require(readr)) install.packages("readr")
library(readr)
github="https://raw.githubusercontent.com/agronomy4future/raw_data_practice/refs/heads/main/agrivoltaics_crop_yield_multiple_season.csv"
df= data.frame(read_csv(url(github), show_col_types= FALSE))
df= subset(df, Row1=="1")
df$Season=as.factor(df$Season)
set.seed(100)

print(df[sample(nrow(df),5),])
    Season Genotype Plot Block Row1   Row    Site  Yield
    Season Genotype Plot Block Row1   Row    Site  Yield
46    2015      cv2  102    II    1 Side1      AV 124.93
111   2016      cv1  115    IV    1 Side1 Control 637.70
26    2015      cv1  114   III    1 Side1 Control 522.74
76    2015      cv2  108    IV    1 Side1      AV 128.48
91    2016      cv2  111    II    1 Side1 Control 633.04
.
.
.

This field layout is designed for evaluating multiple genotypes of a crop (genotypic variance), with a single row planted across two seasons.

if(!require(lme4)) install.packages("lme4")
if(!require(lmerTest)) install.packages("lmerTest")
library(lme4)
library(lmerTest)

model= lmer(Yield~ Site + Genotype + Season + Site:Genotype:Season + 
       (1|Block) + (1|Block:Season) + (1|Plot), data=df)
print(VarCorr(model), comp="Variance")
 Groups       Name        Variance
 Plot         (Intercept) 1200.220
 Block:Season (Intercept)    0.000
 Block        (Intercept)    0.000
 Residual                   26.213

cat("Variance [Plot]", round(1200.220 / (1200.220 + 26.213) * 100, 2), "%\n")
cat("Variance [Residual]", round(26.213 / (1200.220 + 26.213) * 100, 2), "%\n")
Variance [Plot] 97.86 %
Variance [Residual] 2.14 %

The largest source of variance in the model is attributed to the plot-level random effect, which accounts for 97.86% of the total variance. This indicates that there are substantial differences among individual plots that are not explained by the fixed effects (Site, Genotype, Season, and their interaction). These differences could arise from uncontrolled micro-environmental variation, plot management, or other plot-specific factors. In contrast, the residual variance represents only 2.14% of the total variability. This relatively small residual suggests that the model effectively captures the structure in the data, with minimal unexplained random error. The negligible variance attributed to Block and Block:Season (0.00%) further supports the conclusion that plot-to-plot variation is the primary driver of random variability in this dataset.

print(anova(model, type= 3))
Type III Analysis of Variance Table with Satterthwaite's method
                      Sum Sq Mean Sq NumDF  DenDF   F value  Pr(>F) 
Site                 10095.8 10095.8     1 12.259  385.1437  <0.001 ***
Genotype                 9.4     9.4     1 12.259    0.3589  0.56
Season               30054.2 30054.2     1 12.000 1146.5331  <0.001 ***
Site:Genotype:Season 15179.1  3794.8     4 13.704  144.7668  <0.001 ***

The interaction among Site:Genotype:Season was significant.

if(!require(emmeans)) install.packages("emmeans")
if(!require(pbkrtest)) install.packages("pbkrtest")
if(!require(multcompView)) install.packages("multcompView")
if(!require(multcomp)) install.packages("multcomp")
library(emmeans)
library(pbkrtest)
library(multcompView)
library(multcomp)

EMMeans= emmeans(model, ~ Site:Genotype:Season)
mean_compared= cld (EMMeans, adjust= "none", Letters=letters, reverse = TRUE)
print(mean_compared)
 Site    Genotype Season emmean   SE   df lower.CL upper.CL .group
 Control cv2      2016    651.0 17.5 12.3    612.9      689  a    
 Control cv1      2016    636.1 17.5 12.3    598.1      674  a    
 Control cv2      2015    528.4 17.5 12.3    490.3      566   b   
 Control cv1      2015    516.3 17.5 12.3    478.3      554   b   
 AV      cv2      2016    165.0 17.5 12.3    126.9      203    c  
 AV      cv2      2015    133.9 17.5 12.3     95.8      172     de
 AV      cv1      2016    116.9 17.5 12.3     78.8      155    cd 
 AV      cv1      2015     94.9 17.5 12.3     56.8      133      e

Although the field layout violates the assumptions of a standard ANOVA, I will still use ANOVA to compare the output.

model= aov (Yield~ Site:Genotype:Season + Block, data=df)
summary(model)
                     Df  Sum Sq Mean Sq F value Pr(>F)    
Block                 3    3296    1099   0.883  0.466    
Site:Genotype:Season  7 1728197  246885 198.352 <2e-16 ***
Residuals            21   26138    1245      

if(!require(agricolae)) install.packages("agricolae")
library(agricolae)
Tukey_Test= HSD.test(model, c("Site","Genotype","Season"))
print(Tukey_Test)
                    Yield groups
Control:cv2:2016 650.9550      a
Control:cv1:2016 636.1200      a
Control:cv2:2015 528.3700      b
Control:cv1:2015 516.3300      b
AV:cv2:2016      164.9750      c
AV:cv2:2015      133.9075      c
AV:cv1:2016      116.8925      c
AV:cv1:2015       94.8825      c

8) Multiple cultivars with multiple rows in different seasons

if(!require(readr)) install.packages("readr")
library(readr)
github="https://raw.githubusercontent.com/agronomy4future/raw_data_practice/refs/heads/main/agrivoltaics_crop_yield_multiple_season.csv"
df= data.frame(read_csv(url(github), show_col_types= FALSE))
df$Row1=as.factor(df$Row1)
df$Season=as.factor(df$Season)
set.seed(100)

print(df[sample(nrow(df),5),])
    Season Genotype Plot Block Row1    Row    Site  Yield
102   2016      cv2  113   III    2 Side1 Control 513.60
112   2016      cv1  115    IV    2 Side1 Control 625.07
151   2016      cv1  107   III    1 Side1      AV 143.81
4     2015      cv1  109     I    4 Side2 Control 384.36
55    2015      cv2  103   III    5 Side2      AV  31.69
.
.
.

This field layout is designed for evaluating multiple genotypes of a crop (genotypic variance), with multiple rows planted across two seasons.

if(!require(lme4)) install.packages("lme4")
if(!require(lmerTest)) install.packages("lmerTest")
library(lme4)
library(lmerTest)

model= lmer(Yield~ Site + Genotype + Season + Row + 
            Site:Genotype:Season + Site:Genotype:Season:Row + 
            (1|Block) + (1|Plot) + (1|Block:Season) + (1|Plot:Season) + 
            (1|Plot:Row), data=df)          
print(VarCorr(model), comp="Variance")
 Groups       Name        Variance
 Plot:Row     (Intercept) 1555.80 
 Plot:Season  (Intercept)    0.00 
 Plot         (Intercept)  186.42 
 Block:Season (Intercept)    0.00 
 Block        (Intercept)    0.00 
 Residual                 3185.53

cat("Variance [Plot:Row]", round(1555.80 / (1555.80 + 186.42 + 3185.53) * 100, 2), "%\n")
cat("Variance [Plot]", round(186.42 / (1555.80 + 186.42 + 3185.53) * 100, 2), "%\n")
cat("Variance [Residual]", round(3185.53 / (1555.80 + 186.42 + 3185.53) * 100, 2), "%\n")
Variance [Plot:Row] 31.57 %
Variance [Plot] 3.78 %
Variance [Residual] 64.64 %

The largest component of variance is the residual [64.64%]. This is the unexplained variation—likely due to unmeasured factors or inherent randomness in yield not captured by your fixed or random effects. Also, a significant portion of variability is at the Row within Plot level [31.57%]. This suggests that spatial variation within plots across rows is substantial. It might point to field heterogeneity (e.g., soil variability, moisture) within individual plots. Variation between plots is much smaller [3.78%], but not negligible. Some yield variability can be attributed to differences between plots.

print(anova(model, type= 3))
Type III Analysis of Variance Table with Satterthwaite's method
                         Sum Sq Mean Sq NumDF  DenDF F value    Pr(>F)    
Site                      88528   88528     1 90.383 27.7906 9.142e-07 ***
Genotype                  15803   15803     1 90.383  4.9609   0.02841 *  
Season                    16441   16441     1 97.160  5.1612   0.02530 *  
Row                       18427    9213     2 42.279  2.8923   0.06648 .  
Site:Genotype:Season     335154   83788     4 41.089 26.3028 7.366e-11 ***
Site:Genotype:Season:Row  82182    5870    14 45.252  1.8427   0.06109 .

There was no significant effect of Row, nor a significant interaction for Site:Genotype:Season:Row. However, the interaction among Site:Genotype:Season was highly significant.

if(!require(emmeans)) install.packages("emmeans")
if(!require(pbkrtest)) install.packages("pbkrtest")
if(!require(multcompView)) install.packages("multcompView")
if(!require(multcomp)) install.packages("multcomp")
library(emmeans)
library(pbkrtest)
library(multcompView)
library(multcomp)

EMMeans= emmeans(model, ~ Site:Genotype:Season)
mean_compared= cld (EMMeans, adjust= "sidak", Letters=letters, reverse = TRUE)
print(mean_compared)
 Site    Genotype Season emmean   SE   df lower.CL upper.CL .group
 Control cv2      2016      537 18.8 20.1    480.3      595  a    
 Control cv1      2016      481 18.8 20.1    424.0      538  ab   
 Control cv2      2015      436 18.8 20.1    379.1      493   bc  
 Control cv1      2015      391 18.8 20.1    333.4      448    c  
 AV      cv2      2016      190 18.8 20.1    133.1      247     d 
 AV      cv1      2016      156 18.8 20.1     98.4      213     d 
 AV      cv2      2015      155 18.8 20.1     97.5      212     d 
 AV      cv1      2015      127 18.8 20.1     70.1      184     d

I also applied a standard ANOVA model to compare it with the linear mixed model (Case 6 and 7). Interestingly, the results were quite similar. This similarity is expected when the experimental design is relatively balanced (i.e., equal replication across groups), as the fixed effects estimates (treatment means) tend to converge between the two approaches. However, despite this similarity, it is still more appropriate to use a linear mixed model for the following reasons:

Agrivoltaic systems typically lack full randomization. Solar panel and control plots are often located in spatially distinct zones (e.g., shaded versus full-sun areas), which limits the ability to fully randomize treatment assignments. This spatial segregation introduces potential correlation and variability that are not captured by fixed effects alone. Mixed models are better suited to handle this through the inclusion of random effects such as Block or Block:Season.
Mixed models account for the hierarchical and correlated structure of the data, which standard ANOVA does not. In this context, random effects like Block, Block:Season, and Block:Rows are important for modeling underlying heterogeneity due to spatial layout or repeated measures. Because the assumptions of ANOVA—particularly independence and homoscedasticity—are more easily violated under these conditions, the mixed model provides a more statistically robust framework.

We aim to develop open-source code for agronomy ([email protected])

Last Updated: 07/07/2025

Agronomy4future

Stories about cereals and statistics (plus coding). We aim to develop open-source code for agronomy.