[STAT Article] Statistical Models in Agrivoltaics: Linear Mixed Models Across Different Field Layouts

Agrivoltaics is the study and practice of combining agriculture and solar energy production on the same land. The core idea is to install solar panels above or among crops, allowing for simultaneous food and energy production.
Generally, an agrivoltaics study investigates how this dual-use approach affects crop growth and yield (due to changes in light, temperature, and moisture), microclimate conditions under the panels, solar panel efficiency influenced by vegetation, as well as land-use efficiency, economic outcomes, and sustainability metrics.
Today, I’ll explain why standard ANOVA or split-plot models are often inadequate for agrivoltaics research, and why linear mixed models (LMMs) are recommended to account for the complex variability in treatment effects.

■ Layout for Split-Plot and RCBD Designs, and Their Limitations in Agrivoltaics Studies
The layout below represents a typical split-plot design used in field experiments. This design is fully randomized across both sites (Control and Agrivoltaics, or AV) and crops (e.g., Crop 1 to Crop 3), adhering to the principles of a split-plot structure.

However, such a design becomes impractical for agrivoltaics research due to the scale and fixed infrastructure of solar panel installations. For instance, if the agrivoltaics site spans 50 acres, implementing full randomization as per this layout would require ca. 200 acres of land to accommodate both control and treatment conditions. Establishing a control site within the AV system is not feasible, as achieving full-sun conditions would necessitate removal of existing solar panels, defeating the purpose of the study.
Most researchers conducting agrivoltaics studies overlook this issue and apply a split-plot statistical model, even when the actual experimental layout does not conform to the assumptions of a true split-plot design.

Additionally, some researchers use a standard ANOVA model; however, this approach is poorly aligned with the actual layout of agrivoltaics studies. The design shown below is a randomized complete block design (RCBD), where treatments are fully randomized within each block. While RCBD is appropriate for many agricultural field trials, its assumptions and structure differ significantly from the typical constraints and spatial configurations found in agrivoltaics systems. As such, applying an RCBD model to agrivoltaics experiments can lead to misrepresentation of treatment effects and variability.

In many agrivoltaics studies, researchers overlook the actual experimental layout and analyze the data using split-plot or RCBD models by simply accounting for treatment variables (e.g., inside vs. outside the solar panel array). However, this approach does not accurately reflect real field conditions.
Randomization is fundamental to experimental design because it ensures that, on average, the only systematic difference between treatments is the treatment itself. In typical agrivoltaics experiments, however, the treatment (e.g., shading by solar panels) is perfectly confounded with space—meaning that any observed treatment effect may also be attributable to spatial variation, not just the treatment. As a result, standard split-plot designs are theoretically inappropriate due to a lack of randomization at the site level. Similarly, RCBD is invalid in this context because the experimental layout violates key assumptions required for statistical inference, such as homogeneity and random allocation within blocks.
To address these challenges, I consulted with several statisticians and developed tailored statistical models for specific cases, ensuring that the analysis aligns with the spatial and structural constraints unique to agrivoltaics research.

■ Linear Mixed Models for different field layouts
Before starting, let’s make it clear about the terminology. I defined site, plot, block and row like below.


1) Single crop (only 1 cultivar) with a single row
if(!require(readr)) install.packages("readr")
library(readr)
github="https://raw.githubusercontent.com/agronomy4future/raw_data_practice/refs/heads/main/yield_shading_spatial_genotype.csv"
df= data.frame(read_csv(url(github), show_col_types= FALSE))
df$Spatial= as.factor(df$Spatial)
colnames(df)[2]=c("Site")
CV1=subset(df, Genotype=="CV1")
set.seed(100)
print(CV1[sample(nrow(CV1),5),])
Genotype Site Block Spatial Fruit_No Yield
10 CV1 Control III 2 1 14.0
6 CV1 Control II 2 4 17.5
16 CV1 AV III 2 1 10.0
3 CV1 Control I 3 5 15.2
9 CV1 Control III 1 4 5.8
.
.
.

This field layout is for 1 single crop (no genotypic variance) with a single row.
if(!require(lme4)) install.packages("lme4")
if(!require(lmerTest)) install.packages("lmerTest")
library(lme4)
library(lmerTest)
model= lmer(Yield ~ Site + (1|Site:Block), data=CV1)
print(VarCorr(model), comp="Variance")
Groups Name Variance
Site:Block (Intercept) 1.9747
Residual 42.6318
cat("Variance [Site:Block]", round(1.9747 / (1.9747 + 42.6318) * 100, 2), "%\n")
cat("Variance [Residual]", round(42.6318 / (1.9747 + 42.6318) * 100, 2), "%\n")
Variance [Site:Block] 4.43 %
Variance [Residual] 95.57 %
The variance component analysis shows that approximately 96% of the total variance is due to the residual
, which reflects within-plot variability or experimental error — such as differences among individual plants, measurement error, or other uncontrolled environmental factors within plots. In contrast, only about 4% of the variance is attributed to the random effect of Site:Block
, which captures variation among blocks within each site — i.e., spatial heterogeneity in how sites perform across different blocks.
print(anova(model, type= 3))
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
Site 42.178 42.178 1 4.4632 0.9893 0.3707
There is no Site
effect, indicating that there is no shading effect on crop yield.
if(!require(emmeans)) install.packages("emmeans")
library(emmeans)
if(!require(pbkrtest)) install.packages("pbkrtest")
library(pbkrtest)
EMMeans= emmeans(model, ~ Site)
print(summary(EMMeans, digits= 2))
Site emmean SE df lower.CL upper.CL
AV 15.4 3.61 5.93 6.58 24.3
Control 11.5 1.91 3.12 5.59 17.5
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
###
pairwise= contrast(EMMeans, method= "pairwise", adjust= "none")
print(summary(pairwise))
contrast estimate SE df t.ratio p.value
AV - Control 3.91 4.08 5.06 0.956 0.3823
Degrees-of-freedom method: kenward-roger
if(!require(multcompView)) install.packages("multcompView")
if(!require(multcomp)) install.packages("multcomp")
library(multcompView)
library(multcomp)
mean_compared= cld (EMMeans, adjust= "none", Letters= letters, reverse= TRUE)
print(mean_compared)
Site emmean SE df lower.CL upper.CL .group
AV 15.4 3.61 5.93 6.58 24.3 a
Control 11.5 1.91 3.12 5.59 17.5 a
There is no difference between AV and control site. This indicates that most of the variability in yield is not due to differences among sites or blocks, but rather due to unexplained, plot-level noise. As such, if no statistically significant difference is detected between sites, it is not necessarily because treatments are equivalent, but rather because the high residual variability may be masking real treatment effects. Therefore, the lack of statistical significance should be interpreted with caution, as it may reflect insufficient statistical power due to large experimental error, rather than true equivalence between sites.

2) Single crop (only 1 cultivar) with multiple rows
if(!require(readr)) install.packages("readr")
library(readr)
github="https://raw.githubusercontent.com/agronomy4future/raw_data_practice/refs/heads/main/agrivoltaics_dataset_example_1.csv"
df= data.frame(read_csv(url(github), show_col_types= FALSE))
df$Rows= as.factor(df$Rows)
set.seed(100)
print(df[sample(nrow(df),5),])
Site Block Rows Yield AGW GN
10 AV II 2 343.31 28.79 11926
38 Control II 2 496.73 26.29 19952
25 Control I 1 293.05 28.12 13248
14 AV III 2 442.08 25.39 17414
23 Control III 1 542.28 20.79 30974
.
.
.

This field layout is for 1 single crop (no genotypic variance) with multiple rows.
if(!require(lme4)) install.packages("lme4")
if(!require(lmerTest)) install.packages("lmerTest")
library(lme4)
library(lmerTest)
model= lmer(Yield~ Site + Rows + Site:Rows + (1|Block), data=df)
print(VarCorr(model), comp="Variance")
Groups Name Variance
Block (Intercept) 5645.9
Residual 6492.3
cat("Variance [Block]", round(5645.9 / (5645.9 + 6492.3) * 100, 2), "%\n")
cat("Variance [Residual]", round(6492.3 / (5645.9 + 6492.3) * 100, 2), "%\n")
Variance [Block] 46.51 %
Variance [Residual] 53.49 %
The model indicates that block-to-block differences account for ca. 46.5% of the total yield variation, reflecting substantial environmental or management variability across blocks. The remaining 53.5% is due to residual (plot-level) variation, which may include differences in row position, microclimate, or other unmeasured factors.
print(anova(model, type= 3))
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
Site 12704 12704 1 31 1.9568 0.171783
Rows 78006 39003 2 31 6.0076 0.006237 **
Site:Rows 25074 12537 2 31 1.9310 0.162045
In particular, row position appears to influence yield, likely due to spatial factors such as edge effects or shading gradients, which are common in agrivoltaics systems. While there are yield differences between the AV and Control sites, the Site effect is not consistently stronger than the Row effect.
if(!require(emmeans)) install.packages("emmeans")
if(!require(pbkrtest)) install.packages("pbkrtest")
if(!require(multcompView)) install.packages("multcompView")
if(!require(multcomp)) install.packages("multcomp")
library(emmeans)
library(pbkrtest)
library(multcompView)
library(multcomp)
EMMeans= emmeans(model, ~ Rows)
mean_compared= cld (EMMeans, adjust= "none", Letters=letters, reverse = TRUE)
print(mean_compared)
Rows emmean SE df lower.CL upper.CL .group
3 505 47.1 5.89 390 621 a
2 409 42.6 3.99 290 527 b
1 387 42.6 3.99 268 505 b
This suggests that within-site spatial variability may obscure or confound any overall Site-level differences.

3) Single crop (only 1 cultivar) with a single row in different seasons
if(!require(readr)) install.packages("readr")
library(readr)
github="https://raw.githubusercontent.com/agronomy4future/raw_data_practice/refs/heads/main/agrivoltaics_dataset_example_2.csv"
df= data.frame(read_csv(url(github), show_col_types= FALSE))
df$Season=as.factor(df$Season)
df=subset(df, Rows=="3")
set.seed(100)
print(head(df,5))
Season Site Block Rows Yield AGW GN
3 2017 AV I 3 260.44 25.47 10224
8 2017 AV II 3 619.96 23.88 25959
13 2017 AV III 3 544.07 23.86 22799
18 2017 AV IV 3 644.60 20.89 30858
29 2017 Control I 3 372.60 28.12 13248
.
.
.

This field layout is for 1 single crop (no genotypic variance) with a single row in different seasons. In this model, plot is regarded as a random effect, but there is only one genotype (single crop), plot is the same as block.
if(!require(lme4)) install.packages("lme4")
if(!require(lmerTest)) install.packages("lmerTest")
library(lme4)
library(lmerTest)
model= lmer(Yield~ Season + Site + Season:Site + (1|Block), data=df)
print(VarCorr(model), comp="Variance")
Groups Name Variance
Block (Intercept) 4473.4
Residual 8121.9
cat("Variance [Block]", round(4473.4 / (4473.4 + 8121.9) * 100, 2), "%\n")
cat("Variance [Residual]", round(8121.9 / (4473.4 + 8121.9) * 100, 2), "%\n")
Variance [Block] 35.52 %
Variance [Residual] 64.48 %
The block effect is 35.5% and about 1/3 of the total variability in yield is due to differences between blocks. This suggests that block-to-block variability (e.g., differences in soil, microclimate, or management) has a moderate but meaningful effect on yield. Including block as a random effect is appropriate and important for properly accounting for spatial/environmental heterogeneity. In the same way, residual effect is 64.5% and the majority of variation comes from within-block variability, including plot-level environmental noise, plant-to-plant differences, measurement error, and any unmeasured factors not captured by block-level grouping.
print(anova(model, type= 3))
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
Season 164026 164026 1 9 20.1956 0.001502 **
Site 127 127 1 9 0.0157 0.903075
Season:Site 1307 1307 1 9 0.1610 0.697630
There was only a main effect of Season
, with higher yield observed in the 2018 growing season; however, there was no significant Site
effect across seasons.
if(!require(emmeans)) install.packages("emmeans")
if(!require(pbkrtest)) install.packages("pbkrtest")
if(!require(multcompView)) install.packages("multcompView")
if(!require(multcomp)) install.packages("multcomp")
library(emmeans)
library(pbkrtest)
library(multcompView)
library(multcomp)
season_site= emmeans(model, ~ Season)
mean_compared= emmeans:::cld.emmGrid(season_site, adjust="none", Letters=letters, reverse=TRUE)
print(mean_compared)
Season emmean SE df lower.CL upper.CL .group
2018 708 46.2 5 589 827 a
2017 505 46.2 5 387 624 b

4) Single crop (only 1 cultivar) with multiple rows in different seasons
if(!require(readr)) install.packages("readr")
library(readr)
github="https://raw.githubusercontent.com/agronomy4future/raw_data_practice/refs/heads/main/agrivoltaics_dataset_example_2.csv"
df= data.frame(read_csv(url(github), show_col_types= FALSE))
df$Rows=as.factor(df$Rows)
df$Season=as.factor(df$Season)
set.seed(100)
print(df[sample(nrow(df),5),])
Season Site Block Rows Yield AGW GN
74 2018 AV III 2 651.2467 27.99 23267.12
78 2018 AV IV 3 738.5171 28.83 25616.27
23 2017 Control III 1 542.2800 20.79 30974.00
70 2018 AV II 2 566.2730 29.29 19333.32
4 2017 AV I 2 375.7800 29.24 12853.00
.
.
.

This field layout is for 1 single crop (no genotypic variance) with multiple rows in different seasons. Again, in this model, plot is regarded as a random effect, but there is only one genotype (single crop), plot is the same as block.
if(!require(lme4)) install.packages("lme4")
if(!require(lmerTest)) install.packages("lmerTest")
library(lme4)
library(lmerTest)
model= lmer(Yield~ Season + Site + Rows + Season:Site + Season:Rows + (1|Block) +
Season:Site:Rows + (1|Block:Season) + (1|Block:Rows), data=df)
print(VarCorr(model), comp="Variance")
Groups Name Variance
Block:Rows (Intercept) 0.00
Block:Season (Intercept) 2139.71
Block (Intercept) 523.08
Residual 8184.96
cat("Variance [Block:Season]", round(2139.71 / (2139.71 + 523.08 + 8184.96) * 100, 2), "%\n")
cat("Variance [Block]", round(523.08 / (2139.71 + 523.08 + 8184.96) * 100, 2), "%\n")
cat("Variance [Residual]", round(8184.96 / (2139.71 + 523.08 + 8184.96) * 100, 2), "%\n")
Variance [Block:Season] 19.72 %
Variance [Block] 4.82 %
Variance [Residual] 75.45 %
The majority of variation is due to Residuals
, but Block:Season
also contributes significantly, indicating season-specific block effects are relevant. Block
effects alone are moderate.
print(anova(model, type= 3))
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
Season 363930 363930 1 4.501 44.4633 0.001710 **
Site 184957 184957 1 62.000 22.5972 0.0000123 ***
Rows 43829 21915 2 62.000 2.6774 0.076684 .
Season:Site 84651 84651 1 62.000 10.3422 0.002067 **
Season:Rows 2909 1454 2 62.000 0.1777 0.837619
Season:Site:Rows 109298 27324 4 62.000 3.3384 0.015361 *
Both Season
and Site
significantly affect yield, with a significant Season
× Site
interaction. The significant three-way interaction implies that yield responses to specific Site
× Row
combinations depend on the growing Season
.
if(!require(emmeans)) install.packages("emmeans")
if(!require(pbkrtest)) install.packages("pbkrtest")
if(!require(multcompView)) install.packages("multcompView")
if(!require(multcomp)) install.packages("multcomp")
library(emmeans)
library(pbkrtest)
library(multcompView)
library(multcomp)
EMMeans= emmeans(model, ~ Season:Site:Rows)
mean_compared= cld (EMMeans, adjust= "sidak", Letters=letters, reverse = TRUE)
print(mean_compared)
Season Site Rows emmean SE df lower.CL upper.CL .group
2018 Control 1 836 41.1 20 704 968 a
2018 Control 2 805 41.1 20 672 937 a
2018 Control 3 714 52.1 40 556 872 ab
2018 AV 3 702 52.1 40 544 860 ab
2018 AV 1 600 41.1 20 468 732 bc
2018 AV 2 558 41.1 20 425 690 bc
2017 AV 3 517 52.1 40 359 675 bc
2017 Control 3 494 52.1 40 336 651 bc
2017 Control 2 462 41.1 20 330 594 bc
2017 Control 1 402 41.1 20 269 534 c
2017 AV 1 372 41.1 20 239 504 c
2017 AV 2 356 41.1 20 223 488 c

5) Multiple cultivars with a single row
if(!require(readr)) install.packages("readr")
library(readr)
github="https://raw.githubusercontent.com/agronomy4future/raw_data_practice/refs/heads/main/yield_shading_genotypes.csv"
df= data.frame(read_csv(url(github), show_col_types= FALSE))
colnames(df)[2]=c("Site")
df= df[, -c(4, 5)]
set.seed(100)
print(df[sample(nrow(df),5),])
Genotype Site Block Yield
10 CV1 Control I 6.0
38 CV3 AV I 17.5
48 CV3 Control I 22.0
51 CV3 Control II 8.0
25 CV2 AV III 3.0
.
.
.

This field layout is for multiple genotypes for a crop (genotypic variance) with a single row.
if(!require(lme4)) install.packages("lme4")
if(!require(lmerTest)) install.packages("lmerTest")
library(lme4)
library(lmerTest)
model= lmer(Yield~ Site + Genotype + Site:Genotype + (1|Block), data=df)
print(VarCorr(model), comp="Variance")
Groups Name Variance
Block (Intercept) 6.1686
Residual 18.8841
cat("Variance [Block]", round(6.1686 / (6.1686 + 18.8841) * 100, 2), "%\n")
cat("Variance [Residual]", round(18.8841 / (6.1686 + 18.8841) * 100, 2), "%\n")
Variance [Block] 24.62 %
Variance [Residual] 75.38 %
The largest component of variance is the Residual
[75.38%]. This is the unexplained variation—likely due to unmeasured factors or inherent randomness in yield not captured by your fixed or random effects.
print(anova(model, type= 3))
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
Site 517.70 517.70 1 46 27.4146 <0.001 ***
Genotype 9.02 4.51 2 46 0.2388 0.7885616
Site:Genotype 315.66 157.83 2 46 8.3577 <0.001 ***
The interaction among Site:Genotype
was significant.
if(!require(emmeans)) install.packages("emmeans")
if(!require(pbkrtest)) install.packages("pbkrtest")
if(!require(multcompView)) install.packages("multcompView")
if(!require(multcomp)) install.packages("multcomp")
library(emmeans)
library(pbkrtest)
library(multcompView)
library(multcomp)
EMMeans= emmeans(model, ~ Site:Genotype)
mean_compared= cld (EMMeans, adjust= "sidak", Letters=letters, reverse= TRUE)
print(mean_compared)
Site Genotype emmean SE df lower.CL upper.CL .group
AV CV1 15.66 2.04 5.83 7.720 23.6 a
AV CV2 13.14 2.04 5.83 5.209 21.1 ab
AV CV3 10.73 2.04 5.83 2.798 18.7 ab
Control CV3 10.46 2.04 5.83 2.520 18.4 ab
Control CV2 6.97 2.04 5.83 -0.969 14.9 bc
Control CV1 3.53 2.04 5.83 -4.402 11.5 c


6) Multiple cultivars with multiple rows
if(!require(readr)) install.packages("readr")
library(readr)
github="https://raw.githubusercontent.com/agronomy4future/raw_data_practice/refs/heads/main/agrivoltaics_crop_yield_multiple_season.csv"
df= data.frame(read_csv(url(github), show_col_types= FALSE))
df= subset(df, Season=="2015")
set.seed(100)
print(df[sample(nrow(df),5),])
Season Genotype Plot Block Row1 Row Site Yield
74 2015 cv1 107 III 4 Side2 AV 168.64
78 2015 cv2 108 IV 3 Middle AV 179.66
23 2015 cv2 113 III 3 Middle Control 460.51
70 2015 cv1 106 II 5 Side2 AV 86.02
4 2015 cv1 109 I 4 Side2 Control 384.36
.
.
.

This field layout is designed for evaluating multiple genotypes of a crop (genotypic variance), with a single row planted across two seasons.
if(!require(lme4)) install.packages("lme4")
if(!require(lmerTest)) install.packages("lmerTest")
library(lme4)
library(lmerTest)
model= lmer(Yield~ Site + Genotype + Row + Site:Genotype:Row + (1|Block) + (1|Plot), data=df)
print(VarCorr(model), comp="Variance")
Groups Name Variance
Plot (Intercept) 70.929
Block (Intercept) 0.000
Residual 3597.058
cat("Variance [Plot]", round(70.929 / (70.929 + 3597.058) * 100, 2), "%\n")
cat("Variance [Residual]", round(3597.058 / (70.929 + 3597.058) * 100, 2), "%\n")
Variance [Plot] 1.93 %
Variance [Residual] 98.07 %
The largest component of variance is the residual [98.07%]. This is the unexplained variation—likely due to unmeasured factors or inherent randomness in yield not captured by your fixed or random effects.
print(anova(model, type= 3))
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
Site 359936 359936 1 47.975 100.0641 0.001 ***
Genotype 5248 5248 1 47.975 1.4589 0.2330
Row 20333 10167 2 56.000 2.8263 0.0677 .
Site:Genotype:Row 198835 28405 7 62.384 7.8967 0.001 ***
The interaction among Site:Genotype:Row
was significant.
if(!require(emmeans)) install.packages("emmeans")
if(!require(pbkrtest)) install.packages("pbkrtest")
if(!require(multcompView)) install.packages("multcompView")
if(!require(multcomp)) install.packages("multcomp")
library(emmeans)
library(pbkrtest)
library(multcompView)
library(multcomp)
EMMeans= emmeans(model, ~ Site:Genotype:Row)
mean_compared= cld (EMMeans, adjust= "sidak", Letters=letters, reverse = TRUE)
print(mean_compared)
Site Genotype Row emmean SE df lower.CL upper.CL .group
Control cv2 Side1 477 21.6 48.0 412.6 542 a
Control cv1 Side1 467 21.6 48.0 402.1 532 a
Control cv2 Side2 441 21.6 48.0 375.6 505 a
Control cv1 Side2 404 21.6 48.0 338.7 468 ab
Control cv2 Middle 391 30.3 67.3 301.2 480 ab
Control cv1 Middle 301 30.3 67.3 211.5 391 bc
AV cv2 Middle 179 30.3 67.3 89.1 268 cd
AV cv2 Side1 151 21.6 48.0 85.9 216 d
AV cv1 Middle 148 30.3 67.3 58.4 238 d
AV cv2 Side2 135 21.6 48.0 69.8 200 d
AV cv1 Side1 131 21.6 48.0 66.6 196 d
AV cv1 Side2 102 21.6 48.0 37.5 167 d
Although the field layout violates the assumptions of a standard ANOVA, I will still use ANOVA to compare the output.
model= aov (Yield~ Site:Genotype:Row + Block, data=df)
summary(model)
Df Sum Sq Mean Sq F value Pr(>F)
Block 3 8184 2728 0.737 0.534
Site:Genotype:Row 11 1812838 164803 44.510 <2e-16 ***
Residuals 65 240672 3703
if(!require(agricolae)) install.packages("agricolae")
library(agricolae)
Tukey_Test= HSD.test(model, c("Site","Genotype","Row"))
print(Tukey_Test)
Yield groups
Control:cv2:Side1 477.4725 a
Control:cv1:Side1 467.0025 a
Control:cv2:Side2 440.5100 a
Control:cv1:Side2 403.5825 ab
Control:cv2:Middle 390.8275 ab
Control:cv1:Middle 301.0875 bc
AV:cv2:Middle 178.6750 cd
AV:cv2:Side1 150.7937 d
AV:cv1:Middle 147.9650 d
AV:cv2:Side2 134.6788 d
AV:cv1:Side1 131.4638 d
AV:cv1:Side2 102.3950 d
The result is similar.


7) Multiple cultivars with a single row in different seasons
if(!require(readr)) install.packages("readr")
library(readr)
github="https://raw.githubusercontent.com/agronomy4future/raw_data_practice/refs/heads/main/agrivoltaics_crop_yield_multiple_season.csv"
df= data.frame(read_csv(url(github), show_col_types= FALSE))
df= subset(df, Row1=="1")
df$Season=as.factor(df$Season)
set.seed(100)
print(df[sample(nrow(df),5),])
Season Genotype Plot Block Row1 Row Site Yield
Season Genotype Plot Block Row1 Row Site Yield
46 2015 cv2 102 II 1 Side1 AV 124.93
111 2016 cv1 115 IV 1 Side1 Control 637.70
26 2015 cv1 114 III 1 Side1 Control 522.74
76 2015 cv2 108 IV 1 Side1 AV 128.48
91 2016 cv2 111 II 1 Side1 Control 633.04
.
.
.

This field layout is designed for evaluating multiple genotypes of a crop (genotypic variance), with a single row planted across two seasons.
if(!require(lme4)) install.packages("lme4")
if(!require(lmerTest)) install.packages("lmerTest")
library(lme4)
library(lmerTest)
model= lmer(Yield~ Site + Genotype + Season + Site:Genotype:Season +
(1|Block) + (1|Block:Season) + (1|Plot), data=df)
print(VarCorr(model), comp="Variance")
Groups Name Variance
Plot (Intercept) 1200.220
Block:Season (Intercept) 0.000
Block (Intercept) 0.000
Residual 26.213
cat("Variance [Plot]", round(1200.220 / (1200.220 + 26.213) * 100, 2), "%\n")
cat("Variance [Residual]", round(26.213 / (1200.220 + 26.213) * 100, 2), "%\n")
Variance [Plot] 97.86 %
Variance [Residual] 2.14 %
The largest source of variance in the model is attributed to the plot-level random effect, which accounts for 97.86% of the total variance. This indicates that there are substantial differences among individual plots that are not explained by the fixed effects (Site, Genotype, Season, and their interaction). These differences could arise from uncontrolled micro-environmental variation, plot management, or other plot-specific factors. In contrast, the residual variance represents only 2.14% of the total variability. This relatively small residual suggests that the model effectively captures the structure in the data, with minimal unexplained random error. The negligible variance attributed to Block
and Block:Season
(0.00%) further supports the conclusion that plot-to-plot variation is the primary driver of random variability in this dataset.
print(anova(model, type= 3))
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
Site 10095.8 10095.8 1 12.259 385.1437 <0.001 ***
Genotype 9.4 9.4 1 12.259 0.3589 0.56
Season 30054.2 30054.2 1 12.000 1146.5331 <0.001 ***
Site:Genotype:Season 15179.1 3794.8 4 13.704 144.7668 <0.001 ***
The interaction among Site:Genotype:Season
was significant.
if(!require(emmeans)) install.packages("emmeans")
if(!require(pbkrtest)) install.packages("pbkrtest")
if(!require(multcompView)) install.packages("multcompView")
if(!require(multcomp)) install.packages("multcomp")
library(emmeans)
library(pbkrtest)
library(multcompView)
library(multcomp)
EMMeans= emmeans(model, ~ Site:Genotype:Season)
mean_compared= cld (EMMeans, adjust= "none", Letters=letters, reverse = TRUE)
print(mean_compared)
Site Genotype Season emmean SE df lower.CL upper.CL .group
Control cv2 2016 651.0 17.5 12.3 612.9 689 a
Control cv1 2016 636.1 17.5 12.3 598.1 674 a
Control cv2 2015 528.4 17.5 12.3 490.3 566 b
Control cv1 2015 516.3 17.5 12.3 478.3 554 b
AV cv2 2016 165.0 17.5 12.3 126.9 203 c
AV cv2 2015 133.9 17.5 12.3 95.8 172 de
AV cv1 2016 116.9 17.5 12.3 78.8 155 cd
AV cv1 2015 94.9 17.5 12.3 56.8 133 e
Although the field layout violates the assumptions of a standard ANOVA, I will still use ANOVA to compare the output.
model= aov (Yield~ Site:Genotype:Season + Block, data=df)
summary(model)
Df Sum Sq Mean Sq F value Pr(>F)
Block 3 3296 1099 0.883 0.466
Site:Genotype:Season 7 1728197 246885 198.352 <2e-16 ***
Residuals 21 26138 1245
if(!require(agricolae)) install.packages("agricolae")
library(agricolae)
Tukey_Test= HSD.test(model, c("Site","Genotype","Season"))
print(Tukey_Test)
Yield groups
Control:cv2:2016 650.9550 a
Control:cv1:2016 636.1200 a
Control:cv2:2015 528.3700 b
Control:cv1:2015 516.3300 b
AV:cv2:2016 164.9750 c
AV:cv2:2015 133.9075 c
AV:cv1:2016 116.8925 c
AV:cv1:2015 94.8825 c


8) Multiple cultivars with multiple rows in different seasons
if(!require(readr)) install.packages("readr")
library(readr)
github="https://raw.githubusercontent.com/agronomy4future/raw_data_practice/refs/heads/main/agrivoltaics_crop_yield_multiple_season.csv"
df= data.frame(read_csv(url(github), show_col_types= FALSE))
df$Row1=as.factor(df$Row1)
df$Season=as.factor(df$Season)
set.seed(100)
print(df[sample(nrow(df),5),])
Season Genotype Plot Block Row1 Row Site Yield
102 2016 cv2 113 III 2 Side1 Control 513.60
112 2016 cv1 115 IV 2 Side1 Control 625.07
151 2016 cv1 107 III 1 Side1 AV 143.81
4 2015 cv1 109 I 4 Side2 Control 384.36
55 2015 cv2 103 III 5 Side2 AV 31.69
.
.
.

This field layout is designed for evaluating multiple genotypes of a crop (genotypic variance), with multiple rows planted across two seasons.
if(!require(lme4)) install.packages("lme4")
if(!require(lmerTest)) install.packages("lmerTest")
library(lme4)
library(lmerTest)
model= lmer(Yield~ Site + Genotype + Season + Row +
Site:Genotype:Season + Site:Genotype:Season:Row +
(1|Block) + (1|Plot) + (1|Block:Season) + (1|Plot:Season) +
(1|Plot:Row), data=df)
print(VarCorr(model), comp="Variance")
Groups Name Variance
Plot:Row (Intercept) 1555.80
Plot:Season (Intercept) 0.00
Plot (Intercept) 186.42
Block:Season (Intercept) 0.00
Block (Intercept) 0.00
Residual 3185.53
cat("Variance [Plot:Row]", round(1555.80 / (1555.80 + 186.42 + 3185.53) * 100, 2), "%\n")
cat("Variance [Plot]", round(186.42 / (1555.80 + 186.42 + 3185.53) * 100, 2), "%\n")
cat("Variance [Residual]", round(3185.53 / (1555.80 + 186.42 + 3185.53) * 100, 2), "%\n")
Variance [Plot:Row] 31.57 %
Variance [Plot] 3.78 %
Variance [Residual] 64.64 %
The largest component of variance is the residual [64.64%]. This is the unexplained variation—likely due to unmeasured factors or inherent randomness in yield not captured by your fixed or random effects. Also, a significant portion of variability is at the Row within Plot level [31.57%]. This suggests that spatial variation within plots across rows is substantial. It might point to field heterogeneity (e.g., soil variability, moisture) within individual plots. Variation between plots is much smaller [3.78%], but not negligible. Some yield variability can be attributed to differences between plots.
print(anova(model, type= 3))
Type III Analysis of Variance Table with Satterthwaite's method
Sum Sq Mean Sq NumDF DenDF F value Pr(>F)
Site 88528 88528 1 90.383 27.7906 9.142e-07 ***
Genotype 15803 15803 1 90.383 4.9609 0.02841 *
Season 16441 16441 1 97.160 5.1612 0.02530 *
Row 18427 9213 2 42.279 2.8923 0.06648 .
Site:Genotype:Season 335154 83788 4 41.089 26.3028 7.366e-11 ***
Site:Genotype:Season:Row 82182 5870 14 45.252 1.8427 0.06109 .
There was no significant effect of Row
, nor a significant interaction for Site:Genotype:Season:Row
. However, the interaction among Site:Genotype:Season
was highly significant.
if(!require(emmeans)) install.packages("emmeans")
if(!require(pbkrtest)) install.packages("pbkrtest")
if(!require(multcompView)) install.packages("multcompView")
if(!require(multcomp)) install.packages("multcomp")
library(emmeans)
library(pbkrtest)
library(multcompView)
library(multcomp)
EMMeans= emmeans(model, ~ Site:Genotype:Season)
mean_compared= cld (EMMeans, adjust= "sidak", Letters=letters, reverse = TRUE)
print(mean_compared)
Site Genotype Season emmean SE df lower.CL upper.CL .group
Control cv2 2016 537 18.8 20.1 480.3 595 a
Control cv1 2016 481 18.8 20.1 424.0 538 ab
Control cv2 2015 436 18.8 20.1 379.1 493 bc
Control cv1 2015 391 18.8 20.1 333.4 448 c
AV cv2 2016 190 18.8 20.1 133.1 247 d
AV cv1 2016 156 18.8 20.1 98.4 213 d
AV cv2 2015 155 18.8 20.1 97.5 212 d
AV cv1 2015 127 18.8 20.1 70.1 184 d


I also applied a standard ANOVA model to compare it with the linear mixed model (Case 6 and 7). Interestingly, the results were quite similar. This similarity is expected when the experimental design is relatively balanced (i.e., equal replication across groups), as the fixed effects estimates (treatment means) tend to converge between the two approaches. However, despite this similarity, it is still more appropriate to use a linear mixed model for the following reasons:
- Agrivoltaic systems typically lack full randomization. Solar panel and control plots are often located in spatially distinct zones (e.g., shaded versus full-sun areas), which limits the ability to fully randomize treatment assignments. This spatial segregation introduces potential correlation and variability that are not captured by fixed effects alone. Mixed models are better suited to handle this through the inclusion of random effects such as Block or Block:Season.
- Mixed models account for the hierarchical and correlated structure of the data, which standard ANOVA does not. In this context, random effects like Block, Block:Season, and Block:Rows are important for modeling underlying heterogeneity due to spatial layout or repeated measures. Because the assumptions of ANOVA—particularly independence and homoscedasticity—are more easily violated under these conditions, the mixed model provides a more statistically robust framework.

We aim to develop open-source code for agronomy ([email protected])
© 2022 – 2025 https://agronomy4future.com – All Rights Reserved.
Last Updated: 07/07/2025