The Best Linear Unbiased Estimator (BLUE): Step-by-Step Guide

In this session, I will introduce the method of calculating the Best Linear Unbiased Estimator (BLUE). Instead of simply listing formulas as many websites do to explain BLUE, this post aims to help readers understand the process of calculating BLUE with an actual dataset using R. I have the following data. The dataset comprises three … Read more

Firth’s Logistic Regression: Solving the Problem of Separation

In my previous post, I explained What logistic regression is and also explained odds, odds ratio and model equation. Today, I’ll introduce Firth’s Logistic Regression. Common logistic regression is mostly used when the sample size is sufficiently large, and the outcome variable (0 and 1) is well-balanced. Also, it is used when there is no … Read more

Confidence interval (CI) formula for a two-sample t-test

When performing a t-test, a confidence interval can be obtained. Below, I describe how to calculate the confidence interval manually, step by step. 1) Difference in means 2) Pooled variance 3) Standard error 4) Degrees of freedom 5) Confidence interval and let’s calculate the confidence interval We aim to develop open-source code for agronomy ([email protected]) … Read more

Understanding Bayes’ Theorem Step by Step

Recently, I’ve been focusing on Bayesian statistics. To organize the concepts for myself as well, I’m going to explain Bayes’ theorem as simply as possible. Let’s import a dataset from Kaggle. https://www.kaggle.com/datasets/cameronseamons/electronic-sales-sep2023-sep2024 This dataset contains information used to analyze customer purchasing behavior at an electronics store. You can download it from Kaggle after creating an … Read more

How to analyze quadratic plateau model in R Studio?

Previous post□ How to analyze linear plateau model in R Studio? In my previous post, I explained how to analyze the linear plateau model. I simulated yield data for five different crop varieties with varying sulphur applications and suggested that the optimum sulphur application would be 23.3 kg/ha based on the linear plateau model. In … Read more

What is the Gamma Distribution? Shape and Scale Parameters, and the Probability Density Function (PDF)

The Gamma distribution is a flexible family of continuous probability distributions defined only for non-negative values (x>0). It’s commonly used to model quantities that represent time, size, or waiting periods—anything that can’t go below zero and often shows right-skewed behavior (a long tail toward larger values). In essence, the Gamma distribution describes how likely different … Read more

[R package] Streamlined Mixed-Effects Analysis for Agrivoltaics Experiments (Feat. agrivoltaics)

In my previous post, I suggested different statistical models for agrivoltaics studies to explain why we should consider Linear Mixed Models in this field. In many agrivoltaics studies, researchers overlook the actual experimental layout and analyze data using split-plot or RCBD models, focusing only on treatment variables (e.g., inside vs. outside the solar panel array). … Read more

[STAT Article] How to calculate reaction norm in crop physiology?

□ Quantifying Phenotypic Plasticity of Crops In my previous post, I explained how to quantify phenotypic plasticity in crops and described three different approaches: 1) Responsiveness, 2) Reaction Norm, and the 3) Finlay-Wilkinson Regression Model. Responsiveness is calculated as (Treatment − Control) / Control. It indicates how the dependent variable (e.g., yield) responds to a … Read more

[STAT Article] RMSE Calculation with Excel and R: A Comprehensive Guide

When running statistical programs, you might encounter RMSE (Root Mean Square Error). For example, the table below shows RMSE values obtained from SAS, indicating that it is ca. 2.72. I’m curious about how RMSE is calculated. Below is the equation for RMSE. First, calculate the difference between the estimated and observed values: (ŷi – yi), … Read more

What is split-plot design in agronomy research?

Split-plot design has been widely used particularly in the agronomy research. In split-plot design, the experimental units are divided into smaller units. Split-plot designs are useful when some factors are difficult or expensive to change or when the levels of the factors cannot be randomized (I’ll explain in detail later). Split-plot design consists of one … Read more

[STAT Article] Step-by-Step Guide to Calculating and Analyzing Principal Component Analysis (PCA) by Hand

Principal Component Analysis (PCA) is a statistical technique used for dimensionality reduction while preserving as much variability in the data as possible. It transforms the original variables in a dataset into a new set of uncorrelated variables called principal components, ordered by the amount of variance they capture from the original dataset. Here’s the step … Read more

Understanding Mean Absolute Error (MAE) in ANOVA: A Step-by-Step Guide to Calculation in Excel

Mean Absolute Error (MAE) is a metric used to measure the accuracy of a model’s predictions. It calculates the average magnitude of the errors in a set of predictions, without considering their direction. In other words, MAE measures the average absolute difference between the actual values and the predicted values. MAE is typically used in … Read more

Stepwise Regression: A Practical Approach for Model Selection using R

Stepwise selection, forward selection, and backward elimination are all methods used in the context of building statistical models, specifically regression models, where the goal is to select the most relevant predictors. In this section, I’ll introduce one by one. Let’s generate one dataset. This dataset includes grain yield data, along with measurements of stem biomass, … Read more

A Practical Approach to Linear Mixed-Effects Modeling in R

A Linear Mixed-Effects Model (LMM) is a statistical model that combines both fixed effects and random effects to analyze data with repeated measurements or hierarchical structure. Let’s break down the key components and concepts of a Linear Mixed-Effects Model: 1) Fixed Effects: 2) Random Effects: 3) Linear Mixed-Effects Model Equation: The general equation of a … Read more

Understanding Multiple Linear Regression Easily (Part 2: Calculating the Coefficient of Determination Manually)

□ Understanding Multiple Linear Regression Easily (Part 1: Calculating the Regression Equation Manually) In the previous post, we explained how to manually calculate the regression equation in multiple linear regression analysis. Now, in this post, I will explain how to calculate the coefficient of determination (R2) in multiple linear regression analysis. No. Yield (yi) Time … Read more

Understanding Multiple Linear Regression Easily (Part 1: Calculating the Regression Equation Manually)

In my previous posts, I explained the simple linear regression model as five categories. I recommend reading the following posts first. □ Simple linear regression (1/5)- correlation and covariance□ Simple linear regression (2/5)- slope and intercept of linear regression model□ Simple linear regression (3/5)- standard error of slope and intercept□ Simple linear regression (4/5)- t … Read more

Easy-to-Understand Guide to Factorial Experiments and Two-Way ANOVA

Today, I’ll try to explain factorial experiments in the simplest way. When you apply multiple different factors simultaneously to derive experimental results, it’s called factorial experiments. The different treatments within the experiment are referred to as ‘factorials.’ In other words, a factorial is a combination of factors. [Note 1] A factorial experiment is a research … Read more

Two-Way ANOVA Tutorial Using SAS Studio

I will introduce how to perform a Two-Way ANOVA analysis using SAS Studio. Here is the data that you have available: Upload this Excel file to SAS Studio. After uploading the Excel file to SAS Studio, create a data table named “EXP1” in My Libraries. Then, click on the EXP1 data table. Then, select the … Read more

Quantifying Phenotypic Plasticity of Crops

Phenotypic plasticity refers to the ability of an individual organism, in this case, a plant, to display varying phenotypic traits or characteristics in response to different environmental conditions. These traits can include physical features, physiological processes, and behaviors. Phenotypic plasticity is a crucial adaptive mechanism that allows organisms to optimize their survival and reproduction in … Read more

Statistical Inference on Binomially Distributed Data

The primary purpose of our experiment is to validate hypotheses regarding the population of the subjects under study. As a result, the experimenter must determine whether to accept or reject these hypotheses based on the experiment’s results. In this context, the method of statistical analysis will vary depending on whether the sample data follows a … Read more

R-Squared Calculation in Linear Regression with Zero Intercept

Previously, I scanned wheat grains to obtain the area of each grain, and then measured the weight of each grain corresponding to its area in order to develop a model equation. The following regression demonstrates the relationship between grain area and weight. You can download this data in Kaggle. # Data download https://www.kaggle.com/datasets/agronomy4future/wheat-grain-area-vs-weight Alternatively, you … Read more

[STAT article] Two-Way ANOVA: An Essential Tool for Understanding Factorial Experiments

A factorial experiment involves the simultaneous manipulation of multiple factors or independent variables (x) to study their effects on a dependent variable (y). The experiment is called factorial because it involves testing multiple factors simultaneously. In factorial experiments, the combination of the different levels of each factor being tested is called a factorial, and each … Read more

[STAT Article] What is the statistical method for comparing whether the slopes and y-intercepts in a regression model are the same or not (Feat. ANCOVA using R and SAS)?

To gain a basic understanding of the topic, I recommend reading the following posts. Analysis of Covariance (ANCOVA) I have a dataset as shown below, and I would like to analyze crop yield, and height based on different fertilizer types (Control, Slow-release, and Fast-release). The experimental design is a Completely Randomized Design (CRD) with 10 … Read more

What is the F-ratio in statistics?

Today, I will explain the meaning of the F-value in testing for significance through statistical processing. Let me give you an example. Suppose we want to determine whether there are differences in the yield according to the varieties (A, B, C). The total experimental unit is 12 (3 varieties x 4 replicates). What would happen … Read more