Statistics Archives - Agronomy4future

Confidence interval (CI) formula for a two-sample t-test

December 12, 2025 JK

When performing a t-test, a confidence interval can be obtained. Below, I describe how to calculate the confidence interval manually, step by step. 1) Difference in means 2) Pooled variance 3) Standard error 4) Degrees of freedom 5) Confidence interval and let’s calculate the confidence interval We aim to develop open-source code for agronomy ([email protected]) © 2022 – 2025 https://agronomy4future.com – All Rights Reserved. Last Updated: 12/12/2025

Understanding Bayes’ Theorem Step by Step

December 8, 2025 JK

Recently, I’ve been focusing on Bayesian statistics. To organize the concepts for myself as well, I’m going to explain Bayes’ theorem as simply as possible. Let’s import a dataset from Kaggle. https://www.kaggle.com/datasets/cameronseamons/electronic-sales-sep2023-sep2024 This dataset contains information used to analyze customer purchasing behavior at an electronics store. You can download it from Kaggle after creating an account. I’ll load the data directly using Python code. I’m using Google Colab. This is how I imported the dataset directly from Kaggle. For more…

Read More Read More

How to analyze quadratic plateau model in R Studio?

November 7, 2025 JK

Previous post□ How to analyze linear plateau model in R Studio? In my previous post, I explained how to analyze the linear plateau model. I simulated yield data for five different crop varieties with varying sulphur applications and suggested that the optimum sulphur application would be 23.3 kg/ha based on the linear plateau model. In this post, I’ll explain how to analyze the quadratic plateau model using the same data in R Studio. 1) Data upload If you run the…

Read More Read More

What is the Gamma Distribution? Shape and Scale Parameters, and the Probability Density Function (PDF)

November 3, 2025 JK

The Gamma distribution is a flexible family of continuous probability distributions defined only for non-negative values (x>0). It’s commonly used to model quantities that represent time, size, or waiting periods—anything that can’t go below zero and often shows right-skewed behavior (a long tail toward larger values). In essence, the Gamma distribution describes how likely different positive values are to occur, determined by two key parameters: the shape (α) and the scale (θ). Together, these parameters control the curve’s form and…

Read More Read More

[R package] Streamlined Mixed-Effects Analysis for Agrivoltaics Experiments (Feat. agrivoltaics)

August 3, 2025 JK

In my previous post, I suggested different statistical models for agrivoltaics studies to explain why we should consider Linear Mixed Models in this field. In many agrivoltaics studies, researchers overlook the actual experimental layout and analyze data using split-plot or RCBD models, focusing only on treatment variables (e.g., inside vs. outside the solar panel array). However, this approach does not accurately reflect real field conditions. ■ [STAT Article] Statistical Models in Agrivoltaics: Linear Mixed Models Across Different Field Layouts Randomization…

Read More Read More

[STAT Article] Statistical Models in Agrivoltaics: Linear Mixed Models Across Different Field Layouts

June 28, 2025 JK

Agrivoltaics is the study and practice of combining agriculture and solar energy production on the same land. The core idea is to install solar panels above or among crops, allowing for simultaneous food and energy production. Generally, an agrivoltaics study investigates how this dual-use approach affects crop growth and yield (due to changes in light, temperature, and moisture), microclimate conditions under the panels, solar panel efficiency influenced by vegetation, as well as land-use efficiency, economic outcomes, and sustainability metrics. Today,…

Read More Read More

[STAT Article] How to calculate reaction norm in crop physiology?

June 18, 2025 JK

□ Quantifying Phenotypic Plasticity of Crops In my previous post, I explained how to quantify phenotypic plasticity in crops and described three different approaches: 1) Responsiveness, 2) Reaction Norm, and the 3) Finlay-Wilkinson Regression Model. Responsiveness is calculated as (Treatment − Control) / Control. It indicates how the dependent variable (e.g., yield) responds to a given treatment relative to the control. The responsiveness value can range from −1 (complete reduction) to values greater than 0, depending on the magnitude of…

Read More Read More

[STAT Article] RMSE Calculation with Excel and R: A Comprehensive Guide

March 21, 2025 JK

When running statistical programs, you might encounter RMSE (Root Mean Square Error). For example, the table below shows RMSE values obtained from SAS, indicating that it is ca. 2.72. I’m curious about how RMSE is calculated. Below is the equation for RMSE. First, calculate the difference between the estimated and observed values: (ŷi – yi), and then square the difference: (ŷi – yi)². Second, calculate the sum of squares: Σ(ŷi – yi)². Third, divide the sum of squares by the…

Read More Read More

What is split-plot design in agronomy research?

March 21, 2025 JK

Split-plot design has been widely used particularly in the agronomy research. In split-plot design, the experimental units are divided into smaller units. Split-plot designs are useful when some factors are difficult or expensive to change or when the levels of the factors cannot be randomized (I’ll explain in detail later). Split-plot design consists of one whole plot and one subplot. The whole plot factor is randomly assigned to the experimental units, while the subplot factor is applied to a smaller…

Read More Read More

[STAT Article] Steps to Calculate Log-Likelihood Prior to AIC and BIC: [Part 2] ANOVA model

November 15, 2024 JK

In my previous post, I explained how to calculate the Log-Likelihood, AIC, and BIC in a regression model. In this post, I will demonstrate the same concepts, but in the context of an ANOVA model. Here I have one dataset. Let’s say this data represents yield in response to different fertilizer types (Control, Slow, and Fast), and I want to determine the effect of fertilizer type on yield. Therefore, I will perform a one-way ANOVA. Now, I observe that the…

Read More Read More

[STAT Article] Steps to Calculate Log-Likelihood Prior to AIC and BIC: [Part 1] regression model

November 8, 2024 JK

Here I have one dataset. I want to predict grain weight using grain dimension data such as length, width, and area, and identify the best prediction model for estimating grain weight. As a result, I developed the following models. and I’ll calculate Log-likelihood for each model. To do that, I need to know each model equation. Now, I obtained each model equation, and I’ll calculate Log-likelihood For a linear regression model, the Log-Likelihood (LL) is defined as: where:n is the…

Read More Read More

[STAT Article] Step-by-Step Guide to Calculating and Analyzing Principal Component Analysis (PCA) by Hand

November 1, 2024 JK

Principal Component Analysis (PCA) is a statistical technique used for dimensionality reduction while preserving as much variability in the data as possible. It transforms the original variables in a dataset into a new set of uncorrelated variables called principal components, ordered by the amount of variance they capture from the original dataset. Here’s the step of Principal Component Analysis (PCA). 1. Standardize the Data: Since PCA is affected by the scale of the variables, it often begins with standardizing the…

Read More Read More

Understanding Mean Absolute Error (MAE) in ANOVA: A Step-by-Step Guide to Calculation in Excel

October 27, 2024 JK

Mean Absolute Error (MAE) is a metric used to measure the accuracy of a model’s predictions. It calculates the average magnitude of the errors in a set of predictions, without considering their direction. In other words, MAE measures the average absolute difference between the actual values and the predicted values. MAE is typically used in the context of regression analysis and prediction error evaluation, rather than in ANOVA (Analysis of Variance), which focuses on comparing the means of different groups….

Read More Read More

Stepwise Regression: A Practical Approach for Model Selection using R

May 7, 2024 JK

Stepwise selection, forward selection, and backward elimination are all methods used in the context of building statistical models, specifically regression models, where the goal is to select the most relevant predictors. In this section, I’ll introduce one by one. Let’s generate one dataset. This dataset includes grain yield data, along with measurements of stem biomass, grain weight (agw), and grain number (gn). I would now like to determine which variables are the most critical factors in influencing the final grain…

Read More Read More

A Practical Approach to Linear Mixed-Effects Modeling in R

December 28, 2023 JK

A Linear Mixed-Effects Model (LMM) is a statistical model that combines both fixed effects and random effects to analyze data with repeated measurements or hierarchical structure. Let’s break down the key components and concepts of a Linear Mixed-Effects Model: 1) Fixed Effects: 2) Random Effects: 3) Linear Mixed-Effects Model Equation: The general equation of a Linear Mixed-Effects Model can be written as: Y= Xβ + Zb + ε 4) Estimation: In summary, Linear Mixed-Effects Models are a powerful statistical tool…

Read More Read More

Understanding Multiple Linear Regression Easily (Part 2: Calculating the Coefficient of Determination Manually)

September 14, 2023 JK

□ Understanding Multiple Linear Regression Easily (Part 1: Calculating the Regression Equation Manually) In the previous post, we explained how to manually calculate the regression equation in multiple linear regression analysis. Now, in this post, I will explain how to calculate the coefficient of determination (R2) in multiple linear regression analysis. No. Yield (yi) Time (xi1) Moisture (xi2) 1 4.3 4 0.2 2 5.5 5 0.2 3 6.8 6 0.2 4 8.0 7 0.2 5 4.0 4 0.3 6 5.2…

Read More Read More

Understanding Multiple Linear Regression Easily (Part 1: Calculating the Regression Equation Manually)

September 14, 2023 JK

In my previous posts, I explained the simple linear regression model as five categories. I recommend reading the following posts first. □ Simple linear regression (1/5)- correlation and covariance□ Simple linear regression (2/5)- slope and intercept of linear regression model□ Simple linear regression (3/5)- standard error of slope and intercept□ Simple linear regression (4/5)- t value on the slope and intercept □ Simple linear regression (5/5)- R_squared In this session, I will explain multiple regression analysis. Multiple regression analysis refers to…

Read More Read More

Step-by-Step Guide: Uploading Data and Conducting Statistical Analysis in SAS Studio

August 28, 2023 JK

SAS Studio is a web version of the SAS program, and it can be used for free. As my current license for the statistical program I’ve been using is about to expire, I was searching for alternatives. Upon discovering SAS Studio, I decided to give it a try. Although I have never used SAS before, I’ve decided to take this opportunity to learn. I will now summarize the very basic learning materials I have covered up to this point. First,…

Read More Read More

Easy-to-Understand Guide to Factorial Experiments and Two-Way ANOVA

August 27, 2023 JK

Today, I’ll try to explain factorial experiments in the simplest way. When you apply multiple different factors simultaneously to derive experimental results, it’s called factorial experiments. The different treatments within the experiment are referred to as ‘factorials.’ In other words, a factorial is a combination of factors. [Note 1] A factorial experiment is a research design in which multiple independent variables, also known as factors, are manipulated simultaneously to analyze their combined effects on a dependent variable. The goal of…

Read More Read More

Two-Way ANOVA Tutorial Using SAS Studio

August 27, 2023 JK

I will introduce how to perform a Two-Way ANOVA analysis using SAS Studio. Here is the data that you have available: Upload this Excel file to SAS Studio. After uploading the Excel file to SAS Studio, create a data table named “EXP1” in My Libraries. Then, click on the EXP1 data table. Then, select the icon for generating code located at the top. By doing so, a new tab named “Program 1” will be created, allowing you to generate the…

Read More Read More

Quantifying Phenotypic Plasticity of Crops

August 27, 2023 JK

Phenotypic plasticity refers to the ability of an individual organism, in this case, a plant, to display varying phenotypic traits or characteristics in response to different environmental conditions. These traits can include physical features, physiological processes, and behaviors. Phenotypic plasticity is a crucial adaptive mechanism that allows organisms to optimize their survival and reproduction in varying environments. Crops are particularly reliant on phenotypic plasticity to cope with changes in factors such as light, temperature, moisture, nutrient availability, and other environmental…

Read More Read More

Statistical Inference on Binomially Distributed Data

August 27, 2023 JK

The primary purpose of our experiment is to validate hypotheses regarding the population of the subjects under study. As a result, the experimenter must determine whether to accept or reject these hypotheses based on the experiment’s results. In this context, the method of statistical analysis will vary depending on whether the sample data follows a normal distribution or a binomial distribution. Today, we will introduce statistical testing methods for data that conform to a binomial distribution. Let’s delve into an…

Read More Read More

[STAT Article] Easy Guide to Cook’s Distance Calculation Using Excel and R

May 14, 2023 JK

I have 1,000 data points of measurements of the length (mm) and weight (mg) of wheat grains. With this data, I want to analyze the relationship between the length and weight of the wheat grain to propose an equation model that can predict grain weight. I will draw a graph to visualize the data. If you are new to R, you can copy and paste the following code into your R script window to obtain the same graph as shown…

Read More Read More

R-Squared Calculation in Linear Regression with Zero Intercept

May 8, 2023 JK

Previously, I scanned wheat grains to obtain the area of each grain, and then measured the weight of each grain corresponding to its area in order to develop a model equation. The following regression demonstrates the relationship between grain area and weight. You can download this data in Kaggle. # Data download https://www.kaggle.com/datasets/agronomy4future/wheat-grain-area-vs-weight Alternatively, you can upload the data directly to R using the code below. you can download this data to your PC, using the code below. Let’s analyze…

Read More Read More

[STAT article] Two-Way ANOVA: An Essential Tool for Understanding Factorial Experiments

May 4, 2023 JK

A factorial experiment involves the simultaneous manipulation of multiple factors or independent variables (x) to study their effects on a dependent variable (y). The experiment is called factorial because it involves testing multiple factors simultaneously. In factorial experiments, the combination of the different levels of each factor being tested is called a factorial, and each factorial represents a unique combination of these levels. For instance, N0_Genotyp1, N0_Genotyp2, N1_Genotyp1, N1_Genotyp2, etc. are different factorials used to conduct the experiment and analyze…

Read More Read More

Augment Models: How to Calculate Contrasts and Analyze Your Data with Excel and R?

April 28, 2023 JK

I have the following data. Nitrogen Sulphur Rep Yield 0 0 1 1.0 0 0 2 0.9 0 0 3 0.8 N1 S1 1 1.0 N1 S1 2 1.2 N1 S1 3 1.3 N1 S2 1 2.1 N1 S2 2 2.2 N1 S2 3 2.3 N2 S1 1 1.4 N2 S1 2 1.6 N2 S1 3 1.7 N2 S2 1 2.5 N2 S2 2 2.6 N2 S2 3 2.8 Let’s assume that this data is the result of investigating how…

Read More Read More

The Best Linear Unbiased Estimator (BLUE): Step-by-Step Guide using R (with AllInOne Package)

April 24, 2023 JK

In this session, I will introduce the method of calculating the Best Linear Unbiased Estimator (BLUE). Instead of simply listing formulas as many websites do to explain BLUE, this post aims to help readers understand the process of calculating BLUE with an actual dataset using R. I have the following data. location sulphur (kg/ha) block yield Cordoba 0 1 750 Cordoba 24 1 1250 Cordoba 36 1 1550 Cordoba 48 1 1120 Cordoba 0 2 780 Cordoba 24 2 1280…

Read More Read More

[STAT Article] What is the statistical method for comparing whether the slopes and y-intercepts in a regression model are the same or not (Feat. ANCOVA using R and SAS)?

April 7, 2023 JK

To gain a basic understanding of the topic, I recommend reading the following posts. Analysis of Covariance (ANCOVA) I have a dataset as shown below, and I would like to analyze crop yield, and height based on different fertilizer types (Control, Slow-release, and Fast-release). The experimental design is a Completely Randomized Design (CRD) with 10 replicates. Rep Fertilizer Yield Height Fertilizer Yield Height Fertilizer Yield Height 1 Control 12.2 45.0 Slow 16.6 63.0 Fast 9.5 52.0 2 Control 12.4 52.0…

Read More Read More

What is the F-ratio in statistics?

March 31, 2023 JK

Today, I will explain the meaning of the F-value in testing for significance through statistical processing. Let me give you an example. Suppose we want to determine whether there are differences in the yield according to the varieties (A, B, C). The total experimental unit is 12 (3 varieties x 4 replicates). What would happen if there is a significant difference in yield among varieties A and C? If there is a large difference in yield between these varieties, the…

Read More Read More

Simple linear regression (5/5)- Coefficient of determination

March 28, 2023 JK

Here is data for x and y. I would like to perform regression analysis to understand how y changes with x. n x y 1 10 30 2 20 40 3 30 50 4 40 80 5 50 90 6 60 100 7 70 120 I have data for x and y as described above, and want to determine the regression model for this data, where the dependent variable y changes according to the independent variable x, in the form…

Read More Read More