R programming Archives - Agronomy4future

[R package] Quantifying Reaction Norm Plasticity from Slopes to Individual Responses (Feat. nrmodel)

June 23, 2025 JK

This post is part of a series introducing R packages that present methods for calculating phenotypic plasticity. □ [R package] Finlay-Wilkinson Regression model (feat. fwrmodel)□ [R package] Calculate the responsiveness of each treatment relative to a control (Feat. deltactrl)□ [R package] Quantifying Reaction Norm Plasticity from Slopes to Individual Responses (Feat. nrmodel) In my previous post, I explained how to quantify phenotypic plasticity in crops and described three different approaches: Responsiveness, Reaction Norm, and the Finlay-Wilkinson Regression Model. □ Quantifying…

Read More Read More

[Data article] Visualizing Responsiveness: Integrating Raw Data for a Holistic Dataset View

June 9, 2025 JK

■ [R package] Embedding Key Descriptive Statistics within Original Data (Feat. descriptivestat) ■ [R package] Calculate the responsiveness of each treatment relative to a control (Feat. deltactrl) In my previous posts, I introduced two R packages. The first package, descriptivestat(), displays raw data along with mean values and additional descriptive statistics. The second package, deltactrl(), calculates the responsiveness of dependent variables in response to the control. Today, in this article, I will demonstrate how combining these two R packages allows…

Read More Read More

[R package] Calculate the responsiveness of each treatment relative to a control (Feat. deltactrl)

June 9, 2025 JK

■ Quantifying Phenotypic Plasticity of Crops In my previous posts, I explained how to quantify phenotypic plasticity of crops in response to environmental factors, and introduced the concept of responsiveness, calculated using the formula: (Treatment – Control) / Control, as shown in the table below. Genotype Control Treatment Responsiveness A 100 90 -10.0% = (90-100)/100 B 120 70 -41.7% C 115 90 -21.7% D 95 85 -10.5% E 110 105 -4.5% While calculating responsiveness for a single variable is relatively…

Read More Read More

[R package] Embedding Key Descriptive Statistics within Original Data (Feat. descriptivestat)

May 18, 2025 JK

When analyzing data, we often need to examine descriptive statistics such as standard deviation, standard error, or the coefficient of variation (CV). These are typically summarized in a short table, but in some cases, it may be necessary to include such statistics directly in the original dataset for further analysis. Let’s upload a dataset to begin. I would like to calculate the mean, variance, standard deviation, standard error, 95% confidence interval, coefficient of variation, and the 25th percentile of yield…

Read More Read More

How to upload data from Google Drive to Google Colab in an R environment?

May 13, 2025 JK

How to use Google Colab for Python (power tool to analyze data)? In my previous post, I introduced how to upload data from Google Drive to Google Colab in a Python environment. Google Colab is primarily Python-based, but now we can change the runtime to R and use R code in Google Colab. Today, I will introduce how to upload data from Google Drive to Google Colab in an R environment. First, let’s upload a data file to Google Drive…

Read More Read More

[R package] R-Squared Calculation in Simple Linear Regression with Zero Intercept (Feat. Intercept0)

May 11, 2025 JK

In my previous article, I suggested when forcing the intercept to zero in simple linear regression model, the existing calculation R2 = SSR / SST is incorrect. Instead, when forcing the intercept to zero, R2 should be calculated as shown below. 1 – SSE (when intercept is 0) / SST (when intercept exists) ■ R-Squared Calculation in Linear Regression with Zero Intercept This is because that only the SSE (Sum of Squared Error) is calculated as Σ(yi – ŷi)2, regardless of the presence of an…

Read More Read More

How to download data from R environment in Google Colab?

May 5, 2025 JK

How to Upload Data from GitHub Using R and Python?

March 21, 2025 JK

I have soybean yield data that I want to upload to Github and access from R. First, let’s upload the data to Github. The data should be in .csv format. Click Add file, choose Upload files, and, after uploading, select the Raw button to view the data in .csv format as text. and you can find the address for this data, starting with https://raw.githubusercontent.com/… Let’s copy this address. Next, I’ll bring this data into R from Github. Before that, let’s…

Read More Read More

[STAT Article] RMSE Calculation with Excel and R: A Comprehensive Guide

March 21, 2025 JK

When running statistical programs, you might encounter RMSE (Root Mean Square Error). For example, the table below shows RMSE values obtained from SAS, indicating that it is ca. 2.72. I’m curious about how RMSE is calculated. Below is the equation for RMSE. First, calculate the difference between the estimated and observed values: (ŷi – yi), and then square the difference: (ŷi – yi)². Second, calculate the sum of squares: Σ(ŷi – yi)². Third, divide the sum of squares by the…

Read More Read More

What is split-plot design in agronomy research?

March 21, 2025 JK

Split-plot design has been widely used particularly in the agronomy research. In split-plot design, the experimental units are divided into smaller units. Split-plot designs are useful when some factors are difficult or expensive to change or when the levels of the factors cannot be randomized (I’ll explain in detail later). Split-plot design consists of one whole plot and one subplot. The whole plot factor is randomly assigned to the experimental units, while the subplot factor is applied to a smaller…

Read More Read More

[R Package] Convert Data into Code Instantly – Save as a Script with One Line

March 6, 2025 JK

When uploading data to R, we sometimes worry about losing track of the data over time. This is because we save data in different folders according to various projects, and we might forget where we stored it. Additionally, if the file path changes, it can be difficult to upload the data directly and locate its current location. Therefore, a better approach is to save the data as code, allowing us to access it directly when opening the R file where…

Read More Read More

[R package] An easy way to use interpolation code to predict in-between data points

March 2, 2025 JK

In my previous post, I explained how to calculate interpolation to predict in-between data points. ■ [Data article] Predicting Intermediate Data Points with Linear Interpolation in Excel and R To make interpolation calculations easier, particularly for groups, I recently developed a new R package, interpolate(). First, let’s upload a dataset. This dataset contains chlorophyll content measurements for sorghum and soybean. I measured chlorophyll content every 10 days between 65 and 125 days after sowing, with four replicates at each time…

Read More Read More

How to Rename Variables within Columns in R (feat. case_when() code)?

February 28, 2025 JK

In my previous post, I introduced how to change variable names within columns. In the post, I provided a simple code to rename variables and also used the stringr package for renaming variables. ■ How to Rename Variables within Columns in R? Today I’ll introduce another code to simply rename variables using dplyr() package. In my previous post, using the simple data above, I introduced how to rename variables. For example, we can rename variables using the below code. Or,…

Read More Read More

How to automatically insert linear regression equation in graph in RSTUDIO?

February 11, 2025 JK

Sometimes, we need to insert a linear regression equation inside a graph, but it’s an annoying to type an equation every time when generating a linear regression graph. Using stat_poly_eq(), we can automatically insert a linear regression equation. Let’s generate one data frame. Then, I’ll generate a regression graph. Now let’s analyze a linear regression. The linear model equation is y= 9.1429 + 1.5357x and R2 is 0.9245. Now I’ll insert this equation model automatically using stat_poly_eq(). I’ll add the…

Read More Read More

Practices in Data Normalization using normtools() in R

October 13, 2024 JK

■ [R package] Normalization Methods for Data Scaling (Feat. normtools) In my previous post, I introduced the R package normtools(), which I developed to normalize data using various methods. This time, I’ll demonstrate how to use the R package normtools() for data normalization. 1. Data upload This data includes kernel number (KN), average kernel weight (AGW), and grain yield (GY) for different corn varieties across various years, populations, and locations. 2. Data normalization This is the normtools() package. First, I’ll…

Read More Read More

[R package] Normalization Methods for Data Scaling (Feat. normtools)

October 7, 2024 JK

■ [Data article] Data Normalization Techniques: Excel and R as the Initial Steps in Machine Learning In my previous post, I explained how to normalize data using various methods and demonstrated how to perform the calculations for each method. To simplify these calculations, I recently developed an R package that easily generates normalized data. 1. Install the normtools() package 2. Basic code format 3. Practice with actual dataset (data upload) 4. Normalize data 4.1. Z-test normalization 4.2. Robust Scaling 4.3….

Read More Read More

R GIS: Interpolating and Plotting Corn Grain Yield Data

September 22, 2024 JK

■ Python GIS: Interpolating and Plotting Corn Grain Yield Data In my previous post, I explained how to create a GIS map using Python. Today, I’ll introduce how to create the same GIS map using R. First, let’s install all the required packages. and I’ll upload a dataset for practice. Next, I’ll extract columns for latitude, longitude, and y (output) and I’ll interpolate data Finally, I’ll create a GIS map using ggplot(). Full code If you copy and paste the…

Read More Read More

Graphing Normal Distributions with Varied Variances

September 22, 2024 JK

I want to create a normal distribution graph with a specific variance. First, it’s necessary to create the data. I’ll generate data with a mean of 100 and a variance of 100 (which means the standard deviation is 10). However, it’s important to establish a range. To do this, I’ll set up a range of 6σ, and the dataset will contain 1,000 rows. and I’ll create a normal distribution graph. These are graphs with different variances, ranging from 1σ to…

Read More Read More

[R package] Calculation for Growing Degree Days (GDDs, ºCd)

September 14, 2024 JK

Growing Degree Days (GDDs) are a measure of heat accumulation used to predict crop development rates such as the growth of crops. The GDDs are calculated to provide a simple model to estimate the growth and development of plants, especially crops, based on the daily temperature. To calculate GDDs, the base temperature for each crop should first be identified. The base temperature is the temperature below which crop growth is minimal or stops. This temperature varies by crop. For example,…

Read More Read More

[R package] Prediction of Grain Weight and Area in Bread Wheat (feat. kimindex)

September 9, 2024 JK

These days, image analysis equipment can easily provide grain area measurements (mm²), and the large datasets acquired instantly from this equipment offer more insights into wheat grains. While grain weight can be a good indicator of wheat yield, obtaining data on grain weight is challenging with the available equipment. Currently, average grain weight is calculated using thousand kernel weight (TWK), a process that is time-consuming and labor-intensive. Therefore, predicting wheat grain weight from the grain area would allow us to…

Read More Read More

[R package] Probability Distribution and Z-Score Calculation Function (feat. probdistz)

August 31, 2024 JK

■ Introduction ■ What is Probability Density Function (PDF) and Cumulative Distribution Function (CDF): How to calculate using Excel and R? In my previous post, I explained what the Probability Density Function (PDF) and the Cumulative Distribution Function (CDF) are. I also explained the formula for the PDF and demonstrated how to manually calculate it in Excel. Additionally, I mentioned the Excel function that performs the same calculation for the PDF, as follows: I then introduced how to create a probability distribution…

Read More Read More

[R package] Finlay-Wilkinson Regression model (feat. fwrmodel)

August 24, 2024 JK

■ What is Finlay-Wilkinson Regression Model? In my previous post, I introduced what Finlay-Wilkinson Regression Model is and how to calculate adaptability (or stability). Actually, adaptability and stability are opposite concept with the same data. Have you ever heard heritability (h2)? Heritability is a key concept in genetics and breeding that measures how much of the variation in a trait within a population is due to genetic differences among individuals. In other words, it quantifies the proportion of phenotypic variation…

Read More Read More

In R Studio, how to exclude missing value (NA)?

May 17, 2024 JK

I’ll create one data. In genotype D, yield data was missed, so it was indicated as NA. Now I’ll calculate the mean of total yield across all genotypes. As you see above, we can’t calculate the mean dud to NA. To obtain the mean of total yield, we should exclude NA. Using subset(), we can simply exclude Genotype D, But, a much simpler way is to use the code na.rm=TRUE, which enables you to avoid using subset(). When the data…

Read More Read More

How to Sample a Portion of Data using R?

May 9, 2024 JK

I have one big dataset. Let’s upload to R. This data has 96,319 data rows. I want to use some part of this data. How can I randomly extract some data from the whole dataset. First, I’ll add number from 1 to the end of the data row to provide ID of each data row. Caret package The caret package (short for Classification And REgression Training) is a set of functions that attempt to streamline the process for creating predictive models. You can find…

Read More Read More

Stepwise Regression: A Practical Approach for Model Selection using R

May 7, 2024 JK

Stepwise selection, forward selection, and backward elimination are all methods used in the context of building statistical models, specifically regression models, where the goal is to select the most relevant predictors. In this section, I’ll introduce one by one. Let’s generate one dataset. This dataset includes grain yield data, along with measurements of stem biomass, grain weight (agw), and grain number (gn). I would now like to determine which variables are the most critical factors in influencing the final grain…

Read More Read More

In R, how to check the data structure?

May 6, 2024 JK

When uploading data to R, we first need to check the data structure before analyzing it. Here are some tips for checking the data structure in R. First, I’ll upload a dataset from my GitHub. In this dataset, let’s check the structure of the data. ■ Code to display the first or last certain rows When we examine the data, we can simply run the variable df or use print(df) to display it. However, if we want to quickly understand…

Read More Read More

Coding Light Spectrum Curves for Plant Growth in R

May 1, 2024 JK

Let’s say we collected relative light intensity data across a wide range of the light spectrum in an LED experiment. and I’d like to create light spectrum curves regarding relative light intensity. First, I’ll define wavelength colors. The color at different ranges of wavelengths is always the same, so if we run this code, we can obtain the same color range at wavelength (which would be the x-axis of the graph). and let’s create curve graph. I’ll highlight the ranges…

Read More Read More

[Data article] Data Normalization Techniques: Excel and R as the Initial Steps in Machine Learning

April 27, 2024 JK

In my previous post, I introduced the necessity of data normalization in visualizing data. By following that post, you may gain an understanding of how we can organize data according to our preferences. □ Why is data normalization necessary when visualizing data? Today, I’ll introduce various methods for data normalization, utilizing the biomass with N and P uptake data available on my GitHub. R coding Python coding I also aim to create regression graphs illustrating the relationship between biomass and…

Read More Read More

[Data article] Why is data normalization necessary when visualizing data?

April 23, 2024 JK

Data normalization is necessary when visualizing data for several key reasons, and I believe the most important reason is for scale uniformity. Different data variables can have vastly different scales and units. For example, grain yield might be in Mg/ha, while nutrient contents might typically range from %. Normalizing these data to a common scale (like 0 to 1) allows them to be compared and visualized on the same axis without one overshadowing the other due to its scale. Additionally,…

Read More Read More

How to draw a y-axis border when using facet_wrap() in R? (feat. scales=”free”)

April 22, 2024 JK

Here is one dataset, and I’ll use facet_wrap() to create bar graphs. First, let’s summarize the data. Then, I’ll create a bar graph using facet_wrap() to divide panels by irrigation. Now, I want to draw a y-axis border for the ‘Irrigation_Yes’ panel. We can achieve this simply by adding scales=”free”. © 2022 – 2023 https://agronomy4future.com