How to Upload Data from GitHub Using R and Python?

How to Upload Data from GitHub Using R and Python?

I have soybean yield data that I want to upload to Github and access from R. First, let’s upload the data to Github. The data should be in .csv format. Click Add file, choose Upload files, and, after uploading, select the Raw button to view the data in .csv format as text. and you can find the address for this data, starting with https://raw.githubusercontent.com/… Let’s copy this address. Next, I’ll bring this data into R from Github. Before that, let’s…

Read More Read More

[Data article] Simulating Crop Growth Over Time Using a Sigmoid Growth Model

[Data article] Simulating Crop Growth Over Time Using a Sigmoid Growth Model

I’m planning to frequently collect biomass samples to observe how biomass accumulation differs among treatments or varieties over time. I assume that the growth will follow a curve pattern, characterized by slow accumulation during the early growing stage, followed by rapid growth, and eventually reaching a plateau. I want to visualize this curve through simulation, and here is the Python code to demonstrate it. First, let’s import the required packages. and I’ll also set up a seed for reproducibility. Next,…

Read More Read More

[STAT Article] RMSE Calculation with Excel and R: A Comprehensive Guide

[STAT Article] RMSE Calculation with Excel and R: A Comprehensive Guide

When running statistical programs, you might encounter RMSE (Root Mean Square Error). For example, the table below shows RMSE values obtained from SAS, indicating that it is ca. 2.72. I’m curious about how RMSE is calculated. Below is the equation for RMSE. First, calculate the difference between the estimated and observed values: (ŷi – yi), and then square the difference: (ŷi – yi)². Second, calculate the sum of squares: Σ(ŷi – yi)². Third, divide the sum of squares by the…

Read More Read More

What is split-plot design in agronomy research?

What is split-plot design in agronomy research?

Split-plot design has been widely used particularly in the agronomy research. In split-plot design, the experimental units are divided into smaller units. Split-plot designs are useful when some factors are difficult or expensive to change or when the levels of the factors cannot be randomized (I’ll explain in detail later). Split-plot design consists of one whole plot and one subplot. The whole plot factor is randomly assigned to the experimental units, while the subplot factor is applied to a smaller…

Read More Read More

[데이터 칼럼] 선형 보간법 (Linear Interpolation) 을 사용하여 중간 데이터를 예측해 보자

[데이터 칼럼] 선형 보간법 (Linear Interpolation) 을 사용하여 중간 데이터를 예측해 보자

오늘은 데이터 사이에 있는 값을 예측하기 위한 선형 보간법 (Linear Interpolation) 에 대해 설명하겠습니다. 예를 들어, 현장에서 데이터를 수집할 때 매일 데이터를 수집할 수는 없을 것입니다. 그래서 우리는 일정한 간격 (매주, 격주, etc.,) 으로 데이터를 수집합니다. 그러나 데이터를 제시할 때는 일별로 표시해야 할 경우가 발생 합니다. 예를 들어, 질소 비료 시비량이 0kg/ha, 30kg/ha, 60kg/ha, 120kg/ha 일 때 반응하는 작물의 수확량 차이를 조사한다고 가정해 보겠습니다. 0부터 120까지의 각 질소 비료량에서 수확량 차이를 나타내야 한다면 어떻게 데이터를 추정할 수 있을까요? 이런 상황에서…

Read More Read More

[R Package] Convert Data into Code Instantly – Save as a Script with One Line

[R Package] Convert Data into Code Instantly – Save as a Script with One Line

When uploading data to R, we sometimes worry about losing track of the data over time. This is because we save data in different folders according to various projects, and we might forget where we stored it. Additionally, if the file path changes, it can be difficult to upload the data directly and locate its current location. Therefore, a better approach is to save the data as code, allowing us to access it directly when opening the R file where…

Read More Read More

[R package] An easy way to use interpolation code to predict in-between data points

[R package] An easy way to use interpolation code to predict in-between data points

In my previous post, I explained how to calculate interpolation to predict in-between data points. ■ [Data article] Predicting Intermediate Data Points with Linear Interpolation in Excel and R To make interpolation calculations easier, particularly for groups, I recently developed a new R package, interpolate(). First, let’s upload a dataset. This dataset contains chlorophyll content measurements for sorghum and soybean. I measured chlorophyll content every 10 days between 65 and 125 days after sowing, with four replicates at each time…

Read More Read More

[Data article] Predicting Intermediate Data Points with Linear Interpolation in Excel and R

[Data article] Predicting Intermediate Data Points with Linear Interpolation in Excel and R

Today, I’ll explain the interpolation technique used to predict in-between data points. For example, when collecting field data, we might not be able to gather information every day, so we establish our own interval (e.g., weekly or bi-weekly). However, when presenting the data, it might be necessary to show it on a daily basis. As another example, consider investigating yield differences in response to varying continuous variables, such as nitrogen at levels of 0, 30, 60, 120. What if we…

Read More Read More

How to Combine Files and Create a New Data Table in MySQL

How to Combine Files and Create a New Data Table in MySQL

In my previous post, I introduced how to combine multiple files into one using Access, and now I’ll explain how to do the same using MySQL. The SQL code is similar in both programs, so the code will be the same. First, I uploaded three different datasets to MySQL, and I want to combine them into one. I’ll use union code to combine all data. Now I want to create this data table. So, I’ll use this code. Now, new…

Read More Read More

How to Rename Variables within Columns in R (feat. case_when() code)?

How to Rename Variables within Columns in R (feat. case_when() code)?

In my previous post, I introduced how to change variable names within columns. In the post, I provided a simple code to rename variables and also used the stringr package for renaming variables. ■ How to Rename Variables within Columns in R? Today I’ll introduce another code to simply rename variables using dplyr() package. In my previous post, using the simple data above, I introduced how to rename variables. For example, we can rename variables using the below code. Or,…

Read More Read More

How to convert to a .json file using Python?

How to convert to a .json file using Python?

Sometimes we need to convert our data to .json format, and I will introduce an easy way to do it using Python. I will use Google Colab. First, let’s mount Google Drive to Google Colab. Second, let’s upload a dataset from GitHub. I’ll convert this data to a .json file and download it to my PC. or I can directly download it to Google Colab. Now .json file is created. Let’s upload this .json file to Google Colab. When uploading…

Read More Read More

How to Use Temporary Tables for Quick Calculations in MySQL?

How to Use Temporary Tables for Quick Calculations in MySQL?

In my SQL, sometimes we need to calculate average or something else for filtered data. It woud be much easiler if we create temporary tables when calculating filtered data. here is an example. First, let’s create a database Second, I’ll create a data table. Let’s see the data table was well created. Now, I want to calculate average for root and total biomass per treatment. Next, I want to calculate average again, but excluding treatment, N3. So, I’ll run this…

Read More Read More

Visualizing Geospatial Data with Folium in Python

Visualizing Geospatial Data with Folium in Python

Recently, I saw the QS World University Rankings; QS World University Rankings by Subject 2024: Agriculture & Forestry. It shows the global university rankings for Agriculture and Forestry Science. Suddenly, I became interested in marking the U.S. agriculture universities on a map to see where these colleges are located in the U.S. I found that the Folium package in Python provides an excellent GIS map with an easy process, and I am sharing the code here. First, using Python, I’ll…

Read More Read More

How to automatically insert linear regression equation in graph in RSTUDIO?

How to automatically insert linear regression equation in graph in RSTUDIO?

Sometimes, we need to insert a linear regression equation inside a graph, but it’s an annoying to type an equation every time when generating a linear regression graph. Using stat_poly_eq(), we can automatically insert a linear regression equation. Let’s generate one data frame. Then, I’ll generate a regression graph. Now let’s analyze a linear regression. The linear model equation is y= 9.1429 + 1.5357x and R2 is 0.9245. Now I’ll insert this equation model automatically using stat_poly_eq(). I’ll add the…

Read More Read More

[Article] Tiny Plants Reveal Big Potential for Boosting Crop Efficiency – Boyce Thompson Institute – Boyce Thompson Institute

[Article] Tiny Plants Reveal Big Potential for Boosting Crop Efficiency – Boyce Thompson Institute – Boyce Thompson Institute

Scientists have long sought ways to help plants turn more carbon dioxide (CO₂) into biomass, which could boost crop yields and even combat climate change. Recent research suggests that a group of unique, often overlooked plants called hornworts may hold the key. “Hornworts possess a remarkable ability that is unique among land plants: they have a natural turbocharger for photosynthesis,” said Tanner Robison, a graduate student at the Boyce Thompson Institute (BTI) and first author of the paper recently published in Nature Plants….

Read More Read More

[슬기로운 코넬 생활 101] 한국에서 이타카 (Ithaca) 가는 방법

[슬기로운 코넬 생활 101] 한국에서 이타카 (Ithaca) 가는 방법

“슬기로운 코넬 생활 101” 은 제가 초기 정착 때 경험한 것들을 시간별로 정리해서 새롭게 오시는 분들에게 필요한 정보를 공유하는 것을 목적으로 하는 프로젝트 입니다. 참고로 “슬기로운 어바나-샴페인 생활 101” 프로젝트는 성공적이었으며 어바나-샴페인으로 새롭게 오시는 분들에게 다양한 로컬 정보를 제공했었습니다 (e.g., 한국에서 어바나-샴페인 (Urbana-Champaign) 가는 방법) 목차1. [슬기로운 코넬 생활 101] 한국에서 이타카 (Ithaca) 가는 방법 일리노이 대학 (University of Illinois at Urbana-Champaign) 연구그룹에서 마무리를 잘 하고 이타카 (Ithaca) 로 잘 이동했습니다. 필드 작물 연구가 활발한 Midwest 지역에서 약간은 필드 작물…

Read More Read More

[STAT Article] Steps to Calculate Log-Likelihood Prior to AIC and BIC: [Part 2] ANOVA model

[STAT Article] Steps to Calculate Log-Likelihood Prior to AIC and BIC: [Part 2] ANOVA model

In my previous post, I explained how to calculate the Log-Likelihood, AIC, and BIC in a regression model. In this post, I will demonstrate the same concepts, but in the context of an ANOVA model. Here I have one dataset. Let’s say this data represents yield in response to different fertilizer types (Control, Slow, and Fast), and I want to determine the effect of fertilizer type on yield. Therefore, I will perform a one-way ANOVA. Now, I observe that the…

Read More Read More

2024 ASA, CSSA, SSSA International Annual Meeting in San Antonio, TX

2024 ASA, CSSA, SSSA International Annual Meeting in San Antonio, TX

I went to San Antonio to attend the ASA, CSSA, SSSA International Annual Meeting. It was my first time to visit Texas, and the weather was great!! I had an oral presentation about my agrivoltaics study I’d conducted for the last two seasons. The title was ‘Shading Impacts on Sorghum and Soybean Grain Yields Under Agrivoltaic Systems: Source-Sink Strength in Response to Shading.’ Agrivoltaic (AV) systems induce shading throughout the entire crop growth period, and understanding how these shading patterns…

Read More Read More

[STAT Article] Steps to Calculate Log-Likelihood Prior to AIC and BIC: [Part 1] regression model

[STAT Article] Steps to Calculate Log-Likelihood Prior to AIC and BIC: [Part 1] regression model

Here I have one dataset. I want to predict grain weight using grain dimension data such as length, width, and area, and identify the best prediction model for estimating grain weight. As a result, I developed the following models. and I’ll calculate Log-likelihood for each model. To do that, I need to know each model equation. Now, I obtained each model equation, and I’ll calculate Log-likelihood For a linear regression model, the Log-Likelihood (LL) is defined as: where:n is the…

Read More Read More

[STAT Article] Step-by-Step Guide to Calculating and Analyzing Principal Component Analysis (PCA) by Hand

[STAT Article] Step-by-Step Guide to Calculating and Analyzing Principal Component Analysis (PCA) by Hand

Principal Component Analysis (PCA) is a statistical technique used for dimensionality reduction while preserving as much variability in the data as possible. It transforms the original variables in a dataset into a new set of uncorrelated variables called principal components, ordered by the amount of variance they capture from the original dataset. Here’s the step of Principal Component Analysis (PCA). 1. Standardize the Data: Since PCA is affected by the scale of the variables, it often begins with standardizing the…

Read More Read More

Understanding Mean Absolute Error (MAE) in ANOVA: A Step-by-Step Guide to Calculation in Excel

Understanding Mean Absolute Error (MAE) in ANOVA: A Step-by-Step Guide to Calculation in Excel

Mean Absolute Error (MAE) is a metric used to measure the accuracy of a model’s predictions. It calculates the average magnitude of the errors in a set of predictions, without considering their direction. In other words, MAE measures the average absolute difference between the actual values and the predicted values. MAE is typically used in the context of regression analysis and prediction error evaluation, rather than in ANOVA (Analysis of Variance), which focuses on comparing the means of different groups….

Read More Read More

Practices in Data Normalization using normtools() in R

Practices in Data Normalization using normtools() in R

■ [R package] Normalization Methods for Data Scaling (Feat. normtools) In my previous post, I introduced the R package normtools(), which I developed to normalize data using various methods. This time, I’ll demonstrate how to use the R package normtools() for data normalization. 1. Data upload This data includes kernel number (KN), average kernel weight (AGW), and grain yield (GY) for different corn varieties across various years, populations, and locations. 2. Data normalization This is the normtools() package. First, I’ll…

Read More Read More

Sorghum panicle damage

Sorghum panicle damage

The damage to sorghum grain can result from a variety of causes, including environmental, biological, and mechanical factors. Here are some common causes: 1. Excessive Rainfall and Humidity 2. Pest Infestation 3. Temperature Stress 4. Mechanical Damage During Harvest 5. Soil Conditions 6. Delayed Harvest 7. Post-Harvest Factors To minimize sorghum grain damage, it is crucial to manage environmental conditions, ensure proper timing of harvest, and implement effective pest control and storage techniques. ■ References □ Rain devastates Downs sorghum…

Read More Read More

How to install Llama 3 in your PC?

How to install Llama 3 in your PC?

Llama 3, or Large Language Model Meta AI 3, is an advanced iteration of Meta’s language models, designed to facilitate a wide array of natural language processing tasks with enhanced capabilities. This model leverages state-of-the-art techniques in deep learning and transformer architectures, providing improved performance in text generation, comprehension, and contextual awareness. We can install Llama 3 in your PC. 1. Visit ollama.com and click the Download button. Select your OS and download. https://ollama.com After downloading, run the OllamaSetup file….

Read More Read More

[R package] Normalization Methods for Data Scaling (Feat. normtools)

[R package] Normalization Methods for Data Scaling (Feat. normtools)

■ [Data article] Data Normalization Techniques: Excel and R as the Initial Steps in Machine Learning In my previous post, I explained how to normalize data using various methods and demonstrated how to perform the calculations for each method. To simplify these calculations, I recently developed an R package that easily generates normalized data. 1. Install the normtools() package 2. Basic code format 3. Practice with actual dataset (data upload) 4. Normalize data 4.1. Z-test normalization 4.2. Robust Scaling 4.3….

Read More Read More

[코딩 교육 플랫폼 추천] 코드트리 (Code Tree)

[코딩 교육 플랫폼 추천] 코드트리 (Code Tree)

비 CS 전공자로서 코딩 공부는 늘 한계를 느끼곤 합니다. 혼자 코딩을 독학하며 현업에 필요한 프로그래밍 코드를 사용하고 있지만 가끔 “이 코드는 왜 이렇게 작동되는 것일까?” 에 대한 궁금증은 늘 가지고 있습니다. 그래서 여러 교육 플랫폼에서 온라인 강의를 들어봐도 대부분의 시각은 전공자에게 맞춰져 있기 때문에 저 같은 비 전공자가 따라 가기에는 종종 한계를 느끼곤 합니다. 최근 저 같은 비 전공자에게 아주 유익한 코딩 교육 플랫폼을 찾았습니다. 이름은 코드트리 (Code Tree) 입니다. 오늘은 이 코딩 교육 플랫폼에 대해 소개해 볼까 합니다. 참고로…

Read More Read More

베이즈 (Bayes) 정리를 가장 쉽게 설명해 보자

베이즈 (Bayes) 정리를 가장 쉽게 설명해 보자

최근에는 베이지안 통계에 집중하고 있습니다. 그래서 개념 정리도 할 겸 베이즈 정리 (Bayes’ theorem)에 대해 최대한 쉽게 한번 설명해 보겠습니다. Kaggle 에서 데이터를 하나 가져오겠습니다. https://www.kaggle.com/datasets/cameronseamons/electronic-sales-sep2023-sep2024 이 데이터는 전자 마트에서 고객의 소비 성향을 분석한 데이터 입니다. Kaggle 에서 회원가입 후 데이터를 다운로드 받을 수 있습니다. 저는 파이썬 코드로 바로 데이터를 가져오겠습니다. 참고로 저는 구글 코랩 (Google Colab) 을 사용합니다. 복잡해 보이지만 어려울게 하나도 없습니다. 아래 코드를 본인의 구글 코랩에 복사/붙여넣기 하고 본인 구글 코랩의 파일 경로만 수정하면 됩니다. 자 이렇게…

Read More Read More

R GIS: Interpolating and Plotting Corn Grain Yield Data

R GIS: Interpolating and Plotting Corn Grain Yield Data

■ Python GIS: Interpolating and Plotting Corn Grain Yield Data In my previous post, I explained how to create a GIS map using Python. Today, I’ll introduce how to create the same GIS map using R. First, let’s install all the required packages. and I’ll upload a dataset for practice. Next, I’ll extract columns for latitude, longitude, and y (output) and I’ll interpolate data Finally, I’ll create a GIS map using ggplot(). Full code If you copy and paste the…

Read More Read More

Graphing Normal Distributions with Varied Variances

Graphing Normal Distributions with Varied Variances

I want to create a normal distribution graph with a specific variance. First, it’s necessary to create the data. I’ll generate data with a mean of 100 and a variance of 100 (which means the standard deviation is 10). However, it’s important to establish a range. To do this, I’ll set up a range of 6σ, and the dataset will contain 1,000 rows. and I’ll create a normal distribution graph. These are graphs with different variances, ranging from 1σ to…

Read More Read More

[R package] Calculation for Growing Degree Days (GDDs, ºCd)

[R package] Calculation for Growing Degree Days (GDDs, ºCd)

Growing Degree Days (GDDs) are a measure of heat accumulation used to predict crop development rates such as the growth of crops. The GDDs are calculated to provide a simple model to estimate the growth and development of plants, especially crops, based on the daily temperature. To calculate GDDs, the base temperature for each crop should first be identified. The base temperature is the temperature below which crop growth is minimal or stops. This temperature varies by crop. For example,…

Read More Read More