[Data article] Visualizing Responsiveness: Integrating Raw Data for a Holistic Dataset View

[Data article] Visualizing Responsiveness: Integrating Raw Data for a Holistic Dataset View


[R package] Embedding Key Descriptive Statistics within Original Data (Feat. descriptivestat)

[R package] Calculate the responsiveness of each treatment relative to a control (Feat. deltactrl)


In my previous posts, I introduced two R packages. The first package, descriptivestat(), displays raw data along with mean values and additional descriptive statistics. The second package, deltactrl(), calculates the responsiveness of dependent variables in response to the control. Today, in this article, I will demonstrate how combining these two R packages allows us to easily create different types of figures to better understand the dataset.

First, I will upload a dataset.”

if (!require("rio")) install.packages("rio")
library(rio)
url= "https://github.com/agronomy4future/raw_data_practice/raw/main/wheat_grain_size_big_data.RData"
df= import(url)
df1=subset(df, fungicide!="N/A" & Genotype=="Peele")
df1= subset(df1, select = c(-fertilizer, -Shoot, -Length.mm., -Width.mm.))

print(tail(df1,5))
     Field Genotype Block fungicide planting_date Area.mm2.
.
.
.
96315 South    Peele   III        No         early    13.687
96316 South    Peele   III        No         early    11.058
96317 South    Peele   III        No         early     9.154
96318 South    Peele    II        No          late    18.092
96319 South    Peele    II        No          late    18.092

This dataset contains 96,319 rows, making it difficult to perform certain calculations in Excel. In this case, R programming is an efficient tool for conducting data analysis. I will create a bar graph based on this dataset to analyze how the grain area differs at different planting dates, depending on whether fungicide is applied.

First, I will rename the variables and reorder them accordingly.

if (!require("dplyr")) install.packages("dplyr")
library(dplyr)

dataA= df1 %>%
              mutate (planting_date= case_when(
                      planting_date== "early"  ~ "Early",
                      planting_date== "normal"  ~ "Normal",
                      planting_date== "late"  ~ "Late",
                      TRUE ~ as.character(planting_date)
  ))

dataA$planting_date=factor(dataA$planting_date, levels=c("Early","Normal","Late")) 

print(head(dataA,5))
      Field Genotype Block fungicide planting_date Area.mm2.
.
.
.
96315 South    Peele   III        No         Early    13.687
96316 South    Peele   III        No         Early    11.058
96317 South    Peele   III        No         Early     9.154
96318 South    Peele    II        No          Late    18.092
96319 South    Peele    II        No          Late    18.092

and I will summarize the dataset to create a bar graph.

if (!require("dplyr")) install.packages("dplyr")
library(dplyr)

dataB= data.frame(dataA %>%
                    group_by(fungicide, planting_date) %>%
                    dplyr::summarize(across(c(Area.mm2.), 
                                            .fns= list(Mean=~mean(., na.rm= TRUE), 
                                                       SD= ~sd(., na.rm= TRUE), 
                                                       n=~length(.),
                                                       se=~sd(.,na.rm= TRUE) / sqrt(length(.))))))

print(dataB)        
fungicide  planting_date  Area.mm2._Mean  Area.mm2._SD  Area.mm2._n  Area.mm2._se
No         Early          13.97312        2.479379      2054      0.05470698
No         Normal         15.78524        3.087877      2217      0.06558086
No         Late           17.65699        2.787904      1628      0.06909563
Yes        Early          14.23392        2.569342      2243      0.05425093
Yes        Normal         16.87129        2.610428      2434      0.05291166
Yes        Late           17.59145        2.746128      1636      0.06789364

Okay! Let’s create a bar graph.

if(!require("ggplot2")) install.packages("ggplot2")
library(ggplot2)

Fig1=ggplot(data=dataB, aes(x=planting_date, y=Area.mm2._Mean, fill=fungicide)) +
  geom_bar(stat="identity", position="dodge", width=0.91, size=1) +
  geom_errorbar(aes(ymin=Area.mm2._Mean-Area.mm2._SD, ymax=Area.mm2._Mean+Area.mm2._SD),
                position=position_dodge(0.9), width=0.5) +
  scale_fill_manual(name="Fungicide", values= c("grey35", "grey75")) +

  geom_text(aes(family="serif", x=0.8, y=18, label="e"), size=5, color="black") +
  geom_text(aes(family="serif", x=1.2, y=19, label="d"), size=5, color="black") +
  geom_text(aes(family="serif", x=1.8, y=20, label="c"), size=5, color="black") +
  geom_text(aes(family="serif", x=2.2, y=21, label="b"), size=5, color="black") +
  geom_text(aes(family="serif", x=2.8, y=23, label="a"), size=5, color="black") +
  geom_text(aes(family="serif", x=3.2, y=23, label="a"), size=5, color="black") +
  geom_text(aes(family="serif", x=2, y=25, label="***"), size=6, color="red") +
  scale_y_continuous(breaks = seq(0, 30, 5), limits = c(0, 30)) +
  labs(x="", y="Yield") +
  theme_classic(base_size= 15, base_family = "serif") +
  theme(legend.position=c(0.15,0.87),
        legend.title= element_text(family="serif", size=14, color="black"),
        legend.key=element_rect(color="white", fill=alpha(0.5)),
        legend.text=element_text(family="serif", face="plain",
                                 size=13, color="black"),
        legend.background= element_rect(fill=alpha(0.5)),
        panel.border= element_rect(color="black", fill=NA, linewidth=0.5),
        axis.line= element_line(linewidth= 0.5, colour= "black"),
        strip.background=element_rect(color="white", 
                                      linewidth=0.5, linetype="solid"))

Fig1+windows(width=5.5, height= 5)

ggsave("C:/Users/agron/Fig1.jpg", 
       Fig1, width=9*2.54, height=5*2.54, units="cm", dpi=1000)

This figure shows how grain area differs at various planting dates, with or without fungicide application. At later planting dates, grain area tends to be greater (the error bar represents the standard deviation).

While this figure is useful, if our primary interest is not the grain area differences but rather the effect of fungicide, it would be better to focus on the responsiveness of grain area to fungicide, rather than displaying the grain area itself.

Additionally, to provide a more comprehensive view of the dataset, using raw data points instead of mean values would better capture the variation in the data.

The two R packages, descriptivestat() and deltactrl(), make it easy to conduct such data analysis.

First, let’s load the necessary libraries.

if(!require(remotes)) install.packages("remotes")
library(remotes)

# deltactrl
if (!requireNamespace("deltactrl", quietly = TRUE)) {
remotes::install_github("agronomy4future/deltactrl", force= TRUE)
}
library(deltactrl)

# descriptivestat
if (!requireNamespace("descriptivestat", quietly = TRUE)) {
  remotes::install_github("agronomy4future/descriptivestat", force= TRUE)
}
library(descriptivestat)

I will calculate the responsiveness of grain area in response to fungicide application. The ‘fungicide’ column contains two levels: ‘Yes’ (fungicide application) and ‘No’ (no fungicide). I will use ‘No’ as the control (baseline), and the responsiveness will be calculated as (Treatment - Control) / Control, which translates to (Yes - No) / No.

dataC= deltactrl(
  data= dataA,
  group_vars= c("planting_date"),
  treatment_var= fungicide,
  control_label= No,
  response_vars= c("Area.mm2.")
)

print(tail(dataC,5))
Field Genotype Block fungicide planting_date Area.mm2. responsive_Area.mm2.
.
.
.
South Peele    III   No        Early          13.7      NA
South Peele    III   No        Early          11.1      NA
South Peele    III   No        Early          9.15      NA
South Peele    II    No        Late           18.1      NA
South Peele    II    No        Late           18.1      NA

In this code, you can designate the control by setting control_label= No. You can also calculate responsiveness by grouping the data with group_vars = c('planting_date'). With these conditions, I’ll calculate the responsiveness of the grain area (Area.mm2).

Responsiveness will be calculated by adding a new column: responsive_Area.mm2. This column indicates how the grain area changes when fungicide is applied. Next, I’ll add descriptive statistics to dataC.

dataD= descriptivestat(data= dataC, group_vars= c("planting_date","fungicide"), 
                     value_vars= c("responsive_Area.mm2."), 
                     output_stats= c("sd"))

I added the standard deviation; output_stats = c('sd'), calculated by grouping the data by planting date and fungicide application. Since ‘no fungicide’ is regarded as the baseline, it is not necessary to include it in the final output. Therefore, I will delete it.

dataE= subset(dataD, fungicide!="No")

Let’s create the figure.

if(!require(ggplot2)) install.packages("ggplot2")
library(ggplot2)

Fig2= ggplot() +
  geom_jitter(data= subset(dataE, category=="observed"),
              aes(x= planting_date, y= responsive_Area.mm2., fill= planting_date, 
              shape=planting_date), 
              width=0.2, alpha=0.5,
              size=2, color="grey75") +
  
  geom_errorbar(data= subset(dataE, category=="mean"),
                aes(x= planting_date, ymin= responsive_Area.mm2.-sd.responsive_Area.mm2., 
                    ymax=responsive_Area.mm2.+sd.responsive_Area.mm2.),
                width= 0.1, color= "black") +
  
  geom_point(data= subset(dataE, category=="mean"),
             aes(x= planting_date, y= responsive_Area.mm2., fill= planting_date, 
             shape=planting_date),
             size=4, color="black", stroke=1.5) +
  
  geom_text(aes(family="serif", x=1, y=0.65, label="b"), size=5, color="black") +
  geom_text(aes(family="serif", x=2, y=0.7, label="a"), size=5, color="black") +
  geom_text(aes(family="serif", x=3, y=0.55, label="c"), size=5, color="black") +
  geom_text(aes(family="serif", x=2, y=0.85, label="***"), size=6, color="red") +
  scale_fill_manual(values= c("darkred", "grey35", "darkblue")) +
  scale_shape_manual(values = c(21,22,23)) +            
  geom_hline(yintercept=0, linetype="dashed", color="black", size=0.5) +
  scale_y_continuous(breaks=seq(-1,1,0.5), limits = c(-1,1)) +
  #facet_wrap(~ genotype, scales = "free") +
  labs(x= NULL, y="Responsiveness to 
  fungicide application") +
  theme_classic(base_size=18, base_family="serif") +
  theme(
    legend.position="none",
    legend.key=element_rect(color="white", fill="white"),
    legend.text=element_text(family="serif", face="plain",
                             size=15, color= "black"),
    legend.background=element_rect(fill=alpha("white", 0.5)),
    strip.background= element_rect(color="white", linewidth=0.5, linetype="solid"),
    panel.border= element_rect(color="black", fill=NA, linewidth=0.5),
    panel.grid.major= element_line(color="grey90", linetype="dashed"),
    axis.line= element_blank()   
  )

Fig2+windows(width=5.5, height= 5)

ggsave("C:/Users/agron/Fig2.jpg", 
       Fig2, width=9*2.54, height=5*2.54, units="cm", dpi=1000)

This figure shows how grain area responds to fungicide application at different planting dates. The dotted line represents no fungicide application, meaning that if the responsiveness is 0, there is no difference in grain area between fungicide and non-fungicide treatments. The figure indicates that at the normal planting date, grain area is most responsive to fungicide application.

If you copy and paste this entire code into your R console, you will generate the same figure shown above.

if (!require("rio")) install.packages("rio")
if(!require(remotes)) install.packages("remotes")
if (!requireNamespace("deltactrl", quietly = TRUE)) {
  remotes::install_github("agronomy4future/deltactrl", force= TRUE)
}
if (!requireNamespace("descriptivestat", quietly = TRUE)) {
  remotes::install_github("agronomy4future/descriptivestat", force= TRUE)
}
if (!require("dplyr")) install.packages("dplyr")
if(!require("ggplot2")) install.packages("ggplot2")

library(rio)
library(remotes)
library(deltactrl)
library(descriptivestat)
library(dplyr)
library(ggplot2)

url= "https://github.com/agronomy4future/raw_data_practice/raw/main/wheat_grain_size_big_data.RData"
df= import(url)
df1=subset(df, fungicide!="N/A" & Genotype=="Peele")
df1= subset(df1, select = c(-fertilizer, -Shoot, -Length.mm., -Width.mm.))

dataA= df1 %>%
  mutate (planting_date= case_when(
    planting_date== "early"  ~ "Early",
    planting_date== "normal"  ~ "Normal",
    planting_date== "late"  ~ "Late",
    TRUE ~ as.character(planting_date)
  ))

dataA$planting_date=factor(dataA$planting_date, levels=c("Early","Normal","Late")) 

dataC= deltactrl(
  data= dataA,
  group_vars= c("planting_date"),
  treatment_var= fungicide,
  control_label= No,
  response_vars= c("Area.mm2.")
)

dataD= descriptivestat(data= dataC, group_vars= c("planting_date","fungicide"), 
                       value_vars= c("responsive_Area.mm2."), 
                       output_stats= c("sd"))

dataE= subset(dataD, fungicide!="No")

Fig2= ggplot() +
   geom_jitter(data= subset(dataE, category=="observed"),
              aes(x= planting_date, y= responsive_Area.mm2., fill= planting_date, 
                  shape=planting_date), 
              width=0.2, alpha=0.5,
              size=2, color="grey75") +
  
  geom_errorbar(data= subset(dataE, category=="mean"),
                aes(x= planting_date, ymin= responsive_Area.mm2.-sd.responsive_Area.mm2., 
                    ymax=responsive_Area.mm2.+sd.responsive_Area.mm2.),
                width= 0.1, color= "black") +
  
  geom_point(data= subset(dataE, category=="mean"),
             aes(x= planting_date, y= responsive_Area.mm2., fill= planting_date, 
                 shape=planting_date),
             size=4, color="black", stroke=1.5) +
  
  geom_text(aes(family="serif", x=1, y=0.65, label="b"), size=5, color="black") +
  geom_text(aes(family="serif", x=2, y=0.7, label="a"), size=5, color="black") +
  geom_text(aes(family="serif", x=3, y=0.55, label="c"), size=5, color="black") +
  geom_text(aes(family="serif", x=2, y=0.85, label="***"), size=6, color="red") +
  scale_fill_manual(values= c("darkred", "grey35", "darkblue")) +
  scale_shape_manual(values = c(21,22,23)) +            
  geom_hline(yintercept=0, linetype="dashed", color="black", size=0.5) +
  scale_y_continuous(breaks=seq(-1,1,0.5), limits = c(-1,1)) +
  #facet_wrap(~ genotype, scales = "free") +
  labs(x= NULL, y="Responsiveness to 
  fungicide application") +
  theme_classic(base_size=18, base_family="serif") +
  theme(
    legend.position="none",
    legend.key=element_rect(color="white", fill="white"),
    legend.text=element_text(family="serif", face="plain",
                             size=15, color= "black"),
    legend.background=element_rect(fill=alpha("white", 0.5)),
    strip.background= element_rect(color="white", linewidth=0.5, linetype="solid"),
    panel.border= element_rect(color="black", fill=NA, linewidth=0.5),
    panel.grid.major= element_line(color="grey90", linetype="dashed"),
    axis.line= element_blank()   
  )

Fig2+windows(width=5.5, height= 5)

I created two different formats of the figure using the same dataset. The bar graph (left figure) is the typical format we use, while the raw data with mean graph (right figure) provides a different insight for analyzing the data.

What is more, calculating responsiveness can reduce the number of variables (in this figure, ‘No fungicide’ has been omitted). This approach is also useful when there are many variables that take up too much space in the figure.

■ code summary: https://github.com/agronomy4future/r_code/blob/main/%5BData_article%5D_Visualizing_Responsiveness_Integrating_Raw_Data_for_a_Holistic_Dataset_View.ipynb

We aim to develop open-source code for agronomy ([email protected])

© 2022 – 2025 https://agronomy4future.com – All Rights Reserved.

Last Updated: 09/06/2025

Comments are closed.