When data is arranged, it can be structured either vertically (row-based) or horizontally (column-based). The choice depends on your preference for organizing data. However, when running statistics, data should be arranged row-based, as variables need to be in the same column. On the other hand, when calculating per variable, it is much easier to organize data column-based, allowing for simpler calculations. Regardless of the approach, well-organized data is essential, and the ability to restructure data is a valuable skill. Today, I will introduce how to automatically reshape data both vertically and horizontally using R.
Let’s upload a dataset in R.
if(!require(readr)) install.packages("readr")
library(readr)
github= paste0("https://raw.githubusercontent.com/agronomy4future/",
"raw_data_practice/refs/heads/main/",
"wheat_grain_Fe_uptake.csv")
dataA= data.frame(read_csv(url(github),show_col_types = FALSE))
head(dataA, 10) Location Season Genotype Reps Iron_ton_ha Stage Fe 1 East 2021 CV9 1 7.5833 Vegetative 0.36 2 East 2021 CV9 2 4.1246 Vegetative 0.32 3 East 2021 CV9 3 3.0873 Vegetative 0.39 4 East 2021 CV9 4 7.4287 Vegetative 0.37 5 East 2021 CV9 5 11.4825 Vegetative 0.36 6 East 2021 CV9 6 6.2228 Vegetative 0.36 7 East 2021 CV9 7 6.0621 Vegetative 0.32 8 East 2021 CV9 8 3.5002 Vegetative 0.33 9 East 2021 CV9 9 2.7580 Vegetative 0.37 10 East 2021 CV9 10 7.6637 Vegetative 0.33 . . .
Now, I’d like to move the variables in the ‘Stage‘ column to individual columns. I’ll use pivot_wider()
if(!require(dplyr)) install.packages("dplyr")
if(!require(tidyr)) install.packages("tidyr")
library(dplyr)
library(tidyr)
dataB= data.frame(dataA %>%
group_by(Location, Season, Genotype, Iron_ton_ha) %>%
pivot_wider(names_from= Stage, values_from=Fe))
head(dataB, 10) Location Season Genotype Reps Iron_ton_ha Vegetative Reproductive Maturity 1 East 2021 CV9 1 7.5833 0.36 0.26 0.07 2 East 2021 CV9 2 4.1246 0.32 0.22 0.07 3 East 2021 CV9 3 3.0873 0.39 0.22 0.06 4 East 2021 CV9 4 7.4287 0.37 0.24 0.06 5 East 2021 CV9 5 11.4825 0.36 0.28 0.16 6 East 2021 CV9 6 6.2228 0.36 0.30 0.07 7 East 2021 CV9 7 6.0621 0.32 0.29 0.16 8 East 2021 CV9 8 3.5002 0.33 0.30 0.10 9 East 2021 CV9 9 2.7580 0.37 0.27 0.09 10 East 2021 CV9 10 7.6637 0.33 0.30 0.15 . . .
This is perfectly transposed from a vertical to a horizontal structure, but it only works correctly when all values are fully grouped. When values are repeated without grouping, pivot_wider() does not work as expected.
if(!require(readr)) install.packages("readr")
library(readr)
github="https://raw.githubusercontent.com/agronomy4future/raw_data_practice/refs/heads/main/treatment_without_grouping.csv"
dataC= data.frame(read_csv(url(github), show_col_types= FALSE))
dataC=subset(dataC, select=c(-Biomass))
head(dataC, 10) Treatment Block Yield 1 Control I 75.3 2 Control I 45.4 3 Control I 92.2 4 Control I 32.5 5 Control I 85.1 6 Control I 69.6 7 Control I 84.1 8 Control I 41.5 9 Control I 38.4 10 Control II 76.9 . . .
This dataset contains Treatment and Block for Yield; however, as shown, there are 9 replicates per block. When using pivot_wider(), the structure breaks due to the lack of grouping for Block.
dataD= data.frame(dataC %>%
group_by(Treatment) %>%
pivot_wider(names_from= Block, values_from=Yield))
head(dataD, 10) Treatment 1 Control 2 Fertilizer I 1 75.3, 45.4, 92.2, 32.5, 85.1, 69.6, 84.1, 41.5, 38.4, 56.8, 44.8, 29.2, 66.1, 65.2, 48.2, 38.4, 61.8, 49.5 2 47.9, 27.7, 32.6, 29.9, 43.1, 45.2, 29.3, 33.6, 58.7, 49.7, 37.3, 53.3, 19.8, 31.9, 27.9, 40.6, 41.8, 18.1 II
In this case, we can modify the code as follows. I added a grouping variable named rep. When grouping by Treatment and rep, the data structure works correctly.
dataE= dataC %>%
group_by(Treatment, Block) %>%
mutate(rep= row_number()) %>%
ungroup()
head(dataE, 3) Treatment Block Yield rep 1 Control I 75.3 1 2 Control I 45.4 2 3 Control I 92.2 3 . . .
dataF= data.frame(dataE %>%
group_by(Treatment, rep) %>%
pivot_wider(names_from= Block, values_from=Yield))
head(dataF, 15)
Treatment rep I II III IV
1 Control 1 75.3 76.9 101.5 14.3
2 Control 2 45.4 70.4 63.7 58.1
3 Control 3 92.2 83.8 80.7 69.8
4 Control 4 32.5 36.3 92.6 56.7
5 Control 5 85.1 62.1 78.4 99.7
6 Control 6 69.6 92.0 69.5 39.5
7 Control 7 84.1 71.0 93.2 72.0
8 Control 8 41.5 67.6 40.4 62.3
9 Control 9 38.4 131.4 73.3 142.6
10 Fertilizer 1 47.9 44.2 34.1 30.8
11 Fertilizer 2 27.7 26.0 35.6 25.7
12 Fertilizer 3 32.6 23.4 28.7 28.3
13 Fertilizer 4 29.9 33.7 37.6 32.5
14 Fertilizer 5 43.1 44.2 36.8 31.4
15 Fertilizer 6 45.2 49.6 46.6 19.5
.
.
.

Let’s change it as vertically again.
dataG= data.frame(dataF %>%
pivot_longer(
cols= c(I,II,III,IV),
names_to= "Block",
values_to= "Yield"))
head(dataG, 10) Treatment rep Block Yield 1 Control 1 I 75.3 2 Control 1 II 76.9 3 Control 1 III 101.5 4 Control 1 IV 14.3 5 Control 2 I 45.4 6 Control 2 II 70.4 7 Control 2 III 63.7 8 Control 2 IV 58.1 9 Control 3 I 92.2 10 Control 3 II 83.8 . . .

We aim to develop open-source code for agronomy ([email protected])
© 2022 – 2025 https://agronomy4future.com – All Rights Reserved.
Last Updated: 12/21/2025