Simplify Your Data Cleaning: Replace Text in R

if(!require(readr)) install.packages("readr")
if(!require(dplyr)) install.packages("dplyr")
if(!require(tidyr)) install.packages("tidyr")
library (readr)
library(dplyr)
library(tidyr)

github= paste0("https://raw.githubusercontent.com/",
               "agronomy4future/raw_data_practice/",
               "main/yield_per_location.csv")
df=data.frame(read_csv(url(github),show_col_types= FALSE))

df.transpose= data.frame(df %>%
       group_by(Genotype, Nitrogen, Block) %>%
       pivot_longer(
       cols= c(Location1, Location2, Location3, Location4, Location5, 
             Location6, Location7, Location8, Location9, Location10, 
             Location11, Location12),
       names_to= "Location",
       values_to= "Yield"))

print(head(df.transpose, 12))
   Genotype Nitrogen Block   Location Yield
1       CV1       N0     I  Location1  98.0
2       CV1       N0     I  Location2  96.5
3       CV1       N0     I  Location3 115.8
4       CV1       N0     I  Location4  94.1
5       CV1       N0     I  Location5  82.8
6       CV1       N0     I  Location6 115.8
7       CV1       N0     I  Location7 110.0
8       CV1       N0     I  Location8  97.9
9       CV1       N0     I  Location9 107.6
10      CV1       N0     I Location10 128.6
11      CV1       N0     I Location11  74.3
12      CV1       N0     I Location12 121.3
.
.
.

Here is a dataset. I want to replace specific text values. First, I want to change “CV1” to “Genotype1”.

df.transpose= df.transpose %>%
     mutate(Genotype = gsub("CV1", "Genotype1", Genotype))

print(head(df.transpose, 5))       
   Genotype Nitrogen Block  Location Yield
1 Genotype1       N0     I Location1  98.0
2 Genotype1       N0     I Location2  96.5
3 Genotype1       N0     I Location3 115.8
4 Genotype1       N0     I Location4  94.1
5 Genotype1       N0     I Location5  82.8

In the same way, let’s change “Location1” to “Site1”.

df.transpose= df.transpose %>%
      mutate(Location= gsub("Location1", "Site1", Location))

print(head(df.transpose, 5)) 
   Genotype Nitrogen Block  Location Yield
1 Genotype1       N0     I     Site1  98.0
2 Genotype1       N0     I Location2  96.5
3 Genotype1       N0     I Location3 115.8
4 Genotype1       N0     I Location4  94.1
5 Genotype1       N0     I Location5  82.8

However, there is a catch. If the sequence goes up to CV10 or Location10, we would have to write this code 10 times. To avoid this kind of tedious, repetitive work, I suggest using the following code.

dataA= df.transpose %>%
        mutate(
         Site = as.numeric(gsub("Location", "", Location)),
         SiteInfo = gsub("Location", "Site", Location)
         )

 print(head(dataA, 5))    
  Genotype Nitrogen Block  Location Yield Site SiteInfo
1      CV1       N0     I Location1  98.0    1    Site1
2      CV1       N0     I Location2  96.5    2    Site2
3      CV1       N0     I Location3 115.8    3    Site3
4      CV1       N0     I Location4  94.1    4    Site4
5      CV1       N0     I Location5  82.8    5    Site5

This code creates a new dataframe called dataA by taking df.transpose and applying mutate() to add two new columns. The first column, Site, is created by removing the word "Location" from each entry in the Location column using gsub() and then converting the remaining numeric string to an actual number using as.numeric(), so for example "Location3" becomes simply 3. The second column, SiteInfo, is created by replacing the word "Location" with "Site" in each entry of the Location column, so for example "Location3" becomes "Site3". The original Location column is kept intact, and the two new columns are added alongside it in the resulting dataframe dataA.

In the same way, let’s change “CV” to “Genotype”.

dataA= df.transpose %>%
        mutate(
         CV = as.numeric(gsub("CV", "", Genotype)),
         CVInfo = gsub("CV", "Genotype", Genotype)
         )

 print(tail(dataA, 5))  
    Genotype Nitrogen Block   Location Yield CV    CVInfo
320      CV3       N2   III  Location8 117.9  3 Genotype3
321      CV3       N2   III  Location9 129.6  3 Genotype3
322      CV3       N2   III Location10 154.8  3 Genotype3
323      CV3       N2   III Location11  89.5  3 Genotype3
324      CV3       N2   III Location12 146.0  3 Genotype3

We aim to develop open-source code for agronomy ([email protected])

© 2022 – 2025 https://agronomy4future.com – All Rights Reserved.

Last Updated: 05/24/2026