[R Package] Convert Data into Code Instantly – Save as a Script with One Line

[R Package] Convert Data into Code Instantly – Save as a Script with One Line



When uploading data to R, we sometimes worry about losing track of the data over time. This is because we save data in different folders according to various projects, and we might forget where we stored it. Additionally, if the file path changes, it can be difficult to upload the data directly and locate its current location.

Therefore, a better approach is to save the data as code, allowing us to access it directly when opening the R file where the code is stored. The most common method for converting data to code is by using dput(), but there are also other ways to achieve this in R. For more details, please check the post below.


How to convert an uploaded data to code in R?


Let’s talk in detail with an actual dataset.

if(!require(readr)) install.packages("readr")
library(readr)

github= "https://raw.githubusercontent.com/agronomy4future/raw_data_practice/main/fertilizer_treatment.csv"
dataA= data.frame(read_csv(url(github),show_col_types = FALSE))

print(head(dataA,5))
    Genotype Block variable value
1 Genotype_A     I  Control  42.9
2 Genotype_A    II  Control  41.6
3 Genotype_A   III  Control  28.9
4 Genotype_A    IV  Control  30.8
5 Genotype_B     I  Control  53.3
.
.
.

Here is a dataset called dataA. If you upload this data from an Excel file on your PC, you need to save the file and specify the path every time you upload it to R. To avoid this, I want to save the dataset as code so that it can be directly stored in an R script and easily accessed without needing to reload the original file.

To achieve this, I used dput().

dput(dataA)

structure(list(Genotype = c("Genotype_A", "Genotype_A", "Genotype_A", 
"Genotype_A", "Genotype_B", "Genotype_B", "Genotype_B", "Genotype_B", 
"Genotype_C", "Genotype_C", "Genotype_C", "Genotype_C", "Genotype_D", 
"Genotype_D", "Genotype_D", "Genotype_D", "Genotype_A", "Genotype_A", 
"Genotype_A", "Genotype_A", "Genotype_B", "Genotype_B", "Genotype_B", 
"Genotype_B", "Genotype_C", "Genotype_C", "Genotype_C", "Genotype_C", 
"Genotype_D", "Genotype_D", "Genotype_D", "Genotype_D", "Genotype_A", 
"Genotype_A", "Genotype_A", "Genotype_A", "Genotype_B", "Genotype_B", 
"Genotype_B", "Genotype_B", "Genotype_C", "Genotype_C", "Genotype_C", 
"Genotype_C", "Genotype_D", "Genotype_D", "Genotype_D", "Genotype_D", 
"Genotype_A", "Genotype_A", "Genotype_A", "Genotype_A", "Genotype_B", 
"Genotype_B", "Genotype_B", "Genotype_B", "Genotype_C", "Genotype_C", 
"Genotype_C", "Genotype_C", "Genotype_D", "Genotype_D", "Genotype_D", 
"Genotype_D"), Block = c("I", "II", "III", "IV", "I", "II", "III", 
"IV", "I", "II", "III", "IV", "I", "II", "III", "IV", "I", "II", 
"III", "IV", "I", "II", "III", "IV", "I", "II", "III", "IV", 
"I", "II", "III", "IV", "I", "II", "III", "IV", "I", "II", "III", 
"IV", "I", "II", "III", "IV", "I", "II", "III", "IV", "I", "II", 
"III", "IV", "I", "II", "III", "IV", "I", "II", "III", "IV", 
"I", "II", "III", "IV"), variable = c("Control", "Control", "Control", 
"Control", "Control", "Control", "Control", "Control", "Control", 
"Control", "Control", "Control", "Control", "Control", "Control", 
"Control", "Fertilizer1", "Fertilizer1", "Fertilizer1", "Fertilizer1", 
"Fertilizer1", "Fertilizer1", "Fertilizer1", "Fertilizer1", "Fertilizer1", 
"Fertilizer1", "Fertilizer1", "Fertilizer1", "Fertilizer1", "Fertilizer1", 
"Fertilizer1", "Fertilizer1", "Fertilizer2", "Fertilizer2", "Fertilizer2", 
"Fertilizer2", "Fertilizer2", "Fertilizer2", "Fertilizer2", "Fertilizer2", 
"Fertilizer2", "Fertilizer2", "Fertilizer2", "Fertilizer2", "Fertilizer2", 
"Fertilizer2", "Fertilizer2", "Fertilizer2", "Fertilizer3", "Fertilizer3", 
"Fertilizer3", "Fertilizer3", "Fertilizer3", "Fertilizer3", "Fertilizer3", 
"Fertilizer3", "Fertilizer3", "Fertilizer3", "Fertilizer3", "Fertilizer3", 
"Fertilizer3", "Fertilizer3", "Fertilizer3", "Fertilizer3"), 
    value = c(42.9, 41.6, 28.9, 30.8, 53.3, 69.6, 45.4, 35.1, 
    62.3, 58.5, 44.6, 50.3, 75.4, 65.6, 54, 52.7, 53.8, 58.5, 
    43.9, 46.3, 57.6, 69.6, 42.4, 51.9, 63.4, 50.4, 45, 46.7, 
    70.3, 67.3, 57.6, 58.5, 49.5, 53.8, 40.7, 39.4, 59.8, 65.8, 
    41.4, 45.4, 64.5, 46.1, 62.6, 50.3, 68.8, 65.3, 45.6, 51, 
    44.4, 41.8, 28.3, 34.7, 64.1, 57.4, 44.1, 51.6, 63.6, 56.1, 
    52.7, 51.8, 71.6, 69.4, 56.6, 47.4)), class = "data.frame", row.names = c(NA, 
-64L))

However, the outputted code is spread across multiple lines, taking up a lot of space. When I copy and paste it into an R script or Google Colab, it looks messy and unorganized.



datazip() package

Therefore, I want a simple one-line code format. The datazip() package I developed is a useful tool for converting data into a single-line code format.

First, let’s import the package.

if(!require(remotes)) install.packages("remotes")
if (!requireNamespace("datazip", quietly = TRUE)) {
   remotes::install_github("agronomy4future/datazip", force= TRUE)
}
library(remotes)
library(datazip)

If you run ?datazip, you can view its description.

First, I’ll convert the data into a single-line code.

datazip(dataA)
structure(list(Genotype=c("Genotype_A","Genotype_A","Genotype_A","Genotype_A","Genotype_B","Genotype_B","Genotype_B","Genotype_B","Genotype_C","Genotype_C","Genotype_C","Genotype_C","Genotype_D","Genotype_D","Genotype_D","Genotype_D","Genotype_A","Genotype_A","Genotype_A","Genotype_A","Genotype_B","Genotype_B","Genotype_B","Genotype_B","Genotype_C","Genotype_C","Genotype_C","Genotype_C","Genotype_D","Genotype_D","Genotype_D","Genotype_D","Genotype_A","Genotype_A","Genotype_A","Genotype_A","Genotype_B","Genotype_B","Genotype_B","Genotype_B","Genotype_C","Genotype_C","Genotype_C","Genotype_C","Genotype_D","Genotype_D","Genotype_D","Genotype_D","Genotype_A","Genotype_A","Genotype_A","Genotype_A","Genotype_B","Genotype_B","Genotype_B","Genotype_B","Genotype_C","Genotype_C","Genotype_C","Genotype_C","Genotype_D","Genotype_D","Genotype_D","Genotype_D"),Block=c("I","II","III","IV","I","II","III","IV","I","II","III","IV","I","II","III","IV","I","II","III","IV","I","II","III","IV","I","II","III","IV","I","II","III","IV","I","II","III","IV","I","II","III","IV","I","II","III","IV","I","II","III","IV","I","II","III","IV","I","II","III","IV","I","II","III","IV","I","II","III","IV"),variable=c("Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Control","Fertilizer1","Fertilizer1","Fertilizer1","Fertilizer1","Fertilizer1","Fertilizer1","Fertilizer1","Fertilizer1","Fertilizer1","Fertilizer1","Fertilizer1","Fertilizer1","Fertilizer1","Fertilizer1","Fertilizer1","Fertilizer1","Fertilizer2","Fertilizer2","Fertilizer2","Fertilizer2","Fertilizer2","Fertilizer2","Fertilizer2","Fertilizer2","Fertilizer2","Fertilizer2","Fertilizer2","Fertilizer2","Fertilizer2","Fertilizer2","Fertilizer2","Fertilizer2","Fertilizer3","Fertilizer3","Fertilizer3","Fertilizer3","Fertilizer3","Fertilizer3","Fertilizer3","Fertilizer3","Fertilizer3","Fertilizer3","Fertilizer3","Fertilizer3","Fertilizer3","Fertilizer3","Fertilizer3","Fertilizer3"),value=c(42.9,41.6,28.9,30.8,53.3,69.6,45.4,35.1,62.3,58.5,44.6,50.3,75.4,65.6,54,52.7,53.8,58.5,43.9,46.3,57.6,69.6,42.4,51.9,63.4,50.4,45,46.7,70.3,67.3,57.6,58.5,49.5,53.8,40.7,39.4,59.8,65.8,41.4,45.4,64.5,46.1,62.6,50.3,68.8,65.3,45.6,51,44.4,41.8,28.3,34.7,64.1,57.4,44.1,51.6,63.6,56.1,52.7,51.8,71.6,69.4,56.6,47.4)),class="data.frame",row.names=c(NA,-64L)) 

The code is generated as a single line. When copying and pasting it into an R script or Google Colab, it takes up much less space.

Eventually, this code enables the simple conversion of data into a single-line code, making it easy to save as a script.



However, there is one limitation—saving large datasets as code can be challenging. Let’s upload a large dataset to explore this issue.

if(!require(readr)) install.packages("readr")
library(readr)

github="https://raw.githubusercontent.com/agronomy4future/raw_data_practice/main/wheat_grains_data_training.csv"
dataB=data.frame(read_csv(url(github),show_col_types= FALSE))

print(tail(dataB,5))
      Field Genotype Block fungicide planting_date fertilizer   Shoot Length.mm. Width.mm.
96315 South    Peele   III        No         early        N/A Tillers      5.951     2.987
96316 South    Peele   III        No         early        N/A Tillers      5.614     2.687
96317 South    Peele   III        No         early        N/A Tillers      5.674     2.210
96318 South    Peele    II        No          late        N/A Tillers      6.041     2.138
96319 South    Peele    II        No          late        N/A Tillers      6.041     2.138
      Area.mm2.
96315    13.687
96316    11.058
96317     9.154
96318    18.092
96319    18.092

This dataset contains 96,319 rows, and when converted into code, it becomes too large to display in an R script.

Let’s try it!

datazip(dataB)

The generated code is too long to copy and paste, making it difficult to scroll through in the R console. There is no practical benefit to saving such a large code block in your R script. In this case, a better approach is to export the code to R file instead.

So, I’ll use the code below to designate that the code will be saved as an .r file, with "output="data_name.r". I exported this code to my PC ("C:/Users/Desktop") as an R file named dataB_output.

setwd("C:/Users/Desktop") # set up the pathway to save the file
datazip(dataB, output="dataB_output.r")

After the code is downloaded to my PC, when I open the file, you might see the following message:

This code is too large to open in the source editor. However, dataB_output.r is still saved as code.

Later, if I want to open this code as a data table, we can use the R package dataunzip(), which I developed as a counterpart to datazip().

setwd("C:/Users/Desktop") # set up the pathway to import the file

dataB_recovered= dataunzip("dataB_output.r")
print(tail(dataB_recovered, 5))

      Field Genotype Block fungicide planting_date fertilizer
96315 South    Peele   III        No         early        N/A
96316 South    Peele   III        No         early        N/A
96317 South    Peele   III        No         early        N/A
96318 South    Peele    II        No          late        N/A
96319 South    Peele    II        No          late        N/A
        Shoot Length.mm. Width.mm. Area.mm2.
96315 Tillers      5.951     2.987    13.687
96316 Tillers      5.614     2.687    11.058
96317 Tillers      5.674     2.210     9.154
96318 Tillers      6.041     2.138    18.092
96319 Tillers      6.041     2.138    18.092
.
.
.

Now the code is recovered as a data table again! From now on, you can save your data as code instead of as an Excel file, which might be modified when opened or moved.

If you want to save it in .rds format (not as code, but as a single R object), you can use the code below.

# to save data as .rds 
datazip(df, output="dataB_output.rds") 

# to import it to R as data frame 
dataB_recovered= dataunzip("dataB_output.rds")

print(tail(dataB_recovered, 5))

This datazip() and dataunzip() package allows for the simple conversion of data into code with a single line, making it easy to save as a script.

Github: https://github.com/agronomy4future/datazip

We aim to develop open-source code for agronomy ([email protected])

© 2022 – 2025 https://agronomy4future.com – All Rights Reserved.

Last Updated: 06/03/2025

Your donation will help us create high-quality content.
PayPal @agronomy4furure / Venmo @agronomy4furure / Zelle @agronomy4furure

Comments are closed.