I am working with large data sets and often switch between my work station and laptop. Saving a workspace image to .RData is for me the most natural and convenient way, so this is the file that I want to synchronize between the two computers.
Unfortunately, it tends to be rather big (a few GB), so efficient synchronisation either requires me to connect my laptop with a cable or moving the files with a USB stick. If I forgot to synchronize my laptop when I was next to my workstation, it takes me hours to make sure everything is synchronized.
The largest objects, however, change relatively rarely (although I constantly work with them). I could save them to another file, and then delete them before saving the session and load them after restoring the session. This would work, but would be extremely annoying. Also, I would have to remember to save them whenever they are modified. It would soon end up being a total mess.
Is there more efficient way of dealing with such large data chunks?
For example, my problem would be solved if there was an alternative format to .RData -- one in which .RData is a directory, and files in that directory are objects to be loaded.
You can use saveRDS:
objs.names <- ls()
objs <- mget(objs.names)
invisible(
lapply(
seq_along(objs),
function(x) saveRDS(objs[[x]], paste0("mydatafolder/", objs.names[[x]], ".rds"))
) )
This will save every object in your session to the "mydatafolder" folder as a separate file (make sure to create the folder before hand).
Unfortunately, this will modify the timestamps of all objects, you can't rely on rsync. You could first read the objects in with readRDS, see which ones have changed with identical, and only run the lapply above on the changed objects so you can then use something like rsync.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With