I downloaded the Gwern Branwen dataset here: https://www.gwern.net/DNM-archives
I'm trying to read the dataset in R and I'm having a lot of trouble. I tried to open one of the files in the dataset called "1776.tar.xz" and I think I "unzipped" it with untar() but I'm not getting anything past that.
untar("C:/User/user/Downloads/dnmarchives/1776.tar.xz",
files = NULL,
list = FALSE, exdir = ".",
compressed = "xz", extras = NULL, verbose = FALSE, restore_times = TRUE,
tar = Sys.getenv("TAR"))
Edit: Thanks for all of the comments so far! The code is in base R. I have multiple datasets that I downloaded from Gwern's website. I'm just trying to open one to explore.
Base R includes function untar. On my Ubuntu 19.10 running R 3.6.2, default installation, the following was enough.
fls <- list.files(pattern = "\\.xz")
untar(fls[1], verbose = TRUE)
Note.
In the question, "dataset" is singular but there were several datasets (plural) on that website. To download the files I used
args <- "--verbose rsync://78.46.86.149:873/dnmarchives/grams.tar.xz rsync://78.46.86.149:873/dnmarchives/grams-20150714-20160417.tar.xz ./"
cmd <- "rsync"
od <- getwd()
setwd('~/tmp')
system2(cmd, args)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With