Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How do I read an .tar.xz file?

Tags:

r

tar

xz

I downloaded the Gwern Branwen dataset here: https://www.gwern.net/DNM-archives

I'm trying to read the dataset in R and I'm having a lot of trouble. I tried to open one of the files in the dataset called "1776.tar.xz" and I think I "unzipped" it with untar() but I'm not getting anything past that.

untar("C:/User/user/Downloads/dnmarchives/1776.tar.xz",
  files = NULL,
  list = FALSE, exdir = ".",
  compressed = "xz", extras = NULL, verbose = FALSE, restore_times = TRUE,
  tar = Sys.getenv("TAR"))

Edit: Thanks for all of the comments so far! The code is in base R. I have multiple datasets that I downloaded from Gwern's website. I'm just trying to open one to explore.

like image 910
bob Avatar asked Nov 20 '25 15:11

bob


1 Answers

Base R includes function untar. On my Ubuntu 19.10 running R 3.6.2, default installation, the following was enough.

fls <- list.files(pattern = "\\.xz")
untar(fls[1], verbose = TRUE)

Note.
In the question, "dataset" is singular but there were several datasets (plural) on that website. To download the files I used

args <- "--verbose rsync://78.46.86.149:873/dnmarchives/grams.tar.xz rsync://78.46.86.149:873/dnmarchives/grams-20150714-20160417.tar.xz ./"
cmd <- "rsync"

od <- getwd()
setwd('~/tmp')

system2(cmd, args)
like image 152
Rui Barradas Avatar answered Nov 23 '25 05:11

Rui Barradas