Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

combine two columns while using read_table

Tags:

r

readr

I have many files with the format

datetime variable value
Date Time name xxx.y

e.g.

datetime variable value
2011-01-01 01:00:00 precipitation 10.0
2011-01-01 02:00:00 precipitation 5.0
2011-01-01 03:00:00 precipitation 0.0

I am using readr::read_table to read them.

However the problem is that the header has three columns but the data has four.

For instance, reading the data

ex_txt="datetime variable value
2011-01-01 01:00:00 precipitation 10.0
2011-01-01 02:00:00 precipitation 5.0
2011-01-01 03:00:00 precipitation 0.0"

with

read_table(ex_txt)

returns get an error since read_table finds three labels in the first row and then data rows have four columns.

There is an obvious work-around to skip the first row, and then use a sequence of pipes to get the data in the correct form.

For instance the following solves my problem

read_table(ex_txt,skip=1,col_names = c("date","time","variable","value")) %>%
mutate(datetime=lubridate::ydm_hms(paste(date,time))) %>% 
select(datetime,variable,value)

However, as I don't like the skip=1 part, I am curious to know if anyone has a way to make read_table read the three headers and then combine the first two columns (Date & Time) into one DateTime column. I have been experimenting with col_types but to no avail.

Thanks

Halldór

like image 747
stuttungr Avatar asked Oct 20 '25 19:10

stuttungr


1 Answers

An option using readr::read_fwf, since the datetime at the beginning should always be the same length, ending at position 19. Position 21 should then be the beginning of the rest of the line.

library(readr)
library(tidyr)

read_fwf(ex_txt, fwf_cols(datetime = 19, X = 21), skip = 1) %>% 
  separate_wider_delim(X, " ", names = c("variable", "value"))
# A tibble: 3 × 3
  datetime            variable      value
  <dttm>              <chr>         <chr>
1 2011-01-01 01:00:00 precipitation 10.0 
2 2011-01-01 02:00:00 precipitation 5.0  
3 2011-01-01 03:00:00 precipitation 0.0

With base R try surrounding the space delimited text with '

ex_txt <- "datetime variable value
'2011-01-01 01:00:00' precipitation 10.0
'2011-01-01 02:00:00' precipitation 5.0
'2011-01-01 03:00:00' precipitation 0.0"

and reading with argument text

read.table(text = ex_txt, header = T)
             datetime      variable value
1 2011-01-01 01:00:00 precipitation    10
2 2011-01-01 02:00:00 precipitation     5
3 2011-01-01 03:00:00 precipitation     0

Note that you still have to do things like as.POSIXct for column datetime

In case the text can't be manipulated easily by hand try gsub, e.g.

ex_txt <- "datetime variable value
2011-01-01 01:00:00 precipitation 10.0
2011-01-01 02:00:00 precipitation 5.0
2011-01-01 03:00:00 precipitation 0.0"
read.table(text = gsub(":(\\d{2}) ", ":\\1' ", 
                    gsub("\n20", "\n'20", ex_txt)), header = T)
             datetime      variable value
1 2011-01-01 01:00:00 precipitation    10
2 2011-01-01 02:00:00 precipitation     5
3 2011-01-01 03:00:00 precipitation     0
like image 148
Andre Wildberg Avatar answered Oct 22 '25 08:10

Andre Wildberg



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!