Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert list (with multi elements) into string without turn to "c("xxx","xxx","xxx")" R

library(data.table)

# Target string to convert

DATE_DATA <- c("2015-01-02;2015-01-07;2021-05-02;2019-02-05",
"2017-08-02;2000-01-22;2003-03-07;2017-10-09",
"2013-08-02;2022-06-02;2012-03-15")

# Dataset
DT <- data.table(NAME = c("JOE","MARY","PAUL"),DATE = c(DATE_DATA))

Expected result -- convert DATE column in a new column call "period" like below: split + sorted decreasing = F + unique year

#  period
1: 2015,2019,2021
2: 2000,2003,2017
3: 2012,2013,2022

The approaches like below i have don't meet excepted result

# 1st approach -- RESULT : created column with class -- "list"

DT[,period:= lapply(strsplit(DT$DATE,";"),
                                 function(x) sort(unique(str_sub(x,1,4)),
                                                  decreasing = FALSE))]

# 2nd approach -- RESULT : created column with class -- "character" but value
#                          turn to "c("xxx", "xxx", "xxx")" , not expected 
#                          "xxx,xxx,xxx"

DT[,period:= as.character(paste(lapply(strsplit(DT$DATE,";"),
                             function(x) sort(unique(str_sub(x,1,4)),
                                              decreasing = FALSE)),collapse = ","))]

What step did i missed ? Thanks in advance

like image 525
rane Avatar asked Jan 19 '26 03:01

rane


2 Answers

For each DATE we can split the DATE column on ";", convert them into Date, extract the year using format, take the unique years and paste them together using toString.

DT$Period <- sapply(DT$DATE, function(x) 
         toString(sort(unique(format(as.Date(strsplit(x, ";")[[1]]), "%Y")))))
DT

#   NAME                                        DATE           Period
#1:  JOE 2015-01-02;2015-01-07;2021-05-02;2019-02-05 2015, 2019, 2021
#2: MARY 2017-08-02;2000-01-22;2003-03-07;2017-10-09 2000, 2003, 2017
#3: PAUL            2013-08-02;2022-06-02;2012-03-15 2012, 2013, 2022

We can reduce the as.Date and format step using the year function from the lubridate package which gives the same output.

library(lubridate)
DT$Period <- sapply(DT$DATE, function(x) 
                   toString(sort(unique(year(strsplit(x, ";")[[1]])))))

I am not a data.table expert but I think the thing which you are missing from your attempt is the grouping (by) parameter because currently it is giving you unique year from the entire DATE column, you need to specify that you need unique year for each row separately which is mentioned in the by argument.

DT[,period:= paste(sapply(strsplit(DATE,";"),
  function(x) sort(unique(substr(x,1,4)),)),collapse = ","), by = 1:nrow(DT)]

DT

#   NAME                                        DATE         period
#1:  JOE 2015-01-02;2015-01-07;2021-05-02;2019-02-05 2015,2019,2021
#2: MARY 2017-08-02;2000-01-22;2003-03-07;2017-10-09 2000,2003,2017
#3: PAUL            2013-08-02;2022-06-02;2012-03-15 2012,2013,2022
like image 141
Ronak Shah Avatar answered Jan 20 '26 15:01

Ronak Shah


We can do this using gsub and scan

DT[,  Period := toString(sort(unique(scan(text=gsub("-\\d+", 
               "", DATE), what = numeric(), sep=";")))), NAME]
DT
#   NAME                                        DATE           Period
#1:  JOE 2015-01-02;2015-01-07;2021-05-02;2019-02-05 2015, 2019, 2021
#2: MARY 2017-08-02;2000-01-22;2003-03-07;2017-10-09 2000, 2003, 2017
#3: PAUL            2013-08-02;2022-06-02;2012-03-15 2012, 2013, 2022

Or another option is tidyverse, where we reshape into 'long' format by splitting the 'DATE' at ;, grouped by 'NAME', summarise the 'Period' as the sorted year of the converted Date class (ymd), do a join with the original dataset and select the columns to the appropriate order (if needed)

library(tidyverse)
DT %>% 
   separate_rows(DATE, sep = ";") %>% 
   group_by(NAME) %>% 
   summarise(Period = toString(sort(unique(year(ymd(DATE)))))) %>% 
   right_join(DT) %>%
   select(names(DT), everything())
# A tibble: 3 x 3
#  NAME  DATE                                        Period                
#  <chr> <chr>                                       <chr>                 
#1 JOE   2015-01-02;2015-01-07;2021-05-02;2019-02-05 2015, 2019, 2021
#2 MARY  2017-08-02;2000-01-22;2003-03-07;2017-10-09 2000, 2003, 2017
#3 PAUL  2013-08-02;2022-06-02;2012-03-15            2012, 2013, 2022    
like image 40
akrun Avatar answered Jan 20 '26 15:01

akrun