How to parse array of arrays from string in column of csv/dataframe

Question

I have a csv file that contains a numpy array in one column. When reading the csv file, the resulting column will be of type character, since it's all wrapped in a string. I'd like to parse it into a separate dataframe to analyse the data.

Input data

As csv:

first_column,second_column
a,"[[1,2],[3,4]]"
b,"[[5,6],[7,8]]"
c,"[[9,10],[11,12]]"

As dataframe:

df <- data.frame(first_column  = c("a","b","c"),
                 second_column = c("[[1,2],[3,4]]","[[5,6],[7,8]]","[[9,10],[11,12]]"))

What I have tried

Since I am unaware of any direct parsing function that could extract arrays from strings, I have started out doing string manipulation.

Remove the outer [] characters:

> df %>% mutate(second_column = str_replace_all(second_column, c("^\[" = "","]$" = "")))
  first_column  second_column
1            a    [1,2],[3,4]
2            b    [5,6],[7,8]
3            c [9,10],[11,12]

However, from now on I don't know how to proceed.

Expected output

The resulting dataframe should look like this in the end:

  col_1 col_2
1     1     2
2     3     4
3     5     6
4     7     8
5     9    10
6    11    12

Note that there are more columns and more rows in the real dataframe

G. Grothendieck · Accepted Answer

Replace occurrences of ],[ with newline, replace square brackets with spaces and use read.table to read that.

df$second_column |>
  gsub("\],\[", "\n", x = _) |>
  chartr("[]", "  ", x = _) |>
  read.table(text = _, sep = ",")

giving:

How to parse array of arrays from string in column of csv/dataframe

Tags:

arrays

dataframe

r

csv

nested-lists

Input data

What I have tried

Expected output

94621

1 Answers

G. Grothendieck

Recent Activity

Donate For Us

How to parse array of arrays from string in column of csv/dataframe

Tags:

arrays

dataframe

r

csv

nested-lists

Input data

What I have tried

Expected output

94621

1 Answers

G. Grothendieck

Related questions

Recent Activity

Donate For Us