Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to parse array of arrays from string in column of csv/dataframe

I have a csv file that contains a numpy array in one column. When reading the csv file, the resulting column will be of type character, since it's all wrapped in a string. I'd like to parse it into a separate dataframe to analyse the data.

Input data

As csv:

first_column,second_column
a,"[[1,2],[3,4]]"
b,"[[5,6],[7,8]]"
c,"[[9,10],[11,12]]"

As dataframe:

df <- data.frame(first_column  = c("a","b","c"),
                 second_column = c("[[1,2],[3,4]]","[[5,6],[7,8]]","[[9,10],[11,12]]"))

What I have tried

Since I am unaware of any direct parsing function that could extract arrays from strings, I have started out doing string manipulation.

Remove the outer [] characters:

> df %>% mutate(second_column = str_replace_all(second_column, c("^\\[" = "","]$" = "")))
  first_column  second_column
1            a    [1,2],[3,4]
2            b    [5,6],[7,8]
3            c [9,10],[11,12]

However, from now on I don't know how to proceed.

Expected output

The resulting dataframe should look like this in the end:

  col_1 col_2
1     1     2
2     3     4
3     5     6
4     7     8
5     9    10
6    11    12

Note that there are more columns and more rows in the real dataframe

like image 568
94621 Avatar asked Dec 08 '25 06:12

94621


1 Answers

Replace occurrences of ],[ with newline, replace square brackets with spaces and use read.table to read that.

df$second_column |>
  gsub("\\],\\[", "\n", x = _) |>
  chartr("[]", "  ", x = _) |>
  read.table(text = _, sep = ",")

giving:

  V1 V2
1  1  2
2  3  4
3  5  6
4  7  8
5  9 10
6 11 12
like image 51
G. Grothendieck Avatar answered Dec 09 '25 19:12

G. Grothendieck



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!