I have a csv file that contains a numpy array in one column. When reading the csv file, the resulting column will be of type character, since it's all wrapped in a string. I'd like to parse it into a separate dataframe to analyse the data.
As csv:
first_column,second_column
a,"[[1,2],[3,4]]"
b,"[[5,6],[7,8]]"
c,"[[9,10],[11,12]]"
As dataframe:
df <- data.frame(first_column = c("a","b","c"),
second_column = c("[[1,2],[3,4]]","[[5,6],[7,8]]","[[9,10],[11,12]]"))
Since I am unaware of any direct parsing function that could extract arrays from strings, I have started out doing string manipulation.
Remove the outer [] characters:
> df %>% mutate(second_column = str_replace_all(second_column, c("^\\[" = "","]$" = "")))
first_column second_column
1 a [1,2],[3,4]
2 b [5,6],[7,8]
3 c [9,10],[11,12]
However, from now on I don't know how to proceed.
The resulting dataframe should look like this in the end:
col_1 col_2
1 1 2
2 3 4
3 5 6
4 7 8
5 9 10
6 11 12
Note that there are more columns and more rows in the real dataframe
Replace occurrences of ],[ with newline, replace square brackets with spaces and use read.table to read that.
df$second_column |>
gsub("\\],\\[", "\n", x = _) |>
chartr("[]", " ", x = _) |>
read.table(text = _, sep = ",")
giving:
V1 V2
1 1 2
2 3 4
3 5 6
4 7 8
5 9 10
6 11 12
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With