I am pretty new at using R and I have some data that I need to tidy a bit before I can use it. Basically I have a dataframe with a bunch of rows and columns and in every cell of this dataframe I have a string of 20 numbers of 1 and zeroes ("0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0").
Now I am trying to separate every number of a field having each number in a new column (1 field would be 20 columns). After that I would like to convert these newly separated strings into numbers. I will show a small sample of the data. Here I would need the numbers separated in 40 columns and 3 rows:
df<-data.frame(
"V1" = c("0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 ","0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ","1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 "),
"V2" = c("0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 ","0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 ","0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 "))
As you can see a good way to separate each number of a string would be treating the space as a delimiter, but I am not having any luck with that. I tried my luck with df<-lapply(strsplit(df, " "), as.numeric) but the dataframe can't be treated with this function. I tried then df<-lapply(strsplit(as.character(df), " "), as.numeric)
That way it separates correctly but making the full dataframe as a character messes up the data.
I suppose that it's easier than I think but I still lack skill in this code.
Easier option is read.table (no packages used)
read.table(text = as.character(df$V1), header = FALSE)
For multiple columns, use lapply
lapply(df, function(x) read.table(text = as.character(x), header = FALSE))
You can use cSplit from splitstackshape to convert multiple columns into separate columns.
splitstackshape::cSplit(df, names(df), " ")
# V1_01 V1_02 V1_03 V1_04 V1_05 V1_06 V1_07 V1_08 V1_09 V1_10 V1_11
#1: 0 0 0 0 0 0 0 0 0 0 0
#2: 0 0 0 1 0 0 0 0 0 0 0
#3: 1 0 0 0 0 0 0 0 0 0 0
# V1_12 V1_13 V1_14 V1_15 V1_16 V1_17 V1_18 V1_19 V1_20 V2_01 V2_02
#1: 0 0 0 1 0 0 0 0 0 0 0
#2: 0 0 0 0 0 0 0 0 0 0 0
#3: 0 0 0 0 0 0 0 0 0 0 0
# V2_03 V2_04 V2_05 V2_06 V2_07 V2_08 V2_09 V2_10 V2_11 V2_12 V2_13
#1: 0 0 0 0 1 0 0 0 0 0 0
#2: 0 0 0 0 0 0 0 0 0 0 0
#3: 0 0 0 0 0 0 0 1 0 0 0
# V2_14 V2_15 V2_16 V2_17 V2_18 V2_19 V2_20
#1: 0 0 0 0 0 0 0
#2: 0 0 0 0 0 1 0
#3: 0 0 0 0 0 0 0
Note that I have used names(df) here since you want to convert all the columns into separate columns. If you have additional columns and want to separate only few of them, you can also do
splitstackshape::cSplit(df, c("V1", "V2"), " ")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With