I am designing a function that can translate DNA sequence to binary code in four dimension vector. e.g "A"-(1,0,0,0)| "G-(0,1,0,0)"...
We also find the () in for loop can actually influence the result. we hope to find the reason behind this. e.g. 4-1:7-1 & (4-1):7-1 is totally different, we want to find the knowledge behind this
NC1 <- function(data){
for(i in 1:length(data) ){
if(i==1){
DCfirst <- unlist(as.vector(strsplit(data[1],"",fixed = TRUE)))
DCsecond <- matrix(0,nrow = length(data),ncol = length(DCfirst))
DCsecond[1,] <- DCfirst
}else{
DCsecond[i,] <- unlist(as.vector(strsplit(data[i],"",fixed = TRUE)))
}
}
return(DCsecond)
}
binary<- function(data){
sequence_X<-NC1(data)
N=ncol(sequence_X)
X2<-matrix(NA,nrow=length(data),ncol=4*N)
for (i in 1 : N){
L1<-which(sequence_X[,i]=="A")
L2<-which(sequence_X[,i]=="G")
L3<-which(sequence_X[,i]=="C")
L4<-which(sequence_X[,i]=="U")
for (j in L1){
X2[j, (4i-3):4i-1]<-unlist(c(1,0,0,0))
}
for (j in L2){
X2[j, (4i-3):4i-1]<-unlist(c(1,0,0,0))
}
for (j in L3){
X2[j, (4i-3):4i-1]<-unlist(c(1,0,0,0))
}
for (j in L4){
X2[j, (4i-3):4i-1]<-unlist(c(1,0,0,0))
}
}
return (X2)
}
TEST <- c("ACGUC","ACUAU","UCGUA","CGUCG","UAGUG")
binary(TEST)
The final result is showed us below:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17]
[1,] NA NA NA NA 1 0 0 0 1 0 0 0 1 0 0 0 1
[2,] NA NA NA NA 1 0 0 0 1 0 0 0 1 0 0 0 1
[3,] NA NA NA NA 1 0 0 0 1 0 0 0 1 0 0 0 1
[4,] NA NA NA NA 1 0 0 0 1 0 0 0 1 0 0 0 1
[5,] NA NA NA NA 1 0 0 0 1 0 0 0 1 0 0 0 1
[,18] [,19] [,20]
[1,] 0 0 0
[2,] 0 0 0
[3,] 0 0 0
[4,] 0 0 0
[5,] 0 0 0
I hope my final sequence can all be translated to vector format. As can be seen from the results, all except the first element in each sequence cannot fully be translated to the vector format
this is the correct answer i hope to achieve:

this is the first time to use this to ask questions. I feel really sorry to be unable to convey the question clearly
Here is an option in base R with outer and ==. We split the 'TEST' by "", do the elementwise comparison to give a list of logical matrices
f1 <- function(x, y) outer(x, y, FUN = `==`)
lapply(strsplit(TEST, ""), f1, c("A", "G", "C", "U"))
TEST <- c("ACGUC","ACUAU","UCGUA","CGUCG","UAGUG")
I think I would do this in a lapply-like operation.
TEST <- c("ACGUC","ACUAU","UCGUA","CGUCG","UAGUG")
vecDNA <- function(x){unlist(strsplit(x = x, split = "*"))}
binDNA <- function(x){
data.frame(
code=x,
G=as.numeric(x=="G"),
C=as.numeric(x=="C"),
A=as.numeric(x=="A"),
U=as.numeric(x=="U")
)
}
T2 <- lapply(as.list(TEST),vecDNA)
T3 <- lapply(T2, binDNA)
T3
> T3
[[1]]
code G C A U
1 A 0 0 1 0
2 C 0 1 0 0
3 G 1 0 0 0
4 U 0 0 0 1
5 C 0 1 0 0
[[2]]
code G C A U
1 A 0 0 1 0
2 C 0 1 0 0
3 U 0 0 0 1
4 A 0 0 1 0
5 U 0 0 0 1
[[3]]
code G C A U
1 U 0 0 0 1
2 C 0 1 0 0
3 G 1 0 0 0
4 U 0 0 0 1
5 A 0 0 1 0
[[4]]
code G C A U
1 C 0 1 0 0
2 G 1 0 0 0
3 U 0 0 0 1
4 C 0 1 0 0
5 G 1 0 0 0
[[5]]
code G C A U
1 U 0 0 0 1
2 A 0 0 1 0
3 G 1 0 0 0
4 U 0 0 0 1
5 G 1 0 0 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With