A function that can translate DNA sequence to binary code

Question

I am designing a function that can translate DNA sequence to binary code in four dimension vector. e.g "A"-(1,0,0,0)| "G-(0,1,0,0)"...

We also find the () in for loop can actually influence the result. we hope to find the reason behind this. e.g. 4-1:7-1 & (4-1):7-1 is totally different, we want to find the knowledge behind this

NC1 <- function(data){ 
  for(i in 1:length(data) ){
    if(i==1){ 
      DCfirst <- unlist(as.vector(strsplit(data[1],"",fixed = TRUE)))
      DCsecond <- matrix(0,nrow = length(data),ncol = length(DCfirst))
      DCsecond[1,] <-  DCfirst 
    }else{
      DCsecond[i,] <- unlist(as.vector(strsplit(data[i],"",fixed = TRUE)))
    }
  }
  return(DCsecond)
}

binary<- function(data){
  sequence_X<-NC1(data)
  N=ncol(sequence_X)
  X2<-matrix(NA,nrow=length(data),ncol=4*N)
  for (i in 1 : N){
    L1<-which(sequence_X[,i]=="A")
    L2<-which(sequence_X[,i]=="G")
    L3<-which(sequence_X[,i]=="C")
    L4<-which(sequence_X[,i]=="U")
    for (j in L1){
      X2[j, (4i-3):4i-1]<-unlist(c(1,0,0,0))
    }
    for (j in L2){
      X2[j, (4i-3):4i-1]<-unlist(c(1,0,0,0))
    }
    for (j in L3){
      X2[j, (4i-3):4i-1]<-unlist(c(1,0,0,0))
    }
    for (j in L4){
      X2[j, (4i-3):4i-1]<-unlist(c(1,0,0,0))
    }
  }
    return (X2)
}

TEST <- c("ACGUC","ACUAU","UCGUA","CGUCG","UAGUG")
binary(TEST)

The final result is showed us below:

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17]
[1,]   NA   NA   NA   NA    1    0    0    0    1     0     0     0     1     0     0     0     1
[2,]   NA   NA   NA   NA    1    0    0    0    1     0     0     0     1     0     0     0     1
[3,]   NA   NA   NA   NA    1    0    0    0    1     0     0     0     1     0     0     0     1
[4,]   NA   NA   NA   NA    1    0    0    0    1     0     0     0     1     0     0     0     1
[5,]   NA   NA   NA   NA    1    0    0    0    1     0     0     0     1     0     0     0     1
     [,18] [,19] [,20]
[1,]     0     0     0
[2,]     0     0     0
[3,]     0     0     0
[4,]     0     0     0
[5,]     0     0     0

I hope my final sequence can all be translated to vector format. As can be seen from the results, all except the first element in each sequence cannot fully be translated to the vector format

this is the correct answer i hope to achieve:

enter image description here

this is the first time to use this to ask questions. I feel really sorry to be unable to convey the question clearly

akrun · Accepted Answer

Here is an option in base R with outer and ==. We split the 'TEST' by "", do the elementwise comparison to give a list of logical matrices

f1 <- function(x, y) outer(x, y, FUN = `==`)
lapply(strsplit(TEST, ""), f1, c("A", "G", "C", "U"))

data

TEST <- c("ACGUC","ACUAU","UCGUA","CGUCG","UAGUG")

Marc in the box · Answer

I think I would do this in a lapply-like operation.

Example:

TEST <- c("ACGUC","ACUAU","UCGUA","CGUCG","UAGUG")

vecDNA <- function(x){unlist(strsplit(x = x, split = "*"))}
binDNA <- function(x){
  data.frame(
    code=x, 
    G=as.numeric(x=="G"), 
    C=as.numeric(x=="C"), 
    A=as.numeric(x=="A"), 
    U=as.numeric(x=="U")
  )
}

T2 <- lapply(as.list(TEST),vecDNA)
T3 <- lapply(T2, binDNA)
T3

Result:

> T3
[[1]]
  code G C A U
1    A 0 0 1 0
2    C 0 1 0 0
3    G 1 0 0 0
4    U 0 0 0 1
5    C 0 1 0 0

[[2]]
  code G C A U
1    A 0 0 1 0
2    C 0 1 0 0
3    U 0 0 0 1
4    A 0 0 1 0
5    U 0 0 0 1

[[3]]
  code G C A U
1    U 0 0 0 1
2    C 0 1 0 0
3    G 1 0 0 0
4    U 0 0 0 1
5    A 0 0 1 0

[[4]]
  code G C A U
1    C 0 1 0 0
2    G 1 0 0 0
3    U 0 0 0 1
4    C 0 1 0 0
5    G 1 0 0 0

[[5]]
  code G C A U
1    U 0 0 0 1
2    A 0 0 1 0
3    G 1 0 0 0
4    U 0 0 0 1
5    G 1 0 0 0

A function that can translate DNA sequence to binary code

Tags:

r

bioinformatics

Davy

2 Answers

data

akrun

Example:

Result:

Marc in the box

Recent Activity

Donate For Us

A function that can translate DNA sequence to binary code

Tags:

r

bioinformatics

Davy

2 Answers

data

akrun

Example:

Result:

Marc in the box

Related questions

Recent Activity

Donate For Us