Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R-thonic replacement for simple for loops containing a condition

Tags:

for-loop

r

I'm using R, and I'm a beginner. I have two large lists (30K elements each). One is called descriptions and where each element is (maybe) a tokenized string. The other is called probes where each element is a number. I need to make a dictionary that mapsprobes to something in descriptions, if that something is there. Here's how I'm going about this:

probe2gene <- list()
for (i in 1:length(probes)){
 strings<-strsplit(descriptions[i]), '//')
 if (length(strings[[1]]) > 1){ 
  probe2gene[probes[i]] = strings[[1]][2]
 }
}

Which works fine, but seems slow, much slower than the roughly equivalent python:

probe2gene = {}
for p,d in zip(probes, descriptions):
    try:
     probe2gene[p] = descriptions.split('//')[1]
    except IndexError:
     pass

My question: is there an "R-thonic" way of doing what I'm trying to do? The R manual entry on for loops suggests that such loops are rare. Is there a better solution?

Edit: a typical good "description" looks like this:

"NM_009826 // Rb1cc1 // RB1-inducible coiled-coil 1 // 1 A2 // 12421 /// AB070619 // Rb1cc1 // RB1-inducible coiled-coil 1 // 1 A2 // 12421 /// ENSMUST00000027040 // Rb1cc1 // RB1-inducible coiled-coil 1 // 1 A2 // 12421"

a bad "description: looks like this

"-----"

though it can quite easily be some other not-very-helpful string. Each probe is simply a number. The probe and description vectors are the same length, and completely correspond to each other, i.e. probe[i] maps to description[i].

like image 613
Mike Dewar Avatar asked Dec 13 '25 12:12

Mike Dewar


2 Answers

It's usually better in R if you use the various apply-like functions, rather than a loop. I think this solves your problem; the only drawback is that you have to use string keys.

> descriptions <- c("foo//bar", "")
> probes <- c(10, 20)
> probe2gene <- lapply(strsplit(descriptions, "//"), function (x) x[2])
> names(probe2gene) <- probes
> probe2gene <- probe2gene[!is.na(probe2gene)]
> probe2gene[["10"]]
[1] "bar"

Unfortunately, R doesn't have a good dictionary/map type. The closest I've found is using lists as a map from string-to-value. That seems to be idiomatic, but it's ugly.

like image 132
Johann Hibschman Avatar answered Dec 16 '25 05:12

Johann Hibschman


If I understand correctly you are looking to save each probe-description combination where the there is more than one (split) value in description?

Probe and Description are the same length?

This is kind of messy but a quick first pass at it?

a <- list("a","b","c")
b <- list(c("a","b"),c("DEF","ABC"),c("Z"))

names(b) <- a
matches <- which(lapply(b, length)>1) #several ways to do this
b <- lapply(b[matches], function(x) x[2]) #keeps the second element only

That's my first attempt. If you have a sample dataset that would be very useful.

Best regards,

Jay

like image 22
Jay Avatar answered Dec 16 '25 04:12

Jay



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!