I am trying to order a variable in R which is a list of file names that contains three substrings that I want to order on. The files names are formatted as such:
MAF001.incMHC.zPGS.S1
MAF002.incMHC.zPGS.S1
MAF003.incMHC.zPGS.S1
MAF001.incMHC.zPGS.S2
MAF002.incMHC.zPGS.S2
MAF003.incMHC.zPGS.S2
MAF001.noMHC_incRS148.zPGS.S1
MAF002.noMHC_incRS148.zPGS.S1
MAF003.noMHC_incRS148.zPGS.S1
MAF001.noMHC_incRS148.zPGS.S2
MAF002.noMHC_incRS148.zPGS.S2
MAF003.noMHC_incRS148.zPGS.S2
MAF001.noMHC.zPGS.S1
MAF002.noMHC.zPGS.S1
MAF003.noMHC.zPGS.S1
MAF001.noMHC.zPGS.S2
MAF002.noMHC.zPGS.S2
MAF003.noMHC.zPGS.S2
I want to order this list firstly on MAF substring, then MHC substring, then S substring, such that the order is:
MAF001.incMHC.zPGS.S1
MAF001.noMHC_incRS148.zPGS.S1
MAF001.noMHC.zPGS.S1
MAF001.incMHC.zPGS.S2
MAF001.noMHC_incRS148.zPGS.S2
MAF001.noMHC.zPGS.S2
MAF002.incMHC.zPGS.S1
MAF002.noMHC_incRS148.zPGS.S1
MAF002.noMHC.zPGS.S1
MAF002.incMHC.zPGS.S2
MAF002.noMHC_incRS148.zPGS.S2
MAF002.noMHC.zPGS.S2
MAF003.incMHC.zPGS.S1
MAF003.noMHC_incRS148.zPGS.S1
MAF003.noMHC.zPGS.S1
MAF003.incMHC.zPGS.S2
MAF003.noMHC_incRS148.zPGS.S2
MAF003.noMHC.zPGS.S2
I have had a play around with gsub after seeing the answer to this question regarding a single substring: R Sort strings according to substring
But I am not sure how to extend this idea to multiple substrings (of mixed character and numerical classes) within a string.
Here's a one-liner in base R:
bar <- foo[order(sapply(strsplit(foo, "\\."), function(x) paste(x[1], x[4])))]
head(data.frame(result = bar), 10)
result
1 MAF001.incMHC.zPGS.S1
2 MAF001.noMHC_incRS148.zPGS.S1
3 MAF001.noMHC.zPGS.S1
4 MAF001.incMHC.zPGS.S2
5 MAF001.noMHC_incRS148.zPGS.S2
6 MAF001.noMHC.zPGS.S2
7 MAF002.incMHC.zPGS.S1
8 MAF002.noMHC_incRS148.zPGS.S1
9 MAF002.noMHC.zPGS.S1
10 MAF002.incMHC.zPGS.S2
Explanation:
.
using strsplit
: strsplit(foo, "\\.")
paste(x[1], x[4])
order
foo[]
Data (foo
):
c("MAF001.incMHC.zPGS.S1", "MAF002.incMHC.zPGS.S1", "MAF003.incMHC.zPGS.S1",
"MAF001.incMHC.zPGS.S2", "MAF002.incMHC.zPGS.S2", "MAF003.incMHC.zPGS.S2",
"MAF001.noMHC_incRS148.zPGS.S1", "MAF002.noMHC_incRS148.zPGS.S1",
"MAF003.noMHC_incRS148.zPGS.S1", "MAF001.noMHC_incRS148.zPGS.S2",
"MAF002.noMHC_incRS148.zPGS.S2", "MAF003.noMHC_incRS148.zPGS.S2",
"MAF001.noMHC.zPGS.S1", "MAF002.noMHC.zPGS.S1", "MAF003.noMHC.zPGS.S1",
"MAF001.noMHC.zPGS.S2", "MAF002.noMHC.zPGS.S2", "MAF003.noMHC.zPGS.S2"
)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With