Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to create a python-style dictionary by data.table in R?

Tags:

r

data.table

I'm looking for a python-like dictionary structure in R to replace values in a large dataset (>100 MB) and I think data.table package can help me do this. However, I cannot find out an easy way to solve the problem.

For example, I have two data.table:

Table A:

   V1 V2
1:  A  B
2:  C  D
3:  C  D
4:  B  C
5:  D  A

Table B:

   V3 V4
1:  A  1
2:  B  2
3:  C  3
4:  D  4

I want to use B as a dictionary to replace the values in A. So the result I want to get is:

Table R:

V5 V6
 1  2
 3  4
 3  4
 2  3
 4  1

What I did is:

c2=tB[tA[,list(V2)],list(V4)]
c1=tB[tA[,list(V1)],list(V4)]

Although I specified j=list(V4), it still returned me with the values of V3. I don't know why.

c2:

   V3 V4
1:  B  2
2:  D  4
3:  D  4
4:  C  3
5:  A  1

c1:

   V3 V4
1:  A  1
2:  C  3
3:  C  3
4:  B  2
5:  D  4

Finally, I combined the two V4 columns and got the result I want.

But I think there should be a much easier way to do this. Any ideas?

like image 627
lovetl2002 Avatar asked Sep 19 '25 15:09

lovetl2002


1 Answers

Here's an alternative way:

setkey(B, V3)
for (i in seq_len(length(A))) {
    thisA = A[[i]]
    set(A, j=i, value=B[thisA]$V4)
}
#    V1 V2
# 1:  1  2
# 2:  3  4
# 3:  3  4
# 4:  2  3
# 5:  4  1

Since thisA is character column, we don't need the J() (for convenience). Here, A's columns are replaced by reference, and is therefore also memory efficient. But if you don't want to replace A, then you can just use cA <- copy(A) and replace cA's columns.


Alternatively, using :=:

A[, names(A) := lapply(.SD, function(x) B[J(x)]$V4)]
# or
ans = copy(A)[, names(A) := lapply(.SD, function(x) B[J(x)]$V4)]

(Following user2923419's comment): You can drop the J() if the lookup is a single column of type character (just for convenience).


In 1.9.3, when j is a single column, it returns a vector (based on user request). So, it's a bit more natural data.table syntax:

setkey(B, V3)
for (i in seq_len(length(A))) {
    thisA = A[[i]]
    set(A, j=i, value=B[thisA, V4])
}
like image 164
Arun Avatar answered Sep 21 '25 05:09

Arun