Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filter a Table based on another Table

One includes the TMC # for all the roadways of interest. The second includes travel times occurring on every single roadway in a particular state. I want to use the first table to filter so that only the records corresponding to those roadways of interest are remaining.

df

  id     link       tmc
1  1 23402444 122P06466
2  2 23402487 122P06476
3  3 23402488 122N06476
4  4 23402493 122N06477
5  5 23402555 122P06454
6  6 23402557 122N06453

df2

  id       tmc   epoch  tt
1  1 108N04625 1182014 163
2  2 108N04625 1182014 103
3  3 108N04625 1182014  73
4  4 108N04625 1172014 254
5  5 108N04625 1172014 224

I was trying to use a filter

Data2Filter<-(Data2, TMC==Data2$TMC)

but I was getting either object not found for everything or that my dimensions are mismatched (there are about 8000 records in Data1 and 14000000 in Data 2 because there can be multiple travel times (TT) on a TMC but I'm only interested in those that occurred on TMCs corresponding to the list in Data1). I'm very familiar with MatLab but unfortunately at the moment I only have R Studio available and know nothing about this software. Also these table are loaded in as cvs files if that makes any difference.

like image 720
mgc77 Avatar asked Mar 05 '26 15:03

mgc77


1 Answers

df2$tmc[1] <- df$tmc[1]
df2

  id       tmc   epoch  tt
1  1 122P06466 1182014 163
2  2 108N04625 1182014 103
3  3 108N04625 1182014  73
4  4 108N04625 1172014 254
5  5 108N04625 1172014 224

Many many options

subset(df2, tmc %in% df$tmc)

df2[df2$tmc %in% df$tmc, ]

library(dplyr)
fi <- filter(df2, tmc %in% df$tmc)

fi
  id       tmc   epoch  tt
1  1 122P06466 1182014 163

superfast with large datasets

library(data.table)
dt <- data.table(df)
dt2 <- data.table(df2)
subset(dt2, tmc %in% dt$tmc)

This may be a useful topic on subset performance

like image 173
Paulo E. Cardoso Avatar answered Mar 08 '26 08:03

Paulo E. Cardoso