Working on a Spark platform, using R and RStudio Server, I want to filter my tbl where a given column (string) meets the condition of being numeric. Hence, the column contains both numeric/integer values ('1234') and string values ('x1234') and I want to filter out the latter.
Is this possible using dyplr, for example with the filter operator/function?
I tried stuff like:
tbl2 <- tbl %>% filter(!is.numeric(col1))
By the way: the data is stored on other servers, I don't want to load the data locally. Hence I'm not able to use objects like dataframes. I'm bounded to libraries like dyplr.
Help is highly appreciated!
I donot think this works: tbl2 <- tbl %>% filter(!is.numeric(col1)). In a tbl_df or df, each column has only one class. So in your case, the col1 containing "123" and "x123" should be of the class "character".
One possible solution is to convert the col1 as numeric and to test if the conversion succeeds.
tbl2 <- tbl %>% filter(!is.na(as.numeric(col1)))
Test:
library(data.frame)
df <- data.frame(a=1:3,b=2:4)
df[2,2] <- "x123"
class(df$b)
# "character"
df %>% filter(!is.na(as.numeric(b)))
# a b
#1 1 2
#2 3 4
NOTE: there is a Warning message.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With