It seems like dplyr::pull() and dplyr::select() do the same thing. Is there a difference besides that dplyr::pull() only selects 1 variable?
Description. The function pull selects a column in a data frame and transforms it into a vector. This is useful to use it in combination with magrittr's pipe operator and dplyr's verbs.
dplyr is a package for making tabular data manipulation easier. tidyr enables you to swiftly convert between different data formats.
In addition to tidyr, and dplyr, there are five packages (including stringr and forcats) which are designed to work with specific types of data: lubridate for dates and date-times.
First, it makes to see what class each function creates.
library(dplyr)
mtcars %>% pull(cyl) %>% class()
#> 'numeric'
mtcars %>% select(cyl) %>% class()
#> 'data.frame'
So pull() creates a vector -- which, in this case, is numeric -- whereas select() creates a data frame.
Basically, pull() is the equivalent to writing mtcars$cyl or mtcars[, "cyl"], whereas select() removes all of the columns except for cyl but maintains the data frame structure
You could see select as an analogue of [ or magrittr::extract and pull as an analogue of [[ (or $) or magrittr::extract2 for data frames (an analogue of [[ for lists would be purr::pluck).
df <- iris %>% head
All of these give the same output:
df %>% pull(Sepal.Length)
df %>% pull("Sepal.Length")
a <- "Sepal.Length"; df %>% pull(!!quo(a))
df %>% extract2("Sepal.Length")
df %>% `[[`("Sepal.Length")
df[["Sepal.Length"]]
# all of them:
# [1] 5.1 4.9 4.7 4.6 5.0 5.4
And all of these give the same output:
df %>% select(Sepal.Length)
a <- "Sepal.Length"; df %>% select(!!quo(a))
df %>% select("Sepal.Length")
df %>% extract("Sepal.Length")
df %>% `[`("Sepal.Length")
df["Sepal.Length"]
# all of them:
# Sepal.Length
# 1 5.1
# 2 4.9
# 3 4.7
# 4 4.6
# 5 5.0
# 6 5.4
pull and select can take literal, character, or numeric indices, while the others take character or numeric only
One important thing is they differ on how they handle negative indices.
For select negative indices mean columns to drop.
For pull they mean count from last column.
df %>% pull(-Sepal.Length)
df %>% pull(-1)
# [1] setosa setosa setosa setosa setosa setosa
# Levels: setosa versicolor virginica
Strange result but Sepal.Length is converted to 1, and column -1 is Species (last column)
This feature is not supported by [[ and extract2 :
df %>% `[[`(-1)
df %>% extract2(-1)
df[[-1]]
# Error in .subset2(x, i, exact = exact) :
# attempt to select more than one element in get1index <real>
Negative indices to drop columns are supported by [ and extract though.
df %>% select(-Sepal.Length)
df %>% select(-1)
df %>% `[`(-1)
df[-1]
# Sepal.Width Petal.Length Petal.Width Species
# 1 3.5 1.4 0.2 setosa
# 2 3.0 1.4 0.2 setosa
# 3 3.2 1.3 0.2 setosa
# 4 3.1 1.5 0.2 setosa
# 5 3.6 1.4 0.2 setosa
# 6 3.9 1.7 0.4 setosa
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With