I want to use the function skim from R package skimr to produce summary statistics of multiple datasets. To save space, I need to prioritize information that gets displayed. I would like to remove these rows from the Data Summary section of skim output: "Name", "Column type frequency", and "Group variables". Is there an easy way to do this?
I tried skim(iris) and got the following:
-- Data Summary ------------------------
Values
Name iris
Number of rows 150
Number of columns 5
_______________________
Column type frequency:
factor 1
numeric 4
________________________
Group variables None
-- Variable type: factor -----------------------------------------------------------------------------------------------------------------------------------------------------
# A tibble: 1 x 6
skim_variable n_missing complete_rate ordered n_unique top_counts
* <chr> <int> <dbl> <lgl> <int> <chr>
1 Species 0 1 FALSE 3 set: 50, ver: 50, vir: 50
-- Variable type: numeric ----------------------------------------------------------------------------------------------------------------------------------------------------
# A tibble: 4 x 11
skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
* <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 Sepal.Length 0 1 5.84 0.828 4.3 5.1 5.8 6.4 7.9 ▆▇▇▅▂
2 Sepal.Width 0 1 3.06 0.436 2 2.8 3 3.3 4.4 ▁▆▇▂▁
3 Petal.Length 0 1 3.76 1.77 1 1.6 4.35 5.1 6.9 ▇▁▆▇▂
4 Petal.Width 0 1 1.20 0.762 0.1 0.3 1.3 1.8 2.5 ▇▁▇▅▃
Instead, I want to display the following:
-- Data Summary ------------------------
Values
Number of rows 150
Number of columns 5
-- Variable type: factor -----------------------------------------------------------------------------------------------------------------------------------------------------
# A tibble: 1 x 6
skim_variable n_missing complete_rate ordered n_unique top_counts
* <chr> <int> <dbl> <lgl> <int> <chr>
1 Species 0 1 FALSE 3 set: 50, ver: 50, vir: 50
-- Variable type: numeric ----------------------------------------------------------------------------------------------------------------------------------------------------
# A tibble: 4 x 11
skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
* <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
1 Sepal.Length 0 1 5.84 0.828 4.3 5.1 5.8 6.4 7.9 ▆▇▇▅▂
2 Sepal.Width 0 1 3.06 0.436 2 2.8 3 3.3 4.4 ▁▆▇▂▁
3 Petal.Length 0 1 3.76 1.77 1 1.6 4.35 5.1 6.9 ▇▁▆▇▂
4 Petal.Width 0 1 1.20 0.762 0.1 0.3 1.3 1.8 2.5 ▇▁▇▅▃
Function skim returns an object of S3 class "skim_df" sub-classing classes "tbl_df", "tbl", "data.frame" and a print method for that class exists. This print method has an argument include_summary that can be set to FALSE to skip the printing of that information.
s <- skimr::skim(iris)
class(s)
#> [1] "skim_df" "tbl_df" "tbl" "data.frame"
Created on 2022-03-23 by the reprex package (v2.0.1)
To answer the question, just run
print(s, include_summary = FALSE)
#-- Variable type: factor ----------------------------------------------------------------------------------------------------------------
## A tibble: 1 x 6
# skim_variable n_missing complete_rate ordered n_unique top_counts
#* <chr> <int> <dbl> <lgl> <int> <chr>
#1 Species 0 1 FALSE 3 set: 50, ver: 50, vir: 50
#
#-- Variable type: numeric ----------------------------------------------------------------------#-----------------------------------------
# A tibble: 4 x 11
# skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
#* <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#1 Sepal.Length 0 1 5.84 0.828 4.3 5.1 5.8 6.4 7.9 ▆▇▇▅▂
#2 Sepal.Width 0 1 3.06 0.436 2 2.8 3 3.3 4.4 ▁▆▇▂▁
#3 Petal.Length 0 1 3.76 1.77 1 1.6 4.35 5.1 6.9 ▇▁▆▇▂
#4 Petal.Width 0 1 1.20 0.762 0.1 0.3 1.3 1.8 2.5 ▇▁▇▅▃
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With