get_all_corr
makes it easy to calculate correlations across
every variable in a data frame or select set of variables. It also
works with grouped data frames so you can check correlations among
the levels of several grouping variables.
Arguments
- data
A data frame or tibble object
- cols
<
tidy-select
> The variables you want to get the correlations for.- wt
A variable to use as the weights for weighted correlations
- remove_redundant
Should rows where the two variables are the same be kept or removed? If
TRUE
, the default, they are removed.
Examples
# load dplyr and adlgraphs
library(dplyr)
library(adlgraphs)
# To get correlations with three variables you can do it three ways
# 1. Create a new data frame with only the columns you want
new_data <- test_data %>% dplyr::select(top:dominate)
get_all_corr(new_data)
#> # A tibble: 6 × 8
#> x y correlation n conf.low conf.high p.value stars
#> <chr+lbl> <chr+lbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 inferior [Some… top [An i… 0.498 250 0.389 0.606 4.82e-17 ***
#> 2 dominate [No o… top [An i… -0.147 250 -0.271 -0.0234 2.00e- 2 *
#> 3 top [An ideal … inf… [Som… 0.498 250 0.389 0.606 4.82e-17 ***
#> 4 dominate [No o… inf… [Som… -0.138 250 -0.262 -0.0146 2.86e- 2 *
#> 5 top [An ideal … dom… [No … -0.147 250 -0.271 -0.0234 2.00e- 2 *
#> 6 inferior [Some… dom… [No … -0.138 250 -0.262 -0.0146 2.86e- 2 *
# 2. Using dplyr::select() and pipes
test_data %>%
dplyr::select(c(top:dominate)) %>%
get_all_corr()
#> # A tibble: 6 × 8
#> x y correlation n conf.low conf.high p.value stars
#> <chr+lbl> <chr+lbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 inferior [Some… top [An i… 0.498 250 0.389 0.606 4.82e-17 ***
#> 2 dominate [No o… top [An i… -0.147 250 -0.271 -0.0234 2.00e- 2 *
#> 3 top [An ideal … inf… [Som… 0.498 250 0.389 0.606 4.82e-17 ***
#> 4 dominate [No o… inf… [Som… -0.138 250 -0.262 -0.0146 2.86e- 2 *
#> 5 top [An ideal … dom… [No … -0.147 250 -0.271 -0.0234 2.00e- 2 *
#> 6 inferior [Some… dom… [No … -0.138 250 -0.262 -0.0146 2.86e- 2 *
# 3. Use the `cols` argument
get_all_corr(test_data, cols = c(top:dominate))
#> # A tibble: 6 × 8
#> x y correlation n conf.low conf.high p.value stars
#> <chr+lbl> <chr+lbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 inferior [Some… top [An i… 0.498 250 0.389 0.606 4.82e-17 ***
#> 2 dominate [No o… top [An i… -0.147 250 -0.271 -0.0234 2.00e- 2 *
#> 3 top [An ideal … inf… [Som… 0.498 250 0.389 0.606 4.82e-17 ***
#> 4 dominate [No o… inf… [Som… -0.138 250 -0.262 -0.0146 2.86e- 2 *
#> 5 top [An ideal … dom… [No … -0.147 250 -0.271 -0.0234 2.00e- 2 *
#> 6 inferior [Some… dom… [No … -0.138 250 -0.262 -0.0146 2.86e- 2 *
# or
test_data %>% get_all_corr(c(top:dominate))
#> # A tibble: 6 × 8
#> x y correlation n conf.low conf.high p.value stars
#> <chr+lbl> <chr+lbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 inferior [Some… top [An i… 0.498 250 0.389 0.606 4.82e-17 ***
#> 2 dominate [No o… top [An i… -0.147 250 -0.271 -0.0234 2.00e- 2 *
#> 3 top [An ideal … inf… [Som… 0.498 250 0.389 0.606 4.82e-17 ***
#> 4 dominate [No o… inf… [Som… -0.138 250 -0.262 -0.0146 2.86e- 2 *
#> 5 top [An ideal … dom… [No … -0.147 250 -0.271 -0.0234 2.00e- 2 *
#> 6 inferior [Some… dom… [No … -0.138 250 -0.262 -0.0146 2.86e- 2 *
# To get weighted correlations just specify the `wt` argument
test_data %>% get_all_corr(c(top:dominate), wt = wts)
#> # A tibble: 6 × 8
#> x y correlation n conf.low conf.high p.value stars
#> <chr+lbl> <chr+lbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
#> 1 inferior [Some… top [An i… 0.498 250 0.389 0.606 4.82e-17 ***
#> 2 dominate [No o… top [An i… -0.147 250 -0.271 -0.0234 2.00e- 2 *
#> 3 top [An ideal … inf… [Som… 0.498 250 0.389 0.606 4.82e-17 ***
#> 4 dominate [No o… inf… [Som… -0.138 250 -0.262 -0.0146 2.86e- 2 *
#> 5 top [An ideal … dom… [No … -0.147 250 -0.271 -0.0234 2.00e- 2 *
#> 6 inferior [Some… dom… [No … -0.138 250 -0.262 -0.0146 2.86e- 2 *
# You can also calculate grouped correlations. For example, if
# you were interested in comparing the weighted correlations
# among people with a college degree vs those without one, you
# would do it like this:
test_data %>%
dplyr::group_by(edu_f2) %>%
get_all_corr(c(top:dominate), wt = wts)
#> # A tibble: 12 × 9
#> edu_f2 x y correlation n conf.low conf.high p.value
#> <chr> <chr+lbl> <chr+lbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 No Colle… inf… [Som… top [An i… 0.509 142 0.366 0.653 9.61e-11
#> 2 No Colle… dom… [No … top [An i… -0.104 142 -0.270 0.0621 2.18e- 1
#> 3 No Colle… top [An i… inf… [Som… 0.509 142 0.366 0.653 9.61e-11
#> 4 No Colle… dom… [No … inf… [Som… -0.120 142 -0.286 0.0456 1.54e- 1
#> 5 No Colle… top [An i… dom… [No … -0.104 142 -0.270 0.0621 2.18e- 1
#> 6 No Colle… inf… [Som… dom… [No … -0.120 142 -0.286 0.0456 1.54e- 1
#> 7 At Least… inf… [Som… top [An i… 0.483 108 0.315 0.652 1.19e- 7
#> 8 At Least… dom… [No … top [An i… -0.218 108 -0.406 -0.0297 2.37e- 2
#> 9 At Least… top [An i… inf… [Som… 0.483 108 0.315 0.652 1.19e- 7
#> 10 At Least… dom… [No … inf… [Som… -0.164 108 -0.354 0.0259 8.99e- 2
#> 11 At Least… top [An i… dom… [No … -0.218 108 -0.406 -0.0297 2.37e- 2
#> 12 At Least… inf… [Som… dom… [No … -0.164 108 -0.354 0.0259 8.99e- 2
#> # ℹ 1 more variable: stars <chr>