Skip to contents

This function calculates weighted Pearson correlations between two variables. It also allows you to group the data and calculate correlations along each level of the grouping variable. If data is not grouped and no group is specified, then it will return the same output as wtd_corr().

Usage

get_corr(data, x, y, group = NULL, wt)

Arguments

data

An object of type data.frame or tibble. If piping the data into the function, this is not required.

x, y

Can be either character strings or symbols. Name of two variables in the data you want to calculate the correlation between.

group

Can be either a character string or a symbol. The grouping variable.

wt

Can be either character strings or symbols. Weights. Add if you have a weighting variable and want to get weighted correlations

Value

A tibble showing correlations (correlation), number of observations (n), low and high confidence intervals (conf.low, conf.high), the p-value (p.value), and stars indicating it's statistical significance. If the data is grouped, then it will also include a column, or multiple, for each group. Similarly, if the data is grouped, the tibble will have a row for each unique combination of grouping variables.

Examples

# load the dplyr for piping and grouping
library(dplyr)

# Let's first do a simple correlation where we pipe in the data
test_data %>% get_corr(x = top, y = sdo_sum)
#> # A tibble: 1 × 8
#>   x               y          correlation     n conf.low conf.high  p.value stars
#>   <chr+lbl>       <chr+lbl>        <dbl> <dbl>    <dbl>     <dbl>    <dbl> <chr>
#> 1 top [An ideal … sdo… [Soc…      -0.736   250   -0.821    -0.651 6.41e-44 ***  

# Repeat but with weights
test_data %>% get_corr(x = top, y = sdo_sum, wt = wts)
#> # A tibble: 1 × 8
#>   x               y          correlation     n conf.low conf.high  p.value stars
#>   <chr+lbl>       <chr+lbl>        <dbl> <dbl>    <dbl>     <dbl>    <dbl> <chr>
#> 1 top [An ideal … sdo… [Soc…      -0.721   250   -0.808    -0.634 2.25e-41 ***  

# Now let's get the correlatoin among only people with a bachelor's degree
test_data %>% 
  filter(edu_f2 == "At Least a Bachelor's Degree") %>% 
  get_corr(x = top, y = sdo_sum, wt = wts)
#> # A tibble: 1 × 8
#>   x               y          correlation     n conf.low conf.high  p.value stars
#>   <chr+lbl>       <chr+lbl>        <dbl> <dbl>    <dbl>     <dbl>    <dbl> <chr>
#> 1 top [An ideal … sdo… [Soc…      -0.712   108   -0.847    -0.577 5.41e-18 ***  

# Now let's get it for each education level. Two ways of doing this:
# The first is to group the data ahead of time
test_data %>% 
  group_by(edu_f) %>% 
  get_corr(x = top, y = sdo_sum, wt = wts)
#> # A tibble: 4 × 9
#> # Groups:   edu_f [4]
#>   edu_f x         y          correlation     n conf.low conf.high  p.value stars
#>   <fct> <chr+lbl> <chr+lbl>        <dbl> <dbl>    <dbl>     <dbl>    <dbl> <chr>
#> 1 High… top [An … sdo… [Soc…      -0.728    64   -0.902    -0.555 8.94e-12 ***  
#> 2 Some… top [An … sdo… [Soc…      -0.729    78   -0.885    -0.572 3.93e-14 ***  
#> 3 Bach… top [An … sdo… [Soc…      -0.603    68   -0.799    -0.407 5.24e- 8 ***  
#> 4 Grad… top [An … sdo… [Soc…      -0.814    40   -1.00     -0.623 1.68e-10 ***  

# The second is to use the group argument
test_data %>% get_corr(x = top, y = sdo_sum, group = edu_f, wt = wts)
#> # A tibble: 4 × 9
#> # Groups:   edu_f [4]
#>   edu_f x         y          correlation     n conf.low conf.high  p.value stars
#>   <fct> <chr+lbl> <chr+lbl>        <dbl> <dbl>    <dbl>     <dbl>    <dbl> <chr>
#> 1 High… top [An … sdo… [Soc…      -0.728    64   -0.902    -0.555 8.94e-12 ***  
#> 2 Some… top [An … sdo… [Soc…      -0.729    78   -0.885    -0.572 3.93e-14 ***  
#> 3 Bach… top [An … sdo… [Soc…      -0.603    68   -0.799    -0.407 5.24e- 8 ***  
#> 4 Grad… top [An … sdo… [Soc…      -0.814    40   -1.00     -0.623 1.68e-10 ***