Compute pairwise Pearson correlations between two or more numeric variables in a survey design, with design-based standard errors and confidence intervals. Returns results in long or wide format.
Usage
get_corr(
design,
x,
group = NULL,
format = c("long", "wide"),
redundant = FALSE,
diagonal = FALSE,
variance = "ci",
conf_level = 0.95,
n_weighted = FALSE,
decimals = NULL,
min_cell_n = 30L,
na.rm = TRUE,
label_values = TRUE,
label_vars = TRUE,
name_style = "surveycore"
)Arguments
- design
A survey design object:
survey_taylor,survey_replicate,survey_twophase,survey_srs, orsurvey_calibrated.- x
<
tidy-select> Two or more unquoted numeric variable names. Non-numeric variables are dropped with a warning. At least two numeric variables must remain.- group
<
tidy-select> Optional grouping variable(s). Combined with any grouping set bygroup_by(). DefaultNULL.- format
"long"(default) or"wide". Long format returns one row per variable pair with inference statistics. Wide format returns the correlation matrix (rvalues only — no variance or inference columns). Whengroupis active, group columns are prepended in both formats.- redundant
Logical. If
FALSE(default), each pair appears once (lower triangle: pairs where var1 precedes var2 in input order). IfTRUE, both(A, B)and(B, A)are included (full directed pairs). Only affects long format; wide format always shows the full symmetric matrix.- diagonal
Logical. If
FALSE(default), self-correlations are excluded (diagonal isNAin wide format). IfTRUE, self-correlations (r equals 1) are included.- variance
NULLor a character vector of one or more of"se","ci","var","cv","moe","deff". Default"ci". CI bounds use the Fisher Z transform (guaranteeing bounds in (−1, 1)). Only applies to long format.- conf_level
Numeric scalar in (0, 1). Default
0.95.- n_weighted
Logical. If
TRUE, add ann_weightedcolumn with the pairwise sum of weights (both variables non-NA). DefaultFALSE.- decimals
Integer or
NULL. If an integer, rounds all numeric output columns (e.g.,r,se,ci_low,ci_high) to this many decimal places. DefaultNULL(no rounding).- min_cell_n
Integer. Minimum pairwise unweighted count before
surveycore_warning_small_cellfires. Default30L(AAPOR guidance).- na.rm
Logical. If
TRUE(default), pairs use complete cases for each variable pair separately (pairwise deletion), and observations where any group variable isNAare excluded from the output. IfFALSE, pairwise complete cases are still used for each variable pair, and observations where a group variable isNAare collected into their own group row in the output (appearing after all non-NAgroup rows).- label_values
Logical. If
TRUE(default) and the grouping variable has value labels, the group column is converted to a labelled factor. Has no visible effect when no groups are active.- label_vars
Logical. If
TRUE(default) and variable labels are set in metadata,var1/var2columns (long) andvariablecolumn (wide) show labels instead of raw names. Falls back to raw names if labels are unset.- name_style
"surveycore"(default) or"broom". When"broom", renamesr→estimate,se→std.error, etc. Only affects long format.
Value
A survey_corr tibble (also inheriting survey_result).
When group is active, group variable columns are prepended before all
other columns in both long and wide formats.
Long format columns:
[group_cols...]— group variable columns (when active), first.var1,var2— variable names (or labels whenlabel_vars = TRUE).r— Pearson correlation coefficient.Variance columns (
se,var,cv,ci_low,ci_high,moe,deff) — only those requested viavariance.p_value— two-tailed p-value.statistic— t-statistic.df— degrees of freedom for the t-test (n minus 2).n— pairwise unweighted count.n_weighted— pairwise sum of weights (only when requested).
Wide format columns:
[group_cols...]— group variable columns (when active), first.variable— row variable names (or labels).One column per focal variable, containing
rvalues.
Use meta(result) to access design type, variable labels, and
method ("pearson").
See also
Other analysis:
get_freqs(),
get_means(),
get_quantiles(),
get_ratios(),
get_totals(),
meta()
Examples
d <- as_survey(nhanes_2017, ids = sdmvpsu, weights = wtint2yr,
strata = sdmvstra, nest = TRUE)
get_corr(d, x = c(ridageyr, bpxsy1))
#> # A tibble: 1 × 9
#> var1 var2 r ci_low ci_high p_value statistic df n
#> <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int>
#> 1 Age in years at scre… Syst… 0.544 0.529 0.559 0 51.5 6300 6302
# Wide correlation matrix
get_corr(d, x = c(ridageyr, bpxsy1), format = "wide")
#> # A tibble: 2 × 3
#> variable Age in years at scre…¹ Systolic: Blood pres…²
#> <chr> <dbl> <dbl>
#> 1 Age in years at screening NA 0.544
#> 2 Systolic: Blood pres (1st rdg) … 0.544 NA
#> # ℹ abbreviated names: ¹`Age in years at screening`,
#> # ²`Systolic: Blood pres (1st rdg) mm Hg`
# AAPOR-compliant
get_corr(d, x = c(ridageyr, bpxsy1),
variance = c("ci", "moe"), n_weighted = TRUE)
#> # A tibble: 1 × 11
#> var1 var2 r ci_low ci_high moe p_value statistic df n
#> <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int>
#> 1 Age in years … Syst… 0.544 0.529 0.559 0.0151 0 51.5 6300 6302
#> # ℹ 1 more variable: n_weighted <dbl>