Skip to contents

Compute the design-based estimate of the finite-population variance for one or more numeric variables in a survey design, with optional grouping, uncertainty quantification, and metadata-driven labelling. Matches survey::svyvar() numerically (Kish n/(n-1) correction) on Taylor, replicate, twophase, and nonprob designs.

Usage

get_variance(
  design,
  x,
  group = NULL,
  variance = "ci",
  conf_level = 0.95,
  n_weighted = FALSE,
  decimals = NULL,
  min_cell_n = 30L,
  na.rm = TRUE,
  na_handling = c("pairwise", "listwise"),
  label_values = TRUE,
  label_vars = TRUE,
  name_style = "surveycore",
  ...,
  .id = NULL,
  .if_missing_var = NULL
)

Arguments

design

A survey design object: survey_taylor, survey_replicate, survey_twophase, or survey_nonprob. Also accepts a survey_collection.

x

<tidy-select> One or more unquoted numeric variable names. Must resolve to at least one numeric column; non-numeric columns are rejected (no silent drop).

group

<tidy-select> Optional grouping variable(s). Combined with any grouping set by group_by(). Default NULL.

variance

NULL or a character vector of one or more of "se", "ci", "var", "cv", "moe", "deff". Controls which uncertainty columns appear in the output. Default "ci".

conf_level

Numeric scalar in (0, 1). Confidence level for intervals. Default 0.95.

n_weighted

Logical. If TRUE, add an n_weighted column with the sum of weights for non-NA, positive-weight observations in each row's estimate. Default FALSE.

decimals

Integer or NULL. If an integer, rounds all numeric output columns to this many decimal places. Default NULL (no rounding).

min_cell_n

Integer. Minimum unweighted cell count before surveycore_warning_small_cell fires. Default 30L (AAPOR guidance).

na.rm

Logical. If TRUE (default), NA values in the focal variable are excluded from the estimate and rows with NA in any grouping variable are excluded from the output. If FALSE, NA propagates to produce NaN estimates.

na_handling

"pairwise" (default) or "listwise". In multi-variable mode controls whether each focal variable uses its own complete-case set ("pairwise") or the intersection across all focal variables ("listwise"). Ignored when na.rm = FALSE.

label_values

Logical. Accepted for API uniformity; used to convert grouping-variable codes to value labels. Default TRUE.

label_vars

Logical. If TRUE (default), the name column shows variable labels when available (falling back to raw names).

name_style

"surveycore" (default) or "broom". Under "broom", renames varianceestimate, sestd.error, ci_lowconf.low, ci_highconf.high.

...

Unused. Reserved so that .id and .if_missing_var remain named-only when a survey_collection is passed as design.

.id

Character(1) or NULL. Column name used to identify each survey when design is a survey_collection. For collection inputs, NULL (the default) resolves to the collection's stored @id property. Pass a non-NULL value to override. Ignored when design is a single survey.

.if_missing_var

"error", "skip", or NULL. How to handle surveys in a collection that lack one of the requested NSE variables. For collection inputs, NULL (the default) resolves to the collection's stored @if_missing_var property. Pass a non-NULL value to override. Ignored when design is a single survey.

Value

A survey_variance tibble (also inheriting survey_result). Columns, in order:

  • [group_cols...] — group variable columns (when active), first.

  • name — focal variable name (or its label when label_vars = TRUE).

  • variance — design-based point estimate of the finite-population variance. NaN for degenerate cells; exact 0 for constant-in-domain variables.

  • Uncertainty columns (se, var, cv, ci_low, ci_high, moe, deff) — only those requested via variance.

  • n — unweighted count of non-NA observations used.

  • n_weighted — sum of weights (only when n_weighted = TRUE).

Details

Confidence intervals use the normal-Wald approximation on the SE of the variance estimate: ci_low = variance - z * se, ci_high = variance + z * se, where z = qnorm((1 + conf_level) / 2). The bounds are not clamped. When the true variance is near zero with wide SE, ci_low may be negative. Users who want non-negative lower bounds can clamp at 0 post-hoc. This behaviour matches survey::svyvar().

Under na_handling = "pairwise" (the default), each focal variable contributes its own per-variable complete-case count to n. Under na_handling = "listwise", every output row shares the intersection complete-case count — rows with NA in any selected variable are excluded from every variable's calculation.

Examples

d <- as_survey(
  nhanes_2017,
  ids = sdmvpsu,
  weights = wtint2yr,
  strata = sdmvstra,
  nest = TRUE
)
get_variance(d, ridageyr)
#> # A tibble: 1 × 5
#>   name                      variance ci_low ci_high     n
#>   <chr>                        <dbl>  <dbl>   <dbl> <int>
#> 1 Age in years at screening     515.   497.    534.  9254

# Multiple variables
get_variance(d, c(ridageyr, bpxsy1))
#> # A tibble: 2 × 5
#>   name                                 variance ci_low ci_high     n
#>   <chr>                                   <dbl>  <dbl>   <dbl> <int>
#> 1 Age in years at screening                515.   497.    534.  9254
#> 2 Systolic: Blood pres (1st rdg) mm Hg     316.   296.    336.  6302

# With grouping
get_variance(d, ridageyr, group = riagendr)
#> # A tibble: 2 × 6
#>   riagendr name                      variance ci_low ci_high     n
#>      <dbl> <chr>                        <dbl>  <dbl>   <dbl> <int>
#> 1        1 Age in years at screening     505.   481.    530.  4557
#> 2        2 Age in years at screening     523.   502.    544.  4697