Skip to contents

Creates a survey design object using Taylor series (linearization) for variance estimation. Supports simple random samples, stratified designs, single- and multi-stage cluster designs, and designs with finite population correction. Uses a tidy-select interface for all design variable arguments.

Usage

as_survey(
  data,
  ids = NULL,
  probs = NULL,
  weights = NULL,
  strata = NULL,
  fpc = NULL,
  nest = FALSE
)

Arguments

data

A data.frame containing the survey responses. Must have at least one row and unique column names.

ids

<tidy-select> Cluster (PSU) ID column(s). For single-stage: ids = psu. For multi-stage: ids = c(psu, ssu). Omit entirely for simple random sampling.

probs

<tidy-select> Sampling probability column (a single column, values in (0, 1]). Converted to weights = 1/probs and stored internally. Cannot be used together with weights unless the values are consistent (weights == 1/probs).

weights

<tidy-select> Sampling weight column (a single column, values strictly > 0).

strata

<tidy-select> Stratification variable column (a single column).

fpc

<tidy-select> Finite population correction column(s). For single-stage designs, supply one column. For multi-stage designs, supply one column per stage: fpc = c(fpc_stage1, fpc_stage2). Each column accepts either total population size (integer, all > 1) or sampling fraction (numeric, all in (0, 1]). Cannot contain NA. Cannot have more columns than ids stages; fewer is allowed (later stages assume infinite population).

nest

Logical. If TRUE, PSU IDs are treated as nested within strata — i.e., the same ID value in two different strata refers to two distinct PSUs. Set nest = TRUE when PSU IDs are not globally unique (e.g., NHANES, where PSU IDs restart from 1 in each stratum). Requires strata to be specified. Default FALSE.

Value

A survey_taylor object.

Tidy-select

All design variable arguments (ids, probs, weights, strata, fpc) support tidy-select syntax:

# Bare name
as_survey(df, weights = wt)
# c() for multi-stage ids
as_survey(df, ids = c(psu, ssu), weights = wt)
# tidy-select helpers also work (e.g., starts_with())

Simple random sample

When no ids or strata are specified, the result is a survey_taylor object with NULL ids and strata — i.e., a simple random sample (SRS). The Taylor variance machinery produces the same estimates as the classical SRS formula (1 - f) * s^2 / n. If weights and probs are also both omitted, uniform weights are assigned and a warning is issued.

Known limitations

as_survey() does not support probability-proportional-to-size (PPS) variance estimation. Taylor series linearization treats all designs as with-replacement, which overestimates (is conservative for) variance in PPS-without-replacement designs. The Yates-Grundy and Brewer/Overton estimators available in survey::svydesign() via its pps and variance arguments are not supported.

If your design requires PPS-specific variance estimation, create the design with survey::svydesign() and convert it with from_svydesign():

d_survey <- survey::svydesign(
  ids = ~psu, weights = ~wt, strata = ~stratum,
  pps = "brewer", data = mydata
)
d <- from_svydesign(d_survey)

Examples

# Full NHANES design: stratified cluster with PSU IDs nested within strata
d <- as_survey(
  nhanes_2017,
  ids     = sdmvpsu,
  weights = wtint2yr,
  strata  = sdmvstra,
  nest    = TRUE
)

# Stratified design without PSU cluster IDs
d_strat <- as_survey(nhanes_2017, weights = wtint2yr, strata = sdmvstra)

# Blood pressure analysis: filter to exam participants, use MEC weight
exam <- nhanes_2017[nhanes_2017$ridstatr == 2, ]
d_bp <- as_survey(exam, ids = sdmvpsu, weights = wtmec2yr,
                  strata = sdmvstra, nest = TRUE)