Creates a survey design object using Taylor series (linearization) for variance estimation. Supports simple random samples, stratified designs, single- and multi-stage cluster designs, and designs with finite population correction. Uses a tidy-select interface for all design variable arguments.
Usage
as_survey(
data,
ids = NULL,
probs = NULL,
weights = NULL,
strata = NULL,
fpc = NULL,
nest = FALSE
)Arguments
- data
A
data.framecontaining the survey responses. Must have at least one row and unique column names.- ids
<
tidy-select> Cluster (PSU) ID column(s). For single-stage:ids = psu. For multi-stage:ids = c(psu, ssu). Omit entirely for simple random sampling.- probs
<
tidy-select> Sampling probability column (a single column, values in (0, 1]). Converted to weights= 1/probsand stored internally. Cannot be used together withweightsunless the values are consistent (weights == 1/probs).- weights
<
tidy-select> Sampling weight column (a single column, values strictly > 0).- strata
<
tidy-select> Stratification variable column (a single column).- fpc
<
tidy-select> Finite population correction column(s). For single-stage designs, supply one column. For multi-stage designs, supply one column per stage:fpc = c(fpc_stage1, fpc_stage2). Each column accepts either total population size (integer, all > 1) or sampling fraction (numeric, all in (0, 1]). Cannot containNA. Cannot have more columns thanidsstages; fewer is allowed (later stages assume infinite population).- nest
Logical. If
TRUE, PSU IDs are treated as nested within strata — i.e., the same ID value in two different strata refers to two distinct PSUs. Setnest = TRUEwhen PSU IDs are not globally unique (e.g., NHANES, where PSU IDs restart from 1 in each stratum). Requiresstratato be specified. DefaultFALSE.
Tidy-select
All design variable arguments (ids, probs, weights, strata, fpc)
support tidy-select syntax:
Simple random sample
When no ids or strata are specified, the result is a survey_taylor
object with NULL ids and strata — i.e., a simple random sample (SRS).
The Taylor variance machinery produces the same estimates as the classical
SRS formula (1 - f) * s^2 / n. If weights and probs are also both
omitted, uniform weights are assigned and a warning is issued.
Known limitations
as_survey() does not support probability-proportional-to-size (PPS)
variance estimation. Taylor series linearization treats all designs as
with-replacement, which overestimates (is conservative for) variance in
PPS-without-replacement designs. The Yates-Grundy and Brewer/Overton
estimators available in survey::svydesign() via its pps and variance
arguments are not supported.
If your design requires PPS-specific variance estimation, create the design
with survey::svydesign() and convert it with from_svydesign():
d_survey <- survey::svydesign(
ids = ~psu, weights = ~wt, strata = ~stratum,
pps = "brewer", data = mydata
)
d <- from_svydesign(d_survey)See also
as_survey_replicate() for replicate-weight designs,
as_survey_twophase() for two-phase designs,
set_var_label() to add variable labels
Other constructors:
as_survey_nonprob(),
as_survey_replicate(),
as_survey_twophase(),
survey_data(),
survey_glm(),
survey_glm_fit(),
survey_nonprob(),
survey_replicate(),
survey_taylor(),
survey_twophase()
Examples
# Full NHANES design: stratified cluster with PSU IDs nested within strata
d <- as_survey(
nhanes_2017,
ids = sdmvpsu,
weights = wtint2yr,
strata = sdmvstra,
nest = TRUE
)
# Stratified design without PSU cluster IDs
d_strat <- as_survey(nhanes_2017, weights = wtint2yr, strata = sdmvstra)
# Blood pressure analysis: filter to exam participants, use MEC weight
exam <- nhanes_2017[nhanes_2017$ridstatr == 2, ]
d_bp <- as_survey(exam, ids = sdmvpsu, weights = wtmec2yr,
strata = sdmvstra, nest = TRUE)
