Creates a two-phase (double) sampling design from an existing
survey_taylor Phase 1 object. Phase 1 covers all rows; Phase 2 is a
strict subset indicated by a logical column. Uses a tidy-select interface
for all Phase 2 design variable arguments.
Usage
as_survey_twophase(
phase1,
ids2 = NULL,
strata2 = NULL,
probs2 = NULL,
fpc2 = NULL,
subset,
method = c("full", "approx", "simple")
)Arguments
- phase1
A
survey_taylorobject representing the Phase 1 design. Its@datamust contain ALL rows from both phases, plus a logical indicator column for Phase 2 membership. Create withas_survey().- ids2
<
tidy-select> Phase 2 cluster ID column(s). For single-stage Phase 2:ids2 = psu2. For multi-stage:ids2 = c(psu2, ssu2). Omit if Phase 2 has no within-stratum clustering.- strata2
<
tidy-select> Phase 2 stratification column (a single column). Optional.- probs2
<
tidy-select> Phase 2 inclusion probability column (a single column, values in (0, 1]). Optional.- fpc2
<
tidy-select> Phase 2 finite population correction column (a single column). Optional.- subset
<
tidy-select> Single logical column inphase1@data.TRUE= row selected into Phase 2;FALSE= Phase 1 only. Required. Must contain bothTRUEandFALSEvalues (non-degenerate).- method
Character. Variance estimation method for combining Phase 1 and Phase 2 variability. One of
"full"(default),"approx", or"simple". See Details.
Details
Variance methods
"full"— Full two-phase variance formula. Accounts for variability in both phases. Requires Phase 2 design information (probs2,ids2,strata2) when Phase 2 is not a simple random subsample. If none of these are provided, a warning is issued and Phase 2 selection is treated as SRS within Phase 1 strata."approx"— Approximation that ignores Phase 1 sampling variability. Faster but less accurate than"full"when the Phase 1 sampling fraction is non-negligible."simple"— Treats Phase 2 as a single-phase design, ignoring Phase 1. Only valid when Phase 1 is a census (no sampling). Issues a warning when Phase 1 has PSU cluster variables, because this understates variance for clustered designs.
See also
as_survey() for Taylor series designs,
as_survey_rep() for replicate-weight designs
Other constructors:
as_survey(),
as_survey_calibrated(),
as_survey_rep(),
as_survey_srs(),
survey_calibrated(),
survey_data(),
survey_replicate(),
survey_srs(),
survey_taylor(),
survey_twophase()
Examples
# Minimal two-phase design: Phase 1 = full cohort, Phase 2 = random subset
df <- data.frame(
id = 1:20,
wt = rep(2, 20),
in_phase2 = c(rep(TRUE, 10), rep(FALSE, 10)),
y = rnorm(20)
)
phase1 <- as_survey(df, ids = id, weights = wt)
d2 <- as_survey_twophase(phase1, subset = in_phase2)
# With Phase 2 stratification and inclusion probabilities
df2 <- data.frame(
id = 1:30,
wt = rep(3, 30),
in_phase2 = c(rep(TRUE, 15), rep(FALSE, 15)),
arm = rep(c("A", "B", "C"), 10),
subsamprate = rep(c(0.5, 0.7, 0.3), 10),
y = rnorm(30)
)
phase1b <- as_survey(df2, ids = id, weights = wt)
d2b <- as_survey_twophase(
phase1b,
strata2 = arm,
probs2 = subsamprate,
subset = in_phase2,
method = "full"
)