Creates a survey design object using replicate weights for variance estimation. Supports all common replicate methods: jackknife (JK1, JK2, JKn), balanced repeated replication (BRR, Fay), bootstrap, ACS, successive-difference, and user-defined types. Uses a tidy-select interface for weight and replicate-weight columns.
Arguments
- data
A
data.framecontaining the survey responses. Must have at least one row and unique column names.- weights
<
tidy-select> Sampling weight column (a single column, values strictly > 0). Required.- repweights
<
tidy-select> Replicate weight columns. Must select at least one column. Supports tidy-select helpers (e.g.,starts_with("repwt")). Required.- type
Character. Replicate weight method. One of
"JK1"(delete-1 jackknife),"JK2"(delete-1 jackknife, stratified),"JKn"(delete-1 jackknife with varying replication counts),"BRR"(balanced repeated replication),"Fay"(Fay's method, a modified BRR),"bootstrap","ACS"(used in American Community Survey),"successive-difference", or"other"(user-specified scale). Case-sensitive.- scale
Numeric. Scaling factor applied to the replicate variance formula. If
NULL(default), computed automatically fromtypeand the number of replicates:(R-1)/Rfor jackknife methods,1/4for BRR/Fay,1/Rfor bootstrap/ACS,2/Rfor successive-difference,1for other.- rscales
Numeric vector of replicate-specific scaling factors, or
NULL. If provided, must have the same length as the number of replicate weight columns selected byrepweights.- fpc
<
tidy-select> Finite population correction column (a single column). Used by some replicate methods to adjust the variance estimator.NULLmeans no FPC correction.- fpctype
Character. How
fpcis interpreted:"fraction"(sampling fraction, 0–1) or"correction"(multiplier for the replicate variance). Default"fraction". Case-sensitive.- mse
Logical. If
TRUE(default), use mean-squared-error estimates (subtract the full-sample estimate rather than the mean replicate estimate when computing variance). Recommended for most designs.
Tidy-select
Both weights and repweights support tidy-select syntax:
# Bare name for weights
as_survey_replicate(
df, weights = wt, repweights = starts_with("repwt"), type = "BRR"
)
# c() for explicit replicate columns
as_survey_replicate(
df, weights = wt, repweights = c(rep1, rep2, rep3), type = "JK1"
)Replicate weight matrix
The replicate weight matrix is not stored in the object. Only the
column names are stored in @variables$repweights. Variance estimation
computes the matrix on demand:
as.matrix(design@data[, design@variables$repweights]).
Memory usage
Each call to an estimation function (e.g., get_means(), get_totals())
materialises the full replicate weight matrix from the data frame. For large
designs (e.g., ACS PUMS with 500k+ rows × 80 replicates), this is roughly
nrow * n_replicates * 8 bytes per call (~363 MB for ACS Wyoming × 80).
If you are estimating many variables, this is repeated for each call.
This behaviour matches the survey package reference implementation.
References
Judkins, D.R. (1990) Fay's method for variance estimation. Journal of the American Statistical Association 85(410), 895–904.
Canty, A.J. and Davison, A.C. (1999) Resampling-based variance estimation for labour force surveys. The Statistician 48(3), 379–391.
Shao, J. and Tu, D. (1995) The Jackknife and Bootstrap. Springer.
See also
as_survey() for Taylor series designs,
as_survey_twophase() for two-phase designs,
set_var_label() to add variable labels
Other constructors:
as_survey(),
as_survey_nonprob(),
as_survey_twophase(),
survey_data(),
survey_glm(),
survey_glm_fit(),
survey_nonprob(),
survey_replicate(),
survey_taylor(),
survey_twophase()
Examples
# ACS PUMS Wyoming: 80 successive-difference replicate weights
d_acs <- as_survey_replicate(
acs_pums_wy,
weights = pwgtp,
repweights = pwgtp1:pwgtp80,
type = "successive-difference"
)
# Explicit replicate columns using c()
d_sub <- as_survey_replicate(
acs_pums_wy,
weights = pwgtp,
repweights = c(pwgtp1, pwgtp2, pwgtp3, pwgtp4),
type = "JK1"
)
