Fits a GLM to survey data, producing design-based coefficient estimates and variance-covariance matrix via the Binder (1983) sandwich estimator. All five surveycore design classes are supported.
Arguments
- design
A survey design object created by
as_survey(),as_survey_replicate(),as_survey_twophase(), oras_survey_nonprob().- formula
A model formula in standard R notation (e.g.
y ~ x1 + x2). Mutually exclusive withresponse/predictors. IfNULLandresponseis alsoNULL, errors withsurveycore_error_formula_missing.- response
Character string naming the outcome variable. Programmatic alternative to
formula. Mutually exclusive withformula. Use withpredictorsto build a model formula viareformulate(predictors, response). Suitable forlapply()iteration.- predictors
Character vector of predictor variable names. Used with
responseto build the model formula. Ifresponseis supplied andpredictorsisNULL, an intercept-only model is fitted.- family
A GLM family object specifying the error distribution and link function. Default
gaussian(). Any family accepted bystats::glm()is supported. Forbinomial()andquasibinomial()families, the "non-integer #successes" warning is suppressed because survey weights are non-integer by design.- na.action
How to handle
NAvalues in the model frame. Defaultna.omit(silently drops rows with anyNAin model variables).na.failerrors withsurveycore_error_na_in_datalisting the offending columns and NA counts. Note:na.actionapplies only to model frame variables; survey weights are validated separately.- start
Starting values for the coefficient vector.
- etastart
Starting values for the linear predictor.
- mustart
Starting values for the mean.
- control
A list of GLM control parameters passed to
stats::glm.control().
Value
A survey_glm_fit S7 object.
Details
Variance estimation: Uses the Binder (1983) sandwich estimator, which
decomposes into per-observation score vectors passed to the Phase 0
variance machinery. The bread (X'W̃X)⁻¹ accounts for IRLS working
weights and is correct for all GLM families including binomial and
Poisson.
binomial() family: Wraps the stats::glm() call in
suppressWarnings() to suppress the "non-integer #successes" warning
that fires for every survey-weighted binomial model.
Domain estimation: Use surveytidy::filter() before calling
survey_glm(). The GLM is fit on in-domain rows only; variance
estimation uses the full design for correct design-based SEs.
Multinomial response: cbind() on the LHS of formula is not
supported. Multinomial logistic regression is deferred to a later phase.
Examples
d <- as_survey(gss_2024, ids = vpsu, weights = wtssps, strata = vstrat,
nest = TRUE)
# Linear model: respondent age predicted by education and sex
fit <- survey_glm(d, age ~ educ + sex)
fit@coefficients
#> (Intercept) educ sex
#> 41.7912598 0.4033772 0.3367356
fit@vcov
#> (Intercept) educ sex
#> (Intercept) 7.5695740 -0.363231367 -1.401719626
#> educ -0.3632314 0.025187314 0.002071918
#> sex -1.4017196 0.002071918 0.895403410
# Programmatic interface — suitable for lapply()
results <- lapply(c("age", "educ"), function(v) {
survey_glm(d, response = v, predictors = "sex")
})
