Iterative proportional fitting (raking) that adjusts survey weights to
match multiple marginal population totals simultaneously. Supports two
algorithms: the "anesrake" method (chi-square variable selection,
improvement-based convergence) and the "survey" method (fixed-order
IPF, epsilon-based convergence).
Arguments
- data
A
data.frame,weighted_df,survey_taylor, orsurvey_nonprob.survey_replicate→ error. Any other class → error.- margins
Named list or data frame specifying population margin targets.
Format A — named list:
list( age_group = c("18-34" = 0.28, "35-54" = 0.37, "55+" = 0.35), sex = c("M" = 0.49, "F" = 0.51) )Each element can be a named numeric vector or a data frame with columns
levelandtarget(formats can be mixed within the list).Format B — long data frame with columns
variable,level,target:data.frame( variable = c("age_group", "age_group", "sex", "sex"), level = c("18-34", "35-54", "M", "F"), target = c(0.40, 0.60, 0.49, 0.51) )Format B is auto-detected and converted to Format A before use. The converted Format A is stored in the weighting history.
- weights
<
tidy-select> Weight column name (bare name).NULL→ auto-detected fromweighted_dfattribute or survey object@variables$weights. For plaindata.framewithweights = NULL, uniform starting weights are used and the output column is named bywt_name(default"wts").- wt_name
Character scalar. Name of the output weight column in the returned
weighted_df. Default"wts". Ignored whendatais a survey object (survey_taylororsurvey_nonprob).- type
Character scalar.
"prop"(default):marginsvalues are proportions."count":marginsvalues are counts.- method
Character scalar.
"anesrake"(default): chi-square discrepancy variable selection with improvement-based convergence, as in theanesrakepackage."survey": fixed-order IPF cycling through all margins, with epsilon-based convergence, as insurvey::rake().- cap
Numeric or
NULL. Cap on the weight ratiow / mean(w). Any weight exceedingcap × mean(w)is set tocap × mean(w). Applied after each per-margin adjustment step (not post-hoc).NULL(default) means no cap. Applies to both methods.- control
Named list of algorithm parameters. Merged with method-specific defaults — omitted keys retain their defaults.
method = "anesrake"defaults:maxit = 1000: maximum full sweepsimprovement = 0.01: percentage improvement convergence thresholdpval = 0.05: chi-square p-value threshold for variable selectionmin_cell_n = 0L: minimum unweighted observations per cell (0 = no min)variable_select = "total": chi-square aggregation for ranking ("total","max", or"average")
method = "survey"defaults:maxit = 100: maximum full sweepsepsilon = 1e-7: maximum relative margin error convergence threshold
Passing anesrake-specific keys when
method = "survey"(or vice versa) triggers asurveywts_warning_control_param_ignoredwarning per ignored parameter.
Value
data.frameorweighted_dfinput →weighted_dfsurvey_taylororsurvey_nonprobinput → same class as input (survey_taylororsurvey_nonprob; class is preserved)
The weight column in the output contains raked weights. A history entry
with operation = "raking" is appended to weighting_history.
Details
method = "anesrake": At each sweep, variables are sorted by their
chi-square discrepancy (controlled by control$variable_select). Variables
with any cell below control$min_cell_n unweighted observations are
excluded entirely. Variables where the chi-square p-value exceeds
control$pval are skipped in that sweep. Convergence is assessed as the
percentage improvement in total chi-square between consecutive sweeps.
If all variables pass or are excluded in sweep 1, a
surveywts_message_already_calibrated message is emitted.
method = "survey": Variables are raked in the fixed order given by
margins. All variables participate in every sweep. Convergence is
assessed as the maximum relative error across all margin cells falling
below control$epsilon.
See also
Other calibration:
calibrate(),
poststratify()
