surveycore 0.8.2
CRAN preparation
- Resubmission addressing CRAN feedback on the 0.8.1 submission. Tagged numerical oracle and integration test files (comparisons against
survey,marginaleffectsintegration, polychoric/polyserial MLE, vendored saddlepoint parity, and two-phase variance parity) withskip_on_cran(). Test runtime underR CMD check --as-crandrops from ~11 minutes to under 1 minute. The skipped tests continue to run on every push in CI and locally withdevtools::test(). - Single-quoted
'surveyverse'inDescriptionto match the convention used for other proper nouns ('S7','tidyselect','haven') and silence the spell-checker NOTE.
surveycore 0.8.1
CRAN preparation
- Added Thomas Lumley to
Authors@Ras[ctb, cph]for the variance estimation code vendored from thesurveypackage (R/variance-taylor.R, R/variance-replicate.R, R/variance-twophase.R, R/variance-vendored-saddlepoint.R). Vendoring is documented inVENDORED.md. - Reworded the closing sentence of the package
Descriptionfor grammatical completeness (“Automatically preserves…” instead of “Automatic preservation of…”). - Bumped
inst/CITATIONto track the upcoming release version. - Removed the
\url{}wrapper aroundelectionstudies.orgin theanes_2024data documentation. The URL is preserved as plain prose; the ANES homepage 403’s automated requests, which previously triggered aurlchecker::url_check()failure underR CMD check --as-cran.
surveycore 0.8.0
Breaking changes
- Constructing a
survey_collectionfrom member surveys with divergent@groupsnow errorssurveycore_error_collection_group_divergent. Previously, a mixed-grouping collection would dispatch analysis functions per-survey and stitch a patchwork of grouped and ungrouped rows together withbind_rows()— violating the pseudo-data.frame mental model. All members must either share@groupsor the caller must supplygroup =explicitly. -
as_survey_collection()’s.on_missingargument has been replaced by.if_missing_var, and the previously silent no-op behaviour is fixed..if_missing_varis now stored on the returned collection’s@if_missing_varproperty and is honoured (rather than ignored) by every dispatchedget_*(). Callers using the old name will see R’s positional-argument-mismatch error. - The
.on_missingnamed-only argument on every collection-dispatchingget_*()(get_means(),get_totals(),get_freqs(),get_ratios(),get_diffs(),get_corr(),get_variance(),get_quantiles(),get_covariance(),get_t_test(),get_pairwise()) has been renamed to.if_missing_var. The default flips from"error"toNULL;NULLresolves to the collection’s stored@if_missing_varproperty, while a non-NULLvalue overrides it for that call. The.idargument similarly defaults toNULLand resolves to the collection’s stored@id. Callers passing.on_missing = ...will silently have the value flow into...(no behaviour change at the analysis layer); update to.if_missing_var = ...to restore intent.
New features
survey_collection per-call dispatch defaults
-
survey_collectiongains two new properties:-
@id(character(1), default".survey") — column name.dispatch_over_collection()uses when an analysis function is dispatched across the collection without an explicit per-call.id. Validated via the new shared helper; the existingsurveycore_error_collection_invalid_idclass fires on bad input. -
@if_missing_var(character(1), default"error", must be one ofc("error", "skip")) — controls how dispatchedget_*()calls behave when a member survey is missing a requested variable. Validated via the new helper; raises the newsurveycore_error_collection_invalid_if_missing_varerror class on bad input.
-
- New exported setters
set_collection_id(x, id)andset_collection_if_missing_var(x, if_missing_var)mutate the corresponding property and return the collection invisibly. Both validate via the same shared helpers; both raisesurveycore_error_not_survey_collectionon non-collection input. -
add_survey()andremove_survey()now propagate the source collection’s@idand@if_missing_varonto the returned collection. -
print(survey_collection)rendersid:andif_missing_var:lines on every print, regardless of whether they hold the default values. -
.dispatch_over_collection()resolves both.idand.if_missing_varvia two-tier precedence: a non-NULLvalue at the analysis-function call site beats the value stored on the collection’s property. Thesurveycore_error_collection_id_collisionhint additionally surfacesset_collection_id()as a fix path when the collision was triggered by the stored@id.
Uniform grouping on survey_collection
-
survey_collectiongains a@groupsproperty (character(0)by default). Every member survey’s@groupsis assertedidentical()to the collection’s value by the class validator — a uniform-grouping invariant that guarantees dispatchedget_*()results share a single grouping structure. -
as_survey_collection()gains agroup =argument that accepts tidy-select column names (bare,c(),all_of()). Missing or empty-resolvedgroup =(includingNULL,character(0),c(),all_of(character(0))) adopts the members’ uniform@groupsor errors on divergence; a supplied non-emptygroup =overrides any pre-existing member@groupsand emits a typedsurveycore_warning_collection_group_overriddenper divergent member. -
add_survey()andremove_survey()now preservecoll@groupsacross mutation: a grouped collection propagates its@groupsonto any empty-grouped new member and errors on divergent-grouped members (surveycore_error_collection_group_conflict); removal keeps the collection-level grouping.
Polychoric and polyserial correlation via get_corr(method = ...)
-
get_corr()gains amethod = "pearson"argument. Settingmethod = "polychoric"fits a weighted two-step MLE for the correlation between two ordinal variables under a bivariate-normal latent model (Olsson 1979; Mannan 2025);method = "polyserial"fits the analogous MLE for one ordinal + one continuous variable (Cox 1974). Auto-detection of the ordinal / continuous side is handled internally; no new user-facing argument is required. Confidence intervals are constructed on the Fisher-z scale and back-transformed to[-1, 1]. Variance is design-based: Taylor linearization via a perturbation-based influence function onsurvey_taylor, and a full per-replicate re-fit of both thresholds andrhoonsurvey_replicate. Formethod != "pearson",df = NA_integer_andstatisticis the z-scale Wald statistic referred to a standard normal distribution.meta(result)$bivariate_normal_cdfis"pbivnorm", andmeta(result)$n_failed_replicates_totalcarries the total count of non-converged replicates when the replicate path observed any. Agreement withpolycor::polychor()/polycor::polyserial()on equal-weight fixtures is within1e-4. - New package Import:
pbivnorm(>= 0.6.0), used as the bivariate-normal CDF for the polychoric / polyserial likelihood. - Fourteen new typed error / warning classes (PC-1 through PC-14) surface ordinal-type, optimizer, sparse-cell, boundary, and replicate-convergence conditions — see
plans/error-messages.mdfor the full list.
New functions
-
get_variance()computes design-based finite-population variance estimates for one or more numeric variables in a survey design, matchingsurvey::svyvar()at tolerance1e-10on point estimates and1e-8on SEs. Returns asurvey_variancetibble with point estimate, SE, CI, CV, MOE, design effect (deff), and cell sizes. Supports grouping (viagroup =andgroup_by()), per-variablena_handling = "pairwise"(default) or"listwise",name_style = "broom"renaming, and column-levellabelattributes for downstream gt integration. Dispatches oversurvey_taylor,survey_replicate,survey_twophase,survey_nonprob, andsurvey_collectiondesigns. -
get_covariance()computes design-based finite-population covariance estimates for all unordered pairs drawn from one or more numeric variables in a survey design, matching the off-diagonal entries ofsurvey::svyvar()at tolerance1e-10on point estimates and1e-8on SEs. Returns asurvey_covariancetibble with covariance, SE, CI, CV, MOE, design effect (deff), and pairwise cell sizes. Pearson-only, pairwise-complete NA handling. Supports grouping (viagroup =andgroup_by()),redundant = TRUEto include both(x, y)and(y, x)orderings,diagonal = TRUEto include(x, x)self-pairs (which equalget_variance(x)exactly at1e-10),name_style = "broom"renaming, and column-levellabelattributes for downstream gt integration. Dispatches oversurvey_taylor,survey_replicate,survey_twophase,survey_nonprob, andsurvey_collectiondesigns.
New warning classes
-
surveycore_warning_variance_all_na— fired when every row of the active domain isNAon the focal variable. -
surveycore_warning_variance_insufficient_n— fired when the focal variable has fewer than two non-NAobservations in the active domain (variance is undefined). -
surveycore_warning_covariance_all_na— fired when every row of the active domain isNAon at least one variable in the pair. -
surveycore_warning_covariance_insufficient_n— fired when a pair has fewer than two pairwise-complete observations in the active domain (covariance is undefined). -
surveycore_warning_covariance_non_numeric— fired when one or more variables passed viaxare non-numeric and silently dropped from the pair list.
surveycore 0.7.1
Documentation
- Trimmed the
Getting Startedvignette to remove dependencies on the siblingsurveytidypackage, which is not yet on CRAN. The correlation and ratio examples now clean data viadplyr::filter()on the underlying data frame before constructing the survey object. The standalone “Using surveytidy” section has been removed; those workflows are documented in thesurveytidypackage itself.
surveycore 0.7.0
Breaking changes
-
get_anova()’s first argument is nowobjectand dispatches on class. The formermodel2positional argument has been removed —get_anova(fit1, fit2)must now be writtenget_anova(list(fit1, fit2)). The S3anova(fit1, fit2)interface is unchanged.
New functions
Design-based group comparisons
-
get_t_test()performs a design-based two-sample t-test comparing group means for a numeric outcome across two levels of abyvariable. Returns asurvey_t_testtibble with estimate, per-group means and cell sizes, CI, t-statistic, df, p-value, and significance stars. Supports optional stratification viagroup(one row per stratum) and matchessurvey::svyttest()at tolerance 1e-10 for point estimates and test statistics. -
get_pairwise()computes all k(k−1)/2 pairwise t-tests across the levels of a factor, with multiple-comparison p-value adjustment via anystats::p.adjust()method ("holm"by default, or"none"). Adjustment is applied separately within eachgroupstratum when stratified. Returns asurvey_pairwisetibble with one row per pair.
Design-based ANOVA
-
get_anova()computes Rao-Scott design-based ANOVA forsurvey_glm_fitobjects, supporting both Wald and LRT tests with F or Chi-squared reference distributions. Three dispatch branches:-
get_anova(<survey_glm_fit>)— sequential term-by-term anova (matchesanova.svyglm()semantics). -
get_anova(<list<survey_glm_fit>>)— chained pairwise comparison acrossknested fits, returningk − 1rows. -
get_anova(<survey_base>, formula = ...)— fits the model internally viasurvey_glm()and runs sequential anova on the fit; extra...are forwarded tosurvey_glm(). Matchessurvey::regTermTest()at tolerance 1e-8 on statistics and 1e-6 on p-values.
-
-
anova(fit)on asurvey_glm_fitnow dispatches toget_anova()via a registered S3 method. -
plot()on asurvey_glm_fitproduces a dot-and-whisker coefficient plot with design-based Wald confidence intervals.
Select-all-that-apply (SATA) metadata
-
set_sata()marks one or more variables on a survey design (or data frame) as select-all-that-apply. Accepts either tidy-select...or avariablecharacter vector; settingsata = FALSEremoves the flag. Idempotent on already-flagged variables. -
extract_sata()returns SATA status as a named logical vector (default), a list, or a data frame.fill = FALSEyields a dense view (unmarked variables reported asFALSE);fill = NULLreturns only flagged variables. -
classify_question_type()classifies a set of requested variables into"single","sata", or"battery"by grouping them on sharedquestion_prefacemetadata and honoring per-variable SATA flags. Group numbers are assigned in order of first appearance. Warns when a lone SATA-flagged variable has no preface mate, or when a preface group has mixed SATA flags.
Survey collections
-
survey_collectionis a new S7 container holding an ordered, uniquely-named list ofsurvey_baseobjects — useful for wave-to-wave analyses, panel studies, or any workflow that compares estimates across multiple designs. -
as_survey_collection()constructs a collection from named (wave1 = d1, wave2 = d2) or bare (d1, d2) arguments; duplicate names are repaired by appending_1,_2, … with a warning showing the rename mapping. -
add_survey()andremove_survey()return new collections with surveys appended or removed; the original is unchanged. - All nine
get_*()analysis functions (get_means(),get_totals(),get_freqs(),get_quantiles(),get_ratios(),get_corr(),get_diffs(),get_t_test(),get_pairwise()) now dispatch over asurvey_collection, iterating across surveys and returning a single combined tibble. Two new named-only control args on each function:.id = ".survey"names the identifier column, and.on_missing = c("error", "skip")controls behavior when a requested variable is absent from a survey. Regression functions (survey_glm(),get_anova()) do not support collection dispatch and raise an explicit error pointing users tolapply().
Other improvements
-
survey_glm()gains aquiet =argument to suppress convergence warnings. -
extract_*()metadata functions now accept tidyselect helpers (starts_with(),all_of(),any_of(),matches()) in place of bare name lists.
Bug fixes
-
get_diffs()now correctly computespct_changewhenshow_means = FALSEis combined with grouped marginal effects andshow_pct_change = TRUE(previously returnedNA).
surveycore 0.6.2
Bug fixes
- Moved
dplyrfrom Suggests to Imports (used unguarded in metadata functions). - Fixed broken
vignette("estimation")cross-reference increating-survey-objectsvignette. - Fixed non-canonical CRAN URLs in
surveycore-vs-surveyvignette.
Documentation
- Updated README to reflect current API:
as_survey_replicate()(notas_survey_rep()), addedget_diffs(),survey_glm(), andsurvey_nonprob. - Added
@examplesto 12 exported functions and@returntosurvey_basefor CRAN compliance.
surveycore 0.6.0
Breaking changes
-
survey_srsclass andas_survey_srs()constructor have been removed. SRS designs are now created viaas_survey()with noidsorstrata— this produces asurvey_taylorwith no cluster/strata structure. All estimates are numerically identical.
New features
get_diffs()estimates treatment effects (differences from a reference group) via survey-weighted regression. Supports bivariate and multivariate models, Gaussian and non-Gaussian families, and optional subgroup analysis. Two estimation paths: direct coefficients for simple models, andmarginaleffects::avg_slopes()/avg_predictions()for models with covariates or non-Gaussian AMEs. Returns asurvey_diffstibble with optionalmean,pct_change,n_weightedcolumns, significance stars, and p-value adjustment.marginaleffectsmoved from Suggests to Imports.as_survey()now supports multi-column FPC for multi-stage designs (e.g.,fpc = c(fpc_stage1, fpc_stage2)). Each FPC column corresponds to one ID stage. Per-stage FPC is validated for NAs, non-positive values, and within-cluster constancy.print()forsurvey_taylornow displays per-stage FPC bullets for multi-stage designs (e.g.,FPC (stage 1): fpc,FPC (stage 2): fpc2).
Bug fixes
SRS variance estimation now uses Taylor (HT) linearization via
.build_cluster_matrices(), correct for any weight structure. Previously used unweighted sample variance which was incorrect for non-proportional weights.survey_glm()now correctly indexes weights whenna.action = na.omitdrops non-contiguous rows.get_freqs()now routessurvey_nonprobdesigns through the Horvitz-Thompson variance path, consistent with the other five analysis functions.as_survey_twophase()now acceptssurvey_replicateand SRSsurvey_taylorobjects as the phase-1 design (previously restricted to stratified/clusteredsurvey_tayloronly).as_survey()SRS fallback downgraded from warning to message.
surveycore 0.5.0
Breaking changes
as_survey_replicate()replacesas_survey_repweights(). The constructor name now matches the underlyingsurvey_replicateclass.survey_nonprobandas_survey_nonprob()replacesurvey_calibratedandas_survey_calibrated(). “Calibrated” implies a post-processing step on a probability sample;nonprobaccurately reflects the design type.survey_srsandas_survey_srs()have been removed. SRS designs are now created viaas_survey()with noidsorstrata— this produces asurvey_taylorwith no cluster/strata structure. All estimates are numerically identical. Print output now says “Taylor series linearization” instead of “simple random sample”.Single-row data frames are now rejected at construction time (previously a warning). This matches
survey::svydesign()behavior.The positional setter form
set_var_label(svy, age, "label")has been removed. Use the named formset_var_label(svy, age = "label")instead.extract_var_label(),extract_question_preface(), andextract_var_note()now return a named character vector.extract_var_label(svy, age)now returnsc(age = "Age in years")rather than"Age in years".extract_val_labels()now returns a named list.extract_val_labels(svy, sex)now returnslist(sex = c(Male = 1L, Female = 2L))rather thanc(Male = 1L, Female = 2L).set_variable_labels(),set_value_labels(),set_question_prefaces(), andset_variable_notes()have been removed. Useset_var_label(),set_val_labels(),set_question_preface(), andset_var_note()respectively — all four now accept multiple variables via named....
New features
set_universe()andextract_universe()set and retrieve universe (eligibility) annotations for survey variables.set_missing_codes()andextract_missing_codes()set and retrieve missing value code vectors for survey variables.extract_metadata()returns all metadata fields (variable_label,value_labels,question_preface,note,universe,missing_codes,transformations) for one or more variables as a named list.
Enhancements
All setter functions now support three call conventions: named
...(e.g.,set_var_label(svy, age = "Age in years")), a single named vector/list in..., or explicitvariable =/ content-argument pairs. All setters also now work on plaindata.frames.All extractor functions accept multiple variables via
..., support three output formats ("named_vector","list","data_frame"), and accept afillargument to include variables with no metadata in the output.
surveycore 0.4.0
New features
survey_glm()fits survey-weighted generalized linear models for all four design classes (survey_taylor,survey_replicate,survey_twophase,survey_nonprob); returns asurvey_glm_fitobject with design-based (Binder 1983 sandwich) standard errors and degrees of freedom.clean()converts asurvey_glm_fitto a tidysurvey_glm_tidytibble with one row per coefficient, design-based confidence intervals, structured metadata, and optional reference rows for factor predictors.survey_glm_fitobjects support 20 S3 methods:print(),summary(),coef(),vcov(),predict(),fitted(),residuals(),confint(),formula(),terms(),model.matrix(),model.frame(),deviance(),df.residual(),nobs(),hatvalues(),logLik(),AIC(),BIC(), andupdate().survey_glm_fitintegrates with themarginaleffectspackage; whenmarginaleffectsis installed,avg_slopes(),avg_predictions(), and the full marginaleffects API work directly onsurvey_glm_fitobjects.broom::tidy()is supported forsurvey_glm_fitobjects via a shim that delegates toclean().as_survey_rep()has been renamed toas_survey_replicate()to avoid a namespace clash with thesrvyrpackage.
Bug fixes
-
as_survey_twophase()variance estimation (method = "approx"and"full") now uses the correct PSU-level Phase 2 stratum sampling fraction instead of a row-level fraction, resolving an approximately 2× variance underestimation.
surveycore 0.3.3
New features
-
print()methods for all four survey design classes (survey_taylor,survey_replicate,survey_twophase,survey_nonprob) now display aDomain: <n> of <N> rowsline whensurveytidy::filter()has been applied. The line appears after the sample size line and before theGroups:line. For two-phase designs, domain counts reflect Phase 2 rows only.
surveycore 0.3.0
New features
-
names()now works on survey design objects, returning the column names of the underlying data frame. This enables IDE column-name autocomplete in RStudio and Positron when piping into analysis functions (e.g.,design |> get_means().
surveycore 0.2.0
New features
get_freqs()computes weighted frequency tables for categorical survey variables across all five design types, with domain estimation, value-label support, and AAPOR small-cell warnings.get_means()returns survey-weighted means with design-correct standard errors for all five design types, including grouped and domain estimation.get_totals()returns survey-weighted population totals (and population size when called withoutx) for all five design types.get_corr()computes survey-weighted Pearson correlation using the delta-method variance approach, with optionalgroupparameter for per-group correlations and Fisher Z confidence intervals.-
get_quantiles()estimates survey-weighted quantiles using the Woodruff- linearization method; supports multiple
probsin a single call and five CI interval methods.
- linearization method; supports multiple
get_ratios()estimates survey-weighted ratios (numerator total / denominator total) with design-correct SEs via the delta method (Taylor, SRS, calibrated, two-phase) or direct per-replicate computation (replicate designs).All six analysis functions gain a
decimalsargument to round numeric output columns to a fixed number of decimal places.na.rm = FALSEnow includes rows where a grouping variable isNAas a separate group row in all six analysis functions’ output.infer_question_prefaces()auto-detects shared battery prefaces from variable labels using separator-based and longest-common-prefix detection.survey_weighting_history()returns the weighting history stored in a survey design object’s metadata;as_survey(),as_survey_replicate(), andas_survey_nonprob()now promote"weighting_history"attributes from the input data frame automatically.Two-phase variance estimation (
as_survey_twophase()) is now fully supported inget_means()andget_totals(), using the"full","approx", and"simple"methods vendored from thesurveypackage.
Bug fixes
get_freqs()no longer crashes when thegroupvariable containsNAvalues.get_freqs()now outputspctas a proportion (0–1) rather than a percentage (0–100);seandse_srsare on the same scale.
surveycore 0.1.0
New features
as_survey()createssurvey_taylorobjects with a tidy-select interface (ids,weights,strata,fpc,probs); supports Taylor linearization for stratified, clustered, and SRS designs.as_survey_replicate()createssurvey_replicateobjects; supports BRR, Fay BRR, JK1, JK2, JKn, bootstrap, ACS, and successive-difference replicate schemes.as_survey_twophase()createssurvey_twophaseobjects; supports “full”, “approx”, and “simple” two-phase variance estimation methods.update_design()modifies design variables on an existing survey object without reconstructing from scratch; respectsvalidate = TRUE/FALSE.get_means()returns a weighted mean and standard error via Taylor linearization or replicate weights; respectsgetOption("survey.lonely.psu")for single-PSU strata.get_totals()returns a weighted total and standard error using the same dispatch asget_means().Metadata setters:
set_var_label(),set_variable_labels(),set_val_labels(),set_value_labels(),set_question_preface(),set_question_prefaces(),set_var_note(),set_variable_notes(). Single-variable setters automatically import haven"label"/"labels"attributes from the data frame column.Metadata extractors:
extract_var_label(),extract_val_labels(),extract_question_preface(),extract_var_note().Conversion utilities:
as_svydesign(),from_svydesign(),as_tbl_svy(),from_tbl_svy()— round-trip conversion between surveycore objects,survey::svydesign/survey::svrepdesign, andsrvyr::tbl_svy.print()andsummary()S7 methods for all survey design classes display design type, sample size, and a tibble-style data preview.
Internal infrastructure
S7 class hierarchy: abstract
survey_base→survey_taylor,survey_replicate,survey_twophase;survey_metadatafor label storage.Three-layer validation: S7 structural validators, Layer 2 input validators, Layer 3 constructor validators; all errors use typed
class=for programmatic handling.Variance estimation vendored from the
surveypackage (Thomas Lumley, GPL-2/GPL-3) — seeVENDORED.mdfor full attribution.
