Changelog
Source:NEWS.md
surveytidy 0.2.0
New verbs
filter_out()— the complement offilter(). Marks rows matching the condition as out-of-domain while leaving all other rows in-domain. Likefilter(), no rows are removed. Chains withfilter()via AND-accumulation on the domain column.filter_out(d, group == "control")is often clearer thanfilter(d, group != "control")for exclusion use-cases.distinct()— removes duplicate rows while always retaining all columns (design variables are never dropped). With no column arguments, deduplicates on non-design columns only (survey-safe default). Always issuessurveycore_warning_physical_subset.rename_with()— function-based column renaming. Applies.fnto columns selected by.colsand propagates renames to@variables,@metadata,@groups, andvisible_vars. Validates.fnoutput and errors withsurveytidy_error_rename_fn_bad_outputfor non-character, wrong-length, or duplicate output.rowwise()— enables row-by-row computation inmutate()(e.g.,max(c_across(...))). Rowwise state is stored in@variables$rowwise— never in@groups, keeping those clean for estimation functions.group_by()andungroup()exit rowwise mode, mirroring dplyr behaviour.
New predicates
-
is_rowwise()— returnsTRUEwhen the survey object is in rowwise mode. -
is_grouped()— returnsTRUEwhen@groupsis non-empty. -
group_vars()— returns the current grouping column names from@groups.
Verb support for survey_result objects
filter(),arrange(),mutate(),slice(),slice_head(),slice_tail(),slice_min(),slice_max(),slice_sample(), anddrop_na()are now registered forsurvey_resultobjects (the S3 base class for surveycore analysis outputs:survey_means,survey_freqs,survey_totals,survey_quantiles,survey_corr,survey_ratios). Previously, applying dplyr verbs to these objects could silently strip the class and.metaattribute. Now both are preserved, andmutate()keepsmeta$groupcoherent when.keepdrops grouping columns.select(),rename(), andrename_with()are now registered forsurvey_resultobjects with active.metaupdates.select()prunes stalemeta$groupentries when grouping columns are dropped and handles inline renames (select(r, grp = group)).rename()andrename_with()propagate column renames to all.metakey references ($group,$x,$numerator$name,$denominator$name).rename_with()errors withsurveytidy_error_rename_fn_bad_outputif.fnreturns non-character, wrong-length,NA, or duplicate names.
Bug fixes
drop_na()now performs domain-aware filtering instead of physically removing rows. Previously,drop_na()removed rows withNAvalues, changing which units contributed to variance estimation and producing incorrect standard errors. It now marks incomplete rows as out-of-domain — equivalent to the correspondingfilter(!is.na(col1), ...)chain — giving correct variance estimates for downstream analyses.filter(): the.byunsupported-argument error was mis-classified as asurveycore_error_*; corrected tosurveytidy_error_filter_by_unsupported.
Improvements
rename()andrename_with()now update@groupswhen a grouped column is renamed, and correctly update twophase design variable references (@variables$phase1,@variables$phase2,@variables$subset). The domain column (..surveycore_domain..) is silently protected from renaming.filter()andfilter_out()supportif_any()andif_all()in conditions.
Documentation
Roxygen documentation standardised across all verb files to mirror the dplyr/tidyr reference style, with
@detailssubsections for surveytidy-specific behaviour and examples usingnhanes_2017.Rd files consolidated from per-method (e.g.,
arrange.survey_base.Rd) to per-verb (e.g.,arrange.Rd), fixing the “S3 methods shown with full name” R CMD check NOTE.
surveytidy 0.1.0
First release. Implements a complete set of dplyr and tidyr verbs for survey design objects created with the surveycore package.
New verbs
filter()— domain-aware filtering. Marks rows in-domain rather than removing them, preserving correct variance estimation for subpopulation analyses. Chainedfilter()calls AND their conditions together.select()— column selection. Physically removes non-selected columns while always retaining design variables (weights, strata, PSU, FPC, replicate weights). Sets@variables$visible_varssoprint()hides design columns the user did not explicitly request.relocate()— column reordering. Reordersvisible_varswhen a priorselect()has been called; reorders@datadirectly otherwise.pull()— extract a column as a plain vector (terminal operation).glimpse()— concise column summary, respectingvisible_vars.mutate()— add or modify columns. Re-attaches design variables dropped by.keep = "none"or.keep = "used". Issuessurveytidy_warning_mutate_design_varwhen a mutation’s left-hand side names a design variable. Respects@groupsset bygroup_by().rename()— rename columns. Automatically keeps@variables(design specification) and@metadata(variable labels, value labels, etc.) in sync with the new column names. Issuessurveytidy_warning_rename_design_varwhen a design variable is renamed.arrange()— row sorting. The domain column moves correctly with the rows after sorting. Supports.by_group = TRUEusing@groups.slice(),slice_head(),slice_tail(),slice_min(),slice_max(),slice_sample()— physical row selection with asurveycore_warning_physical_subsetwarning.slice_sample(weight_by = )additionally issuessurveytidy_warning_slice_sample_weight_byto flag that theweight_bycolumn is independent of the survey design weights.group_by()— store grouping columns in@groups. Does not attach agrouped_dfattribute to@data; grouping is kept on the survey object. Supports.add = TRUEfor incremental grouping and computed expressions (e.g.,group_by(d, above_median = y1 > median(y1))).ungroup()— remove all groups (no arguments) or remove specific columns from@groups(partial ungroup).drop_na()— domain-aware NA handling. Marks rows withNAin specified columns (or any column) as out-of-domain without removing them. Equivalent tofilter(!is.na(col1), !is.na(col2), ...)and gives correct variance estimates for downstream analyses. Successivedrop_na()calls AND their conditions together.subset()— physical row removal withsurveycore_warning_physical_subset. Preferfilter()for subpopulation analyses.
Statistical design
The key design decision in surveytidy is that filter() never removes rows. Removing rows from a survey design changes which units contribute to variance estimation and produces incorrect standard errors for subpopulation statistics. filter() instead writes a logical domain column (..surveycore_domain..) to @data. Phase 1 estimation functions will read this column to restrict calculations to the domain while retaining all rows for variance estimation.
Infrastructure
dplyr_reconstruct.survey_base()ensures complex dplyr pipelines (joins,across(), internal slice operations) return survey objects rather than plain tibbles. Errors withsurveycore_error_design_var_removedif a pipeline drops a design variable.Invariant 6 added to
test_invariants(): every column name listed in@variables$visible_varsmust exist in@data.