surveytidy 0.4.0
New features
Survey-aware transformation functions
Five vector-level transformation functions are now available for converting, collapsing, and reversing variables inside mutate(). All five propagate value labels automatically and accept .label and .description arguments to attach metadata in a single step.
make_factor()— converts labelled, numeric, character, or factor vectors to an Rfactor. Levels are ordered by the numeric value of each value label. Acceptsordered,drop_levels,force, andna.rmto control level creation.make_dicho()— collapses a multi-level factor to two levels by stripping the first word of each label and merging labels that reduce to the same stem. Accepts.excludeto keep specific levels asNA, andflip_levelsto reverse the resulting order.make_binary()— converts a dichotomous variable to a 0/1 integer. Thin wrapper aroundmake_dicho(); acceptsflip_valuesto control which level maps to 1.make_rev()— reverses a numeric scale usingmin + max - xand remaps value labels to match. Issues a warning when all values areNA.make_flip()— reverses the semantic valence of a variable by reversing the label strings while keeping the underlying values unchanged. Requires alabelargument to document the new meaning.
surveytidy 0.3.0
New features
Survey-aware recoding functions
Six vector-level recoding functions are now available. Each shadows its dplyr equivalent and adds optional arguments for attaching variable labels, value labels, and transformation notes directly inside mutate(). Without any of those arguments, output is identical to dplyr.
case_when()— a survey-awaredplyr::case_when(). Evaluates a sequence ofcondition ~ valueformulas and uses the first match for each element. Use this to create an entirely new vector from conditions. Accepts.labelto set a variable label,.value_labelsto attach a named vector of value labels,.factor = TRUEto return an ordered factor (levels follow formula order), and.descriptionto record a plain-language note about the transformation.if_else()— a survey-awaredplyr::if_else(). Applies a single binary condition element-wise (true/false/missing). Stricter than baseifelse():true,false, andmissingare cast to a common type. Accepts.label,.value_labels, and.description.na_if()— a survey-awaredplyr::na_if(). Converts specific values toNA. Unlike dplyr’s scalar-onlyy, this version accepts a vectoryand replaces all matching values in a single call. When the input carries value labels, they are inherited automatically;.update_labels = TRUE(the default) removes label entries for the NA’d values, while.update_labels = FALSEretains them (useful for documenting what was set to missing). Also accepts.description.recode_values()— a survey-awaredplyr::recode_values(). Replaces values found infromwith the corresponding value fromto; values not infromare kept unchanged or trigger an error (.unmatched = "error"). Intended for full remapping of every value in a vector. Set.use_labels = TRUEto build thefrom/tomap automatically from the input’s existing value labels (codes becomefrom; label strings becometo). Also accepts.label,.value_labels,.factor, and.description.replace_values()— a survey-awaredplyr::replace_values(). Replaces values found infromwith the corresponding value fromto; all other values are left unchanged. Use this for partial in-place replacement of specific values in an existing vector. Automatically inherits both the variable label and value labels from the input; supply.labelor.value_labelsto override. Also accepts.description.replace_when()— a survey-awaredplyr::replace_when(). Likecase_when()but for partial in-place updates: evaluatescondition ~ valueformulas and replaces only matching elements, leaving all others at their original value. Automatically inherits labels from the input; supply.labelor.value_labelsto override. Also accepts.description.
Shared label arguments
All six functions support a common set of label arguments that propagate into @metadata when used inside mutate():
-
.label— a character string stored in@metadata@variable_labelsas the human-readable variable label for the new column. -
.value_labels— a named vector stored in@metadata@value_labels, where names are label strings and values are the corresponding data values. -
.description— a plain-language string stored in@metadata@transformationsdescribing how the variable was derived.
case_when() and recode_values() also accept .factor = TRUE, which returns an ordered factor instead of a character vector (levels follow formula or to order respectively). .factor and .label cannot be combined.
mutate() enhancements
mutate() now coordinates label propagation automatically: it pre-attaches label attributes from @metadata before the inner dplyr call so recode functions can see existing labels, reads the label output back from recoded columns, and writes it into @metadata — all without extra user steps. The weight-column warning has also been split into two distinct classes: surveytidy_warning_mutate_weight_col for the weight column and surveytidy_warning_mutate_structural_var for strata, PSU, FPC, and replicate weights.
surveytidy 0.2.1
Website & branding
- Added package hex logo.
- Updated pkgdown site colours to a teal theme.
-
READMEnow displays the hex logo. -
LICENSE.mdupdated to credit third-party hex sticker icon (Freepik / Flaticon, CC BY 3.0). -
DESCRIPTIONauthor entry updated with current email, ORCID, and copyright-holder (cph) role.
surveytidy 0.2.0
New verbs
filter_out()— the complement offilter(). Marks rows matching the condition as out-of-domain while leaving all other rows in-domain. Likefilter(), no rows are removed. Chains withfilter()via AND-accumulation on the domain column.filter_out(d, group == "control")is often clearer thanfilter(d, group != "control")for exclusion use-cases.distinct()— removes duplicate rows while always retaining all columns (design variables are never dropped). With no column arguments, deduplicates on non-design columns only (survey-safe default). Always issuessurveycore_warning_physical_subset.rename_with()— function-based column renaming. Applies.fnto columns selected by.colsand propagates renames to@variables,@metadata,@groups, andvisible_vars. Validates.fnoutput and errors withsurveytidy_error_rename_fn_bad_outputfor non-character, wrong-length, or duplicate output.rowwise()— enables row-by-row computation inmutate()(e.g.,max(c_across(...))). Rowwise state is stored in@variables$rowwise— never in@groups, keeping those clean for estimation functions.group_by()andungroup()exit rowwise mode, mirroring dplyr behaviour.
New predicates
-
is_rowwise()— returnsTRUEwhen the survey object is in rowwise mode. -
is_grouped()— returnsTRUEwhen@groupsis non-empty. -
group_vars()— returns the current grouping column names from@groups.
Verb support for survey_result objects
filter(),arrange(),mutate(),slice(),slice_head(),slice_tail(),slice_min(),slice_max(),slice_sample(), anddrop_na()are now registered forsurvey_resultobjects (the S3 base class for surveycore analysis outputs:survey_means,survey_freqs,survey_totals,survey_quantiles,survey_corr,survey_ratios). Previously, applying dplyr verbs to these objects could silently strip the class and.metaattribute. Now both are preserved, andmutate()keepsmeta$groupcoherent when.keepdrops grouping columns.select(),rename(), andrename_with()are now registered forsurvey_resultobjects with active.metaupdates.select()prunes stalemeta$groupentries when grouping columns are dropped and handles inline renames (select(r, grp = group)).rename()andrename_with()propagate column renames to all.metakey references ($group,$x,$numerator$name,$denominator$name).rename_with()errors withsurveytidy_error_rename_fn_bad_outputif.fnreturns non-character, wrong-length,NA, or duplicate names.
Bug fixes
drop_na()now performs domain-aware filtering instead of physically removing rows. Previously,drop_na()removed rows withNAvalues, changing which units contributed to variance estimation and producing incorrect standard errors. It now marks incomplete rows as out-of-domain — equivalent to the correspondingfilter(!is.na(col1), ...)chain — giving correct variance estimates for downstream analyses.filter(): the.byunsupported-argument error was mis-classified as asurveycore_error_*; corrected tosurveytidy_error_filter_by_unsupported.
Improvements
rename()andrename_with()now update@groupswhen a grouped column is renamed, and correctly update twophase design variable references (@variables$phase1,@variables$phase2,@variables$subset). The domain column (..surveycore_domain..) is silently protected from renaming.filter()andfilter_out()supportif_any()andif_all()in conditions.
Documentation
Roxygen documentation standardised across all verb files to mirror the dplyr/tidyr reference style, with
@detailssubsections for surveytidy-specific behaviour and examples usingnhanes_2017.Rd files consolidated from per-method (e.g.,
arrange.survey_base.Rd) to per-verb (e.g.,arrange.Rd), fixing the “S3 methods shown with full name” R CMD check NOTE.
surveytidy 0.1.0
First release. Implements a complete set of dplyr and tidyr verbs for survey design objects created with the surveycore package.
New verbs
filter()— domain-aware filtering. Marks rows in-domain rather than removing them, preserving correct variance estimation for subpopulation analyses. Chainedfilter()calls AND their conditions together.select()— column selection. Physically removes non-selected columns while always retaining design variables (weights, strata, PSU, FPC, replicate weights). Sets@variables$visible_varssoprint()hides design columns the user did not explicitly request.relocate()— column reordering. Reordersvisible_varswhen a priorselect()has been called; reorders@datadirectly otherwise.pull()— extract a column as a plain vector (terminal operation).glimpse()— concise column summary, respectingvisible_vars.mutate()— add or modify columns. Re-attaches design variables dropped by.keep = "none"or.keep = "used". Issuessurveytidy_warning_mutate_design_varwhen a mutation’s left-hand side names a design variable. Respects@groupsset bygroup_by().rename()— rename columns. Automatically keeps@variables(design specification) and@metadata(variable labels, value labels, etc.) in sync with the new column names. Issuessurveytidy_warning_rename_design_varwhen a design variable is renamed.arrange()— row sorting. The domain column moves correctly with the rows after sorting. Supports.by_group = TRUEusing@groups.slice(),slice_head(),slice_tail(),slice_min(),slice_max(),slice_sample()— physical row selection with asurveycore_warning_physical_subsetwarning.slice_sample(weight_by = )additionally issuessurveytidy_warning_slice_sample_weight_byto flag that theweight_bycolumn is independent of the survey design weights.group_by()— store grouping columns in@groups. Does not attach agrouped_dfattribute to@data; grouping is kept on the survey object. Supports.add = TRUEfor incremental grouping and computed expressions (e.g.,group_by(d, above_median = y1 > median(y1))).ungroup()— remove all groups (no arguments) or remove specific columns from@groups(partial ungroup).drop_na()— domain-aware NA handling. Marks rows withNAin specified columns (or any column) as out-of-domain without removing them. Equivalent tofilter(!is.na(col1), !is.na(col2), ...)and gives correct variance estimates for downstream analyses. Successivedrop_na()calls AND their conditions together.subset()— physical row removal withsurveycore_warning_physical_subset. Preferfilter()for subpopulation analyses.
Statistical design
The key design decision in surveytidy is that filter() never removes rows. Removing rows from a survey design changes which units contribute to variance estimation and produces incorrect standard errors for subpopulation statistics. filter() instead writes a logical domain column (..surveycore_domain..) to @data. Phase 1 estimation functions will read this column to restrict calculations to the domain while retaining all rows for variance estimation.
Infrastructure
dplyr_reconstruct.survey_base()ensures complex dplyr pipelines (joins,across(), internal slice operations) return survey objects rather than plain tibbles. Errors withsurveycore_error_design_var_removedif a pipeline drops a design variable.Invariant 6 added to
test_invariants(): every column name listed in@variables$visible_varsmust exist in@data.
