
Compute weighted frequencies, optionally grouped and/or across multiple variables
Source:R/get_freqs.R
get_freqs.Rdget_freqs() computes weighted frequency tables for survey-style data. It supports:
Plain data frames with an optional weight column
survey.designobjects (from the survey/srvyr ecosystem)Single-variable or multi-variable inputs
Optional grouping variables
Optional inclusion/exclusion of zero-count levels (see Limitations for survey multi-variable)
For single-variable inputs, the response column in the output retains the original variable
name. For multi-variable inputs, responses are pivoted to long format using names_to and
values_to.
Usage
get_freqs(
data,
x,
group,
wt,
names_to = "names",
values_to = "values",
name_label,
keep,
drop_zero = FALSE,
decimals = 3,
na.rm = TRUE
)Arguments
- data
A data frame/tibble or a
survey.designobject.- x
Columns selecting one or more variables to tabulate. You can pass:
A bare column name (e.g.,
x = q1)A tidyselect expression (e.g.,
x = dplyr::starts_with("q"))For programmatic selection, use
tidyselect::all_of(c("q1", "q2")).
- group
Optional columns to group by. Accepts a tidyselect expression. If
datais already agrouped_df, those groups are honored in addition togroup.- wt
Optional weight column (numeric). Ignored for
survey.designinputs, where weights come from the design. If omitted for data frames, unit weights are added internally.- names_to
Character scalar used only when
xselects multiple variables; names the “item” column in the long output. Default is"names".- values_to
Character scalar used only when
xselects multiple variables; names the response column in the long output. Default is"values".- name_label
Optional label to attach to the
names_tocolumn in multi-variable outputs (e.g., a question preface).- keep
Optional post-aggregation filter applied only to multi-variable outputs:
character vector: keep only rows where
values_tois in this setfunction: predicate on the
values_tovector; returns logical mask (lengthnrowor scalarTRUE)tidy expression: a dplyr-style filter expression evaluated in the result context
- drop_zero
Logical; whether to drop zero-count rows from the output.
Default path (data.frame): combined with
dplyr::count(.drop = drop_zero)to control inclusion of zero levels.Survey path (
survey.design):Single-variable: zero-count response levels can be included when
drop_zero = FALSE.Multi-variable: zero-count levels are not materialized at this time; see “Limitations”.
- decimals
Integer number of decimal places for rounding counts (
n). Percent (pct) is rounded todecimals + 2.- na.rm
Logical; whether to remove rows with missing values in
xandgroupbefore computing frequencies. If removing NAs would leave zero rows, an informative error is raised.
Value
A tibble with columns:
For single-variable inputs:
[x variable],n,pct, and any grouping columns.For multi-variable inputs:
[grouping columns if any],[names_to],[values_to],n,pct.
The result has class "adlgraphs_freqs" and common attributes:
attr(., "dataset"): the original datasetattr(., "variable_label"),attr(., "variable_name")For grouped outputs:
attr(., "group_names")andattr(., "group_labels")For multi-variable:
attr(., "item_names"),attr(., "item_labels"),attr(., "x_expr")
Details
The keep argument is applied after aggregation in multi-variable outputs to filter rows
of the result based on the response column (values_to). It is ignored for single-variable
calls. Accepted forms:
Character vector:
keep = c("yes", "no")Function:
keep = \(v) v %in% c("yes", "no")or any predicate that returns a logical vector of lengthnrow(result)or a singleTRUE(no filtering).NAentries are dropped.Tidy expression:
keep = .data[[values_to]] %in% c("yes","no")or simplyresp != "skip"whenvalues_to = "resp".
Note: For tidy expressions, the expression is evaluated in the context of the result tibble.
Methods
get_freqs.default(): Operates on data frames/tibbles. Ifwtis omitted, unit weights are used. Usesdplyr::count()with.drop = drop_zero; zero-count levels can be included whendrop_zero = FALSEand the variables are factors with unused levels.get_freqs.survey.design(): Operates onsurvey.designobjects. Weights are taken from the design. Grouping is honored inside low-level survey computations.
Limitations (survey.design with multiple variables)
For multi-variable survey.design inputs (x selects multiple variables), zero-count response
levels are not currently expanded. Results include only observed levels per item, regardless of
drop_zero. This differs from the default (non-survey) path. For single-variable
survey.design inputs, zero-count levels can be included when drop_zero = FALSE.
Errors and edge cases
If
xselects no columns, an error is raised.If
na.rm = TRUEand removing NAs would leave zero rows for the selectedx/groupvariables, an error is raised.If
wtis provided (default path) but the column does not exist or is non-numeric, an error is raised.Survey path includes checks for required variables and emptiness after NA removal.
Examples
# Basic example (data frame)
df <- tibble::tibble(
grp = rep(c("A", "B"), each = 4),
q1 = c("yes", "no", "yes", NA, "no", "no", "yes", "no"),
wts = c(1, 2, 1, 1, 1, 3, 1, 2)
)
get_freqs(df, x = q1, wt = wts, na.rm = TRUE)
#> # A tibble: 2 × 3
#> q1 n pct
#> <chr> <dbl> <dbl>
#> 1 no 8 0.727
#> 2 yes 3 0.273
# Grouped
get_freqs(df, x = q1, group = grp, wt = wts, na.rm = TRUE)
#> # A tibble: 4 × 4
#> grp q1 n pct
#> <fct> <chr> <dbl> <dbl>
#> 1 A no 2 0.5
#> 2 A yes 2 0.5
#> 3 B no 6 0.857
#> 4 B yes 1 0.143
# Multi-variable (data frame)
df$q2 <- c("red", "red", "blue", "blue", "red", "blue", "blue", NA)
res_multi <- get_freqs(
df,
x = tidyselect::all_of(c("q1", "q2")),
wt = wts,
names_to = "item",
values_to = "resp",
na.rm = TRUE
)
res_multi
#> # A tibble: 4 × 4
#> item resp n pct
#> <fct> <chr> <dbl> <dbl>
#> 1 q1 no 8 0.727
#> 2 q1 yes 3 0.273
#> 3 q2 blue 6 0.6
#> 4 q2 red 4 0.4
# ---- keep examples (multi-variable, data frame) ----
# 1) keep as a character vector: retain only "yes" responses across items
get_freqs(
df,
x = tidyselect::all_of(c("q1", "q2")),
wt = wts,
names_to = "item",
values_to = "resp",
keep = c("yes"),
na.rm = TRUE
)
#> # A tibble: 1 × 4
#> item resp n pct
#> <fct> <chr> <dbl> <dbl>
#> 1 q1 yes 3 0.273
# 2) keep as a function: retain values ending with 'e' (e.g., "blue")
get_freqs(
df,
x = tidyselect::all_of(c("q1", "q2")),
wt = wts,
names_to = "item",
values_to = "resp",
keep = function(v) grepl("e$", v),
na.rm = TRUE
)
#> # A tibble: 1 × 4
#> item resp n pct
#> <fct> <chr> <dbl> <dbl>
#> 1 q2 blue 6 0.6
# 3) keep as a tidy expression: drop a specific response level
# Here we keep everything except "no"
get_freqs(
df,
x = tidyselect::all_of(c("q1", "q2")),
wt = wts,
names_to = "item",
values_to = "resp",
keep = resp != "no",
na.rm = TRUE
)
#> # A tibble: 3 × 4
#> item resp n pct
#> <fct> <chr> <dbl> <dbl>
#> 1 q1 yes 3 0.273
#> 2 q2 blue 6 0.6
#> 3 q2 red 4 0.4
# 4) keep with grouping: filter within groups after aggregation
get_freqs(
df,
x = tidyselect::all_of(c("q1", "q2")),
group = grp,
wt = wts,
names_to = "item",
values_to = "resp",
keep = .data$resp %in% c("yes", "red"),
na.rm = TRUE
)
#> # A tibble: 4 × 5
#> grp item resp n pct
#> <fct> <fct> <chr> <dbl> <dbl>
#> 1 A q1 yes 2 0.5
#> 2 A q2 red 3 0.6
#> 3 B q1 yes 1 0.143
#> 4 B q2 red 1 0.2
# 5) keep function returning TRUE (no-op): leaves result unchanged
get_freqs(
df,
x = tidyselect::all_of(c("q1", "q2")),
wt = wts,
names_to = "item",
values_to = "resp",
keep = function(v) TRUE,
na.rm = TRUE
)
#> # A tibble: 4 × 4
#> item resp n pct
#> <fct> <chr> <dbl> <dbl>
#> 1 q1 no 8 0.727
#> 2 q1 yes 3 0.273
#> 3 q2 blue 6 0.6
#> 4 q2 red 4 0.4
# Survey design (single variable)
dff <- tibble::tibble(
grp = rep(c("A", "B"), each = 4),
q1 = c("yes", "no", "yes", NA, "no", "no", "yes", "no"),
q2 = c("red", "red", "blue", "blue", "red", "blue", "blue", NA),
wts = c(1, 2, 1, 1, 1, 3, 1, 2)
)
dsn <- survey::svydesign(ids = ~1, weights = ~wts, data = dff)
get_freqs(dsn, x = q1, na.rm = TRUE)
#> # A tibble: 2 × 3
#> q1 n pct
#> <chr> <dbl> <dbl>
#> 1 no 8 0.727
#> 2 yes 3 0.273
# Survey design (multi-variable) — limitation:
# Zero-count levels are not expanded for multi-variable survey inputs.
get_freqs(
dsn,
x = tidyselect::all_of(c("q1", "q2")),
names_to = "item",
values_to = "resp",
na.rm = TRUE
)
#> # A tibble: 4 × 4
#> item resp n pct
#> <fct> <chr> <dbl> <dbl>
#> 1 q1 no 4 0.571
#> 2 q1 yes 3 0.429
#> 3 q2 blue 4 0.571
#> 4 q2 red 3 0.429
# Note: keep is also supported for survey multi-variable outputs
get_freqs(
dsn,
x = tidyselect::all_of(c("q1", "q2")),
names_to = "item",
values_to = "resp",
keep = resp %in% c("yes", "red"),
na.rm = TRUE
)
#> # A tibble: 2 × 4
#> item resp n pct
#> <fct> <chr> <dbl> <dbl>
#> 1 q1 yes 3 0.429
#> 2 q2 red 3 0.429