
Compute weighted frequencies, optionally grouped and/or across multiple variables
Source:R/get_freqs.R
get_freqs.Rd
get_freqs()
computes weighted frequency tables for survey-style data. It supports:
Plain data frames with an optional weight column
survey.design
objects (from the survey/srvyr ecosystem)Single-variable or multi-variable inputs
Optional grouping variables
Optional inclusion/exclusion of zero-count levels (see Limitations for survey multi-variable)
For single-variable inputs, the response column in the output retains the original variable
name. For multi-variable inputs, responses are pivoted to long format using names_to
and
values_to
.
Usage
get_freqs(
data,
x,
group,
wt,
names_to = "names",
values_to = "values",
name_label,
keep,
drop_zero = FALSE,
decimals = 3,
na.rm = TRUE
)
Arguments
- data
A data frame/tibble or a
survey.design
object.- x
Columns selecting one or more variables to tabulate. You can pass:
A bare column name (e.g.,
x = q1
)A tidyselect expression (e.g.,
x = dplyr::starts_with("q")
)For programmatic selection, use
tidyselect::all_of(c("q1", "q2"))
.
- group
Optional columns to group by. Accepts a tidyselect expression. If
data
is already agrouped_df
, those groups are honored in addition togroup
.- wt
Optional weight column (numeric). Ignored for
survey.design
inputs, where weights come from the design. If omitted for data frames, unit weights are added internally.- names_to
Character scalar used only when
x
selects multiple variables; names the “item” column in the long output. Default is"names"
.- values_to
Character scalar used only when
x
selects multiple variables; names the response column in the long output. Default is"values"
.- name_label
Optional label to attach to the
names_to
column in multi-variable outputs (e.g., a question preface).- keep
Optional post-aggregation filter applied only to multi-variable outputs:
character vector: keep only rows where
values_to
is in this setfunction: predicate on the
values_to
vector; returns logical mask (lengthnrow
or scalarTRUE
)tidy expression: a dplyr-style filter expression evaluated in the result context
- drop_zero
Logical; whether to drop zero-count rows from the output.
Default path (data.frame): combined with
dplyr::count(.drop = drop_zero)
to control inclusion of zero levels.Survey path (
survey.design
):Single-variable: zero-count response levels can be included when
drop_zero = FALSE
.Multi-variable: zero-count levels are not materialized at this time; see “Limitations”.
- decimals
Integer number of decimal places for rounding counts (
n
). Percent (pct
) is rounded todecimals + 2
.- na.rm
Logical; whether to remove rows with missing values in
x
andgroup
before computing frequencies. If removing NAs would leave zero rows, an informative error is raised.
Value
A tibble with columns:
For single-variable inputs:
[x variable]
,n
,pct
, and any grouping columns.For multi-variable inputs:
[grouping columns if any]
,[names_to]
,[values_to]
,n
,pct
.
The result has class "adlgraphs_freqs"
and common attributes:
attr(., "dataset")
: the original datasetattr(., "variable_label")
,attr(., "variable_name")
For grouped outputs:
attr(., "group_names")
andattr(., "group_labels")
For multi-variable:
attr(., "item_names")
,attr(., "item_labels")
,attr(., "x_expr")
Details
The keep
argument is applied after aggregation in multi-variable outputs to filter rows
of the result based on the response column (values_to
). It is ignored for single-variable
calls. Accepted forms:
Character vector:
keep = c("yes", "no")
Function:
keep = \(v) v %in% c("yes", "no")
or any predicate that returns a logical vector of lengthnrow(result)
or a singleTRUE
(no filtering).NA
entries are dropped.Tidy expression:
keep = .data[[values_to]] %in% c("yes","no")
or simplyresp != "skip"
whenvalues_to = "resp"
.
Note: For tidy expressions, the expression is evaluated in the context of the result tibble.
Methods
get_freqs.default()
: Operates on data frames/tibbles. Ifwt
is omitted, unit weights are used. Usesdplyr::count()
with.drop = drop_zero
; zero-count levels can be included whendrop_zero = FALSE
and the variables are factors with unused levels.get_freqs.survey.design()
: Operates onsurvey.design
objects. Weights are taken from the design. Grouping is honored inside low-level survey computations.
Limitations (survey.design with multiple variables)
For multi-variable survey.design
inputs (x
selects multiple variables), zero-count response
levels are not currently expanded. Results include only observed levels per item, regardless of
drop_zero
. This differs from the default (non-survey) path. For single-variable
survey.design
inputs, zero-count levels can be included when drop_zero = FALSE
.
Errors and edge cases
If
x
selects no columns, an error is raised.If
na.rm = TRUE
and removing NAs would leave zero rows for the selectedx
/group
variables, an error is raised.If
wt
is provided (default path) but the column does not exist or is non-numeric, an error is raised.Survey path includes checks for required variables and emptiness after NA removal.
Examples
# Basic example (data frame)
df <- tibble::tibble(
grp = rep(c("A", "B"), each = 4),
q1 = c("yes", "no", "yes", NA, "no", "no", "yes", "no"),
wts = c(1, 2, 1, 1, 1, 3, 1, 2)
)
get_freqs(df, x = q1, wt = wts, na.rm = TRUE)
#> # A tibble: 2 × 3
#> q1 n pct
#> <chr> <dbl> <dbl>
#> 1 no 8 0.727
#> 2 yes 3 0.273
# Grouped
get_freqs(df, x = q1, group = grp, wt = wts, na.rm = TRUE)
#> # A tibble: 4 × 4
#> grp q1 n pct
#> <fct> <chr> <dbl> <dbl>
#> 1 A no 2 0.5
#> 2 A yes 2 0.5
#> 3 B no 6 0.857
#> 4 B yes 1 0.143
# Multi-variable (data frame)
df$q2 <- c("red", "red", "blue", "blue", "red", "blue", "blue", NA)
res_multi <- get_freqs(
df,
x = tidyselect::all_of(c("q1", "q2")),
wt = wts,
names_to = "item",
values_to = "resp",
na.rm = TRUE
)
res_multi
#> # A tibble: 4 × 4
#> item resp n pct
#> <fct> <chr> <dbl> <dbl>
#> 1 q1 no 8 0.727
#> 2 q1 yes 3 0.273
#> 3 q2 blue 6 0.6
#> 4 q2 red 4 0.4
# ---- keep examples (multi-variable, data frame) ----
# 1) keep as a character vector: retain only "yes" responses across items
get_freqs(
df,
x = tidyselect::all_of(c("q1", "q2")),
wt = wts,
names_to = "item",
values_to = "resp",
keep = c("yes"),
na.rm = TRUE
)
#> # A tibble: 1 × 4
#> item resp n pct
#> <fct> <chr> <dbl> <dbl>
#> 1 q1 yes 3 0.273
# 2) keep as a function: retain values ending with 'e' (e.g., "blue")
get_freqs(
df,
x = tidyselect::all_of(c("q1", "q2")),
wt = wts,
names_to = "item",
values_to = "resp",
keep = function(v) grepl("e$", v),
na.rm = TRUE
)
#> # A tibble: 1 × 4
#> item resp n pct
#> <fct> <chr> <dbl> <dbl>
#> 1 q2 blue 6 0.6
# 3) keep as a tidy expression: drop a specific response level
# Here we keep everything except "no"
get_freqs(
df,
x = tidyselect::all_of(c("q1", "q2")),
wt = wts,
names_to = "item",
values_to = "resp",
keep = resp != "no",
na.rm = TRUE
)
#> # A tibble: 3 × 4
#> item resp n pct
#> <fct> <chr> <dbl> <dbl>
#> 1 q1 yes 3 0.273
#> 2 q2 blue 6 0.6
#> 3 q2 red 4 0.4
# 4) keep with grouping: filter within groups after aggregation
get_freqs(
df,
x = tidyselect::all_of(c("q1", "q2")),
group = grp,
wt = wts,
names_to = "item",
values_to = "resp",
keep = .data$resp %in% c("yes", "red"),
na.rm = TRUE
)
#> # A tibble: 4 × 5
#> grp item resp n pct
#> <fct> <fct> <chr> <dbl> <dbl>
#> 1 A q1 yes 2 0.5
#> 2 A q2 red 3 0.6
#> 3 B q1 yes 1 0.143
#> 4 B q2 red 1 0.2
# 5) keep function returning TRUE (no-op): leaves result unchanged
get_freqs(
df,
x = tidyselect::all_of(c("q1", "q2")),
wt = wts,
names_to = "item",
values_to = "resp",
keep = function(v) TRUE,
na.rm = TRUE
)
#> # A tibble: 4 × 4
#> item resp n pct
#> <fct> <chr> <dbl> <dbl>
#> 1 q1 no 8 0.727
#> 2 q1 yes 3 0.273
#> 3 q2 blue 6 0.6
#> 4 q2 red 4 0.4
# Survey design (single variable)
dff <- tibble::tibble(
grp = rep(c("A", "B"), each = 4),
q1 = c("yes", "no", "yes", NA, "no", "no", "yes", "no"),
q2 = c("red", "red", "blue", "blue", "red", "blue", "blue", NA),
wts = c(1, 2, 1, 1, 1, 3, 1, 2)
)
dsn <- survey::svydesign(ids = ~1, weights = ~wts, data = dff)
get_freqs(dsn, x = q1, na.rm = TRUE)
#> # A tibble: 2 × 3
#> q1 n pct
#> <chr> <dbl> <dbl>
#> 1 no 8 0.727
#> 2 yes 3 0.273
# Survey design (multi-variable) — limitation:
# Zero-count levels are not expanded for multi-variable survey inputs.
get_freqs(
dsn,
x = tidyselect::all_of(c("q1", "q2")),
names_to = "item",
values_to = "resp",
na.rm = TRUE
)
#> # A tibble: 4 × 4
#> item resp n pct
#> <fct> <chr> <dbl> <dbl>
#> 1 q1 no 4 0.571
#> 2 q1 yes 3 0.429
#> 3 q2 blue 4 0.571
#> 4 q2 red 3 0.429
# Note: keep is also supported for survey multi-variable outputs
get_freqs(
dsn,
x = tidyselect::all_of(c("q1", "q2")),
names_to = "item",
values_to = "resp",
keep = resp %in% c("yes", "red"),
na.rm = TRUE
)
#> # A tibble: 2 × 4
#> item resp n pct
#> <fct> <chr> <dbl> <dbl>
#> 1 q1 yes 3 0.429
#> 2 q2 red 3 0.429