
Compute survey means with CIs, SD, and N for grouped complex designs
Source:R/get_means.R
get_means.Rdget_means() computes design-based means and confidence intervals for a single
numeric variable from complex survey objects, optionally grouped by one or more
variables. It supports survey.design, svyrep.design, srvyr::tbl_svy, and
plain data.frame inputs (with a simpler weighted/unweighted implementation).
Usage
get_means(
data,
x,
group = NULL,
wt = NULL,
decimals = 3,
na.rm = TRUE,
conf_level = 0.95,
df = Inf
)Arguments
- data
A survey object (survey.design, svyrep.design, or srvyr::tbl_svy), or a plain data.frame.
- x
The variable with which you wany to get the mean. Variable must be numeric but can be input as a string or symbol (e.g., "var" or var).
- group
<
tidy-select> A selection of columns to group the data. This operates very similarly to.byfrom dplyr (for more info on that see ?dplyr_by). It can also be a character vector. If using an external character vector it must be wrapped in curly brackets ({{}}).In addition, grouped data can be piped in via
dplyr::group_by()orsrvyr::group_by(). If data is agrouped_dfandgroupis provided,get_means()will combine the variable(s) used in eithergroup_byfunction and the variable(s) supplied ingroupto calculate means.- wt
Optional weights column name (for data.frame method only). Ignored for survey objects, since weights/replicates are stored in the design.
- decimals
Number of decimals to round the results to. Default is 3.
- na.rm
Logical; whether to remove rows with missing values in
xandgroupbefore computing frequencies. Default isTRUE.- conf_level
Determine the confidence level for Wald CIs. Default is 0.95.
- df
Degrees of freedom for t critical values. Defaults to
Inf(z-based). For designs, you may prefer survey::degf(design); for replicate designs, Inf is conventional.
Value
A tibble with one row per group (or one row if ungrouped). With the following columns:
grouping columns (preserving factor/label semantics for display)
mean: design-based mean of xsd: weighted population standard deviation within each groupn: sum of weights within each group (weighted N)conf_low,conf_high: Wald confidence interval bounds
The result has class "adlgraphs_means" and common attributes:
attr(., "dataset"): the original datasetattr(., "variable_label"),attr(., "variable_name")For grouped outputs:
attr(., "group_names")andattr(., "group_labels")For multi-variable:
attr(., "item_names"),attr(., "item_labels"),attr(., "x_expr")
Details
Survey objects are subset to remove NAs in x and grouping variables before grouping, preserving design alignment (ids/strata/weights/replicates).
Means and standard errors are computed via survey::svymean(). CIs are Wald intervals using either z (df = Inf) or t (finite df) critical values.
sd is the weighted population SD and n is the sum of weights. Both are used in the SE and CI calculations.
Group output order follows factor levels for factors, alphabetical for characters, and ascending for numerics.
Value labels for grouping variables are preserved in the output by factorizing keys against original labels; variable-level labels are copied to output columns.
Methods
get_means.survey.design(): Complex survey designs. For more information visitsurvey::svydesign()andsrvyr::as_survey_design().get_means.svyrep.design(): Replicate-weight survey designs. For more information visitsurvey::as.svrepdesign()andsrvyr::as_survey_rep()get_means.tbl_svy()``: Unwraps srvyr::tbl_svy and dispatches to the appropriate survey method. For more info, visitsrvyr::as_survey_design()andsrvyr::as_survey_rep()`.get_means.data.frame(): Non-design summary with optional weights.
Examples
# Setup example data
library(dplyr)
# Let's calculate the overall average score for trad_n
get_means(test_data, trad_n)
#> # A tibble: 1 × 5
#> mean sd n conf_low conf_high
#> * <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 4.22 3.87 250 3.74 4.70
# it also works if x is a string
get_means(test_data, "trad_n")
#> # A tibble: 1 × 5
#> mean sd n conf_low conf_high
#> * <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 4.22 3.87 250 3.74 4.70
# Let's do that again but add weights
get_means(test_data, trad_n, wt = wts)
#> # A tibble: 1 × 5
#> mean sd n conf_low conf_high
#> * <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 3.97 3.76 245. 3.50 4.45
# the wt argument can also be in quotes like this
get_means(test_data, "trad_n", wt = "wts")
#> # A tibble: 1 × 5
#> mean sd n conf_low conf_high
#> * <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 3.97 3.76 245. 3.50 4.45
# Now let's do the average score for different education levels
get_means(test_data, trad_n, edu_f, wts)
#> # A tibble: 4 × 6
#> edu_f mean sd n conf_low conf_high
#> * <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 High School or Less 4.14 3.59 69.8 3.29 5.00
#> 2 Some College 3.93 3.96 86.3 3.08 4.78
#> 3 Bachelor's Degree 4.23 3.83 49.7 3.14 5.32
#> 4 Graduate Degree 3.45 3.41 39.4 2.35 4.55
# it also works with quotes
get_means(test_data, "trad_n", "edu_f", "wts")
#> # A tibble: 4 × 6
#> edu_f mean sd n conf_low conf_high
#> * <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 High School or Less 4.14 3.59 69.8 3.29 5.00
#> 2 Some College 3.93 3.96 86.3 3.08 4.78
#> 3 Bachelor's Degree 4.23 3.83 49.7 3.14 5.32
#> 4 Graduate Degree 3.45 3.41 39.4 2.35 4.55
# you can also pipe in the `data` argument if you want to do some data
# transformations before you calculate the means. For example, say you want
# to compare the means of `trad_n` among people who agreed vs disagreed with
# the variable `top`:
test_data %>%
mutate(top_f2 = make_dicho(top)) %>%
get_means(trad_n, top_f2, wts)
#> # A tibble: 2 × 6
#> top_f2 mean sd n conf_low conf_high
#> * <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Agree 5.08 3.83 111. 4.36 5.80
#> 2 Disagree 3.05 3.43 134. 2.46 3.64