semi_join() marks rows as in-domain when they have a match in y.
anti_join() marks rows as in-domain when they do NOT have a match in y.
Neither function removes rows or adds new columns — they are implemented as
domain operations, exactly like filter().
Usage
# S3 method for class 'survey_collection'
semi_join(x, y, ..., .if_missing_var = NULL)
# S3 method for class 'survey_collection'
anti_join(x, y, ..., .if_missing_var = NULL)
semi_join(x, y, by = NULL, copy = FALSE, ...)
anti_join(x, y, by = NULL, copy = FALSE, ...)Arguments
- x
A
survey_baseobject.- y
A plain data frame. Must not be a survey object.
- ...
Additional arguments forwarded to the underlying dplyr function.
- .if_missing_var
Per-call override of
collection@if_missing_var. One of"error"or"skip", orNULL(the default) to inherit the collection's stored value. Seesurveycore::set_collection_if_missing_var().- by
A character vector of column names or a
dplyr::join_by()specification.NULLuses all common column names.- copy
Forwarded to the underlying dplyr function.
Value
A survey design object of the same type as x with the domain column
(..surveycore_domain..) updated. Row count unchanged. No new columns added.
Details
Domain awareness
Unlike standard dplyr::semi_join() and dplyr::anti_join(), these
implementations never physically remove rows. Instead, unmatched (or matched,
for anti_join) rows are marked FALSE in the ..surveycore_domain..
column of @data, exactly as filter() does. This preserves variance
estimation validity.
Chaining
Multiple calls accumulate via AND: a row must satisfy every condition from
every filter(), semi_join(), and anti_join() call to remain in-domain.
Survey collections
When called on a surveycore::survey_collection, semi_join() errors
unconditionally with class
surveytidy_error_collection_verb_unsupported. The semantics for joining
a plain data frame onto a multi-survey container are still being designed.
Apply the join inside a per-survey pipeline before constructing the
collection.
When called on a surveycore::survey_collection, anti_join() errors
unconditionally with class
surveytidy_error_collection_verb_unsupported. The semantics for joining
a plain data frame onto a multi-survey container are still being designed.
Apply the join inside a per-survey pipeline before constructing the
collection.
See also
Other joins:
bind_cols(),
bind_rows(),
inner_join,
left_join,
right_join
Examples
# create a small survey object
df <- data.frame(
psu = paste0("psu_", 1:5),
strata = "s1",
fpc = 100,
wt = 1,
y1 = 1:5
)
d <- surveycore::as_survey(
df,
ids = psu,
weights = wt,
strata = strata,
fpc = fpc,
nest = TRUE
)
#> Warning: ! `strata` (strata) has only 1 unique value — stratification has no effect
keepers <- data.frame(y1 = c(1, 3, 5))
# semi_join: rows matching keepers stay in-domain
semi_join(d, keepers, by = "y1")
#>
#> ── Survey Design ───────────────────────────────────────────────────────────────
#> <survey_taylor> (Taylor series linearization)
#> Sample size: 5
#> Domain: 3 of 5 rows
#>
#> # A tibble: 5 × 6
#> psu strata fpc wt y1 ..surveycore_domain..
#> <chr> <chr> <dbl> <dbl> <int> <lgl>
#> 1 psu_1 s1 100 1 1 TRUE
#> 2 psu_2 s1 100 1 2 FALSE
#> 3 psu_3 s1 100 1 3 TRUE
#> 4 psu_4 s1 100 1 4 FALSE
#> 5 psu_5 s1 100 1 5 TRUE
# anti_join: rows matching keepers are marked out-of-domain
anti_join(d, keepers, by = "y1")
#>
#> ── Survey Design ───────────────────────────────────────────────────────────────
#> <survey_taylor> (Taylor series linearization)
#> Sample size: 5
#> Domain: 2 of 5 rows
#>
#> # A tibble: 5 × 6
#> psu strata fpc wt y1 ..surveycore_domain..
#> <chr> <chr> <dbl> <dbl> <int> <lgl>
#> 1 psu_1 s1 100 1 1 FALSE
#> 2 psu_2 s1 100 1 2 TRUE
#> 3 psu_3 s1 100 1 3 FALSE
#> 4 psu_4 s1 100 1 4 TRUE
#> 5 psu_5 s1 100 1 5 FALSE
