A 27-variable extract from the 2024 General Social Survey (GSS), one of the longest-running sociological surveys in the United States (fielded annually or biennially since 1972). All 3,309 respondents from the 2024 cross-section are included.
Format
A data frame with 3,309 rows and 27 variables:
- vpsu
Variance primary sampling unit. Use as the cluster ID for variance estimation.
- vstrat
Variance stratum. Use as the stratification variable.
- wtssps
Person post-stratification weight. Standard analysis weight.
- wtssnrps
Person post-stratification weight adjusted for differential non-response. Preferred when non-response bias is a concern.
- id
Respondent ID. Unique case identifier.
- year
Survey year (all
2024in this extract).- ballot
Ballot form (
A,B,C, orD). The GSS uses a split-ballot design; not all questions appear on every ballot. Inapplicable items are coded-100.- age
Age in years (
89= 89 or older).- sex
Sex:
1= male,2= female.- race
Race:
1= white,2= black,3= other.- hispanic
Hispanic origin:
1= not Hispanic;2–50= specific Hispanic origin.- educ
Highest year of school completed (0–20 years).
- degree
Highest degree:
0= less than HS,1= high school,2= associate,3= bachelor's,4= graduate.- income16
Total family income (26 categories from < $1,000 to $170,000+).
- marital
Marital status:
1= married,2= widowed,3= divorced,4= separated,5= never married.- wrkstat
Labor force status:
1= full time,2= part time,3= temporarily not working,4= unemployed,5= retired,6= in school,7= keeping house,8= other.- hrs1
Hours worked last week (for employed respondents only).
- adults
Number of adults in household (
8= 8 or more).- partyid
Party identification:
0= strong Democrat,3= Independent,6= strong Republican,7= other party.- polviews
Political views:
1= extremely liberal,7= extremely conservative.- happy
General happiness:
1= very happy,2= pretty happy,3= not too happy.- health
Self-rated health:
1= excellent,2= good,3= fair,4= poor.- trust
Social trust:
1= most people can be trusted,2= can't be too careful,3= depends.- natfare
Government spending on welfare:
1= too little,2= about right,3= too much.- abany
Abortion for any reason:
1= yes,2= no.- attend
Religious service attendance:
0= never,8= several times a week.- relig
Religious preference:
1= Protestant,2= Catholic,3= Jewish,4= none, and others.
Source
NORC at the University of Chicago. General Social Survey 2024.
https://gss.norc.org (free account required to download raw data;
the processed .rda is included in the package).
Prepared by data-raw/prepare-gss-2024.R.
Details
Survey design: Stratified multi-stage cluster — use Taylor series linearization:
svy <- as_survey(gss_2024,
ids = vpsu,
strata = vstrat,
weights = wtssps # or wtssnrps for non-response-adjusted weight
)Missing value codes: The GSS uses a consistent system of negative integer codes for missing data across all variables:
| Code | Meaning |
-100 | Inapplicable (question not asked of this respondent) |
-99 | No answer |
-98 | Don't know |
-97 | Skipped on web |
-90 | Refused |
These codes are stored as value labels on every column (check
attr(gss_2024$happy, "labels")). Recode them to NA before analysis.
Split-ballot design: The ballot variable indicates which question
module a respondent received. Variables asked only on some ballots will
have -100 (Inapplicable) for respondents on other ballots.
Metadata:
All columns carry variable labels and value labels as R attributes from the
original SPSS file, automatically extracted into surveycore's metadata
system when you call as_survey().
Variable labels (
"label"attribute): A human-readable description of each column. Example:attr(gss_2024$happy, "label")returns"GENERAL HAPPINESS".Value labels (
"labels"attribute): A named numeric vector mapping each code to its meaning, including all missing-value codes. Example:attr(gss_2024$happy, "labels")returns entries forVery happy,Pretty happy,Not too happy, and the negative missing codes.
Examples
# Variables in the dataset
names(gss_2024)
#> [1] "vpsu" "vstrat" "wtssps" "wtssnrps" "ballot" "year"
#> [7] "id" "age" "sex" "race" "hispanic" "educ"
#> [13] "degree" "income16" "marital" "wrkstat" "hrs1" "adults"
#> [19] "partyid" "polviews" "happy" "health" "trust" "natfare"
#> [25] "abany" "attend" "relig"
# Create survey design
# svy <- as_survey(gss_2024, ids = vpsu, strata = vstrat, weights = wtssps)
# Inspect variable label
attr(gss_2024$happy, "label")
#> [1] "general happiness"
# Inspect value labels (includes GSS missing-value codes)
attr(gss_2024$happy, "labels")
#> iap no answer
#> -100 -99
#> don't know skipped on web
#> -98 -97
#> see codebook uncodeable
#> -96 -95
#> not imputable refused
#> -94 -90
#> not available in this release not available in this year
#> -80 -70
#> I don't have a job dk, na, iap
#> -60 -40
#> very happy pretty happy
#> 1 2
#> not too happy
#> 3
# Split-ballot: how many respondents per ballot form?
table(gss_2024$ballot)
#>
#> 1 2 3
#> 1116 1067 1126