Skip to contents

A 27-variable extract from the 2024 General Social Survey (GSS), one of the longest-running sociological surveys in the United States (fielded annually or biennially since 1972). All 3,309 respondents from the 2024 cross-section are included.

Usage

gss_2024

Format

A data frame with 3,309 rows and 27 variables:

vpsu

Variance primary sampling unit. Use as the cluster ID for variance estimation.

vstrat

Variance stratum. Use as the stratification variable.

wtssps

Person post-stratification weight. Standard analysis weight.

wtssnrps

Person post-stratification weight adjusted for differential non-response. Preferred when non-response bias is a concern.

id

Respondent ID. Unique case identifier.

year

Survey year (all 2024 in this extract).

ballot

Ballot form (A, B, C, or D). The GSS uses a split-ballot design; not all questions appear on every ballot. Inapplicable items are coded -100.

age

Age in years (89 = 89 or older).

sex

Sex: 1 = male, 2 = female.

race

Race: 1 = white, 2 = black, 3 = other.

hispanic

Hispanic origin: 1 = not Hispanic; 250 = specific Hispanic origin.

educ

Highest year of school completed (0–20 years).

degree

Highest degree: 0 = less than HS, 1 = high school, 2 = associate, 3 = bachelor's, 4 = graduate.

income16

Total family income (26 categories from < $1,000 to $170,000+).

marital

Marital status: 1 = married, 2 = widowed, 3 = divorced, 4 = separated, 5 = never married.

wrkstat

Labor force status: 1 = full time, 2 = part time, 3 = temporarily not working, 4 = unemployed, 5 = retired, 6 = in school, 7 = keeping house, 8 = other.

hrs1

Hours worked last week (for employed respondents only).

adults

Number of adults in household (8 = 8 or more).

partyid

Party identification: 0 = strong Democrat, 3 = Independent, 6 = strong Republican, 7 = other party.

polviews

Political views: 1 = extremely liberal, 7 = extremely conservative.

happy

General happiness: 1 = very happy, 2 = pretty happy, 3 = not too happy.

health

Self-rated health: 1 = excellent, 2 = good, 3 = fair, 4 = poor.

trust

Social trust: 1 = most people can be trusted, 2 = can't be too careful, 3 = depends.

natfare

Government spending on welfare: 1 = too little, 2 = about right, 3 = too much.

abany

Abortion for any reason: 1 = yes, 2 = no.

attend

Religious service attendance: 0 = never, 8 = several times a week.

relig

Religious preference: 1 = Protestant, 2 = Catholic, 3 = Jewish, 4 = none, and others.

Source

NORC at the University of Chicago. General Social Survey 2024. https://gss.norc.org (free account required to download raw data; the processed .rda is included in the package). Prepared by data-raw/prepare-gss-2024.R.

Details

Survey design: Stratified multi-stage cluster — use Taylor series linearization:

svy <- as_survey(gss_2024,
  ids     = vpsu,
  strata  = vstrat,
  weights = wtssps       # or wtssnrps for non-response-adjusted weight
)

Missing value codes: The GSS uses a consistent system of negative integer codes for missing data across all variables:

CodeMeaning
-100Inapplicable (question not asked of this respondent)
-99No answer
-98Don't know
-97Skipped on web
-90Refused

These codes are stored as value labels on every column (check attr(gss_2024$happy, "labels")). Recode them to NA before analysis.

Split-ballot design: The ballot variable indicates which question module a respondent received. Variables asked only on some ballots will have -100 (Inapplicable) for respondents on other ballots.

Metadata: All columns carry variable labels and value labels as R attributes from the original SPSS file, automatically extracted into surveycore's metadata system when you call as_survey().

  • Variable labels ("label" attribute): A human-readable description of each column. Example: attr(gss_2024$happy, "label") returns "GENERAL HAPPINESS".

  • Value labels ("labels" attribute): A named numeric vector mapping each code to its meaning, including all missing-value codes. Example: attr(gss_2024$happy, "labels") returns entries for Very happy, Pretty happy, Not too happy, and the negative missing codes.

Examples

# Variables in the dataset
names(gss_2024)
#>  [1] "vpsu"     "vstrat"   "wtssps"   "wtssnrps" "ballot"   "year"    
#>  [7] "id"       "age"      "sex"      "race"     "hispanic" "educ"    
#> [13] "degree"   "income16" "marital"  "wrkstat"  "hrs1"     "adults"  
#> [19] "partyid"  "polviews" "happy"    "health"   "trust"    "natfare" 
#> [25] "abany"    "attend"   "relig"   

# Create survey design
# svy <- as_survey(gss_2024, ids = vpsu, strata = vstrat, weights = wtssps)

# Inspect variable label
attr(gss_2024$happy, "label")
#> [1] "general happiness"

# Inspect value labels (includes GSS missing-value codes)
attr(gss_2024$happy, "labels")
#>                           iap                     no answer 
#>                          -100                           -99 
#>                    don't know                skipped on web 
#>                           -98                           -97 
#>                  see codebook                    uncodeable 
#>                           -96                           -95 
#>                 not imputable                       refused 
#>                           -94                           -90 
#> not available in this release    not available in this year 
#>                           -80                           -70 
#>            I don't have a job                   dk, na, iap 
#>                           -60                           -40 
#>                    very happy                  pretty happy 
#>                             1                             2 
#>                 not too happy 
#>                             3 

# Split-ballot: how many respondents per ballot form?
table(gss_2024$ballot)
#> 
#>    1    2    3 
#> 1116 1067 1126