A simple random sample from the 2000 California Academic Performance
Index (API) study. 200 schools were randomly sampled. This is the same
underlying data as apisrs in the survey package, reformatted to
surveycore conventions.
Format
A data frame with 200 rows and 38 variables:
- pw
Sampling weight (inverse probability of selection).
- fpc
FPC (number of schools in the California API system).
- cds
County/district/school code (character, 14-digit).
- snum
School number (integer).
- dnum
District number (integer).
- name
Short school name (character).
- sname
Full school name (character).
- dname
District name (character).
- cname
County name (character).
- cnum
County number (integer).
- api00
API score 2000 (integer).
- api99
API score 1999 (integer).
- target
API growth target (integer).
- growth
API score change,
api00 - api99(integer).- pcttest
Percent of students tested (integer).
- sch_wide
Met school-wide growth target (integer, 0 = No, 1 = Yes).
- comp_imp
Met comparable improvement target (integer, 0 = No, 1 = Yes).
- both
Met both targets (integer, 0 = No, 1 = Yes).
- awards
Eligible for awards program (integer, 0 = No, 1 = Yes).
- stype
School type (integer): 1 = Elementary, 2 = High, 3 = Middle.
- yr_rnd
Year-round school (integer, 0 = No, 1 = Yes).
- meals
Percent of students receiving free meals (integer).
- ell
Number of English language learners (integer).
- mobility
Percent of students in first year at school (integer).
- enroll
Total number of students (integer).
- api_stu
Number of students included in API 2000 (integer).
- acs_k3
Average class size, grades K–3 (integer;
NAfor high and middle schools).- acs_46
Average class size, grades 4–6 (integer;
NAfor high schools and some others).- acs_core
Average class size, core academic courses (integer;
NAfor most elementary schools).- not_hsg
Percent of parents who did not complete high school (integer).
- hsg
Percent of parents who are high school graduates (integer).
- some_col
Percent of parents with some college (integer).
- col_grad
Percent of parents who are college graduates (integer).
- grad_sch
Percent of parents with graduate school education (integer).
- avg_ed
Average parent education level (numeric).
- pct_resp
Percent of parents who responded to the survey (integer).
- full
Percent of teachers fully credentialed (integer).
- emer
Percent of teachers on emergency credentials (integer).
Source
Lumley T (2004). Analysis of complex survey samples. Journal of Statistical
Software, 9(1):1–19. Data distributed with the survey R package.
California Department of Education, Academic Performance Index 2000.
Details
Survey design: Simple random sample. Use as_survey() with
weights = pw and fpc = fpc:
svy <- as_survey(
ca_api_2000,
weights = pw,
fpc = fpc
)Missing values: Several columns have NA for schools where the value is
inapplicable: acs_k3 (grades K–3) is NA for high schools and middle
schools, where those grade spans do not exist; acs_46 (grades 4–6) is
NA for all high schools and some elementary and middle schools; acs_core is NA for
most elementary schools.
Metadata: All 38 columns carry "label" attributes (human-readable
variable descriptions). The six categorical columns (stype, sch_wide,
comp_imp, both, awards, yr_rnd) additionally carry "labels"
attributes mapping integer codes to category names, compatible with
surveycore's metadata system.
Relationship to apisrs: This dataset contains the same observations
as survey::apisrs, with three differences: (1) the all-NA flag
column is dropped; (2) factor columns are stored as plain integers with
labels attributes; (3) column names are in snake_case.
Examples
head(ca_api_2000[, c("pw", "fpc", "api00", "enroll")])
#> pw fpc api00 enroll
#> 1039 30.97 6194 462 477
#> 1124 30.97 6194 878 478
#> 2868 30.97 6194 734 1410
#> 1273 30.97 6194 772 342
#> 4926 30.97 6194 739 217
#> 2463 30.97 6194 835 258
# Create an SRS design
svy <- as_survey(ca_api_2000, weights = pw, fpc = fpc)
svy
#>
#> ── Survey Design ───────────────────────────────────────────────────────────────
#> <survey_taylor> (Taylor series linearization)
#> Sample size: 200
#>
#> # A tibble: 200 × 38
#> cds stype name sname snum dname dnum cname cnum pcttest api00 api99
#> <chr> <int> <chr> <chr> <dbl> <chr> <int> <chr> <int> <int> <int> <int>
#> 1 15739081… 2 "McF… McFa… 1039 McFa… 432 Kern 14 98 462 448
#> 2 19642126… 1 "Sto… Stow… 1124 ABC … 1 Los … 18 100 878 831
#> 3 30664493… 2 "Bre… Brea… 2868 Brea… 79 Oran… 29 98 734 742
#> 4 19644516… 1 "Ala… Alam… 1273 Down… 187 Los … 18 99 772 657
#> 5 40688096… 1 "Sun… Sunn… 4926 San … 640 San … 39 99 739 719
#> 6 19734456… 1 "Los… Los … 2463 Haci… 284 Los … 18 93 835 822
#> 7 19647336… 3 "Nor… Nort… 2031 Los … 401 Los … 18 98 456 472
#> 8 19647336… 1 "Gla… Glas… 1736 Los … 401 Los … 18 99 506 474
#> 9 19648166… 1 "Max… Maxs… 2142 Moun… 470 Los … 18 100 543 458
#> 10 38684786… 1 "Tre… Trea… 4754 San … 632 San … 37 90 649 604
#> # ℹ 190 more rows
#> # ℹ 26 more variables: target <int>, growth <int>, sch_wide <int>,
#> # comp_imp <int>, both <int>, awards <int>, meals <int>, ell <int>,
#> # yr_rnd <int>, mobility <int>, acs_k3 <int>, acs_46 <int>, acs_core <int>,
#> # pct_resp <int>, not_hsg <int>, hsg <int>, some_col <int>, col_grad <int>,
#> # grad_sch <int>, avg_ed <dbl>, full <int>, emer <int>, enroll <int>,
#> # api_stu <int>, pw <dbl>, fpc <dbl>
# Inspect variable label
attr(ca_api_2000$api00, "label")
#> [1] "API score 2000"
# Inspect value labels for school type
attr(ca_api_2000$stype, "labels")
#> Elementary High Middle
#> 1 2 3
