Overview
CDC AtlasPlus reports many outcomes at the county level. For some analyses, especially geographic access analyses, we may want estimates at a smaller geographic unit such as the Census tract.
CDCAtlas includes helper functions for extrapolating
county-level AtlasPlus counts to Census tracts using tract-level Census
denominators.
The basic idea is:
- Retrieve county-level CDC AtlasPlus counts.
- Retrieve tract-level Census population denominators.
- Match each tract to its parent county.
- Allocate each county’s reported cases across tracts in proportion to the tract’s share of the relevant county population.
For example, if a tract contains 2% of a county’s population in the relevant denominator, it receives 2% of the county’s reported cases.
This is a population-weighted extrapolation, not a direct CDC tract-level surveillance estimate.
Census data requirements
Tract-level extrapolation requires tract-level population
denominators from the U.S. Census Bureau. CDCAtlas uses the
tidycensus package to retrieve these denominators.
If you only use CDCAtlas to retrieve AtlasPlus county,
state, or national data, you do not need to set up
tidycensus. However, if you use
extrapolate_to_tract = TRUE, you will need:
- the
tidycensuspackage installed - a Census API key
- the API key saved in your R environment
You can install tidycensus from CRAN:
install.packages("tidycensus")Then request a free Census API key from the U.S. Census Bureau and
install it using tidycensus::census_api_key().
See https://walker-data.com/tidycensus/ for further details.
Important assumptions
This method assumes that, within each county and stratum, cases are distributed across tracts in proportion to the relevant Census denominator.
For example:
- unstratified HIV prevalence may be allocated using total tract population
- sex-stratified counts may be allocated using tract population by sex
- age-stratified counts may be allocated using tract population by age
- race/ethnicity-stratified counts may be allocated using tract population by race/ethnicity
This can be useful for geographic access models, but it should not be interpreted as observed tract-level surveillance data.
Basic unstratified extrapolation
The simplest use case is extrapolating county-level data to tracts using total tract population.
hiv_tract <- get_atlas(
disease = "hiv",
year = 2022,
geography = "county",
extrapolate_to_tract = TRUE
)The returned data contain one row per tract, with the county-level AtlasPlus count allocated to tracts.
head(hiv_tract)
#> # A tibble: 6 × 26
#> year tract_fips tract_name county_fips race_ethnicity sex age
#> <dbl> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 2022 01001020100 Census Tract 201; Au… 01001 All races/eth… Both… Ages…
#> 2 2022 01001020100 Census Tract 201; Au… 01001 All races/eth… Both… Ages…
#> 3 2022 01001020200 Census Tract 202; Au… 01001 All races/eth… Both… Ages…
#> 4 2022 01001020200 Census Tract 202; Au… 01001 All races/eth… Both… Ages…
#> 5 2022 01001020300 Census Tract 203; Au… 01001 All races/eth… Both… Ages…
#> 6 2022 01001020300 Census Tract 203; Au… 01001 All races/eth… Both… Ages…
#> # ℹ 19 more variables: tract_population_acs <dbl>, county_population_acs <dbl>,
#> # indicator <fct>, county_name <fct>, data_status <fct>, transmission <fct>,
#> # rate100000 <dbl>, county_population_atlas <dbl>, lowerci_rate <dbl>,
#> # upperci_rate <dbl>, rse <dbl>, lowerci_cases <dbl>, upperci_cases <dbl>,
#> # state_fips <chr>, state_name <fct>, state_cases <dbl>, county_cases <dbl>,
#> # tract_cases <dbl>, tract_noncases <dbl>A typical output includes columns such as:
names(hiv_tract)
#> [1] "year" "tract_fips"
#> [3] "tract_name" "county_fips"
#> [5] "race_ethnicity" "sex"
#> [7] "age" "tract_population_acs"
#> [9] "county_population_acs" "indicator"
#> [11] "county_name" "data_status"
#> [13] "transmission" "rate100000"
#> [15] "county_population_atlas" "lowerci_rate"
#> [17] "upperci_rate" "rse"
#> [19] "lowerci_cases" "upperci_cases"
#> [21] "state_fips" "state_name"
#> [23] "state_cases" "county_cases"
#> [25] "tract_cases" "tract_noncases"You should expect columns identifying:
- the tract
- the parent county
- the AtlasPlus disease and year
- the county-level cases
- the tract denominator
- the county denominator
- the tract share of the county denominator
- the extrapolated tract cases
The key calculation is:
tract_cases = county_cases * tract_population / county_populationor, more generally:
tract_cases = county_cases * tract_denominator / county_denominatorChecking county totals
After extrapolation, tract-level estimates should sum back to the original county-level counts, allowing for small floating-point differences.
hiv_tract %>%
group_by(county_fips, indicator) %>%
summarize(
county_cases = first(county_cases),
tract_cases_sum = sum(tract_cases, na.rm = TRUE),
difference = tract_cases_sum - county_cases,
.groups = "drop"
) %>%
arrange(desc(abs(difference))) %>%
head()
#> # A tibble: 6 × 5
#> county_fips indicator county_cases tract_cases_sum difference
#> <chr> <fct> <dbl> <dbl> <dbl>
#> 1 51678 HIV prevalence 559. 559. 1.14e-13
#> 2 51530 HIV prevalence 462. 462. -5.68e-14
#> 3 17187 HIV prevalence 126. 126. -1.42e-14
#> 4 22107 HIV prevalence 113. 113. 1.42e-14
#> 5 16021 HIV prevalence 44.4 44.4 7.11e-15
#> 6 17047 HIV prevalence 46.7 46.7 7.11e-15In a clean allocation, the difference column should be
very close to zero.
Sex-stratified extrapolation
For sex-stratified AtlasPlus data, the allocation denominator should also be sex-specific.
hiv_tract_sex <- get_atlas(
disease = "hiv",
year = 2022,
geography = "county",
stratify_by = "sex",
extrapolate_to_tract = TRUE
)This means that male county cases are allocated across tracts using the male population in each tract, and female county cases are allocated using the female population in each tract.
Conceptually:
\[ \widehat{\text{tract cases}}_{\text{male}} = \text{county cases}_{\text{male}} \times \frac{\text{tract population}_{\text{male}}} {\text{county population}_{\text{male}}} \]
\[ \widehat{\text{tract cases}}_{\text{female}} = \text{county cases}_{\text{female}} \times \frac{\text{tract population}_{\text{female}}} {\text{county population}_{\text{female}}} \] The resulting data should include one row per tract per sex stratum.
hiv_tract_sex %>%
count(sex, indicator)
#> # A tibble: 4 × 3
#> sex indicator n
#> <chr> <fct> <int>
#> 1 Female HIV diagnoses 85396
#> 2 Female HIV prevalence 85396
#> 3 Male HIV diagnoses 85396
#> 4 Male HIV prevalence 85396You can verify the allocation within each county and sex stratum.
hiv_tract_sex %>%
group_by(county_fips, sex, indicator) %>%
summarize(
county_cases = first(county_cases),
tract_cases_sum = sum(tract_cases, na.rm = TRUE),
difference = tract_cases_sum - county_cases,
.groups = "drop"
) %>%
arrange(desc(abs(difference))) %>%
head()
#> # A tibble: 6 × 6
#> county_fips sex indicator county_cases tract_cases_sum difference
#> <chr> <chr> <fct> <dbl> <dbl> <dbl>
#> 1 37199 Male HIV prevalence 95.2 95.2 1.42e-14
#> 2 51007 Male HIV prevalence 50.7 50.7 7.11e-15
#> 3 51735 Male HIV prevalence 48.0 48.0 7.11e-15
#> 4 72049 Male HIV prevalence 63.0 63.0 7.11e-15
#> 5 13043 Female HIV prevalence 27.1 27.1 -3.55e-15
#> 6 32021 Male HIV prevalence 22.8 22.8 3.55e-15Age-stratified extrapolation
Age-stratified extrapolation works the same way, except that the Census denominator is based on tract population within the corresponding AtlasPlus age group.
hiv_tract_age <- get_atlas(
disease = "hiv",
year = 2022,
geography = "county",
stratify_by = "age",
extrapolate_to_tract = TRUE
)The output contains one row per tract per age stratum.
hiv_tract_age %>%
count(age, indicator)
#> # A tibble: 12 × 3
#> age indicator n
#> <chr> <fct> <int>
#> 1 13-24 HIV diagnoses 85396
#> 2 13-24 HIV prevalence 85396
#> 3 25-34 HIV diagnoses 85396
#> 4 25-34 HIV prevalence 85396
#> 5 35-44 HIV diagnoses 85396
#> 6 35-44 HIV prevalence 85396
#> 7 45-54 HIV diagnoses 85396
#> 8 45-54 HIV prevalence 85396
#> 9 55-64 HIV diagnoses 85396
#> 10 55-64 HIV prevalence 85396
#> 11 65+ HIV diagnoses 85396
#> 12 65+ HIV prevalence 85396Again, the diagnostic check is whether tract estimates sum back to the original county count within each county-age stratum.
hiv_tract_age %>%
group_by(county_fips, age, indicator) %>%
summarize(
county_cases = first(county_cases),
tract_cases_sum = sum(tract_cases, na.rm = TRUE),
difference = tract_cases_sum - county_cases,
.groups = "drop"
) %>%
arrange(desc(abs(difference))) %>%
head()
#> # A tibble: 6 × 6
#> county_fips age indicator county_cases tract_cases_sum difference
#> <chr> <chr> <fct> <dbl> <dbl> <dbl>
#> 1 48301 35-44 HIV prevalence 0.0352 0 -0.0352
#> 2 48301 45-54 HIV prevalence 0.0277 0 -0.0277
#> 3 48301 35-44 HIV diagnoses 0.00149 0 -0.00149
#> 4 48301 45-54 HIV diagnoses 0.000725 0 -0.000725
#> 5 15005 13-24 HIV prevalence 0.000705 0 -0.000705
#> 6 15005 13-24 HIV diagnoses 0.000162 0 -0.000162Race/ethnicity-stratified extrapolation
CDC AtlasPlus commonly uses a combined race/ethnicity variable. For example, Hispanic/Latino is represented as a single ethnicity category, while the remaining groups are usually non-Hispanic race categories.
hiv_tract_race <- get_atlas(
disease = "hiv",
year = 2022,
geography = "county",
stratify_by = "race",
extrapolate_to_tract = TRUE
)Race/ethnicity-stratified extrapolation requires care because Census and AtlasPlus race/ethnicity categories may not always align perfectly.
The package attempts to use Census variables that correspond to the AtlasPlus combined race/ethnicity categories, but users should inspect the denominator mapping before interpreting results.
hiv_tract_race %>%
count(race_ethnicity, indicator)
#> # A tibble: 14 × 3
#> race_ethnicity indicator n
#> <chr> <fct> <int>
#> 1 American Indian/Alaska Native HIV diagnoses 85396
#> 2 American Indian/Alaska Native HIV prevalence 85396
#> 3 Asian HIV diagnoses 85396
#> 4 Asian HIV prevalence 85396
#> 5 Black/African American HIV diagnoses 85396
#> 6 Black/African American HIV prevalence 85396
#> 7 Hispanic/Latino HIV diagnoses 85396
#> 8 Hispanic/Latino HIV prevalence 85396
#> 9 Multiracial HIV diagnoses 85396
#> 10 Multiracial HIV prevalence 85396
#> 11 Native Hawaiian/Other Pacific Islander HIV diagnoses 85396
#> 12 Native Hawaiian/Other Pacific Islander HIV prevalence 85396
#> 13 White HIV diagnoses 85396
#> 14 White HIV prevalence 85396