We import U.S. state boundary shapefiles using the tigris package.
The tigris::states() function retrieves state-level shapefiles from the U.S. Census Bureau’s TIGER/Line products.
Column Selection
We retain only:
NAME → full state name
STUSPS → two-letter state abbreviation
geometry → polygon boundaries
This reduces memory overhead and keeps the object lightweight for downstream joins and plotting.
Exclusions
We remove:
Alaska and Hawaii (to avoid projection distortion in contiguous U.S. maps)
U.S. territories (not included in SRTR transplant center analyses)
This ensures that visualizations align with the contiguous U.S. study population.
Output
The resulting object, states_sf, is an sf object containing polygon geometries for the 48 contiguous U.S. states,
Click to show/hide R Code
states_sf <- tigris::states(cb =TRUE, year =2022)%>%select(NAME, STUSPS, geometry) %>%filter(!(NAME %in%c("Alaska", "Hawaii", "Puerto Rico", "United States Virgin Islands","Commonwealth of the Northern Mariana Islands","Guam","American Samoa"))) %>%rename(State=NAME)
Import transplant data from UNOS SRTR
To identify active transplant centers and characterize transplant activity by organ type, we import transplant recipient-level and donor-level data from the Scientific Registry of Transplant Recipients (SRTR).
These datasets are accessed using a custom helper function, load_srtr_file(), which standardizes file loading, variable labeling, and preprocessing across SRTR tables.
To map transplant center locations and define geographic catchment areas, we retrieve center-level data from the Health Resources and Services Administration (HRSA) using the sRtr package.
The get_hrsa_transplant_centers() function accesses HRSA transplant center metadata, including center identifiers and geographic coordinates.
Subsequent steps:
Remove Hawaii and Puerto Rico
Keep the following variables:
OTC_NM: transplant center name
OTC_CD: transplant center 4-letter UNOS code
Service_Lst: list of organs transplanted at that center
X, Y: longitude and latitude of transplant center
Keep transplant centers whose organ corresponds with the organ defined in the organ_loop variable for each loop
Removes duplicates with the distinct() command
Renames variables for convenience
Pauses for five seconds between iterations to avoid overloading the HRSA website
To estimate catchment populations for transplant centers, we retrieve tract- and county-level population data from the American Community Survey (ACS) 5-year files. Because these data are large and slow to download, we implement a caching strategy to avoid repeated API calls during iterative renders.
Overview
For each year in year_list, this code:
Retrieves tract-level total population
Retrieves county-level total population
Caches results locally as .rds files
This structure ensures reproducibility while dramatically improving performance during Quarto re-rendering.
Tract-Level Population Data
We obtain tract-level total population using ACS variable:
B01003_001 — Total population
If a cached file already exists (cache/us_tracts_population_YEAR.rds), it is loaded directly. Otherwise:
Data are downloaded using get_acs()
Geometry is included (geometry = TRUE) for spatial joins
County FIPS codes are derived from the first five digits of the tract GEOID
Identifier fields are converted to numeric for efficient joins
Only necessary variables are retained
The cleaned dataset is saved locally for future use
Why include geometry?
Including tract geometry allows direct spatial intersection with:
50-mile buffers
60-minute isochrones
United catchment areas
This avoids needing a second shapefile join later.
County-Level Population Data
County-level total population is retrieved using the same ACS variable (B01003_001), with geometry retained.
County data are used for:
County-level HIV prevalence overlays
Aggregated catchment summaries
Validation checks against tract-level totals
As with tract data, results are cached locally to prevent redundant API calls.
Age and Sex Variable Labels (B01001)
The ACS table B01001 contains detailed age- and sex-specific population counts. Rather than downloading these counts directly in this step, we load the variable metadata using:
Click to show/hide R Code
us_tracts_population<-list()us_counties_population<-list()vars<-list()for (year_loop in year_list){ tracts_path<-paste0("cache/us_tracts_population_",year_loop,".rds") counties_path<-paste0("cache/us_counties_population_",year_loop,".rds")#Pull US Census tract populations if cache file doesn't already existif (file.exists(tracts_path)) { us_tracts_population[[year_loop]] <-readRDS(tracts_path) } else { us_tracts_population[[year_loop]]<-get_acs(geography ="tract",variables ="B01003_001",year =as.numeric(year_loop),survey ="acs5",geometry =TRUE,state = continental_fips )%>%mutate(county_fips =substr(GEOID, 1, 5))%>%mutate(GEOID=as.numeric(GEOID))%>%mutate(county_fips=as.numeric(county_fips))%>%select(-variable, -moe)%>%rename(tract_population=estimate)saveRDS(us_tracts_population[[year_loop]], tracts_path) }#Pull US Census counties populations if cache file doesn't already existif (file.exists(counties_path)) { us_counties_population[[year_loop]] <-readRDS(counties_path) } else { us_counties_population[[year_loop]]<-get_acs(geography ="county",variables ="B01003_001",year =as.numeric(year_loop),survey ="acs5",geometry =TRUE,state = continental_fips )%>%mutate(GEOID=as.numeric(GEOID))%>%select(-variable, -moe)%>%rename(county_population=estimate)saveRDS(us_counties_population[[year_loop]], counties_path) }# Load tract variable labels for age/sex values B01001 from the ACS 5-year 2017 dataset vars[[year_loop]] <-load_variables(year_loop, "acs5", cache =TRUE) %>%filter(str_detect(name, "B01001_"))%>%mutate(label =str_replace_all(label, "Estimate!!Total!!", ""))%>%mutate(label =str_replace_all(label, "!!", "_"))%>%mutate(label =str_replace_all(label, " ", "_"))}
AtlasPlus county-level HIV data
This code uses the CDCAtlas package to download county-level HIV prevalence data for the years defined in year_list.
Click to show/hide R Code
AtlasPlusTableData_county_totals<-list()for (year_loop in year_list){ AtlasPlusTableData_county_totals[[year_loop]] <-CDCAtlas::get_atlas(disease="hiv",geography="county",year=as.numeric(year_loop))%>%clean_names()%>%rename(county_cases=cases)%>%rename(geo_id=fips)%>%rename(county_name=geography)%>%mutate(county_cases=if_else(is.na(county_cases),0, county_cases))%>%mutate(geo_id=as.numeric(geo_id))%>%filter(year==as.numeric(year_loop))%>%select(-year)%>%filter(indicator=="HIV prevalence")}
Other portions of the analysis
Setup: Defines global paths, data sources, cohort inclusion criteria, and analysis-wide constants.
Functions: Reusable helper functions for cohort construction, matching, costing, and modeling.
Tables: Summary tables and regression outputs generated from the final models.
Figures:Visualizations of costs, risks, and model-based estimates.
---title: "Load data"format: html---The code in this script loads the data needed for other scripts from multiple sources, as explained in detail below.::: {.Rcode title="Source code"}The full R script is available at:- [`R/load_data.R`](https://github.com/VagishHemmige/HOPE-CDC-analysis-2026/blob/master/R/load_data.R)This R script file is itself reliant on the following helper files:- [`R/setup.R`](https://github.com/VagishHemmige/HOPE-CDC-analysis-2026/blob/master/R/setup.R)- [`R/functions.R`](https://github.com/VagishHemmige/HOPE-CDC-analysis-2026/blob/master/R/functions.R):::## Load US state maps from censusWe import U.S. state boundary shapefiles using the `tigris` package.The `tigris::states()` function retrieves state-level shapefiles from the U.S. Census Bureau’s TIGER/Line products.### Column SelectionWe retain only:- `NAME` → full state name- `STUSPS` → two-letter state abbreviation- `geometry` → polygon boundariesThis reduces memory overhead and keeps the object lightweight for downstream joins and plotting.### ExclusionsWe remove:- Alaska and Hawaii (to avoid projection distortion in contiguous U.S. maps)- U.S. territories (not included in SRTR transplant center analyses)This ensures that visualizations align with the contiguous U.S. study population.### OutputThe resulting object, `states_sf`, is an `sf` object containing polygon geometries for the 48 contiguous U.S. states,```{r, eval=FALSE}states_sf <- tigris::states(cb = TRUE, year = 2022)%>% select(NAME, STUSPS, geometry) %>% filter(!(NAME %in% c("Alaska", "Hawaii", "Puerto Rico", "United States Virgin Islands", "Commonwealth of the Northern Mariana Islands", "Guam", "American Samoa"))) %>% rename(State=NAME)```## Import transplant data from UNOS SRTRTo identify active transplant centers and characterize transplant activity by organ type, we import transplant recipient-level and donor-level data from the Scientific Registry of Transplant Recipients (SRTR).These datasets are accessed using a custom helper function, `load_srtr_file()`, which standardizes file loading, variable labeling, and preprocessing across SRTR tables.```{r, eval=FALSE}tx<-list()tx[["kidney"]] <- load_srtr_file("tx_ki")tx[["liver"]] <- load_srtr_file("tx_li")tx[["heart"]] <- load_srtr_file("tx_hr")tx[["lung"]] <- load_srtr_file("tx_lu")tx[["pancreas"]] <- load_srtr_file("tx_kp")donor_deceased <- load_srtr_file("donor_deceased")%>% select(DONOR_ID, DON_ANTI_HIV, DON_HIV_NAT)```## Load transplant center data from SRTR packageTo map transplant center locations and define geographic catchment areas, we retrieve center-level data from the Health Resources and Services Administration (HRSA) using the `sRtr` package.The `get_hrsa_transplant_centers()` function accesses HRSA transplant center metadata, including center identifiers and geographic coordinates.Subsequent steps:- Remove Hawaii and Puerto Rico- Keep the following variables: - `OTC_NM`: transplant center name - `OTC_CD`: transplant center 4-letter UNOS code - `Service_Lst`: list of organs transplanted at that center - `X`, `Y`: longitude and latitude of transplant center- Keep transplant centers whose organ corresponds with the organ defined in the `organ_loop` variable for each loop- Removes duplicates with the `distinct()` command- Renames variables for convenience- Pauses for five seconds between iterations to avoid overloading the HRSA website```{r, eval=FALSE}Transplant_centers_all<-list()for (organ_loop in organ_list){ Transplant_centers_all[[organ_loop]]<-sRtr::get_hrsa_transplant_centers()%>% filter(OPO_STATE_ABBR != "PR" & OPO_STATE_ABBR!= "HI")%>% select(OTC_NM, OTC_CD, Service_Lst, X, Y) %>% filter(str_detect(Service_Lst, str_to_sentence(organ_loop)))%>% select(-Service_Lst)%>% distinct()%>% rename(OTCName=OTC_NM, OTCCode=OTC_CD, Latitude=Y, Longitude=X) Sys.sleep(5)}```## Load census dataTo estimate catchment populations for transplant centers, we retrieve tract- and county-level population data from the American Community Survey (ACS) 5-year files. Because these data are large and slow to download, we implement a caching strategy to avoid repeated API calls during iterative renders.### OverviewFor each year in `year_list`, this code:- Retrieves tract-level total population- Retrieves county-level total population- Caches results locally as `.rds` filesThis structure ensures reproducibility while dramatically improving performance during Quarto re-rendering.### Tract-Level Population DataWe obtain tract-level total population using ACS variable:- `B01003_001` — Total populationIf a cached file already exists (`cache/us_tracts_population_YEAR.rds`), it is loaded directly. Otherwise:- Data are downloaded using `get_acs()`- Geometry is included (`geometry = TRUE`) for spatial joins- County FIPS codes are derived from the first five digits of the tract `GEOID`- Identifier fields are converted to numeric for efficient joins- Only necessary variables are retained- The cleaned dataset is saved locally for future use#### Why include geometry?Including tract geometry allows direct spatial intersection with:- 50-mile buffers- 60-minute isochrones- United catchment areasThis avoids needing a second shapefile join later.### County-Level Population DataCounty-level total population is retrieved using the same ACS variable (`B01003_001`), with geometry retained.County data are used for:- County-level HIV prevalence overlays- Aggregated catchment summaries- Validation checks against tract-level totalsAs with tract data, results are cached locally to prevent redundant API calls.### Age and Sex Variable Labels (`B01001`)The ACS table `B01001` contains detailed age- and sex-specific population counts. Rather than downloading these counts directly in this step, we load the variable metadata using:```{r, eval=FALSE}us_tracts_population<-list()us_counties_population<-list()vars<-list()for (year_loop in year_list){ tracts_path<-paste0("cache/us_tracts_population_",year_loop,".rds") counties_path<-paste0("cache/us_counties_population_",year_loop,".rds") #Pull US Census tract populations if cache file doesn't already exist if (file.exists(tracts_path)) { us_tracts_population[[year_loop]] <- readRDS(tracts_path) } else { us_tracts_population[[year_loop]]<-get_acs( geography = "tract", variables = "B01003_001", year = as.numeric(year_loop), survey = "acs5", geometry = TRUE, state = continental_fips )%>% mutate(county_fips = substr(GEOID, 1, 5))%>% mutate(GEOID=as.numeric(GEOID))%>% mutate(county_fips=as.numeric(county_fips))%>% select(-variable, -moe)%>% rename(tract_population=estimate) saveRDS(us_tracts_population[[year_loop]], tracts_path) } #Pull US Census counties populations if cache file doesn't already exist if (file.exists(counties_path)) { us_counties_population[[year_loop]] <- readRDS(counties_path) } else { us_counties_population[[year_loop]]<-get_acs( geography = "county", variables = "B01003_001", year = as.numeric(year_loop), survey = "acs5", geometry = TRUE, state = continental_fips )%>% mutate(GEOID=as.numeric(GEOID))%>% select(-variable, -moe)%>% rename(county_population=estimate) saveRDS(us_counties_population[[year_loop]], counties_path) } # Load tract variable labels for age/sex values B01001 from the ACS 5-year 2017 dataset vars[[year_loop]] <- load_variables(year_loop, "acs5", cache = TRUE) %>% filter(str_detect(name, "B01001_"))%>% mutate(label = str_replace_all(label, "Estimate!!Total!!", ""))%>% mutate(label = str_replace_all(label, "!!", "_"))%>% mutate(label = str_replace_all(label, " ", "_"))}```## AtlasPlus county-level HIV dataThis code uses the `CDCAtlas` package to download county-level HIV prevalence data for the years defined in `year_list`.```{r eval=FALSE}AtlasPlusTableData_county_totals<-list()for (year_loop in year_list){ AtlasPlusTableData_county_totals[[year_loop]] <-CDCAtlas::get_atlas(disease="hiv", geography="county", year=as.numeric(year_loop))%>% clean_names()%>% rename(county_cases=cases)%>% rename(geo_id=fips)%>% rename(county_name=geography)%>% mutate(county_cases=if_else(is.na(county_cases),0, county_cases))%>% mutate(geo_id=as.numeric(geo_id))%>% filter(year==as.numeric(year_loop))%>% select(-year)%>% filter(indicator=="HIV prevalence")}```## Other portions of the analysis- [**Setup**](setup.qmd): Defines global paths, data sources, cohort inclusion criteria, and analysis-wide constants.- [**Functions**](functions.qmd): Reusable helper functions for cohort construction, matching, costing, and modeling.- [**Tables**](tables.qmd): Summary tables and regression outputs generated from the final models.- [**Figures**](figures.qmd):Visualizations of costs, risks, and model-based estimates.- [**About**](about.qmd): methods, assumptions, and disclosures