Author
Affiliation

Vagish Hemmige

Montefiore Medical Center/ Albert Einstein College of Medicine

The code in this script loads the data needed for other scripts from multiple sources, as explained in detail below.

Source code

The full R script is available at:

This R script file is itself reliant on the following helper files:

Load US state maps from census

We import U.S. state boundary shapefiles using the tigris package.

The tigris::states() function retrieves state-level shapefiles from the U.S. Census Bureau’s TIGER/Line products.

Column Selection

We retain only:

  • NAME → full state name
  • STUSPS → two-letter state abbreviation
  • geometry → polygon boundaries

This reduces memory overhead and keeps the object lightweight for downstream joins and plotting.

Exclusions

We remove:

  • Alaska and Hawaii (to avoid projection distortion in contiguous U.S. maps)
  • U.S. territories (not included in SRTR transplant center analyses)

This ensures that visualizations align with the contiguous U.S. study population.

Output

The resulting object, states_sf, is an sf object containing polygon geometries for the 48 contiguous U.S. states,

Click to show/hide R Code
states_sf <- tigris::states(cb = TRUE, year = 2022)%>%
  select(NAME, STUSPS, geometry) %>%
  filter(!(NAME %in% c("Alaska", 
                       "Hawaii", 
                       "Puerto Rico", 
                       "United States Virgin Islands",
                       "Commonwealth of the Northern Mariana Islands",
                       "Guam",
                       "American Samoa"))) %>%
  rename(State=NAME)

Import transplant data from UNOS SRTR

To identify active transplant centers and characterize transplant activity by organ type, we import transplant recipient-level and donor-level data from the Scientific Registry of Transplant Recipients (SRTR).

These datasets are accessed using a custom helper function, load_srtr_file(), which standardizes file loading, variable labeling, and preprocessing across SRTR tables.

Click to show/hide R Code
tx<-list()

tx[["kidney"]] <- load_srtr_file("tx_ki")
tx[["liver"]] <- load_srtr_file("tx_li")
tx[["heart"]] <- load_srtr_file("tx_hr")
tx[["lung"]] <- load_srtr_file("tx_lu")
tx[["pancreas"]] <- load_srtr_file("tx_kp")



donor_deceased <- load_srtr_file("donor_deceased")%>%
  select(DONOR_ID, DON_ANTI_HIV, DON_HIV_NAT)

Load transplant center data from SRTR package

To map transplant center locations and define geographic catchment areas, we retrieve center-level data from the Health Resources and Services Administration (HRSA) using the sRtr package.

The get_hrsa_transplant_centers() function accesses HRSA transplant center metadata, including center identifiers and geographic coordinates.

Subsequent steps:

  • Remove Hawaii and Puerto Rico

  • Keep the following variables:

    • OTC_NM: transplant center name
    • OTC_CD: transplant center 4-letter UNOS code
    • Service_Lst: list of organs transplanted at that center
    • X, Y: longitude and latitude of transplant center
  • Keep transplant centers whose organ corresponds with the organ defined in the organ_loop variable for each loop

  • Removes duplicates with the distinct() command

  • Renames variables for convenience

  • Pauses for five seconds between iterations to avoid overloading the HRSA website

Click to show/hide R Code
Transplant_centers_all<-list()

for (organ_loop in organ_list)
{
  
  
  Transplant_centers_all[[organ_loop]]<-sRtr::get_hrsa_transplant_centers()%>%
    filter(OPO_STATE_ABBR != "PR" & OPO_STATE_ABBR!= "HI")%>%
    select(OTC_NM, OTC_CD, Service_Lst, X, Y) %>%
    filter(str_detect(Service_Lst, str_to_sentence(organ_loop)))%>%
    select(-Service_Lst)%>%
    distinct()%>%
    rename(OTCName=OTC_NM, OTCCode=OTC_CD, Latitude=Y, Longitude=X)
  Sys.sleep(5)
  
}

Load census data

To estimate catchment populations for transplant centers, we retrieve tract- and county-level population data from the American Community Survey (ACS) 5-year files. Because these data are large and slow to download, we implement a caching strategy to avoid repeated API calls during iterative renders.

Overview

For each year in year_list, this code:

  • Retrieves tract-level total population
  • Retrieves county-level total population
  • Caches results locally as .rds files

This structure ensures reproducibility while dramatically improving performance during Quarto re-rendering.

Tract-Level Population Data

We obtain tract-level total population using ACS variable:

  • B01003_001 — Total population

If a cached file already exists (cache/us_tracts_population_YEAR.rds), it is loaded directly. Otherwise:

  • Data are downloaded using get_acs()
  • Geometry is included (geometry = TRUE) for spatial joins
  • County FIPS codes are derived from the first five digits of the tract GEOID
  • Identifier fields are converted to numeric for efficient joins
  • Only necessary variables are retained
  • The cleaned dataset is saved locally for future use

Why include geometry?

Including tract geometry allows direct spatial intersection with:

  • 50-mile buffers
  • 60-minute isochrones
  • United catchment areas

This avoids needing a second shapefile join later.

County-Level Population Data

County-level total population is retrieved using the same ACS variable (B01003_001), with geometry retained.

County data are used for:

  • County-level HIV prevalence overlays
  • Aggregated catchment summaries
  • Validation checks against tract-level totals

As with tract data, results are cached locally to prevent redundant API calls.

Age and Sex Variable Labels (B01001)

The ACS table B01001 contains detailed age- and sex-specific population counts. Rather than downloading these counts directly in this step, we load the variable metadata using:

Click to show/hide R Code
us_tracts_population<-list()
us_counties_population<-list()
vars<-list()

for (year_loop in year_list){
  
  
  tracts_path<-paste0("cache/us_tracts_population_",year_loop,".rds")
  counties_path<-paste0("cache/us_counties_population_",year_loop,".rds")
  
  #Pull US Census tract populations if cache file doesn't already exist
  
  if (file.exists(tracts_path)) {
    us_tracts_population[[year_loop]] <- readRDS(tracts_path)
  } else {
    
    us_tracts_population[[year_loop]]<-get_acs(
      geography = "tract",
      variables = "B01003_001",
      year = as.numeric(year_loop),
      survey = "acs5",
      geometry = TRUE,
      state = continental_fips
    )%>%
      mutate(county_fips = substr(GEOID, 1, 5))%>%
      mutate(GEOID=as.numeric(GEOID))%>%
      mutate(county_fips=as.numeric(county_fips))%>%
      select(-variable, -moe)%>%
      rename(tract_population=estimate)
    
    saveRDS(us_tracts_population[[year_loop]], 
            tracts_path)
    
  }
  
  #Pull US Census counties populations if cache file doesn't already exist
  
  if (file.exists(counties_path)) {
    us_counties_population[[year_loop]] <- readRDS(counties_path)
  } else {
    
    us_counties_population[[year_loop]]<-get_acs(
      geography = "county",
      variables = "B01003_001",
      year = as.numeric(year_loop),
      survey = "acs5",
      geometry = TRUE,
      state = continental_fips
    )%>%
      mutate(GEOID=as.numeric(GEOID))%>%
      select(-variable, -moe)%>%
      rename(county_population=estimate)
    
    saveRDS(us_counties_population[[year_loop]], 
            counties_path)
    
    
  }
  
  # Load tract variable labels for age/sex values B01001 from the ACS 5-year 2017 dataset
  vars[[year_loop]] <- load_variables(year_loop, "acs5", cache = TRUE) %>%
    filter(str_detect(name, "B01001_"))%>%
    mutate(label = str_replace_all(label, "Estimate!!Total!!", ""))%>%
    mutate(label = str_replace_all(label, "!!", "_"))%>%
    mutate(label = str_replace_all(label, " ", "_"))
  
}

AtlasPlus county-level HIV data

This code uses the CDCAtlas package to download county-level HIV prevalence data for the years defined in year_list.

Click to show/hide R Code
AtlasPlusTableData_county_totals<-list()

for (year_loop in year_list){
  
  AtlasPlusTableData_county_totals[[year_loop]] <-CDCAtlas::get_atlas(disease="hiv",
                                                                      geography="county",
                                                                      year=as.numeric(year_loop))%>%
    clean_names()%>%
    rename(county_cases=cases)%>%
    rename(geo_id=fips)%>%
    rename(county_name=geography)%>%
    mutate(county_cases=if_else(is.na(county_cases),0, county_cases))%>%
    mutate(geo_id=as.numeric(geo_id))%>%
    filter(year==as.numeric(year_loop))%>%
    select(-year)%>%
    filter(indicator=="HIV prevalence")
  
}

Other portions of the analysis

  • Setup: Defines global paths, data sources, cohort inclusion criteria, and analysis-wide constants.
  • Functions: Reusable helper functions for cohort construction, matching, costing, and modeling.
  • Tables: Summary tables and regression outputs generated from the final models.
  • Figures:Visualizations of costs, risks, and model-based estimates.
  • About: methods, assumptions, and disclosures