County-level geographic calculations

Author
Affiliation

Vagish Hemmige

Montefiore Medical Center/ Albert Einstein College of Medicine

The code in this script creates the cohort of kidney transplant patients eligible for the study for use in other scripts.

Source code

The full R script is available at:

This R script file is itself reliant on the following helper files:

Preparing transplant-center sf objects

This code converts multiple transplant-center datasets into geographic (sf) objects and organizes them in nested lists by organ and year. The end result is four parallel list objects that can be used downstream for buffering, distance calculations, and map overlays.

What gets created

  • Transplant_centers_all_sf
    Spatial (sf) version of all transplant centers, by organ.

  • Transplant_centers_active_SF
    Spatial (sf) version of active transplant centers, by organ and year.

  • Transplant_centers_HIV_sf
    Spatial (sf) subset of centers that performed an HIV R+ transplant at the given time point, by organ and year.

  • Transplant_centers_HOPE_sf
    Spatial (sf) subset of centers that performed an HOPE (HIV D+/R+) transplant at the given time point, by organ and year.

Step 1: initialize empty containers

We start by creating empty lists. These will be filled inside the loops:

  • the outer list is keyed by organ_loop (from organ_list)
  • for year-specific objects, there is an inner list keyed by year_loop (from year_list)

Step 2: Loop Over Organ Types

For each organ in organ_list:

  1. The dataset of all transplant centers for that organ is converted to an sf object.
  2. The coordinate reference system is transformed to EPSG:5070 (NAD83 / Conus Albers).

This projection is used because it is well-suited for U.S. spatial analyses and provides consistent units for distance-based calculations.


Step 3: Loop Over Years Within Each Organ

For each year in year_list, three organ-year–specific spatial datasets are created:

A) Active Centers

The year-specific dataset of active transplant centers is converted to sf and projected to EPSG:5070.

B) HIV R+ Centers

From the full set of centers for that organ, centers are filtered to retain only those whose OTCCode appears in the HIV center volume file for that organ and year.

This identifies centers that performed at least one HIV R+ transplant at that time point.

C) HOPE (HIV D+/R+) Centers

Similarly, the full center list is filtered using the HOPE center volume file to identify centers that performed HIV D+/R+ transplants during that year.


Final Structure

After the loops complete, the resulting structure is:

  • Transplant_centers_all_sf[[organ]]
  • Transplant_centers_active_SF[[organ]][[year]]
  • Transplant_centers_HIV_sf[[organ]][[year]]
  • Transplant_centers_HOPE_sf[[organ]][[year]]

This consistent nested design allows downstream functions to iterate cleanly across organ types and years when generating maps, buffers, and catchment-area analyses.

Click to show/hide R Code
Transplant_centers_all_sf<-list()
Transplant_centers_active_SF<-list()
Transplant_centers_HIV_sf<-list()
Transplant_centers_HOPE_sf<-list()

for (organ_loop in organ_list)
  
{

  #Convert transplant center data to geographical data
  Transplant_centers_all_sf[[organ_loop]]<-Transplant_centers_all[[organ_loop]]%>%
    st_transform(5070)  
  
  
  
  for (year_loop in year_list) {
    

    Transplant_centers_active_SF[[organ_loop]][[year_loop]]<-Transplant_centers_active[[organ_loop]][[year_loop]]%>%
      st_transform(5070)
    

    #Geographic file of centers that performed an HIV HIV R+ transplant at the designated time point
    Transplant_centers_HIV_sf[[organ_loop]][[year_loop]] <- Transplant_centers_all_sf[[organ_loop]]%>%
      filter(OTCCode %in% HIV_center_volumes[[organ_loop]][[year_loop]]$REC_CTR_CD)
    
    #Geographic file of centers that performed an HIV D+/R+ transplant at the designated time point
    Transplant_centers_HOPE_sf[[organ_loop]][[year_loop]] <- Transplant_centers_all_sf[[organ_loop]]%>%
      filter(OTCCode %in% HOPE_center_volumes[[organ_loop]][[year_loop]]$REC_CTR_CD)
    
    
    
    
  }
  
  
}

Merging CDC HIV Data with County and Tract Geography

This section joins CDC Atlas HIV surveillance data to U.S. county and census tract population files, then derives tract-level HIV estimates for downstream spatial analysis.

Three list objects are created:

  • Merged_Counties
    County-level spatial data merged with CDC HIV totals.

  • Merged_Counties_nonSF
    County-level data with geometry removed (used for attribute joins).

  • Merged_tracts
    Census tract–level spatial data with estimated HIV-positive and HIV-negative populations.

Each object is indexed by year.


Step 1: Initialize Storage Lists

Empty lists are created to store year-specific merged datasets.

Each list will ultimately contain one entry per year in year_list.


Step 2: Loop Over Years

For each year:


A) Merge County Population Data with CDC Atlas Data

County geographic population files are joined with CDC Atlas HIV totals.

  • The join matches GEOID (county FIPS) from the population file
  • To geo_id from the CDC Atlas dataset
  • A left_join() ensures that all counties remain in the dataset

The result is a spatial (sf) object containing:

  • County geometry
  • County population
  • County HIV case counts and rates

This object is stored in:

Click to show/hide R Code
Merged_Counties<-list()
Merged_Counties_nonSF<-list()
Merged_tracts<-list()

for (year_loop in year_list) {

  #Join county geographical data with CDC data.
  Merged_Counties[[year_loop]]<-left_join(us_counties_population[[year_loop]],
                                          AtlasPlusTableData_county_totals[[year_loop]], 
                                          by=join_by(GEOID==geo_id))
  
  Merged_Counties_nonSF[[year_loop]]<-Merged_Counties[[year_loop]]%>%
    st_drop_geometry()
  

  
  #Merge county data with tracts
  Merged_tracts[[year_loop]]<-left_join(us_tracts_population[[year_loop]], 
                                Merged_Counties_nonSF[[year_loop]], 
                                by=c("county_fips"="GEOID"))%>%
    select(-NAME.x, -NAME.y)%>%
    #Estimate census tract HIV population
    mutate(tract_cases=county_cases*tract_population/county_population)%>%
    #Estimate HIV-negative population in census tracts
    mutate(tract_noncases=tract_population-tract_cases)%>%
    st_transform(5070)
  
  
}

Other portions of the analysis

  • Setup: Defines global paths, data sources, cohort inclusion criteria, and analysis-wide constants.
  • Functions: Reusable helper functions for cohort construction, matching, costing, and modeling.
  • Tables: Summary tables and regression outputs generated from the final models.
  • Figures:Visualizations of costs, risks, and model-based estimates.
  • About: methods, assumptions, and disclosures