
Using usRds to access data within the USRDS file system
Source:vignettes/accessing-files.Rmd
accessing-files.RmdThis vignette demonstrates how to load and filter USRDS data files
using the usRds package. It covers both general file
loading and specific extractors for common tasks.
Prerequisite: Set the Working Directory
All examples assume that you have already set your working directory with:
set_USRDS_wd("/path/to/usrds_data")Option 1: General File Loader
You can load any recognized USRDS file using:
df <- load_usrds_file(file_key = "PATIENTS", year = 2006)This function will:
- Locate the file using internal variables created by the package when
loaded or
set_USRDS_wd()is run - Use
.parquetif available, falling back to.sas7bdator.csv - Read the file using the best available method for that format
- You can also use
col_selectto choose specific variables, andusrds_idsto filter by patient. Setfactor_labels = TRUEandvar_labels = TRUEto apply value and variable labels:
Labeling Options
The load_usrds_file() function supports two options for
enriching the data with documentation from the USRDS Researcher’s
Guide:
-
factor_labels = TRUEapplies value labels (e.g., turning codes like"1"into"Male") using Appendix C. -
var_labels = TRUEattaches human-readable variable descriptions (as a"label"attribute) using Appendix B.
These options help make the raw USRDS data easier to interpret.
# Apply both value and variable labels
df <- load_usrds_file("RXHIST", factor_labels = TRUE, var_labels = TRUE)To see what files are available:
list_available_usrds_files("IN")Option 2: Filtered Extractors (get_*() Functions) for
Institutional and Physician/Supplier Files
The usRds package includes purpose-built extractor
functions designed to read and filter commonly used USRDS claims files.
These functions are optimized for performance and usability,
automatically handling file formats and filtering during load.
Available Functions
-
get_IN_ICD()– diagnoses from institutional claims -
get_IN_HCPCS()– HCPCS procedure codes from institutional claims -
get_PS_ICD()– diagnoses from physician/supplier claims -
get_PS_HCPCS()– HCPCS procedure codes from physician/supplier claims
Key Arguments
All of the get_*() functions share the following
arguments:
-
icd_codesorhcpcs_codes: A character vector of diagnosis or procedure codes to filter. IfNULL, all codes are returned. -
years: A numeric vector of years to include (e.g.,2006,2006:2008) -
usrds_ids(optional): A character or numeric vector ofUSRDS_IDs to filter to a specific cohort. If omitted, all IDs are included.
Examples
# Example: Extract ICD-9 codes from institutional claims
dx <- get_IN_ICD(icd_codes = c("042", "V08"), years = 2006:2007)
# Example: Extract HCPCS code "81003" from physician/supplier claims
ps <- get_PS_HCPCS(hcpcs_codes = "81003", years = 2006)
# Example: Filter to a specific cohort of patients
subset <- get_PS_ICD(
icd_codes = "25000",
years = 2012,
usrds_ids = c("100001234", "100005678")
)Each get_*() function:
- Automatically uses
.parquetfiles if available for best performance - Falls back to
.sas7bdator.csvwhen needed - Applies filtering inline for efficiency
- Accepts optional filtering by
usrds_idsor codes
These functions are optimized for the format of each file. Parquet files provide the fastest performance, especially for large-scale extraction.
Summary
- Use
load_usrds_file()for unfiltered access to raw data - Use
get_*()functions for fast, focused queries by diagnosis or procedure code - All functions are optimized for the available file type, and Parquet is preferred for maximum speed