Building Cohorts with STROBE Diagrams

Introduction

The STROBE (Strengthening the Reporting of Observational Studies in Epidemiology) statement provides guidelines for reporting observational studies. A key component is the flow diagram that shows how the study population was selected. The strobe package provides tools to create STROBE-compliant flow diagrams while building your cohort.

This vignette demonstrates how to use the functions in the strobe package to:

Initialize a cohort with inclusion criteria
Apply sequential filters with exclusion tracking
Review the complete filtering log

Getting Started

library(strobe)
library(dplyr)

Basic STROBE Workflow

The STROBE workflow follows these steps:

Initialize the cohort with strobe_initialize()
Filter sequentially with strobe_filter()
Review the log with get_strobe_log()

Example: Cytomegalovirus

This example requires the medicaldata package from CRAN to demonstrate results.


#Obtain data
library(medicaldata)
#> Warning: package 'medicaldata' was built under R version 4.3.3
data(cytomegalovirus)

#Create cohort
df<-cytomegalovirus %>%
  strobe_initialize(inclusion_label = "Initial transplant cohort") %>%
  strobe_filter(
    condition = "age >= 30",
    inclusion_label = "Age ≥ 30",
    exclusion_reason = "Excluded: Age < 30"
  ) %>%
  strobe_filter(
    condition = "recipient.cmv == 1",
    inclusion_label = "CMV positive recipients",
    exclusion_reason = "Excluded: CMV negative"
  ) %>%
  strobe_filter(
    condition = "prior.transplant == 0",
    inclusion_label = "No prior transplant",
    exclusion_reason = "Excluded: Prior transplant"
  )%>%
  dplyr::select(age, recipient.cmv, donor.cmv, prior.transplant, sex, race)
  
#Show first rows and selected columns of final DF
head(df)
#>   age recipient.cmv donor.cmv prior.transplant sex race
#> 1  61             1         0                0   1    0
#> 3  63             1         1                0   0    1
#> 4  33             1         0                0   0    1
#> 5  54             1         1                0   0    1
#> 6  55             1         1                0   1    1
#> 7  67             1         1                0   1    1

Reviewing the Filtering Log

get_strobe_log()
#> # A tibble: 4 × 7
#>   id    parent inclusion_label         exclusion_reason filter remaining dropped
#>   <chr> <chr>  <chr>                   <chr>            <chr>      <int>   <int>
#> 1 start NA     Initial transplant coh… NA               NA            64      NA
#> 2 step1 start  Age ≥ 30                Excluded: Age <… age >…        63       1
#> 3 step2 step1  CMV positive recipients Excluded: CMV n… recip…        40      23
#> 4 step3 step2  No prior transplant     Excluded: Prior… prior…        34       6

The log shows:

Each filtering step
Number of records included/excluded at each step
Cumulative counts
Exclusion reasons

Advanced Examples

Complex Multi-Step Filtering

cytomegalovirus %>%
  strobe_initialize(inclusion_label = "All transplant recipients") %>%
  strobe_filter(
    condition = "age >= 18 & age <= 65",
    inclusion_label = "Age 18–65",
    exclusion_reason = "Excluded: Outside 18–65"
  ) %>%
  strobe_filter(
    condition = "recipient.cmv == 1",
    inclusion_label = "CMV positive",
    exclusion_reason = "Excluded: CMV negative"
  ) %>%
  strobe_filter(
    condition = "donor.cmv == 1",
    inclusion_label = "Donor CMV positive",
    exclusion_reason = "Excluded: Donor CMV negative"
  ) %>%
  strobe_filter(
    condition = "prior.transplant == 0",
    inclusion_label = "No prior transplant",
    exclusion_reason = "Excluded: Prior transplant"
  )%>%
  #Show first rows and selected columns of final DF
  dplyr::select(age, recipient.cmv, donor.cmv, prior.transplant, sex, race)%>%
  head()
#>    age recipient.cmv donor.cmv prior.transplant sex race
#> 3   63             1         1                0   0    1
#> 5   54             1         1                0   0    1
#> 6   55             1         1                0   1    1
#> 16  61             1         1                0   0    1
#> 17  62             1         1                0   0    1
#> 19  62             1         1                0   1    0

The filter log from the above pipeline is below:

get_strobe_log()
#> # A tibble: 5 × 7
#>   id    parent inclusion_label         exclusion_reason filter remaining dropped
#>   <chr> <chr>  <chr>                   <chr>            <chr>      <int>   <int>
#> 1 start NA     All transplant recipie… NA               NA            64      NA
#> 2 step1 start  Age 18–65               Excluded: Outsi… age >…        63       1
#> 3 step2 step1  CMV positive            Excluded: CMV n… recip…        39      24
#> 4 step3 step2  Donor CMV positive      Excluded: Donor… donor…        28      11
#> 5 step4 step3  No prior transplant     Excluded: Prior… prior…        23       5

Terminal branching

Not uncommonly, the terminal step of a STROBE diagram is used, not to show inclusions or exclusions, but how the final cohort is stratified based on the value of a factor variable. The create_terminal_branch function is designed for this purpose, and can handle factor variables with up to six levels (including missing values).

The optional label_prefix argument allows you to prepend text (e.g., the variable name) to each terminal group label. For example, setting label_prefix = “cgvhd:” will produce labels like ‘cgvhd: 0’ and ‘cgvhd: 1’.

Below, we use the variable cgvhd (chronic graft vs. host disease) to stratify the final inclusion cohort into two groups:


df<-create_terminal_branch(df, variable = "cgvhd", label_prefix="CGVHD value:")

get_strobe_log()
#> # A tibble: 5 × 7
#>   id    parent inclusion_label         exclusion_reason filter remaining dropped
#>   <chr> <chr>  <chr>                   <chr>            <chr>      <int>   <int>
#> 1 start NA     All transplant recipie… NA               NA            64      NA
#> 2 step1 start  Age 18–65               Excluded: Outsi… age >…        63       1
#> 3 step2 step1  CMV positive            Excluded: CMV n… recip…        39      24
#> 4 step3 step2  Donor CMV positive      Excluded: Donor… donor…        28      11
#> 5 step4 step3  No prior transplant     Excluded: Prior… prior…        23       5