Red and white circles on a grid, symbolizing points on a map

There is growing recognition that health is determined largely by factors outside of the healthcare system. These social determinants of health (SDoH) include factors such as a person’s neighborhood and built environment, socioeconomic status, educational attainment, and access to healthcare (1).

Given the impact of SDoH, as well as current trends in healthcare payment models toward a value-based system, healthcare systems and payors are working to identify and help tackle the SDoH-related factors that lead to higher costs and poor patient outcomes (2).

One resource in this effort is geospatial analysis, a powerful tool at the intersection of geography and data science. Healthcare systems can use geospatial analysis to collect, interpret, and visualize geographic data to identify geospatial trends and relationships. Geospatial analysis can help enhance resource allocation, disease surveillance, and public health planning (3). Geospatial data can also be used to link patient health metrics to population-level socioeconomic and demographic data in order to analyze the effect of SDoH on health outcomes (4).

The clinical laboratory generates a trove of high-quality, often quantifiable, patient health data across a large spectrum of medical conditions. This makes laboratory data an invaluable resource for assessing population health, SDoH, and health equity. Moreover, given their subject-matter expertise, quality improvement mindset, data analytics capabilities, and position to impact healthcare delivery, clinical laboratorians are especially well-positioned to leverage geospatial data to identify opportunities for closing care gaps (5).

This article will introduce and describe considerations in the experimental design of a geospatial analysis with a focus on laboratory data.

Key Components of Experimental Design

Formulate a Research Question

In any scientific endeavor, it is important to define research goals and hypotheses to guide the experiment and derive meaningful conclusions. Within laboratory medicine, potential use cases for geospatial analysis may include disease surveillance, identifying disease hotspots or laboratory testing deserts, or analyzing the impact of SDoH on laboratory testing and results (6, 7).

Determine Scale and Geographic Units

A geospatial analysis can be local, regional, or global in scope. This decision will be driven by the specific research question and will also inform the geographic units used in the analysis.

For example, if the question involves an entire country, the analysis might be performed at the regional, provincial, or state level. Analysis of a state may involve smaller geographic units, such as census tracts or block groups.

Note that while zip codes may be the first geographic unit that comes to mind, aggregating data at the zip code level is discouraged, as they do not encompass socioeconomically and demographically similar populations (4), and as a result may obfuscate trends between health and SDoH-related factors.

Conversely, census tracts or block groups are designed to be homogenous in their demographic and socioeconomic makeup and are the preferred geographic units for small scale analyses.

Identify Data Sources and Analysis Tools

Many laboratorians may be familiar with accessing patient laboratory or health-related data from their institution’s laboratory information system (LIS) or electronic medical record (EMR). They may be less familiar with connecting a patient’s lab results to a geographic location. The key to making this connection is to retrieve data on patients’ residential addresses or the addresses of testing locations (depending on the research question), information that should be available in modern LISs and EMRs. Of course, a person’s address can change, so it is important to use the address at the time of testing rather than a patient’s most recent address.

Once address data is retrieved, the next step is to map it to its latitudinal and longitudinal coordinates via a process called “geocoding.” While many different geocoding tools are available, many are not Health Insurance Portability and Accountability Act (HIPAA) compliant, as they may require sending patient addresses over the Internet (8). Thus, one must use a geocoding tool that performs geocoding locally.

One option for local geocoding is ArcGIS Pro, a graphical-user-interface (GUI)-based software that is very robust and widely used within the geospatial analysis community (9). A disadvantage to this software is that it requires a fee-based software license. For users who want to get their feet wet with geospatial analysis without paying a fee, a validated geocoding software tool called DeGAUSS is freely available for download, although it does require basic knowledge of command line tools (10).

Once a dataset has been geocoded, it can be linked to population-level socioeconomic and demographic data for analyzing SDoH-related factors. A wide range of population-level data on SDoH-related factors are freely available from the U.S. Census Bureau (11). Validated composite metrics of social vulnerability at the census tract or block group level, such as the Social Vulnerability Index (SVI) (12) or Area Deprivation Index (ADI) (13), are also publicly available.

Once the data has been retrieved, cleaned, and geocoded, the work of geospatial analysis can begin. This may include geospatial visualization and modeling, which also require specialized software. Software tools for visualization and modeling can be categorized as GUI- and non-GUI-based.

GUI-based tools include ArcGIS Pro, which, as mentioned above, is robust, widely used, has excellent documentation and support, and comes with a fee. Alternatively, QGIS is a free GUI-based program (14) that provides much of the core functionality of ArcGIS Pro. Users with programming experience may prefer the customizability enabled by non-GUI-based tools such as R and Python, which each offer robust and well-documented packages for conducting geospatial analyses (15, 16).

Choosing A Geospatial Analysis or Modeling Approach

There is a wide variety of geospatial analysis and modeling techniques that range in their level of complexity, and deciding which to use depends on the specific question or use case. One of the most used analytical approaches is choropleth mapping, which involves classifying and visualizing geographic units using color-coded palettes. This approach is useful for observing overall trends in the geospatial data, as well as for performing exploratory data analysis.

Another useful technique is spatial autocorrelation, which takes advantage of statistical methods to identify geographic “hot spots” and “cold spots” for a given metric of interest.

Finally, spatial data can also be used in conjunction with traditional supervised or unsupervised machine learning algorithms, such as clustering or regression. For example, unsupervised clustering can group socioeconomically and demographically similar regions, enabling analysis of SDoH and potentially informing resource allocation. A more detailed explanation of analytical methods is outside the scope of this article, but further reading is available (17).

A Critical Tool for Labs to Bridge Healthcare Gaps

Geospatial analysis is a powerful and well-established discipline that researchers have used with great success in a variety of fields, but whose use in healthcare, and especially laboratory medicine, is still maturing. With the growing recognition of the impact of SDoH, using geospatial analysis to identify care gaps and plan appropriate interventions can be a tremendous value-add to a healthcare organization. Given our unique access to high-quality health data and subject matter expertise, laboratory medicine professionals are uniquely suited to fill this role.

 

Image of St, Louis census tracts colored by social vulnerability index

Figure 1: Poor glycemic control is associated with social vulnerability

Left: St. Louis census tracts colored by social vulnerability index (SVI) (12). Darker red colors indicate a higher level of social vulnerability. Right: HbA1c results were retrieved from the LIS along with the patient’s address of residence. Addresses were geocoded and assigned to census tracts, and the percentage of patients whose most recent HbA1c was in the uncontrolled range (≥9%) was calculated per census tract.

Using laboratory data in combination with geospatial and socioeconomic data allows us to observe a disparity, namely, that census tracts in the north and southeastern regions of the city, which tend to be more socioeconomically vulnerable, have higher rates of poorly controlled diabetes.

These data should be used to mobilize interventions to reduce these disparities, and prospective analyses can be used to monitor the effectiveness of those interventions.

Vahid Azimi, MD, is an instructor, pathology and immunology and assistant medical director of laboratory information systems at Washington University School of Medicine in St. Louis, Missouri. Email: [email protected]

References

  1. Braveman P, Egerter S, Williams DR. The social determinants of health: coming of age. Annu Rev Public Health 2011; doi: 10.1146/annurev-publhealth-031210-101218.
  2. Liao JM, Navathe AS. What comes next in prioritizing equity in payment? The ACO REACH Model. Health Aff For 2022;doi: 10.1377/forefront.20220404.728371.
  3. Fradelos EC, Papathanasiou IV, Mitsi D, et al. Health based geographic information systems (GIS) and their applications. Act Inform Med 2014; doi: 10.5455/aim.2014.22.402-405.
  4. Pearson J, Jacobson C, Ugochukwu N, et al. Geospatial analysis of patients’ social determinants of health for health systems science and disparity research. Int Anesthesiol Clin 2023; doi: 10.1097/aia.0000000000000389.
  5. Ducatman BS, Ducatman AM, Crawford JM, et al. The value proposition for pathologists: a population health approach. Acad Pathol 2020; doi: 10.1177/2374289519898857.
  6. Warrington JS, Brett A, Foster H, et al. Driving access to care: use of mobile units for urine specimen collection during the coronavirus disease-19 (COVID-19) pandemic. Acad Pathol 2020; doi: 10.1177/2374289520953557.
  7. Azimi V, Jackups Jr R, Farnsworth CW, et al. Use of laboratory data for illicit drug use surveillance and identification of socioeconomic risk factors. Drug Alcohol Depend 2022; doi: 10.1016/j.drugalcdep.2022.109499.
  8. California Department of Healthcare Services. List of HIPAA Identifiers. https://www.dhcs.ca.gov/dataandstats/data/Pages/ListofHIPAAIdentifiers.aspx (Accessed October 2023).
  9. ESRI. ArcGIS Pro. https://www.esri.com/en-us/arcgis/products/arcgis-pro/overview (Accessed October 2023)https://www.esri.com/en-us/arcgis/products/arcgis-pro/overview (Accessed October 2023).
  10. DeGAUSS . degauss.org (Accessed October 2023).
  11. United States Census Bureau. Explore Census Data. https://data.census.gov/ (Accessed October 2023).
  12. Agency for Toxic Substances and Disease Registry. CDC/ATSDR Social Vulnerability Index. https://www.atsdr.cdc.gov/placeandhealth/svi/index.html (Accessed October 2023).
  13. University of Wisconsin Center for Health Disparities Research. Neighborhood Atlas. https://www.neighborhoodatlas.medicine.wisc.edu/ (Accessed October 2023)https://www.neighborhoodatlas.medicine.wisc.edu/ (Accessed October 2023).
  14. QGIS. https://www.qgis.org/en/site/ (Accessed October 2023).
  15. Columbia University Libraries. GIS, Cartographic, and Spatial Analysis Tools: R/Rstudio. https://guides.library.columbia.edu/geotools/R (Accessed October 2023).
  16. Columbia University Libraries. GIS, Cartographic, and Spatial Analysis Tools: Python. https://guides.library.columbia.edu/geotools/Python (Accessed October 2023).
  17. Geographic Data Science with Python. Introduction. https://geographicdata.science/book/intro.html# (Accessed October 2023).