Data and Research Core (DRC)

Address research priorities and needs to form an inclusive basis for conducting equity-focused AI/ML research targeting the use of electronic health records

About

The mission of the AIM-AHEAD Data and Research Core (DRC) is to broaden the diversity and representation of healthcare data in artificial intelligence and machine learning (AI/ML) and expand its availability to diverse teams of researchers to address health disparities.

The DRC is not a single database. Instead, AIM-AHEAD seeks to catalyze an ecosystem of datasets to help address the lack of population diversity in data used in AI/ML models.

Data Set Options for Research Funded by AIM-AHEAD

These data sources are options for projects teams to propose for AIM-AHEAD-funded research projects. Applicants may also propose other data sources for their projects. As noted in the right column, AIM-AHEAD data partners provide extra services to facilitate access and mentorship to AIM-AHEAD-approved project teams. 

Source

Description

Data Allowed

Access Notes 

A customized subset from OCHIN Community Health Equity Database

EHR data from underserved communities

HIPAA Limited Data Set, individual-patient level data with dates and geographic indicators if needed for research

AIM-AHEAD Data Partner* with facilitated access, concierge services for funded projects. Available through AIM-AHEAD Service Workbench, data use agreement and IRB required. (see below)

Data Bridge from MedStar Health

(Curated data from the MedStar Health EHR)

EHR data from hospital system network with 31% African American patient representation

Multiple curated dataset options (further detail on website) pre-curated or custom curated de-identified EHR, Limited Dataset, Full PHI EHR dataset, Imaging, Select clinical notes, select genomics data, synthetic data

AIM-AHEAD Data Partner* with facilitated access, concierge services for funded projects. Available through MedStar Health, data use agreement and IRB required. (see below)

       

60+ studies from NHLBI BioData Catalyst

Selected large-scale cohorts related to heart, lung, blood and sleep disorders. Includes both prospective clinical studies and associated genomic TOPMED data.

De-identified dataset. Including individual level genomic (TOPMED full genomes) and clinical datasets.

Available on NHLBI BioData Catalyst Infrastructure. Requires approval of Data Access Request; most datasets require IRB.

Selected 15 Open datasets on AWS

A variety of datasets available including clinical and genomic data

Public data, and controlled access data (depends on dataset)

Available on AIM-AHEAD Service Workbench; access requirements depend on the dataset. 

NIH All of Us

The All of Us Research Program is building one of the largest biomedical data resources of its kind.

The All of Us Research Hub stores health data from a diverse group of participants from across the United States.

Available on All of Us Research Workbench, requires registration and institutional use agreement.

The ScHARe Data Ecosystem

ScHARe is a cloud-based research collaboration platform developed by the National Institute on Minority Health and Health Disparities and the  National Institute of Nursing  Research

Google-hosted Public Datasets

ScHARe-hosted Public Datasets

ScHARe-hosted Project Datasets



See reference document

 

The DRC and Infrastructure Core also collaborate to assist AIM-AHEAD awardees in locating other data sources to support their projects. As part of its mission to diversify datasets used in AI/ML, AIM-AHEAD has conducted a landscape survey to raise awareness about datasets that may be of interest to the research community. Each dataset has its own governance process and rules for access.

Apply to include a dataset in the data landscape list

View the landscape survey datasets

AIM-AHEAD Data Partners 

AIM-AHEAD-funded projects may apply to receive facilitated access and data concierge services from AIM-AHEAD data partners that emphasize historically under-resourced and under-represented populations.

Data Bridge from MedStar Health MedStar Health and MedStar Health Research Institute (MHRI) include an extensive network of clinical facilities in the mid-Atlantic region Learn More
OCHIN Community Health Equity Database OCHIN, a nonprofit health care innovation center with a core mission to advance health equity. Learn More

How AIM-AHEAD Data Partners Expand Representation

     

Race

People who select a single race other than White, or who select more than one race

3,087,377

2,220,068

Ethnicity 

People who select an ethnicity other than those listed under the race of White 

2,618,1291

1,962,904

Age

<18 years old and 65 years and above

2,623,517

2,374,283

Sexual and Gender Minority

Individuals with sexual orientation other than ‘straight,’ gender identity other than
‘man’ or ‘woman,’ and/or sex other than ‘male’ or ‘female’

326,557

Not well-captured

Income

Annual household income < $25,000

4,911,886

Not well-captured

Education

People without a high school diploma or GED

Not well-captured but FQHCs generally higher than general population

2,560

Access to Care

Needed a medical visit in the past 12 months but cannot readily use the health care system or pay for needed care

Not well-captured but FQHCs generally higher than general population

Not well-captured

Geography

Residents of established rural and non-metropolitan zip codes, based on the HRSA Federal Office of Rural Health Policy data files

1,276,525

83,3092

Disability 

People with a physical, functional, cognitive, or other condition that substantially limits one or more life activities

788,7493

266,2263

Last Updated: June 2024

Source: All of Us reference UBR categories

1. People with Hispanic ethnicity at any race
2. Based on rural and suburban hospital discharges
3. Based on ICD codes for disability in study by Clark et al, including physical, visual, hearing, intellectual/developmental disabilities

Leadership

Keith Norris
Keith Norris
Lead MPI (Interim)

University of California Los Angeles

Nawar Shara
Nawar Shara
MPI

MedStar Health Research Institute

Josh Lemieux
Josh Lemieux
Co-I

OCHIN

Stephen Fernandez
Stephen Fernandez
Co-I

MedStar Health Research Institute

Erin Hernandez
Erin Hernandez
Project Director

OCHIN

Robert Schuff
Robert Schuff
Data Science Lead

OCHIN

Wyatt Bensken
Wyatt Bensken
Investigator

OCHIN

Sara Stienecker
Sara Stienecker
Co-Director

MedStar Health — Georgetown University CoLab

Taona Haderlein
Taona Haderlein
Investigator

OCHIN

Scroll to top