AIM-AHEAD and NCATS Training Program

Traineeship in Advanced Data Analysis using NCATS Data and the N3C Data Enclave

The application of artificial intelligence and machine learning (AI/ML) to large datasets is dramatically expanding the capacity for hypothesis testing impacting the biomedical and socio-economic domains.  However, underrepresented communities, particularly those at heightened risk of socioeconomic and health disparities, are not receiving AI/ML’s benefits. Training a diverse workforce of researchers proficient in the application of AI/ML represents an opportunity to address a critical unmet need by extending the benefits of AI/ML to underrepresented, at-risk communities.

The central goal of this training program is to increase researcher diversity in AI/ML by training individuals from diverse backgrounds who are committed to gaining proficiency in AI/ML data analysis and applying their expertise to benefit communities underrepresented in biomedical research.

To accomplish this objective, two cohorts of diverse professionals committed to applying AI/ML to benefit underrepresented communities will complete an intensive 8-month program in advanced data analysis developed by National Center for Advancing Translational Sciences (NCATS) and utilizing the resources of the NCATS N3C Data Enclave and AIM-AHEAD’s data science training core.  Completing the training will equip the motivated professional to conduct the in-depth analysis of large datasets essential for cutting-edge biomedical and socioeconomic research.

 

The AIM-AHEAD consortium (Data Science Training Core and Communications Hub) is partnering with NCATS to offer AIM-AHEAD stakeholders, trainees, mentees, and consortium partners a training opportunity designed to increase researcher diversity in AI/ML by leveraging the NCATS N3C Data Enclave. The N3C Data Enclave can be used to study COVID-19 and identify potential treatments, clinical best practices, novel patient cohorts, and interventions as the pandemic continues to evolve.

Figure. AIM-AHEAD/NCATS Training Program Structure and Timeline
Figure. AIM-AHEAD/NCATS Training Program Structure and Timeline

Program Description

The AIM-AHEAD consortium (Data Science Training Core and Communications Hub) is partnering with NCATS to offer AIM-AHEAD stakeholders, trainees, mentees, and consortium partners a training opportunity designed to increase researcher diversity in AI/ML by leveraging the NCATS N3C Data Enclave. The N3C Data Enclave can be used to study COVID-19 and identify potential treatments, clinical best practices, novel patient cohorts, and interventions as the pandemic continues to evolve. 

 

The N3C enables the rapid collection and analysis of clinical, laboratory and diagnostic data from hospitals and health care plans that have transferred this Data via a Data Transfer Agreement to the NCATS N3C Data Enclave. The N3C Data Enclave is built upon a state-of-the-art analytics platform that protects data security, patient privacy, and recognizes investigator contributions. The N3C now represents over 19.6M people, across all 50 states (with DC and Puerto Rico), 7.8M COVID-19 positive patients, 78 medical centers, and over 12.5 billion lab results.

 

This training opportunity is designed to reduce barriers for AIM-AHEAD researchers to access and analyze real-world clinical data, allowing them to conduct novel research at the intersection of AI/ML and health disparities with data collected from communities historically underrepresented in biomedical research. Trainees will have access to weekly training, help desk and concierge support via the N3C and AIM-AHEAD Connect platforms. This learning opportunity utilizes a hybrid training modality. Technical/Applied concepts are taught synchronously with hands-on projects and direct assistance from an instructor. Theoretical concepts are taught asynchronously using training videos and simple exercises. 

 

Trainees will be able to work on developing sound research question and hypothesis and be able to choose from a variety of available datasets within the N3C Data Enclave to study their research question. 

 

The research question that will build from the experiences learned in these classes will be focused on AIM-AHEAD's North Star (III): Use AI/ML to address disparities and minority health in behavioral health, cardiometabolic health, and cancer.

 

This training opportunity will engage two cohorts of graduate students and early-career researchers over a 36 week (i.e. 9 month) period. We particularly encourage individuals from under-represented communities to apply.

 

The first cohort will have 15 participants, focusing on professionals with a clinical background who may be less familiar with Python/R programming and have limited experience in clinical data processing and ethics. The second cohort, comprising 40 participants, is tailored for those skilled in programming (Python/R) but who may lack a clinical background, experience in statistical/data processing, or ethics training in the realm of clinical data.

 

Training will be structured into three separate 12-week phases, spanning a total of 36 weeks. During this period, both cohorts will progress concurrently. The training program will conclude with a 12-week collaborative phase, during which participants from both groups will join forces to address clinical research management challenges through problem-based learning.

Eligibility Criteria

 

1. Applicants must be:

A. U.S. Citizens, Permanent Residents, or Non-Citizen U.S. Nationals

B. Able to submit Form W-9 (Request for Taxpayer Identification)

C. Affiliated with one of the following entities:

    • Higher Education Institutions
      • Public/State Controlled Institutions of Higher Education
      • Private Institutions of Higher Education

Concordant with the goals of the AIM-AHEAD Coordinating Center, individuals affiliated with the following types of Higher Education Institutions are highly encouraged to apply:

      • Hispanic-serving Institution
      • Historically Black Colleges and Universities (HBCUs)
      • Tribally Controlled Colleges and Universities (TCCUs)
      • Alaska Native and Native Hawaiian Serving Institutions
      • Asian American Native American Pacific Islander Serving Institutions (AANAPISIs)
      • Other Minority Serving Institutions
    • Nonprofits Other Than Institutions of Higher Education
      • Nonprofits with 501(c)(3) IRS Status (Other than Institutions of Higher Education)
      • Nonprofits without 501(c)(3) IRS Status (Other than Institutions of Higher Education)
    • For-Profit Organizations
      • Small Businesses
      • For-Profit Organizations (Other than Small Businesses)

D. Able to obtain organization compliance documents:

    • Institutional signoff on Data Use Agreements / Data Sharing Agreements.

 

2. Education:
Applicants must have completed at least an undergraduate degree, but can be post-baccalaureate or graduate students, postdoctoral fellows, medical students or residents, allied health trainees, early-career investigators or early-career employees of non-academic institutions as defined in item 1C above.  Applicants must hold at a minimum a bachelor’s degree from an accredited U.S. institution in one of the following or related fields:

  1. Physical sciences (e.g., chemistry, physics)
  2. Biological or life sciences (e.g., biology, zoology, biochemistry, microbiology)
  3. Mathematics or statistics
  4. Data science
  5. Engineering
  6. Health sciences (e.g. pharmacy, psychology, health information technology, nurses, therapists, social workers)
  7. Public health (epidemiology, biostatistics, health administration, clinical implementation specialists)
  • Cohort 1 Activities include:
    • Hands on Python or R bootcamp
    • Regular concierge services on the workbench, R/Python coding, and other support to AIM-AHEAD researchers. This includes 1-1 guidance, virtual office hours, and helpdesk support using AIM-AHEAD Connect.
    • Access to asynchronous supplemental Python / R content in N3C
    • Upon completion of the prerequisite training, participants are anticipated to be awarded a $2,500 stipend. Subsequently, they will advance to infrastructure training.
  • Cohort 2 Activities include:
    • Infrastructure Training. This training will be conducted through webinars and workshops and hands-on bootcamps and cover topics in R or Python via a cloud based Jupyter Notebook environment.
    • Training on how to onboard to N3C and gain access to the Training Workspace (which includes creating an Enclave account, completing information Security & Management and Human Subjects Research Protection Trainings, and filing a Data Use Request to access de-identified patient data).
    • Working with N3C investigators and domain teams, and an AIM-AHEAD mentor to design testable hypotheses using N3C data. 
    • Training related to N3C Data Enclave platform Fundamentals, R, Python, and Jupyter Notebook via AIM-AHEAD Connect Courses (https://courses.aim-ahead.net/course/catalog).
    • Regular concierge services on the workbench, R/Python coding, and other support to AIM-AHEAD researchers. These services include 1-1 guidance, virtual office hours, and helpdesk support using AIM-AHEAD Connect.
    • Upon completion of the infrastructure training, participants are anticipated to be awarded a $2,500 stipend. Subsequently, they will advance to specialized training in either Clinical Data Management or Good Algorithmic Practice.
  • Cohort 1 Activities include: 
    • Infrastructure Training. This training will be conducted through webinars, workshops, and hands-on bootcamps and cover topics in R or Python via a cloud-based Jupyter Notebook environment.
    • Training on how to onboard to N3C and gain access to the Training Workspace (which includes creating an Enclave account, completing information Security & Management and Human Subjects Research protection Trainings, and filing a Data Use Request to access de-identified patient data).
    • Working with N3C investigators and domain teams, and an AIM-AHEAD mentor to design testable hypotheses using N3C data. 
    • Training related to N3C Data Enclave platform Fundamentals, R, Python, and Jupyter Notebook via AIM-AHEAD Connect Courses (https://courses.aim-ahead.net/course/catalog).
    • Regular concierge services on the workbench, R/Python coding, and other support to AIM-AHEAD researchers. This includes 1-1 guidance, virtual office hours, and helpdesk support using AIM-AHEAD Connect.
    • Following the Infrastructure training, participants will advance to specialized training in either Clinical Data Management or Good Algorithmic Practice.
  • Cohort 2 Activities include: Participants have the choice of two focused training pathways:
    • Training in Clinical Data Management. Trainees will be introduced to the nuances of clinical EHRs and their intersection with technology. Trainees will first gain a solid understanding of EHR data models, learning how patient information is electronically structured and stored. This knowledge is foundational to the subsequent exploration of clinical ontologies, which provide standardized vocabularies to facilitate consistent healthcare communication. Trainees will delve into techniques to align various data sources to ensure their interoperability. Ontology harmonization will also be discussed, highlighting the importance of maintaining a consistent language across healthcare systems. Trainees will focus on the ethical aspects of EHR data. Students will learn about the moral, legal, and professional considerations that accompany the use of patient data. Topics such as informed consent, regulatory guidelines, and the mandates of HIPAA will be covered to provide a comprehensive understanding of the ethical landscape. Lastly, trainees will gain an understanding of patient cohort identification, a crucial process in clinical research and trials. 
    • Training in Good Algorithmic Practice. Trainees will be introduced to the concept of explainability, human in-the-loop feedback, and foundational statistical theories underlying many modern techniques used in clinical data processing. Trainees will also learn to understand and identify common biases in clinical data that can contribute to misleading results/conclusions. The importance of discussing these implications in an explainable way healthcare professionals and patients who rely on algorithmic outputs for care decisions is also covered.
  • Both Cohorts will receive hands-on hybrid training modality, along with regular concierge services on the workbench, R/Python coding, and other support to AIM-AHEAD researchers. This includes 1-1 guidance, virtual office hours, and helpdesk support using the N3C and  AIM-AHEAD Connect platforms.
  • Upon successful completion of this phase of the NCATS training, it is anticipated that participants in both Cohorts will each be granted a stipend of $2,500, and proceed to specialized training in either Clinical Data Management or Good Algorithmic Practice within the context of real 
  • Cohort 1 and 2 Activities include: Participants have the choice of two focused training pathways:
    • Training in Clinical Data Management Problem Based Learning. Trainees will be introduced to the nuances of clinical EHRs and their intersection with technology. Trainees will first gain a solid understanding of EHR data models, learning how patient information is electronically structured and stored. 
    • Training in Good Algorithmic Practice Problem Based Learning. Trainees will be introduced to the concept of explainability, human in-the-loop feedback, and foundational statistical theories underlying many modern techniques used in clinical data processing. 
    • Both Cohorts will engage in clinical problem-based learning that will involve working with a designated AIM-AHEAD mentor to design testable hypotheses using N3C data.
    • Both Cohorts will receive hands-on hybrid training modality, along with regular concierge services on the workbench, R/Python coding, and other support to AIM-AHEAD researchers. These services include 1-1 guidance, virtual office hours, and helpdesk support using the N3C and  AIM-AHEAD Connect platforms.
    • Upon successful completion of the NCATS training, it is anticipated that participants in both Cohorts will each be granted a stipend of $3,000.

Having received advanced practical training in coding, model development, data cleaning and analysis and hypothesis testing, trainees completing this program will be well prepared to harness AI/ML approaches to conduct hypothesis-driven analysis of complex datasets.  These trainees will join the community of AI/ML professionals passionately committed to extend the benefits of AI/ML to communities underrepresented in biomedical research.

This multi-cohort training opportunity will be most beneficial for individuals who have basic understanding of the foundations of biomedical research and basic concepts in clinic practice or biology.  Although the experiences below are not mandatory for applicants, evidence of these experiences will be considered by the trainee selection committee for the given cohort:

  • Cohort 1: Applicants possessing a clinical background with limited exposure and experience with Python/R and clinical data processing knowledge and ethics.
  • Cohort 2: Applicants who are proficient in programming (Python/R) but lacking a clinical background, statistical/data processing knowledge, or ethics training related to clinical data.

Introductory or refresher courses on these topics will be available to successful applicants at the start of the traineeship, via the AIM-AHEAD Connect platform.

The program sets the following objectives for trainees upon completing the program:

 

Objective 1: To achieve proficiency in managing, analyzing, and optimizing clinical data within the N3C Data Enclave, utilizing specific data analysis tools and platforms, designing efficient workflows and ETL processes, curating concept sets, creating patient cohorts, and leveraging advanced computational resources to enhance data-intensive operations.

 

Objective 2: To formulate hypotheses testable by applying AI/ML and advanced data analyses.

 

Objective 3: To be familiar with EHR data models, clinical ontologies, and ethical guidelines, while equipping trainees with the tools and knowledge to standardize, align, and analyze clinical data, as well as identify patient cohorts, navigating challenges effectively.

 

Objective 4: Gain comprehensive understanding of the significance of data quality, integrity, and transparency in clinical settings, identify challenges, biases, and ethical considerations, and learn to validate datasets, integrate human feedback, assess models, and navigate legal and regulatory complexities.

 

Trainees are expected to:

  • Attend all training sessions
  • Devote to the program an average of 8 hours effort per week
  • Engage with NCATS and AIM-AHEAD Mentors
  • Engage in learning communities and peer networking
  • Utilize NCATS concierge and AIM-AHEAD Help Desk support
  • Present a research poster at the AIM-AHEAD annual meeting in summer 2024
  • Generate an abstract suitable for submission to a conference, and/or a manuscript suitable for peer-reviewed publication
  • Attend AIM-AHEAD meetings, such as the Annual Meeting and other webinars and seminars, and be part of the AIM-AHEAD community

 

Each trainee will receive:

  • A $8,000 stipend (Cohort 1 and 2)
  • Travel expenses to attend the AIM-AHEAD 2024 conference
  • Support and guidance from an experienced mentor
  • Support from the AIM-AHEAD data science training core
  • Access to de-identified data within the N3C enclave
  • Direct 1:1 guidance, virtual office hours, helpdesk support and concierge services supporting users of N3C Data Enclave, R and Python coding
  • Training on:
    • Data analysis using N3C Data Enclave platform fundamentals, R, Python, Jupyter Notebook
    • Use and applications of R, Python, and Jupyter Notebook
    • Infrastructure training that covers Data Analysis in the National Clinical Data Collaborative
    • Clinical Data Management
    • Good Algorithmic Practices
    • Hypothesis development for testing by analysis of N3C Data data
    • Human Subjects Research protection
    • Preparation of a valid Data Use Agreement
    • Using the N3C Data Enclave to access database of de-identified medical data

Participants will receive stipend support totaling $8,000 in three installments of $2,500, $2,500 and $3,000, and $2,000 in travel support to attend the AIM-AHEAD Annual Meeting. 

 

NCATS Data Enclave cloud costs (i.e. credits) will be covered by UNTHSC.

 

Each awarded Trainee will receive mentorship from experienced, skilled investigators within AIM-AHEAD core members who will guide the Trainee in developing their testable hypothesis using N3C data.  We will use the online mentoring platform AIM-AHEAD Connect (https://connect.aim-ahead.net) to match mentors with awarded Trainees and for mentor/fellow engagement and progress tracking.

Date

Activity

November 15 - December 11, 2023

Application Open

December 11, 2023

Application Due

 

Application Review and Ranking

 

NIH Approval of Trainee Roster

December 22, 2023

Announce Awardees

January 8, 2024

Training Begins

Applicant Diversity

The goal of the AIM-AHEAD Coordinating Center is to diversify the research workforce in AI/ML and Health Equity.  Consistent with the NIH Interest in Diversity (NOT-OD-20-031: Notice of NIH's Interest in Diversity), the following individuals are highly encouraged to apply for the traineeship:

  1. Individuals from health disparity populations that have been shown by the National Science Foundation to be underrepresented in health-related sciences on a national basis (see http://www.nsf.gov/statistics/showpub.cfm?TopID=2&SubID=27 and the report Women, Minorities, and Persons with Disabilities in Science and Engineering). The following racial and ethnic groups have been shown to be underrepresented in biomedical research:
    1. Blacks or African Americans,
    2. Hispanics or Latinos,
    3. American Indians or Alaska Natives,
    4. Native Hawaiians and other Pacific Islanders.
    5. In addition, it is recognized that underrepresentation can vary from setting to setting; individuals from racial or ethnic groups that can be demonstrated convincingly to be underrepresented by the grantee institution should be encouraged to participate in NIH programs to enhance diversity. For more information on racial and ethnic categories and definitions, see the OMB Revisions to the Standards for Classification of Federal Data on Race and Ethnicity (https://www.govinfo.gov/content/pkg/FR-1997-10-30/html/97-28653.htm).
  2. Individuals with disabilities, who are defined as those with a physical or mental impairment that substantially limits one or more major life activities, as described in the Americans with Disabilities Act of 1990, as amended. See National Science Foundation data at: https://www.nsf.gov/statistics/2017/nsf17310/static/data/tab7-5.pdf
  3. Individuals from disadvantaged backgrounds, defined as those who meet two or more of the following criteria:
    1. Were or currently are homeless, as defined by the McKinney-Vento Homeless Assistance Act (Definition: https://nche.ed.gov/mckinney-vento/)
    2. Were or currently are in the foster care system, as defined by the Administration for Children and Families (Definition: https://www.acf.hhs.gov/cb/focus-areas/foster-care)
    3. Were eligible for the Federal Free and Reduced Lunch Program for two or more years (Definition: https://www.fns.usda.gov/school-meals/income-eligibility- guidelines)
    4. Have/had no parents or legal guardians who completed a bachelor’s degree (see https://nces.ed.gov/pubs2018/2018009.pdf)
    5. Were or currently are eligible for Federal Pell grants (Definition: https://www2.ed.gov/programs/fpg/eligibility.html)
    6. Received support from the Special Supplemental Nutrition Program for Women, Infants and Children (WIC) as a parent or child (Definition: https://www.fns.usda.gov/wic/wic-eligibility-requirements)
    7. Grew up in one of the following areas:
      1. A U.S. rural area, as designated by the Health Resources and Services Administration (HRSA) Rural Health Grants Eligibility Analyzer (https://data.hrsa.gov/tools/rural-health), or
      2. A Centers for Medicare and Medicaid Services-designated Low-Income and Health Professional Shortage Area (qualifying zip codes are included in the file)

Note: Only one of the two areas under #vii can be used as a criterion for the disadvantaged background definition.

We are particularly interested in applicants from historically underrepresented groups in AI/ML, such as women, racial/ethnic minorities, people with disabilities, and individuals from rural or socially disadvantaged backgrounds.

  • Students from low socioeconomic (SES) status backgrounds have been shown to obtain bachelor’s and advanced degrees at significantly lower rates than students from middle and high SES groups (see https://nces.ed.gov/programs/coe/) and are consequently less likely to be represented in biomedical research. For background see Department of Education data at:
  • https://nces.ed.gov/
  • https://nces.ed.gov/programs/coe/
  • https://www2.ed.gov/rschstat/research/pubs/advancing-diversity-inclusion.pdf
  • Literature shows that women from the above backgrounds (categories A and B) face particular challenges at the graduate level and beyond in scientific fields. (See, e.g., From the NIH: A Systems Approach to Increasing the Diversity of Biomedical Research Workforce https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5008902/).
  • Women are known to be underrepresented in doctorate-granting research institutions at senior faculty levels in most biomedical-relevant disciplines, and may also be underrepresented at other faculty levels in some scientific disciplines (See data from the National Science Foundation National Center for Science and Engineering Statistics: Women, Minorities, and Persons with Disabilities in Science and Engineering, special report available at https://www.nsf.gov/statistics/2017/nsf17310/, especially Table 9-23, describing science, engineering, and health doctorate holders employed in universities and 4-year colleges, by broad occupation, sex, years since doctorate, and faculty rank)

Application Process

Applications must be submitted during the open application period (11/20/23-12/11/23). Applications should address the requirements below and any additional questions via the Traineeship Application Form.  The application should be understandable to readers from outside the applicant’s field of study and must clearly present the project aims, applicable studies already completed, methods, materials, and AIM-AHEAD engagement plan.

 

Application Requirements

Profile Information

  • Provide your name, organization, department, position title, research area, email address, and your profile web page
  • Gender, race/ethnicity
  • Please address on InfoReady the profile and prior experience questions 

Letters of Support

  • One signed letter of support from the applicant’s supervisor is required.  Letters of support should include the referee’s contact information (full name, position title, organization, email/phone number, and signature)
  • Letters of recommendation from faculty, mentor, or supervisor who can attest to the applicant’s aptitude for advanced data analysis training
  • Academic transcripts from applicant’s undergraduate and, if applicable, graduate programs

Biographical Sketch of the applicant fellow

Statement of Rationale for Pursuing Training (200 words maximum)

  • Express your interests in one of the two concentration tracks (i.e. Data Harmonization, or Good Algorithmic Practices). Describe what do you hope to accomplish through the Trainee Program? Provide your rationale and need for acquiring these skills
  • Describe your familiarity with (and/or interest in) AI/ML analysis, programming, EHR, clinical or genomic data analysis, biomedical science, public health background and cloud-based computation (if any)
  • Explain how you plan to apply the training to achieve your long-term research interests and objectives

 

Trainee Selection

A Study Review Committee comprised of AIM-AHEAD Consortium members and NCATS members will use the following criteria to evaluate proposals and select award recipients:

Rationale for AI/ML Training: 

  • The applicant clearly articulates his/her expectations and reasons for participating in the program.  The applicant also demonstrates the need for and importance of acquiring the training
  • The applicant has the background and motivation to participate in and benefit from the training
  • The applicant demonstrates a willingness to engage and collaborate with the AIM-AHEAD community, contribute to documentation and training resources, welcome and empowering new users, and help foster a diverse and inclusive community
  • The applicant describes specific plans for long-term application of the training to his/her research program and//or professional development

 

Notification of Award

Applicants should expect to be notified of their acceptance status on Friday, December 22, 2023. Applicants who are accepted should expect to immediately begin providing banking information to the University of North Texas Health Science Center to receive payment.

Submission using AIM-AHEAD Connect and InfoReady platforms

 

Step 1: Click here to register as a “mentee/learner” on AIM-AHEAD Connect (our Community Building Platform)

Step 2: Click here to submit an application for review using InfoReady platform*.

 * To submit your application in InfoReady, please use Chrome, Firefox, or Edge. If you're using Safari, make sure to clear your cache before logging in.

Please note both steps must be completed for consideration.

 

Applications are due on December 11, 2023 


The traineeship will begin on January 22, 2024

Scroll to top