Toward building a general purpose: omics machine learning framework for public health

Funding period: 2023–2025
Lead: Julie (Chih-yu) Chen
Total GRDI funding: $439,400

With the exponential increase in computational power and large-scale data sets, artificial intelligence (AI) and its subdiscipline machine learning are increasingly being applied to health domains.

The recent Best Brain Exchange initiative on building a strategy for AI in public health by PHAC and CIHR shines the spotlight on public health AI applications and highlights the importance of developing good practices. Good practices by both ML developers and end users are crucial in creating successful applications. From the input data through to ML model and result interpretation, hidden biases and context dependencies can arise, underscoring the need for a standardized, bias-aware, and interpretable ML framework. The learning curve for data science (statistics and ML) and the complexity of ML models further challenge the adoption of ML models by experts in the public health domain.

A growing number of ML applications on microbial omics data have been developed for public health. One particularly notable ML application is the lineage classification of SARS-CoV-2 genomes by the PangoLEARN tool, which assigns a lineage (for example, BA.5) to each sequence and has been an integral part of genomic surveillance worldwide. Ready-made ML models save time for end users but still require data science literacy to evaluate the models for implementation, interpretation, and operationalization. In addition to evaluating PangoLEARN, we have developed prior and preliminary ML models on the prediction of antimicrobial resistance in TB, disease status, source location and microbial species/subspecies using genomics, metagenomics and proteomics data. We will also develop and evaluate ML models for genotyping and outbreak classification in TB for genomic surveillance. Although these topics are diverse and different ML algorithms are used, we propose to build a generalized and standardized omics ML framework, genOmicsML, to unify the approach to different topics.

This project aims to increase data science literacy related to omics data and facilitate the adoption of machine learning by end users for omics projects in public health and other domains. This project will also involve developing and evaluating machine learning models for tuberculosis genotyping and outbreak classification for genomic surveillance. Although these topics are diverse and use different machine learning algorithms, the team will build a generalized and standardized omics machine learning framework—genOmicsML—for a unified approach to different topics.

Contact us

For additional information, please contact:
Genomics R&D Initiative
Email: info@grdi-irdg.collaboration.gc.ca

Language selection

Language selection

Toward building a general purpose: omics machine learning framework for public health

Contact us