Проектирование системы ML для предсказания патогенности мутаций

А.Н. Сұлтанғазиева; Д.Д. Орынбай

doi:10.62687/STJ.8.1.2025.6

Vol. 1 No. 8 (2025), Статьи

Vol. 1 No. 8 (2025)

Machine learning system for mutation pathogenicity prediction

Статьи

https://doi.org/10.62687/STJ.8.1.2025.6

Published 2025-12-30

A.N. Sultangaziyeva⁺⁻
D.D.Orynbai⁺⁻

A.N. Sultangaziyeva

Astana International University

Astana International University, Astana, Kazakhstan

D.D.Orynbai

Astana International University, Astana, Kazakhstan

PDF (Russian)

Keywords

machine learning, bioinformatics, mutation pathogenicity, Apache Spark, Random Forest, genetic variant classification, ClinVar, personalized medicine.

How to Cite

Machine learning system for mutation pathogenicity prediction. (2025). SMART TECHNOLOGIES JOURNAL, 1(8). https://doi.org/10.62687/STJ.8.1.2025.6

Abstract

The article presents the design of a distributed machine learning system for automatic classification of genetic variant pathogenicity based on ClinVar clinical data. The relevance is determined by the need to accelerate the interpretation of next-generation sequencing results in clinical practice, where manual analysis of hundreds of thousands of variants takes weeks of geneticists' work.

Architectural solutions for processing large volumes of genetic data using Apache Spark MLlib technology and ensemble learning methods are investigated. Methods of system analysis of biomedical databases, feature engineering for categorical genetic features, cross-validation, and comparative analysis of classification algorithms were applied.

A three-stage methodology was developed: data preparation with normalization and categorization of clinical significance, feature engineering using StringIndexer and OneHotEncoder, training three models (Logistic Regression, Random Forest, Gradient Boosted Trees) with hyperparameter optimization through Grid Search. A recommendation system with five-level variant prioritization (CRITICAL/HIGH/MEDIUM/LOW/MINIMAL) based on pathogenicity probabilities was designed.

Results include a scalable architecture for processing 1млн+ records and an automated clinical recommendation generation module.

PDF (Russian)

Machine learning system for mutation pathogenicity prediction

Keywords

How to Cite

Download Citation

Abstract