Unsupervised learning methods to extract clinical knowledge of patients with chronic diseases
- Chushig Muzo, Cristian David
- Inmaculada Mora Jiménez Director/a
- Cristina Soguero Ruiz Codirector/a
Universidad de defensa: Universidad Rey Juan Carlos
Fecha de defensa: 01 de abril de 2022
- José Luis Sancho Gómez Presidente
- Jesús San Román Montero Secretario/a
- Maurizio Filippone Vocal
Tipo: Tesis
Resumen
Over the last decades, life expectancy has significantly increased worldwide. Recent demographic trends outline that the number of elderly people will continue to rise, yielding populations at higher risk of developing chronic diseases. The number of chronic patients is growing yearly, entailing a significant health burden and demand of services and resources for medical care. Diabetes and hypertension are two of the most prevalent chronic conditions, mainly showing patterns of associative multimorbidity among elderly people. The widespread adoption of Electronic Health Records (EHRs) in national health systems has generated an unprecedented amount of clinical data. EHRs allow to register data of different aspects of care, collecting great information of patients, and becoming a valuable source for conducting data-driven approaches, especially those based on Machine Learning (ML). These methods have revolutionized both academia and industry, substantially outperforming prior outcomes in different domains. ML models have been used in conjunction with EHRs for different clinical applications, including patient mortality prediction, hospital readmission prediction, and identification of adverse events, among others. The obtained insights from these models have the potential to lead an important transformation in traditional health care, shifting from approaches guided by experts to data-driven approaches. Despite the noteworthy benefits of using ML methods in the clinical setting, data extracted from EHRs raised important challenges. EHR data exhibit high levels of heterogeneity and high-dimensionality that substantially affect the learning process of statistical and conventional ML methods. Furthermore, in many applications, the data labels may not be available or be reliable. Unsupervised learning methods provide a way to reveal the underlying structure of complex datasets, allowing us to discover unknown patterns and characterize clusters associated with chronic conditions. The main goal of this Dissertation is to apply and adapt unsupervised learning methods to automatically extract clinical knowledge of patients with chronic diseases. The following specific objectives are proposed: \textit{(i)} to develop a data-driven approach enabling the clinical characterization of the health status associated with different chronic populations; \textit{(ii)} to build new representations associated with chronic patients through dimensionality-reduction techniques enabling the visualization and identification of clusters of patients with specific chronic conditions; and \textit{(iii)} to design a methodology based on probabilistic methods for supporting interpretability of black-box models when used in the clinical setting. From a clinical point of view, we seek to determine factors associated with the onset and progression of chronic conditions, crucial for planning resources, early diagnosis, and prevention. Remark that early interventions and appropriate treatments can help to reduce the economic burden associated with chronic diseases. This Thesis contributes to the bioengineering field by providing effective unsupervised learning methodologies for extracting clinical knowledge from real-world patient data, allowing us to address the main challenges raised by EHR data, improving pattern recognition, visualization, and clinical interpretation.