2021
Chapman, Martin; Mumtaz, Shahzad; Rasmussen, Luke V; Karwath, Andreas; Gkoutos, Georgios V; Gao, Chuang; Thayer, Dan; Pacheco, Jennifer A; Parkinson, Helen; Richesson, Rachel L; Jefferson, Emily; Denaxas, Spiros; Curcin, Vasa
Desiderata for the development of next-generation electronic health record phenotype libraries Journal Article
In: GigaScience, vol. 10, no. 9, 2021.
Links | BibTeX | Tags: EHR, health data science, phenotypes, validation
@article{Chapman_2021,
title = {Desiderata for the development of next-generation electronic health record phenotype libraries},
author = {Martin Chapman and Shahzad Mumtaz and Luke V Rasmussen and Andreas Karwath and Georgios V Gkoutos and Chuang Gao and Dan Thayer and Jennifer A Pacheco and Helen Parkinson and Rachel L Richesson and Emily Jefferson and Spiros Denaxas and Vasa Curcin},
url = {https://doi.org/10.1093%2Fgigascience%2Fgiab059},
doi = {10.1093/gigascience/giab059},
year = {2021},
date = {2021-09-01},
urldate = {2021-09-01},
journal = {GigaScience},
volume = {10},
number = {9},
publisher = {Oxford University Press (OUP)},
keywords = {EHR, health data science, phenotypes, validation},
pubstate = {published},
tppubtype = {article}
}
Karwath, Andreas; Bunting, Karina V; Gill, Simrat K; Tica, Otilia; Pendleton, Samantha; Aziz, Furqan; Barsky, Andrey D; Chernbumroong, Saisakul; Duan, Jinming; Mobley, Alastair R; Cardoso, Victor Roth; Slater, Luke; Williams, John A; Bruce, Emma-Jane; Wang, Xiaoxia; Flather, Marcus D; Coats, Andrew J S; Gkoutos, Georgios V; Kotecha, Dipak
Redefining beta-blocker response in heart failure patients with sinus rhythm and atrial fibrillation: a machine learning cluster analysis Journal Article
In: The Lancet, 2021.
Abstract | Links | BibTeX | Tags: artificial intelligence, clustering, crossvalidation, deep learning, EHR, health data science, phenotypes, validation
@article{Karwath_2021,
title = {Redefining beta-blocker response in heart failure patients with sinus rhythm and atrial fibrillation: a machine learning cluster analysis},
author = {Andreas Karwath and Karina V Bunting and Simrat K Gill and Otilia Tica and Samantha Pendleton and Furqan Aziz and Andrey D Barsky and Saisakul Chernbumroong and Jinming Duan and Alastair R Mobley and Victor Roth Cardoso and Luke Slater and John A Williams and Emma-Jane Bruce and Xiaoxia Wang and Marcus D Flather and Andrew J S Coats and Georgios V Gkoutos and Dipak Kotecha},
url = {https://doi.org/10.1016%2Fs0140-6736%2821%2901638-x},
doi = {10.1016/s0140-6736(21)01638-x},
year = {2021},
date = {2021-08-01},
urldate = {2021-08-01},
journal = {The Lancet},
publisher = {Elsevier BV},
abstract = {Background
Mortality remains unacceptably high in patients with heart failure and reduced left ventricular ejection fraction (LVEF) despite advances in therapeutics. We hypothesised that a novel artificial intelligence approach could better assess multiple and higher-dimension interactions of comorbidities, and define clusters of β-blocker efficacy in patients with sinus rhythm and atrial fibrillation.
Methods
Neural network-based variational autoencoders and hierarchical clustering were applied to pooled individual patient data from nine double-blind, randomised, placebo-controlled trials of β blockers. All-cause mortality during median 1·3 years of follow-up was assessed by intention to treat, stratified by electrocardiographic heart rhythm. The number of clusters and dimensions was determined objectively, with results validated using a leave-one-trial-out approach. This study was prospectively registered with ClinicalTrials.gov (NCT00832442) and the PROSPERO database of systematic reviews (CRD42014010012).
Findings
15 659 patients with heart failure and LVEF of less than 50% were included, with median age 65 years (IQR 56–72) and LVEF 27% (IQR 21–33). 3708 (24%) patients were women. In sinus rhythm (n=12 822), most clusters demonstrated a consistent overall mortality benefit from β blockers, with odds ratios (ORs) ranging from 0·54 to 0·74. One cluster in sinus rhythm of older patients with less severe symptoms showed no significant efficacy (OR 0·86, 95% CI 0·67–1·10; p=0·22). In atrial fibrillation (n=2837), four of five clusters were consistent with the overall neutral effect of β blockers versus placebo (OR 0·92, 0·77–1·10; p=0·37). One cluster of younger atrial fibrillation patients at lower mortality risk but similar LVEF to average had a statistically significant reduction in mortality with β blockers (OR 0·57, 0·35–0·93; p=0·023). The robustness and consistency of clustering was confirmed for all models (p<0·0001 vs random), and cluster membership was externally validated across the nine independent trials.
Interpretation
An artificial intelligence-based clustering approach was able to distinguish prognostic response from β blockers in patients with heart failure and reduced LVEF. This included patients in sinus rhythm with suboptimal efficacy, as well as a cluster of patients with atrial fibrillation where β blockers did reduce mortality.
Funding
Medical Research Council, UK, and EU/EFPIA Innovative Medicines Initiative BigData@Heart.},
keywords = {artificial intelligence, clustering, crossvalidation, deep learning, EHR, health data science, phenotypes, validation},
pubstate = {published},
tppubtype = {article}
}
Mortality remains unacceptably high in patients with heart failure and reduced left ventricular ejection fraction (LVEF) despite advances in therapeutics. We hypothesised that a novel artificial intelligence approach could better assess multiple and higher-dimension interactions of comorbidities, and define clusters of β-blocker efficacy in patients with sinus rhythm and atrial fibrillation.
Methods
Neural network-based variational autoencoders and hierarchical clustering were applied to pooled individual patient data from nine double-blind, randomised, placebo-controlled trials of β blockers. All-cause mortality during median 1·3 years of follow-up was assessed by intention to treat, stratified by electrocardiographic heart rhythm. The number of clusters and dimensions was determined objectively, with results validated using a leave-one-trial-out approach. This study was prospectively registered with ClinicalTrials.gov (NCT00832442) and the PROSPERO database of systematic reviews (CRD42014010012).
Findings
15 659 patients with heart failure and LVEF of less than 50% were included, with median age 65 years (IQR 56–72) and LVEF 27% (IQR 21–33). 3708 (24%) patients were women. In sinus rhythm (n=12 822), most clusters demonstrated a consistent overall mortality benefit from β blockers, with odds ratios (ORs) ranging from 0·54 to 0·74. One cluster in sinus rhythm of older patients with less severe symptoms showed no significant efficacy (OR 0·86, 95% CI 0·67–1·10; p=0·22). In atrial fibrillation (n=2837), four of five clusters were consistent with the overall neutral effect of β blockers versus placebo (OR 0·92, 0·77–1·10; p=0·37). One cluster of younger atrial fibrillation patients at lower mortality risk but similar LVEF to average had a statistically significant reduction in mortality with β blockers (OR 0·57, 0·35–0·93; p=0·023). The robustness and consistency of clustering was confirmed for all models (p<0·0001 vs random), and cluster membership was externally validated across the nine independent trials.
Interpretation
An artificial intelligence-based clustering approach was able to distinguish prognostic response from β blockers in patients with heart failure and reduced LVEF. This included patients in sinus rhythm with suboptimal efficacy, as well as a cluster of patients with atrial fibrillation where β blockers did reduce mortality.
Funding
Medical Research Council, UK, and EU/EFPIA Innovative Medicines Initiative BigData@Heart.
2014
Gütlein, Martin; Karwath, Andreas; Kramer, Stefan
CheS-Mapper 2.0 for visual validation of (Q)SAR models Journal Article
In: J. Cheminformatics, vol. 6, no. 1, pp. 41, 2014.
Abstract | Links | BibTeX | Tags: cheminformatics, data mining, graph mining, validation, visualization
@article{gutlein2014,
title = {CheS-Mapper 2.0 for visual validation of (Q)SAR models},
author = {Martin Gütlein and Andreas Karwath and Stefan Kramer},
url = {http://dx.doi.org/10.1186/s13321-014-0041-7},
doi = {10.1186/s13321-014-0041-7},
year = {2014},
date = {2014-09-23},
journal = {J. Cheminformatics},
volume = {6},
number = {1},
pages = {41},
abstract = {Background
Sound statistical validation is important to evaluate and compare the overall performance of (Q)SAR models. However, classical validation does not support the user in better understanding the properties of the model or the underlying data. Even though, a number of visualization tools for analyzing (Q)SAR information in small molecule datasets exist, integrated visualization methods that allow the investigation of model validation results are still lacking.
Results
We propose visual validation, as an approach for the graphical inspection of (Q)SAR model validation results. The approach applies the 3D viewer CheS-Mapper, an open-source application for the exploration of small molecules in virtual 3D space. The present work describes the new functionalities in CheS-Mapper 2.0, that facilitate the analysis of (Q)SAR information and allows the visual validation of (Q)SAR models. The tool enables the comparison of model predictions to the actual activity in feature space. The approach is generic: It is model-independent and can handle physico-chemical and structural input features as well as quantitative and qualitative endpoints.
Conclusions
Visual validation with CheS-Mapper enables analyzing (Q)SAR information in the data and indicates how this information is employed by the (Q)SAR model. It reveals, if the endpoint is modeled too specific or too generic and highlights common properties of misclassified compounds. Moreover, the researcher can use CheS-Mapper to inspect how the (Q)SAR model predicts activity cliffs. The CheS-Mapper software is freely available at http://ches-mapper.org.
Graphical abstract
Comparing actual and predicted activity values with CheS-Mapper.},
keywords = {cheminformatics, data mining, graph mining, validation, visualization},
pubstate = {published},
tppubtype = {article}
}
Sound statistical validation is important to evaluate and compare the overall performance of (Q)SAR models. However, classical validation does not support the user in better understanding the properties of the model or the underlying data. Even though, a number of visualization tools for analyzing (Q)SAR information in small molecule datasets exist, integrated visualization methods that allow the investigation of model validation results are still lacking.
Results
We propose visual validation, as an approach for the graphical inspection of (Q)SAR model validation results. The approach applies the 3D viewer CheS-Mapper, an open-source application for the exploration of small molecules in virtual 3D space. The present work describes the new functionalities in CheS-Mapper 2.0, that facilitate the analysis of (Q)SAR information and allows the visual validation of (Q)SAR models. The tool enables the comparison of model predictions to the actual activity in feature space. The approach is generic: It is model-independent and can handle physico-chemical and structural input features as well as quantitative and qualitative endpoints.
Conclusions
Visual validation with CheS-Mapper enables analyzing (Q)SAR information in the data and indicates how this information is employed by the (Q)SAR model. It reveals, if the endpoint is modeled too specific or too generic and highlights common properties of misclassified compounds. Moreover, the researcher can use CheS-Mapper to inspect how the (Q)SAR model predicts activity cliffs. The CheS-Mapper software is freely available at http://ches-mapper.org.
Graphical abstract
Comparing actual and predicted activity values with CheS-Mapper.
2013
Gütlein, Martin; Helma, Christoph; Karwath, Andreas; Kramer, Stefan
A Large-Scale Empirical Evaluation of Cross-Validation and External Test Set Validation in (Q)SAR Journal Article
In: Molecular Informatics, vol. 32, no. 5-6, pp. 516-528, 2013.
Abstract | Links | BibTeX | Tags: cheminformatics, crossvalidation, external validation, QSAR, validation
@article{guetlein2013,
title = {A Large-Scale Empirical Evaluation of Cross-Validation and External Test Set Validation in (Q)SAR},
author = {Martin Gütlein and Christoph Helma and Andreas Karwath and Stefan Kramer},
url = {http://onlinelibrary.wiley.com/doi/10.1002/minf.201200134/abstract},
doi = {10.1002/minf.201200134},
year = {2013},
date = {2013-10-14},
urldate = {2013-10-14},
journal = {Molecular Informatics},
volume = {32},
number = {5-6},
pages = {516-528},
abstract = {(Q)SAR model validation is essential to ensure the quality of inferred models and to indicate future model predictivity on unseen compounds. Proper validation is also one of the requirements of regulatory authorities in order to accept the (Q)SAR model, and to approve its use in real world scenarios as alternative testing method. However, at the same time, the question of how to validate a (Q)SAR model, in particular whether to employ variants of cross-validation or external test set validation, is still under discussion. In this paper, we empirically compare a k-fold cross-validation with external test set validation. To this end we introduce a workflow allowing to realistically simulate the common problem setting of building predictive models for relatively small datasets. The workflow allows to apply the built and validated models on large amounts of unseen data, and to compare the performance of the different validation approaches. The experimental results indicate that cross-validation produces higher performant (Q)SAR models than external test set validation, reduces the variance of the results, while at the same time underestimates the performance on unseen compounds. The experimental results reported in this paper suggest that, contrary to current conception in the community, cross-validation may play a significant role in evaluating the predictivity of (Q)SAR models.},
keywords = {cheminformatics, crossvalidation, external validation, QSAR, validation},
pubstate = {published},
tppubtype = {article}
}
2010
Hardy, Barry J.; Douglas, Nicki; Helma, Christoph; Rautenberg, Micha; Jeliazkova, Nina; Jeliazkov, Vedrin; Nikolova, Ivelina; Benigni, Romualdo; Tcheremenskaia, Olga; Kramer, Stefan; Girschick, Tobias; Buchwald, Fabian; Wicker, Jörg; Karwath, Andreas; Gütlein, Martin; Maunz, Andreas; Sarimveis, Haralambos; Melagraki, Georgia; Afantitis, Antreas; Sopasakis, Pantelis; Gallagher, David; Poroikov, Vladimir; Filimonov, Dmitry; Zakharov, Alexey V.; Lagunin, Alexey; Gloriozova, Tatyana; Novikov, Sergey; Skvortsova, Natalia; Druzhilovsky, Dmitry; Chawla, Sunil; Ghosh, Indira; Ray, Surajit; Patel, Hitesh; Escher, Sylvia
Collaborative development of predictive toxicology applications Journal Article
In: J. Cheminformatics, vol. 2, pp. 7, 2010.
Abstract | Links | BibTeX | Tags: crossvalidation, data mining, QSAR, scientific knowledge, validation
@article{hardy2010,
title = {Collaborative development of predictive toxicology applications},
author = {Barry J. Hardy and Nicki Douglas and Christoph Helma and Micha Rautenberg and Nina Jeliazkova and Vedrin Jeliazkov and Ivelina Nikolova and Romualdo Benigni and Olga Tcheremenskaia and Stefan Kramer and Tobias Girschick and Fabian Buchwald and Jörg Wicker and Andreas Karwath and Martin Gütlein and Andreas Maunz and Haralambos Sarimveis and Georgia Melagraki and Antreas Afantitis and Pantelis Sopasakis and David Gallagher and Vladimir Poroikov and Dmitry Filimonov and Alexey V. Zakharov and Alexey Lagunin and Tatyana Gloriozova and Sergey Novikov and Natalia Skvortsova and Dmitry Druzhilovsky and Sunil Chawla and Indira Ghosh and Surajit Ray and Hitesh Patel and Sylvia Escher},
url = {http://dx.doi.org/10.1186/1758-2946-2-7},
doi = {10.1186/1758-2946-2-7},
year = {2010},
date = {2010-08-31},
urldate = {2010-08-31},
journal = {J. Cheminformatics},
volume = {2},
pages = {7},
abstract = {OpenTox provides an interoperable, standards-based Framework for the support of predictive toxicology data management, algorithms, modelling, validation and reporting. It is relevant to satisfying the chemical safety assessment requirements of the REACH legislation as it supports access to experimental data, (Quantitative) Structure-Activity Relationship models, and toxicological information through an integrating platform that adheres to regulatory requirements and OECD validation principles. Initial research defined the essential components of the Framework including the approach to data access, schema and management, use of controlled vocabularies and ontologies, architecture, web service and communications protocols, and selection and integration of algorithms for predictive modelling. OpenTox provides end-user oriented tools to non-computational specialists, risk assessors, and toxicological experts in addition to Application Programming Interfaces (APIs) for developers of new applications. OpenTox actively supports public standards for data representation, interfaces, vocabularies and ontologies, Open Source approaches to core platform components, and community-based collaboration approaches, so as to progress system interoperability goals.
The OpenTox Framework includes APIs and services for compounds, datasets, features, algorithms, models, ontologies, tasks, validation, and reporting which may be combined into multiple applications satisfying a variety of different user needs. OpenTox applications are based on a set of distributed, interoperable OpenTox API-compliant REST web services. The OpenTox approach to ontology allows for efficient mapping of complementary data coming from different datasets into a unifying structure having a shared terminology and representation.
Two initial OpenTox applications are presented as an illustration of the potential impact of OpenTox for high-quality and consistent structure-activity relationship modelling of REACH-relevant endpoints: ToxPredict which predicts and reports on toxicities for endpoints for an input chemical structure, and ToxCreate which builds and validates a predictive toxicity model based on an input toxicology dataset. Because of the extensible nature of the standardised Framework design, barriers of interoperability between applications and content are removed, as the user may combine data, models and validation from multiple sources in a dependable and time-effective way.},
keywords = {crossvalidation, data mining, QSAR, scientific knowledge, validation},
pubstate = {published},
tppubtype = {article}
}
The OpenTox Framework includes APIs and services for compounds, datasets, features, algorithms, models, ontologies, tasks, validation, and reporting which may be combined into multiple applications satisfying a variety of different user needs. OpenTox applications are based on a set of distributed, interoperable OpenTox API-compliant REST web services. The OpenTox approach to ontology allows for efficient mapping of complementary data coming from different datasets into a unifying structure having a shared terminology and representation.
Two initial OpenTox applications are presented as an illustration of the potential impact of OpenTox for high-quality and consistent structure-activity relationship modelling of REACH-relevant endpoints: ToxPredict which predicts and reports on toxicities for endpoints for an input chemical structure, and ToxCreate which builds and validates a predictive toxicity model based on an input toxicology dataset. Because of the extensible nature of the standardised Framework design, barriers of interoperability between applications and content are removed, as the user may combine data, models and validation from multiple sources in a dependable and time-effective way.