Topological Data Analysis is a novel approach, useful whenever data can be described by topological structures such as graphs. The aim of this paper is to investigate whether such tool can be used in order to define a set of descriptors useful for pattern recognition and machine learning tasks. Specifically, we consider a supervised learning problem with the final goal of predicting proteins' physiological function starting from their respective residue contact network. Indeed, folded proteins can effectively be described by graphs, making them a useful case-study for assessing Topological Data Analysis effectiveness concerning pattern recognition tasks. Experiments conducted on a subset of the Escherichia coli proteome using two different classification systems show that descriptors derived from Topological Data Analysis - namely, the Betti numbers sequence - lead to classification performances comparable with descriptors derived from widely-known centrality measures, as concerns the protein function prediction problem. Further benchmarking tests suggest the presence of some information despite the heavy compression intrinsic to the protein-to-Betti numbers casting.
Supervised approaches for protein function prediction by topological data analysis / Martino, Alessio; Rizzi, Antonello; Mascioli, Fabio Massimo Frattale. - 2018 International Joint Conference on Neural Networks (IJCNN), (2018), pp. 1-8. (IJCNN 2018 - 2018 International Joint Conference on Neural Networks, Rio De Janeiro, Brazil, 8-13 July, 2018). [10.1109/IJCNN.2018.8489307].
Supervised approaches for protein function prediction by topological data analysis
Martino, Alessio
;
2018
Abstract
Topological Data Analysis is a novel approach, useful whenever data can be described by topological structures such as graphs. The aim of this paper is to investigate whether such tool can be used in order to define a set of descriptors useful for pattern recognition and machine learning tasks. Specifically, we consider a supervised learning problem with the final goal of predicting proteins' physiological function starting from their respective residue contact network. Indeed, folded proteins can effectively be described by graphs, making them a useful case-study for assessing Topological Data Analysis effectiveness concerning pattern recognition tasks. Experiments conducted on a subset of the Escherichia coli proteome using two different classification systems show that descriptors derived from Topological Data Analysis - namely, the Betti numbers sequence - lead to classification performances comparable with descriptors derived from widely-known centrality measures, as concerns the protein function prediction problem. Further benchmarking tests suggest the presence of some information despite the heavy compression intrinsic to the protein-to-Betti numbers casting.File | Dimensione | Formato | |
---|---|---|---|
Martino_Supervised_2018.pdf
Solo gestori archivio
Tipologia:
Versione dell'editore
Licenza:
DRM (Digital rights management) non definiti
Dimensione
1.46 MB
Formato
Adobe PDF
|
1.46 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.