Dissimilarity spaces, along with feature reduction/ selection techniques, are among the mainstream approaches when dealing with pattern recognition problems in structured (and possibly non-metric) domains. In this work, we aim at investigating dissimilarity space representations in a biology-related application, namely protein function classification, as proteins are a seminal example of structured data given their primary and tertiary structures. Specifically, we propose two different analyses relying on both the complete dissimilarity matrix and a dimensionally-reduced version of the complete dissimilarity matrix, thereby casting the pattern recognition problem from structured domains towards real-valued feature vectors, for which any standard classification algorithm can be used. A third, hybrid, analysis uses a clustering-based one-class classifier exploiting different representations. First results conducted on a subset of the Escherichia coli proteome are promising and some of the analyses presented in this work may also dually suit field-experts, further bridging the gap between natural sciences and computational intelligence techniques.
Dissimilarity space representations and automatic feature selection for protein function prediction / De Santis, Enrico; Martino, Alessio; Rizzi, Antonello; Mascioli, Fabio Massimo Frattale. - 2018 International Joint Conference on Neural Networks (IJCNN), (2018), pp. 1-8. (IJCNN 2018 - 2018 International Joint Conference on Neural Networks, Rio De Janeiro, Brazil, 8-13 July, 2018). [10.1109/IJCNN.2018.8489115].
Dissimilarity space representations and automatic feature selection for protein function prediction
Martino, Alessio;
2018
Abstract
Dissimilarity spaces, along with feature reduction/ selection techniques, are among the mainstream approaches when dealing with pattern recognition problems in structured (and possibly non-metric) domains. In this work, we aim at investigating dissimilarity space representations in a biology-related application, namely protein function classification, as proteins are a seminal example of structured data given their primary and tertiary structures. Specifically, we propose two different analyses relying on both the complete dissimilarity matrix and a dimensionally-reduced version of the complete dissimilarity matrix, thereby casting the pattern recognition problem from structured domains towards real-valued feature vectors, for which any standard classification algorithm can be used. A third, hybrid, analysis uses a clustering-based one-class classifier exploiting different representations. First results conducted on a subset of the Escherichia coli proteome are promising and some of the analyses presented in this work may also dually suit field-experts, further bridging the gap between natural sciences and computational intelligence techniques.File | Dimensione | Formato | |
---|---|---|---|
DeSantis_Dissimilarity_2018.pdf
Solo gestori archivio
Tipologia:
Versione dell'editore
Licenza:
DRM (Digital rights management) non definiti
Dimensione
1.42 MB
Formato
Adobe PDF
|
1.42 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.