Dissimilarity spaces, along with feature reduction/ selection techniques, are among the mainstream approaches when dealing with pattern recognition problems in structured (and possibly non-metric) domains. In this work, we aim at investigating dissimilarity space representations in a biology-related application, namely protein function classification, as proteins are a seminal example of structured data given their primary and tertiary structures. Specifically, we propose two different analyses relying on both the complete dissimilarity matrix and a dimensionally-reduced version of the complete dissimilarity matrix, thereby casting the pattern recognition problem from structured domains towards real-valued feature vectors, for which any standard classification algorithm can be used. A third, hybrid, analysis uses a clustering-based one-class classifier exploiting different representations. First results conducted on a subset of the Escherichia coli proteome are promising and some of the analyses presented in this work may also dually suit field-experts, further bridging the gap between natural sciences and computational intelligence techniques.
De Santis, Enrico; Martino, Alessio; Rizzi, Antonello; Mascioli, Fabio Massimo Frattale. (2018). Dissimilarity space representations and automatic feature selection for protein function prediction. In 2018 International Joint Conference on Neural Networks (IJCNN) (pp. 1- 8). Institute of Electrical and Electronics Engineers (IEEE). Isbn: 978-1-5090-6014-6. Doi: 10.1109/IJCNN.2018.8489115. https://ieeexplore.ieee.org/document/8489115.
Dissimilarity space representations and automatic feature selection for protein function prediction
Martino, Alessio;
2018
Abstract
Dissimilarity spaces, along with feature reduction/ selection techniques, are among the mainstream approaches when dealing with pattern recognition problems in structured (and possibly non-metric) domains. In this work, we aim at investigating dissimilarity space representations in a biology-related application, namely protein function classification, as proteins are a seminal example of structured data given their primary and tertiary structures. Specifically, we propose two different analyses relying on both the complete dissimilarity matrix and a dimensionally-reduced version of the complete dissimilarity matrix, thereby casting the pattern recognition problem from structured domains towards real-valued feature vectors, for which any standard classification algorithm can be used. A third, hybrid, analysis uses a clustering-based one-class classifier exploiting different representations. First results conducted on a subset of the Escherichia coli proteome are promising and some of the analyses presented in this work may also dually suit field-experts, further bridging the gap between natural sciences and computational intelligence techniques.| File | Dimensione | Formato | |
|---|---|---|---|
|
DeSantis_Dissimilarity_2018.pdf
Solo gestori archivio
Tipologia:
Versione dell'editore
Licenza:
DRM (Digital rights management) non definiti
Dimensione
1.42 MB
Formato
Adobe PDF
|
1.42 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



