The possibility of clustering objects represented by structured data with possibly non-trivial geometry certainly is an interesting task in pattern recognition. Moreover, in the Big Data era, the possibility of clustering huge amount of (structured) data challenges computer science and pattern recognition researchers alike. The aim of this paper is to bridge the gap on large-scale structured data clustering. Specifically, following a previous work, in this paper a parallel and distributed k-medoids clustering implementation is proposed and tested on real-world biological structured data, namely pathway maps (graphs) and primary structure of proteins (sequences). Furthermore, two methods for medoids’ evaluation are proposed and compared in terms of scalability, based on exact and approximate procedures, respectively. Computational results show that the proposed implementation is flexible with respect to the dissimilarity measure and the input space adopted, with satisfactory results in terms of scalability.
Efficient approaches for solving the large-scale k-medoids problem: towards structured data / Martino, Alessio; Rizzi, Antonello; Frattale Mascioli, Fabio Massimo. - 829:(2019), pp. 199-219. [10.1007/978-3-030-16469-0_11]
Efficient approaches for solving the large-scale k-medoids problem: towards structured data
Martino, Alessio
;
2019
Abstract
The possibility of clustering objects represented by structured data with possibly non-trivial geometry certainly is an interesting task in pattern recognition. Moreover, in the Big Data era, the possibility of clustering huge amount of (structured) data challenges computer science and pattern recognition researchers alike. The aim of this paper is to bridge the gap on large-scale structured data clustering. Specifically, following a previous work, in this paper a parallel and distributed k-medoids clustering implementation is proposed and tested on real-world biological structured data, namely pathway maps (graphs) and primary structure of proteins (sequences). Furthermore, two methods for medoids’ evaluation are proposed and compared in terms of scalability, based on exact and approximate procedures, respectively. Computational results show that the proposed implementation is flexible with respect to the dissimilarity measure and the input space adopted, with satisfactory results in terms of scalability.File | Dimensione | Formato | |
---|---|---|---|
Martino_Efficient-approaches_2019.pdf
Solo gestori archivio
Tipologia:
Versione dell'editore
Licenza:
DRM (Digital rights management) non definiti
Dimensione
414.34 kB
Formato
Adobe PDF
|
414.34 kB | Adobe PDF | Visualizza/Apri |
Martino_Efficient-approaches_ProductFlyer_2019.pdf
Solo gestori archivio
Tipologia:
Altro materiale allegato
Licenza:
DRM (Digital rights management) non definiti
Dimensione
636.77 kB
Formato
Adobe PDF
|
636.77 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.