The possibility of clustering objects represented by structured data with possibly non-trivial geometry certainly is an interesting task in pattern recognition. Moreover, in the Big Data era, the possibility of clustering huge amount of (structured) data challenges computer science and pattern recognition researchers alike. The aim of this paper is to bridge the gap on large-scale structured data clustering. Specifically, following a previous work, in this paper a parallel and distributed k-medoids clustering implementation is proposed and tested on real-world biological structured data, namely pathway maps (graphs) and primary structure of proteins (sequences). Furthermore, two methods for medoids’ evaluation are proposed and compared in terms of scalability, based on exact and approximate procedures, respectively. Computational results show that the proposed implementation is flexible with respect to the dissimilarity measure and the input space adopted, with satisfactory results in terms of scalability.

Efficient approaches for solving the large-scale k-medoids problem: towards structured data / Martino, Alessio; Rizzi, Antonello; Frattale Mascioli, Fabio Massimo. - 829:(2019), pp. 199-219. [10.1007/978-3-030-16469-0_11]

Efficient approaches for solving the large-scale k-medoids problem: towards structured data

Martino, Alessio
;
2019

Abstract

The possibility of clustering objects represented by structured data with possibly non-trivial geometry certainly is an interesting task in pattern recognition. Moreover, in the Big Data era, the possibility of clustering huge amount of (structured) data challenges computer science and pattern recognition researchers alike. The aim of this paper is to bridge the gap on large-scale structured data clustering. Specifically, following a previous work, in this paper a parallel and distributed k-medoids clustering implementation is proposed and tested on real-world biological structured data, namely pathway maps (graphs) and primary structure of proteins (sequences). Furthermore, two methods for medoids’ evaluation are proposed and compared in terms of scalability, based on exact and approximate procedures, respectively. Computational results show that the proposed implementation is flexible with respect to the dissimilarity measure and the input space adopted, with satisfactory results in terms of scalability.
2019
978-3-030-16468-3
cluster analysis
parallel and distributed computing
large-scale pattern recognition
unsupervised learning
big data mining
non-metric spaces analysis
Efficient approaches for solving the large-scale k-medoids problem: towards structured data / Martino, Alessio; Rizzi, Antonello; Frattale Mascioli, Fabio Massimo. - 829:(2019), pp. 199-219. [10.1007/978-3-030-16469-0_11]
File in questo prodotto:
File Dimensione Formato  
Martino_Efficient-approaches_2019.pdf

Solo gestori archivio

Tipologia: Versione dell'editore
Licenza: DRM (Digital rights management) non definiti
Dimensione 414.34 kB
Formato Adobe PDF
414.34 kB Adobe PDF   Visualizza/Apri
Martino_Efficient-approaches_ProductFlyer_2019.pdf

Solo gestori archivio

Tipologia: Altro materiale allegato
Licenza: DRM (Digital rights management) non definiti
Dimensione 636.77 kB
Formato Adobe PDF
636.77 kB Adobe PDF   Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11385/214561
Citazioni
  • Scopus 28
  • ???jsp.display-item.citation.isi??? 17
social impact