Efficient approaches for solving the large-scale k-medoids problem: towards structured data

Martino, Alessio; Rizzi, Antonello; Frattale Mascioli, Fabio Massimo

doi:10.1007/978-3-030-16469-0_11

The possibility of clustering objects represented by structured data with possibly non-trivial geometry certainly is an interesting task in pattern recognition. Moreover, in the Big Data era, the possibility of clustering huge amount of (structured) data challenges computer science and pattern recognition researchers alike. The aim of this paper is to bridge the gap on large-scale structured data clustering. Speciﬁcally, following a previous work, in this paper a parallel and distributed k-medoids clustering implementation is proposed and tested on real-world biological structured data, namely pathway maps (graphs) and primary structure of proteins (sequences). Furthermore, two methods for medoids’ evaluation are proposed and compared in terms of scalability, based on exact and approximate procedures, respectively. Computational results show that the proposed implementation is ﬂexible with respect to the dissimilarity measure and the input space adopted, with satisfactory results in terms of scalability.

Martino, Alessio; Rizzi, Antonello; Frattale Mascioli, Fabio Massimo. (2019). Efficient approaches for solving the large-scale k-medoids problem: towards structured data. In Christophe Sabourin, Juan Julian Merelo, Kurosh Madani, Kevin Warwick (Eds.), Computational Intelligence: 9th International Joint Conference, IJCCI 2017 Funchal-Madeira, Portugal, November 1-3, 2017 Revised Selected Papers (pp. 199-219). Springer. Isbn: 978-3-030-16468-3. Doi: 10.1007/978-3-030-16469-0_11.

Efficient approaches for solving the large-scale k-medoids problem: towards structured data

Martino, Alessio;Rizzi, Antonello;Frattale Mascioli, Fabio Massimo

2019

Abstract

The possibility of clustering objects represented by structured data with possibly non-trivial geometry certainly is an interesting task in pattern recognition. Moreover, in the Big Data era, the possibility of clustering huge amount of (structured) data challenges computer science and pattern recognition researchers alike. The aim of this paper is to bridge the gap on large-scale structured data clustering. Speciﬁcally, following a previous work, in this paper a parallel and distributed k-medoids clustering implementation is proposed and tested on real-world biological structured data, namely pathway maps (graphs) and primary structure of proteins (sequences). Furthermore, two methods for medoids’ evaluation are proposed and compared in terms of scalability, based on exact and approximate procedures, respectively. Computational results show that the proposed implementation is ﬂexible with respect to the dissimilarity measure and the input space adopted, with satisfactory results in terms of scalability.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2019
			
	Codice ISBN
	
				978-3-030-16468-3
			
	Parole chiave
	
				cluster analysis
parallel and distributed computing
large-scale pattern recognition
unsupervised learning
big data mining
non-metric spaces analysis
			
	Citazione
	
				Martino, Alessio; Rizzi, Antonello; Frattale Mascioli, Fabio Massimo.  (2019). Efficient approaches for solving the large-scale k-medoids problem: towards structured data.  In Christophe Sabourin, Juan Julian Merelo, Kurosh Madani, Kevin Warwick (Eds.), Computational Intelligence: 9th International Joint Conference, IJCCI 2017 Funchal-Madeira, Portugal, November 1-3, 2017 Revised Selected Papers (pp. 199-219). Springer. Isbn: 978-3-030-16468-3. Doi: 10.1007/978-3-030-16469-0_11.
			
	Appare nelle tipologie:
	
				02.1 - Capitolo o saggio su monografia (Monograph’s Chapter/Essay)

File in questo prodotto:

File	Dimensione	Formato
Martino_Efficient-approaches_2019.pdf Solo gestori archivio Tipologia: Versione dell'editore Licenza: DRM (Digital rights management) non definiti Dimensione 414.34 kB Formato Adobe PDF Visualizza/Apri	414.34 kB	Adobe PDF	Visualizza/Apri
Martino_Efficient-approaches_ProductFlyer_2019.pdf Solo gestori archivio Tipologia: Altro materiale allegato Licenza: DRM (Digital rights management) non definiti Dimensione 636.77 kB Formato Adobe PDF Visualizza/Apri	636.77 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11385/214561

Citazioni

27

21

ND

IRIS - Institutional Research Information System