Efficient approaches for solving the large-scale k-medoids problem

Martino, Alessio; Rizzi, Antonello; Frattale Mascioli, Fabio Massimo

doi:10.5220/0006515003380347

In this paper, we propose a novel implementation for solving the large-scale k-medoids clustering problem. Conversely to the most famous k-means, k-medoids suffers from a computationally intensive phase for medoids evaluation, whose complexity is quadratic in space and time; thus solving this task for large datasets and, speciﬁcally, for large clusters might be unfeasible. In order to overcome this problem, we propose two alternatives for medoids update, one exact method and one approximate method: the former based on solving, in a distributed fashion, the quadratic medoid update problem; the latter based on a scan and replacement procedure. We implemented and tested our approach using the Apache Spark framework for parallel and distributed processing on several datasets of increasing dimensions, both in terms of patterns and dimensionality, and computational results show that both approaches are efﬁcient and effective, able to converge to the same solutions provided by state-of-the-art k-medoids implementations and, at the same time, able to scale very well as the dataset size and/or number of working units increase.

Martino, Alessio; Rizzi, Antonello; Frattale Mascioli, Fabio Massimo. (2017). Efficient approaches for solving the large-scale k-medoids problem. In Proceedings of the 9th International Joint Conference on Computational Intelligence - IJCCI (pp. 338- 347). SciTePress. Isbn: 978-989-758-274-5. Doi: 10.5220/0006515003380347. https://www.scitepress.org/Link.aspx?doi=10.5220/0006515003380347.

Efficient approaches for solving the large-scale k-medoids problem

Martino, Alessio;Rizzi, Antonello;Frattale Mascioli, Fabio Massimo

2017

Abstract

In this paper, we propose a novel implementation for solving the large-scale k-medoids clustering problem. Conversely to the most famous k-means, k-medoids suffers from a computationally intensive phase for medoids evaluation, whose complexity is quadratic in space and time; thus solving this task for large datasets and, speciﬁcally, for large clusters might be unfeasible. In order to overcome this problem, we propose two alternatives for medoids update, one exact method and one approximate method: the former based on solving, in a distributed fashion, the quadratic medoid update problem; the latter based on a scan and replacement procedure. We implemented and tested our approach using the Apache Spark framework for parallel and distributed processing on several datasets of increasing dimensions, both in terms of patterns and dimensionality, and computational results show that both approaches are efﬁcient and effective, able to converge to the same solutions provided by state-of-the-art k-medoids implementations and, at the same time, able to scale very well as the dataset size and/or number of working units increase.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno del convegno
	
				2017
			
	Codice ISBN
	
				978-989-758-274-5
			
	Parole chiave
	
				Cluster analysis, parallel and distributed computing, large-scale pattern recognition, unsupervised learning, big data mining
			
	Citazione
	
				Martino, Alessio; Rizzi, Antonello; Frattale Mascioli, Fabio Massimo. (2017). Efficient approaches for solving the large-scale k-medoids problem. In Proceedings of the 9th International Joint Conference on Computational Intelligence - IJCCI (pp. 338- 347).  SciTePress. Isbn: 978-989-758-274-5. Doi: 10.5220/0006515003380347. https://www.scitepress.org/Link.aspx?doi=10.5220/0006515003380347.
			
	Appare nelle tipologie:
	
				04.1 - Contributo in Atti di convegno (Paper in Proceedings)

File in questo prodotto:

File	Dimensione	Formato
Martino_Efﬁcient_2018.pdf Open Access Tipologia: Versione dell'editore Licenza: Creative commons Dimensione 402.75 kB Formato Adobe PDF Visualizza/Apri	402.75 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11385/214587

Citazioni

30

ND

26

IRIS - Institutional Research Information System