IRIS - Institutional Research Information System

We tackle the problem of predicting the performance of MapReduce applications designing accurate progress indicators, which keep programmers informed on the percentage of completed computation time during the execution of a job. This is especially important in pay-as-you-go cloud environments, where slow jobs can be aborted in order to avoid excessive costs. Performance predictions can also serve as a building block for several profile-guided optimizations. By assuming that the running time depends linearly on the input size, state-of-the-art techniques can be seriously harmed by data skewness, load unbalancing, and straggling tasks. We thus design a novel profile-guided progress indicator, called NearestFit, that operates without the linear hypothesis assumption in a fully online way (i.e., without resorting to profile data collected from previous executions). NearestFit exploits a careful combination of nearest neighbor regression and statistical curve fitting techniques. Fine-grained profiles required by our theoretical progress model are approximated through space- and time-efficient data streaming algorithms. We implemented NearestFit on top of Hadoop 2.6.0. An extensive empirical assessment over the Amazon EC2 platform on a variety of benchmarks shows that its accuracy is very good, even when competitors incur non-negligible errors and wide prediction fluctuations.

Coppa, Emilio; Finocchi, Irene. (2015). On data skewness, stragglers, and MapReduce progress indicators. In Proceedings of the Sixth ACM Symposium on Cloud Computing (pp. 139- 152). Isbn: 978-1-4503-3651-2. Doi: 10.1145/2806777.2806843. http://dl.acm.org/citation.cfm?doid=2806777.2806843.

On data skewness, stragglers, and MapReduce progress indicators

COPPA, EMILIO;FINOCCHI, Irene

2015

Abstract

We tackle the problem of predicting the performance of MapReduce applications designing accurate progress indicators, which keep programmers informed on the percentage of completed computation time during the execution of a job. This is especially important in pay-as-you-go cloud environments, where slow jobs can be aborted in order to avoid excessive costs. Performance predictions can also serve as a building block for several profile-guided optimizations. By assuming that the running time depends linearly on the input size, state-of-the-art techniques can be seriously harmed by data skewness, load unbalancing, and straggling tasks. We thus design a novel profile-guided progress indicator, called NearestFit, that operates without the linear hypothesis assumption in a fully online way (i.e., without resorting to profile data collected from previous executions). NearestFit exploits a careful combination of nearest neighbor regression and statistical curve fitting techniques. Fine-grained profiles required by our theoretical progress model are approximated through space- and time-efficient data streaming algorithms. We implemented NearestFit on top of Hadoop 2.6.0. An extensive empirical assessment over the Amazon EC2 platform on a variety of benchmarks shows that its accuracy is very good, even when competitors incur non-negligible errors and wide prediction fluctuations.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno del convegno
	
				2015
			
	Codice ISBN
	
				978-1-4503-3651-2
			
	Parole chiave
	
				MapReduce; data skewness; hadoop; performance prediction; performance profiling; progress indicators
			
	Citazione
	
				Coppa, Emilio; Finocchi, Irene. (2015). On data skewness, stragglers, and MapReduce progress indicators. In Proceedings of the Sixth ACM Symposium on Cloud Computing (pp. 139- 152). Isbn: 978-1-4503-3651-2. Doi: 10.1145/2806777.2806843. http://dl.acm.org/citation.cfm?doid=2806777.2806843.
			
	Appare nelle tipologie:
	
				04.1 - Contributo in Atti di convegno (Paper in Proceedings)

File in questo prodotto:

File	Dimensione	Formato
ACM-SoCC15.pdf Solo gestori archivio Tipologia: Versione dell'editore Licenza: DRM (Digital rights management) non definiti Dimensione 300.91 kB Formato Adobe PDF Visualizza/Apri	300.91 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11385/192551

Citazioni

33

22

31

social impact