Roughly speaking, anomaly detection consists of identifying instances whose features significantly deviate from the rest of input data. It is one of the most widely studied problems in unsupervised machine learning, boasting applications in network intrusion detection, healthcare and many others. Several methods have been developed in recent years, however, a satisfactory solution is still missing to the best of our knowledge. We present Random Histogram Forest an effective approach for unsupervised anomaly detection. Our approach is probabilistic, which has been proved to be effective in identifying anomalies. Moreover, it employs the fourth central moment (aka kurtosis), so as to identify potential anomalous instances. We conduct an extensive experimental evaluation on 38 datasets including all benchmarks for anomaly detection, as well as the most successful algorithms for unsupervised anomaly detection, to the best of our knowledge. We evaluate all the approaches in terms of the average precision of the area under the precision-recall curve (AP). Our evaluation shows that our approach significantly outperforms all other approaches in terms of AP while boasting linear running time.

Putina, Andrian; Sozio, Mauro; Rossi, Dario; Manuel Navarro, José. (2020). Random Histogram Forest for Unsupervised Anomaly Detection. In 2020 IEEE International Conference on Data Mining (ICDM) (pp. 1226- 1231). Doi: 10.1109/ICDM50108.2020.00154.

Random Histogram Forest for Unsupervised Anomaly Detection

Mauro Sozio;Dario Rossi;
2020

Abstract

Roughly speaking, anomaly detection consists of identifying instances whose features significantly deviate from the rest of input data. It is one of the most widely studied problems in unsupervised machine learning, boasting applications in network intrusion detection, healthcare and many others. Several methods have been developed in recent years, however, a satisfactory solution is still missing to the best of our knowledge. We present Random Histogram Forest an effective approach for unsupervised anomaly detection. Our approach is probabilistic, which has been proved to be effective in identifying anomalies. Moreover, it employs the fourth central moment (aka kurtosis), so as to identify potential anomalous instances. We conduct an extensive experimental evaluation on 38 datasets including all benchmarks for anomaly detection, as well as the most successful algorithms for unsupervised anomaly detection, to the best of our knowledge. We evaluate all the approaches in terms of the average precision of the area under the precision-recall curve (AP). Our evaluation shows that our approach significantly outperforms all other approaches in terms of AP while boasting linear running time.
2020
Histograms, Network intrusion detection, Medical services, Benchmark testing, Probabilistic logic, Anomaly detection, Random forests
Putina, Andrian; Sozio, Mauro; Rossi, Dario; Manuel Navarro, José. (2020). Random Histogram Forest for Unsupervised Anomaly Detection. In 2020 IEEE International Conference on Data Mining (ICDM) (pp. 1226- 1231). Doi: 10.1109/ICDM50108.2020.00154.
File in questo prodotto:
Non ci sono file associati a questo prodotto.
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11385/251384
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • Scopus 16
  • ???jsp.display-item.citation.isi??? 13
  • OpenAlex ND
social impact