We develop the first fully dynamic algorithm that maintains a decision tree over an arbitrary sequence of insertions and deletions of labeled examples. Given ε>0 our algorithm guarantees that, at every point in time, every node of the decision tree uses a split with Gini gain within an additive ε of the optimum. For real-valued features the algorithm has an amortized running time per insertion/deletion of O((d·log³n)/ε²), which improves to O((d·log²n)/ε) for binary or categorical features, while it uses space O(n·d), where n is the maximum number of examples at any point in time and d is the number of features. Our algorithm is nearly optimal, as we show that any algorithm with similar guarantees requires amortized running time Ω(d) and space Ω(n·d/polylog(nd)). We complement our theoretical results with an extensive experimental evaluation on real-world data, showing the effectiveness of our algorithm.

Fully-Dynamic Decision Trees / Bressan, Marco; Damay, Gabriel; Sozio, Mauro. - Thirty-Seventh {AAAI} Conference on Artificial Intelligence, {AAAI}2023, Thirty-Fifth Conference on Innovative Applications of ArtificialIntelligence, {IAAI} 2023, Thirteenth Symposium on Educational Advancesin Artificial Intelligence, {EAAI} 2023, Washington, DC, USA, February7-14, 2023, (2023), pp. 6842-6849. (AAAI 23 - Thirty-Seventh {AAAI} Conference on Artificial Intelligence, Washington, DC, USA, February 7-14). [10.1609/AAAI.V37I6.25838].

Fully-Dynamic Decision Trees

Mauro Sozio
2023

Abstract

We develop the first fully dynamic algorithm that maintains a decision tree over an arbitrary sequence of insertions and deletions of labeled examples. Given ε>0 our algorithm guarantees that, at every point in time, every node of the decision tree uses a split with Gini gain within an additive ε of the optimum. For real-valued features the algorithm has an amortized running time per insertion/deletion of O((d·log³n)/ε²), which improves to O((d·log²n)/ε) for binary or categorical features, while it uses space O(n·d), where n is the maximum number of examples at any point in time and d is the number of features. Our algorithm is nearly optimal, as we show that any algorithm with similar guarantees requires amortized running time Ω(d) and space Ω(n·d/polylog(nd)). We complement our theoretical results with an extensive experimental evaluation on real-world data, showing the effectiveness of our algorithm.
File in questo prodotto:
File Dimensione Formato  
25838-Article Text-29901-1-2-20230626-4.pdf

Open Access

Tipologia: Versione dell'editore
Licenza: Tutti i diritti riservati
Dimensione 3.91 MB
Formato Adobe PDF
3.91 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11385/248143
Citazioni
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 0
  • OpenAlex ND
social impact