We study the problem of maintaining a decision tree in the fully-dynamic setting, where the dataset is updated by an adversarial sequence of insertions and deletions. We present the first algorithm with strong guarantees on both the quality of the tree and the worst-case update time (the maximum number of operations the algorithm performs between two dataset updates). For instance, we can maintain a tree where each node has Gini gain within β of the optimum, while guaranteeing an update time O d β−3 log4 n , where d is the number of features and n the maximum size of the dataset. This is optimal up to polylogarithmic factors, as any dynamic algorithm must have update time in Ω(d). Similar guarantees hold for the variance and information gain, for classification and regression, as well as for boosted trees. This shows that many popular decision trees such as ID3 or C4.5 can be efficiently made dynamic, answering an open question of Bressan, Damay, and Sozio (2023). We also show that, under the 3SUM conjecture or the Orthogonal Vectors Hypothesis, the update time must be polynomial in β −1

Fully-Dynamic Approximate Decision Trees With Worst-Case Update Time Guarantees / Bressan, Marco; Sozio, Mauro. - Fully-Dynamic Approximate Decision Trees With Worst-Case Update Time Guarantees, (2024), pp. 4517-4541. (ICML 24 - Forty-first International Conference on Machine Learning, Vienna, Austria, July 21-27).

Fully-Dynamic Approximate Decision Trees With Worst-Case Update Time Guarantees

Mauro Sozio
2024

Abstract

We study the problem of maintaining a decision tree in the fully-dynamic setting, where the dataset is updated by an adversarial sequence of insertions and deletions. We present the first algorithm with strong guarantees on both the quality of the tree and the worst-case update time (the maximum number of operations the algorithm performs between two dataset updates). For instance, we can maintain a tree where each node has Gini gain within β of the optimum, while guaranteeing an update time O d β−3 log4 n , where d is the number of features and n the maximum size of the dataset. This is optimal up to polylogarithmic factors, as any dynamic algorithm must have update time in Ω(d). Similar guarantees hold for the variance and information gain, for classification and regression, as well as for boosted trees. This shows that many popular decision trees such as ID3 or C4.5 can be efficiently made dynamic, answering an open question of Bressan, Damay, and Sozio (2023). We also show that, under the 3SUM conjecture or the Orthogonal Vectors Hypothesis, the update time must be polynomial in β −1
File in questo prodotto:
File Dimensione Formato  
4082_Fully_Dynamic_Approximate.pdf

Open Access

Tipologia: Versione dell'editore
Licenza: Tutti i diritti riservati
Dimensione 477.94 kB
Formato Adobe PDF
477.94 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11385/248142
Citazioni
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact