We propose a new compressed representation for weighted de Bruijn graphs, which is based on the idea of delta-encoding the variations of k-mer abundances on a spanning branching of the graph. Our new data structure is likely to be of practical value: to give an idea, when combined with the compressed BOSS de Bruijn graph representation, it encodes the weighted de Bruijn graph of a 16x-covered DNA read-set (60M distinct k-mers, k = 28) within 4.15 bits per distinct k-mer and can answer abundance queries in about 60 microseconds on a standard machine. In contrast, state of the art tools declare a space usage of at least 30 bits per distinct k-mer for the same task, which is confirmed by our experiments. As a by-product of our new data structure, we exhibit efficient compressed data structures for answering partial sums on edge-weighted trees, which might be of independent interest.

Italiano, Giuseppe Francesco; Prezza, Nicola; Sinaimeri, Blerina; Venturini, R.. (2021). Compressed weighted de Bruijn graphs. In Gawrychowski and Starikovskaya (Eds.), Leibniz International Proceedings in Informatics, LIPIcs (pp. 1-16). Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik. Isbn: 978-3-95977-186-3. Doi: 10.4230/LIPIcs.CPM.2021.16.

Compressed weighted de Bruijn graphs

Italiano G. F.;Prezza N.;Sinaimeri B.
;
2021

Abstract

We propose a new compressed representation for weighted de Bruijn graphs, which is based on the idea of delta-encoding the variations of k-mer abundances on a spanning branching of the graph. Our new data structure is likely to be of practical value: to give an idea, when combined with the compressed BOSS de Bruijn graph representation, it encodes the weighted de Bruijn graph of a 16x-covered DNA read-set (60M distinct k-mers, k = 28) within 4.15 bits per distinct k-mer and can answer abundance queries in about 60 microseconds on a standard machine. In contrast, state of the art tools declare a space usage of at least 30 bits per distinct k-mer for the same task, which is confirmed by our experiments. As a by-product of our new data structure, we exhibit efficient compressed data structures for answering partial sums on edge-weighted trees, which might be of independent interest.
2021
978-3-95977-186-3
Compressed data structures, K-mer annotation, Partial sums, Weighted de Bruijn graphs
Italiano, Giuseppe Francesco; Prezza, Nicola; Sinaimeri, Blerina; Venturini, R.. (2021). Compressed weighted de Bruijn graphs. In Gawrychowski and Starikovskaya (Eds.), Leibniz International Proceedings in Informatics, LIPIcs (pp. 1-16). Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik. Isbn: 978-3-95977-186-3. Doi: 10.4230/LIPIcs.CPM.2021.16.
File in questo prodotto:
File Dimensione Formato  
LIPIcs-CPM-2021-16.pdf

Open Access

Tipologia: Versione dell'editore
Licenza: Creative commons
Dimensione 781.48 kB
Formato Adobe PDF
781.48 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11385/210659
Citazioni
  • Scopus 4
  • ???jsp.display-item.citation.isi??? 0
  • OpenAlex 5
social impact