We propose a new compressed representation for weighted de Bruijn graphs, which is based on the idea of delta-encoding the variations of k-mer abundances on a spanning branching of the graph. Our new data structure is likely to be of practical value: to give an idea, when combined with the compressed BOSS de Bruijn graph representation, it encodes the weighted de Bruijn graph of a 16x-covered DNA read-set (60M distinct k-mers, k = 28) within 4.15 bits per distinct k-mer and can answer abundance queries in about 60 microseconds on a standard machine. In contrast, state of the art tools declare a space usage of at least 30 bits per distinct k-mer for the same task, which is confirmed by our experiments. As a by-product of our new data structure, we exhibit efficient compressed data structures for answering partial sums on edge-weighted trees, which might be of independent interest.

Compressed weighted de Bruijn graphs / Italiano, Giuseppe Francesco; Prezza, Nicola; Sinaimeri, Blerina; Venturini, R.. - 191:(2021), pp. 1-16. [10.4230/LIPIcs.CPM.2021.16]

Compressed weighted de Bruijn graphs

Italiano G. F.;Prezza N.;Sinaimeri B.
;
2021

Abstract

We propose a new compressed representation for weighted de Bruijn graphs, which is based on the idea of delta-encoding the variations of k-mer abundances on a spanning branching of the graph. Our new data structure is likely to be of practical value: to give an idea, when combined with the compressed BOSS de Bruijn graph representation, it encodes the weighted de Bruijn graph of a 16x-covered DNA read-set (60M distinct k-mers, k = 28) within 4.15 bits per distinct k-mer and can answer abundance queries in about 60 microseconds on a standard machine. In contrast, state of the art tools declare a space usage of at least 30 bits per distinct k-mer for the same task, which is confirmed by our experiments. As a by-product of our new data structure, we exhibit efficient compressed data structures for answering partial sums on edge-weighted trees, which might be of independent interest.
2021
978-3-95977-186-3
Compressed data structures, K-mer annotation, Partial sums, Weighted de Bruijn graphs
Compressed weighted de Bruijn graphs / Italiano, Giuseppe Francesco; Prezza, Nicola; Sinaimeri, Blerina; Venturini, R.. - 191:(2021), pp. 1-16. [10.4230/LIPIcs.CPM.2021.16]
File in questo prodotto:
File Dimensione Formato  
LIPIcs-CPM-2021-16.pdf

Open Access

Tipologia: Versione dell'editore
Licenza: Creative commons
Dimensione 781.48 kB
Formato Adobe PDF
781.48 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11385/210659
Citazioni
  • Scopus 3
  • ???jsp.display-item.citation.isi??? ND
social impact