Background: Metabolic networks are complex systems that describe the biochemical reactions within an organism through pairwise interactions between chemical compounds. While this representation is widely used to study biological function, it fails to capture the full structure of metabolic reactions, many of which involve more than two compounds. Hypergraphs offer a more natural representation, where nodes represent metabolites and hyperedges represent reactions involving multiple participants. Clustering such metabolic hypergraphs can reveal systematic differences among evolutionarily distinct organisms, providing insight into ecological constraints and evolutionary pressures. Methods: In this study, we investigate how different graphs and hypergraphs embedding methods influence their unsupervised clustering, with the goal of capturing taxonomy-based classes. We apply 14 distinct embedding strategies to a large-scale dataset of 8467 metabolic hypergraphs. Each embedding was followed by hierarchical clustering using a fixed linkage method. To assess performance, we compared the resulting clusters against known taxonomic groupings. Results: Our findings show that the choice of hypergraph embedding has a significant effect on clustering outcomes. Among the tested methods, Bag of Hyperedges with Jaccard distance, Histogram Cosine Kernel, and a Hypergraph Auto-Encoder consistently performed best. We also advocate that the embedding method should be chosen based on the goal of the downstream task.
Cervellini, Mattia; Sinaimeri, Blerina; Matias, Catherine; Martino, Alessio. (2026). Comparing the ability of embedding methods on metabolic hypergraphs for capturing taxonomy-based features. ALGORITHMS FOR MOLECULAR BIOLOGY, (ISSN: 1748-7188), 21:1, 1-26. Doi: 10.1186/s13015-026-00298-w.
Comparing the ability of embedding methods on metabolic hypergraphs for capturing taxonomy-based features
Sinaimeri, Blerina;Martino, Alessio
2026
Abstract
Background: Metabolic networks are complex systems that describe the biochemical reactions within an organism through pairwise interactions between chemical compounds. While this representation is widely used to study biological function, it fails to capture the full structure of metabolic reactions, many of which involve more than two compounds. Hypergraphs offer a more natural representation, where nodes represent metabolites and hyperedges represent reactions involving multiple participants. Clustering such metabolic hypergraphs can reveal systematic differences among evolutionarily distinct organisms, providing insight into ecological constraints and evolutionary pressures. Methods: In this study, we investigate how different graphs and hypergraphs embedding methods influence their unsupervised clustering, with the goal of capturing taxonomy-based classes. We apply 14 distinct embedding strategies to a large-scale dataset of 8467 metabolic hypergraphs. Each embedding was followed by hierarchical clustering using a fixed linkage method. To assess performance, we compared the resulting clusters against known taxonomic groupings. Results: Our findings show that the choice of hypergraph embedding has a significant effect on clustering outcomes. Among the tested methods, Bag of Hyperedges with Jaccard distance, Histogram Cosine Kernel, and a Hypergraph Auto-Encoder consistently performed best. We also advocate that the embedding method should be chosen based on the goal of the downstream task.| File | Dimensione | Formato | |
|---|---|---|---|
|
s13015-026-00298-w.pdf
Open Access
Tipologia:
Versione dell'editore
Licenza:
Creative commons
Dimensione
3.17 MB
Formato
Adobe PDF
|
3.17 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



