Graph embedding is an established and popular approach when designing graph-based pattern recognition systems. Amongst the several strategies, in the last ten years, Granular Computing emerged as a promising framework for structural pattern recognition. In the late 2000’s, symbolic histograms have been proposed as the driving force in order to perform the graph embedding procedure by counting the number of times each granule of information appears in the graph to be embedded. Similarly to a bag-of-words representation of a text corpora, symbolic histograms have been originally conceived as integer-valued vectorial representation of the graphs. In this paper, we propose six ‘relaxed’ versions of symbolic histograms, where the proper dissimilarity values between the information granules and the constituent parts of the graph to be embedded are taken into account, information which is discarded in the original symbolic histogram formulation due to the hard-limited nature of the counting procedure. Experimental results on six open-access datasets of fully-labelled graphs show comparable performance in terms of classification accuracy with respect to the original symbolic histograms (average accuracy shift ranging from -7% to +2%), counterbalanced by a great improvement in terms of number of resulting information granules, hence number of features in the embedding space (up to 75% less features, on average).

Relaxed Dissimilarity-based Symbolic Histogram Variants for Granular Graph Embedding / Baldini, Luca; Martino, Alessio; Rizzi, Antonello. - Proceedings of the 13th International Joint Conference on Computational Intelligence, (2021), pp. 221-235. (13th International Joint Conference on Computational Intelligence (NCTA), Online Streaming, 25–27 October 2021). [10.5220/0010652500003063].

Relaxed Dissimilarity-based Symbolic Histogram Variants for Granular Graph Embedding

Martino, Alessio;
2021

Abstract

Graph embedding is an established and popular approach when designing graph-based pattern recognition systems. Amongst the several strategies, in the last ten years, Granular Computing emerged as a promising framework for structural pattern recognition. In the late 2000’s, symbolic histograms have been proposed as the driving force in order to perform the graph embedding procedure by counting the number of times each granule of information appears in the graph to be embedded. Similarly to a bag-of-words representation of a text corpora, symbolic histograms have been originally conceived as integer-valued vectorial representation of the graphs. In this paper, we propose six ‘relaxed’ versions of symbolic histograms, where the proper dissimilarity values between the information granules and the constituent parts of the graph to be embedded are taken into account, information which is discarded in the original symbolic histogram formulation due to the hard-limited nature of the counting procedure. Experimental results on six open-access datasets of fully-labelled graphs show comparable performance in terms of classification accuracy with respect to the original symbolic histograms (average accuracy shift ranging from -7% to +2%), counterbalanced by a great improvement in terms of number of resulting information granules, hence number of features in the embedding space (up to 75% less features, on average).
2021
978-989-758-534-0
Structural Pattern Recognition, Supervised Learning, Embedding Spaces, Granular Computing, Graph Edit Distances, Graph Embedding
File in questo prodotto:
File Dimensione Formato  
106525.pdf

Open Access

Tipologia: Versione dell'editore
Licenza: Creative commons
Dimensione 529.39 kB
Formato Adobe PDF
529.39 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11385/214519
Citazioni
  • Scopus 5
  • ???jsp.display-item.citation.isi??? 3
social impact