We describe a bibliometric network characterizing co-authorship collaborations in the entire Italian academic community. The network, consisting of 38,220 nodes and 507,050 edges, is built upon two distinct data sources: faculty information provided by the Italian Ministry of University and Research and publications available in Semantic Scholar. Both nodes and edges are associated with a large variety of semantic data, including gender, bibliometric indexes, authors' and publications' research fields, and temporal information. While linking data between the two original sources posed many challenges, the network has been carefully validated to assess its reliability and to understand its graph-theoretic characteristics. By resembling several features of social networks, our dataset can be profitably leveraged in experimental studies in the wide social network analytics domain as well as in more specific bibliometric contexts.

Data cleaning and enrichment through data integration: networking the Italian academia / Finocchi, Irene; Martino, Alessio; Ranjbar, Fariba; Sinaimeri, Blerina. - In: SCIENTIFIC DATA. - ISSN 2052-4463. - 12:1(2025), pp. 311--. [10.1038/s41597-025-04608-6]

Data cleaning and enrichment through data integration: networking the Italian academia

Finocchi, Irene;Martino, Alessio
;
Ranjbar, Fariba;Sinaimeri, Blerina
2025

Abstract

We describe a bibliometric network characterizing co-authorship collaborations in the entire Italian academic community. The network, consisting of 38,220 nodes and 507,050 edges, is built upon two distinct data sources: faculty information provided by the Italian Ministry of University and Research and publications available in Semantic Scholar. Both nodes and edges are associated with a large variety of semantic data, including gender, bibliometric indexes, authors' and publications' research fields, and temporal information. While linking data between the two original sources posed many challenges, the network has been carefully validated to assess its reliability and to understand its graph-theoretic characteristics. By resembling several features of social networks, our dataset can be profitably leveraged in experimental studies in the wide social network analytics domain as well as in more specific bibliometric contexts.
2025
Data cleaning and enrichment through data integration: networking the Italian academia / Finocchi, Irene; Martino, Alessio; Ranjbar, Fariba; Sinaimeri, Blerina. - In: SCIENTIFIC DATA. - ISSN 2052-4463. - 12:1(2025), pp. 311--. [10.1038/s41597-025-04608-6]
File in questo prodotto:
File Dimensione Formato  
s41597-025-04608-6.pdf

Open Access

Tipologia: Versione dell'editore
Licenza: Creative commons
Dimensione 1.73 MB
Formato Adobe PDF
1.73 MB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11385/247898
Citazioni
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact