We describe a bibliometric network characterizing co-authorship collaborations in the entire Italian academic community. The network, consisting of 38,220 nodes and 507,050 edges, is built upon two distinct data sources: faculty information provided by the Italian Ministry of University and Research and publications available in Semantic Scholar. Both nodes and edges are associated with a large variety of semantic data, including gender, bibliometric indexes, authors' and publications' research fields, and temporal information. While linking data between the two original sources posed many challenges, the network has been carefully validated to assess its reliability and to understand its graph-theoretic characteristics. By resembling several features of social networks, our dataset can be profitably leveraged in experimental studies in the wide social network analytics domain as well as in more specific bibliometric contexts.
Data cleaning and enrichment through data integration: networking the Italian academia / Finocchi, Irene; Martino, Alessio; Ranjbar, Fariba; Sinaimeri, Blerina. - In: SCIENTIFIC DATA. - ISSN 2052-4463. - 12:1(2025), pp. 311--. [10.1038/s41597-025-04608-6]
Data cleaning and enrichment through data integration: networking the Italian academia
Finocchi, Irene;Martino, Alessio
;Ranjbar, Fariba;Sinaimeri, Blerina
2025
Abstract
We describe a bibliometric network characterizing co-authorship collaborations in the entire Italian academic community. The network, consisting of 38,220 nodes and 507,050 edges, is built upon two distinct data sources: faculty information provided by the Italian Ministry of University and Research and publications available in Semantic Scholar. Both nodes and edges are associated with a large variety of semantic data, including gender, bibliometric indexes, authors' and publications' research fields, and temporal information. While linking data between the two original sources posed many challenges, the network has been carefully validated to assess its reliability and to understand its graph-theoretic characteristics. By resembling several features of social networks, our dataset can be profitably leveraged in experimental studies in the wide social network analytics domain as well as in more specific bibliometric contexts.File | Dimensione | Formato | |
---|---|---|---|
s41597-025-04608-6.pdf
Open Access
Tipologia:
Versione dell'editore
Licenza:
Creative commons
Dimensione
1.73 MB
Formato
Adobe PDF
|
1.73 MB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.