Twitter has the potential to be a timely and cost-effective source of data for syndromic surveillance. When speaking of an illness, Twitter users often report a combination of symptoms, rather than a suspected or final diagnosis, using naïve, everyday language. We developed a minimally trained algorithm that exploits the abundance of health-related web pages to identify all jargon expressions related to a specific technical term. We then translated an influenza case definition into a Boolean query, each symptom being described by a technical term and all related jargon expressions, as identified by the algorithm. Subsequently, we monitored all tweets that reported a combination of symptoms satisfying the case definition query. In order to geolocalize messages, we defined 3 localization strategies based on codes associated with each tweet. We found a high correlation coefficient between the trend of our influenza-positive tweets and ILI trends identified by US traditional surveillance systems. © 2013 Gesualdo et al.
F., Gesualdo; Stilo, Giovanni; E., Agricola; M. V., Gonfiantini; E., Pandolfi; Velardi, Paola; A. E., Tozzi. (2013). Influenza-like illness surveillance on twitter through automated learning of naïve language. PLOS ONE, (ISSN: 1932-6203), 8:12, 1-8. Doi: 10.1371/journal.pone.0082489.
Influenza-like illness surveillance on twitter through automated learning of naïve language
STILO, GIOVANNIMembro del Collaboration Group
;
2013
Abstract
Twitter has the potential to be a timely and cost-effective source of data for syndromic surveillance. When speaking of an illness, Twitter users often report a combination of symptoms, rather than a suspected or final diagnosis, using naïve, everyday language. We developed a minimally trained algorithm that exploits the abundance of health-related web pages to identify all jargon expressions related to a specific technical term. We then translated an influenza case definition into a Boolean query, each symptom being described by a technical term and all related jargon expressions, as identified by the algorithm. Subsequently, we monitored all tweets that reported a combination of symptoms satisfying the case definition query. In order to geolocalize messages, we defined 3 localization strategies based on codes associated with each tweet. We found a high correlation coefficient between the trend of our influenza-positive tweets and ILI trends identified by US traditional surveillance systems. © 2013 Gesualdo et al.| File | Dimensione | Formato | |
|---|---|---|---|
|
influenza like.pdf
Open Access
Tipologia:
Versione dell'editore
Licenza:
Creative commons
Dimensione
664.77 kB
Formato
Adobe PDF
|
664.77 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



