#Infodemic: assessing trustworthiness of COVID-19 news on Twitter

March 28, 2020

In February, the World Health Organization made common the use of the term infodemic in newspapers and journals. This expression describes the malevolent effect of a pervasive spread of information; in this context, it is referring to the excessive diffusion of content concerning coronavirus.


It would be quite a cliché to talk about the high level of connectivity to which we are used to in the 21st century, and it is also becoming quite mainstream to talk about the damage provoked by the surplus of information.

However, it is important to stress that the rapid circulation of news, combined with the common misconception that opinions and facts are the same, gives rise to the phenomenon of fake news and hoaxes propagating in traditional media formats. 


The objective of this article isn’t to talk about such phenomenon per se; rather it aims to describe with few numbers the dramatic COVID-19 pandemic in Italy and the parallel infodemic which formed subsequently, allowing fake news to spread deliberately and pollute the already overloaded information system.


The dataset 

The analysis was executed on a dataset

composed of almost 100,000 Italian tweets containing the hashtag #Coronavirus or #Covid-19 collected between the 6th and the 15th of March 2020; the main events in this time range were the announcement of the Lombardy region lockdown on the 9th March and the enforcement of the national “I stay at home” decree on the following day. Retweets and tweets containing a media (gif, image, or video) were ignored to avoid redundancy and to ease the analysis. Approximately 10% of the tweets are quoting other tweets and roughly 8% of them were replies to another tweet.


An assessment of tweet reliability

The first objective of this project was to discern between reliable