The language used in April Fools hoaxes could offer clues to spotting ‘fake news’ articles, say researchers who discovered similarities in the language used in humorous spoofs and malicious stories. For the research, the team from Lancaster University in the UK, compiled a novel dataset, or corpus, of more than 500 April Fools articles sourced from more than 370 websites and written over 14 years.
Using a Machine Learning ‘classifier’, they identified articles into three categories: April Fools hoaxes, fake news, and genuine news stories.
April Fools hoaxes and fake news articles tend to contain less complex language, an easier reading difficulty, and longer sentences than genuine news.
Important details for news stories, such as names, places, dates and times, were found to be used less frequently within April Fools hoaxes and fake news. However, proper nouns such as the names of prominent politicians are more abundant in fake news than in genuine news articles or April Fools hoaxes, which have significantly fewer.
First person pronouns, such as ‘we’, are also a prominent feature for both April Fools and fake news. This goes against traditional thinking in deception detection, which suggests liars use fewer first person pronouns, the researchers said.
“Our findings suggest that there are certain features in common between different forms of disinformation and exploring these similarities may provide important insights for future research into deceptive news stories,” said Alistair Baron, from the varsity.
The research will be presented at the forthcoming 20th International Conference on Computational Linguistics and Intelligent Text Processing in La Rochelle, France.