Leveraging text as data: Tracking economic recovery from COVID19 in the US
As the economic shock from the Covid-19 crisis started to play out, and businesses tried to understand the trajectory of economies and identify signs of recovery in specific countries and sectors, the need for novel datasets grew. Waiting a quarter for official data to be released was months too long and lower latency approaches were needed. The global event data in our Data Lake was a natural candidate for research into novel signals. This database tracks and structures reports from news and social media sources about politics and economics in eight languages and enhances the information with country classifications and machine translations.
Using this data, we wanted to identify notable events relevant to economic crisis and recovery and contextualize the developments that were driving the hard-economic data. In order to derive signals from this large amount of textual information we applied techniques from natural language processing (NLP) such as sentiment analysis and topic modelling. This allows us to structure our analysis of news reports published over the summer and to track the economic recovery from COVID19 in the US.
Sentiment classification
We first filtered down to news reports that include keywords that are associated with economic crisis or economic recovery. In order to distinguish between negative and positive reports at a large scale we assigned a sentiment score to each news article using a lexical method. As opposed to data-intensive machine-learning methods, this approach does not require previously labelled examples (training data). It is instead based on pre-defined lists of words (dictionaries), where a sentiment score gets assigned to each word. By combining the scores for all words, we can generate a sentiment score between -1 (negative) and +1 (positive) for each report, which allows us to systematically classify news reports that provide encouraging signals for the economy and those that suggest a gloomier state. The following table provides examples for this method
We then combined the signals from all news reports to create a sentiment index. This index quantifies the economic sentiment of all news reports that were published on a given day and assigned to the country of interest (in the example below, the United States). Each daily score represents the aggregate of negative and positive news stories published on that day.
Sentiment in US economic news
The index below shows the daily movements of sentiment in news reports about the US economy. It can be used as a quantified input into a model to nowcast economic activity and as a tool to identify relevant events that are driving news sentiment. Reports about the publication of economic statistics about for example GDP growth (June 30th), orders for big ticket goods (July 27th) or the unemployment rate (August 7th) affect the index as much as political statements, for example by the President (July 1st) or Fed officials (July 14th). Other relevant events include policy decisions and business news. This can help to notify analysts such as buy- and sell-side researchers about important market developments. The index can be used as a complement to lower frequency consumer or business surveys (as for example the PMI by IHS Markit) and official data to provide more timely information and to understand the underlying drivers of change.
To better understand the underlying themes driving positive and negative news articles we analyzed the topics covered in the articles. We identified these topics using a statistical model that assumes that the semantics of news reports are being governed by some "latent" variables - topics - that we are not observing. These topics shape the meaning of the news stories. Given that a news report usually focuses on a particular topic, one would expect certain words to appear together. We can thus identify topics based on the co-occurrence of words in the news reports.
The two main negative topics identified for the US in June, July and August deal with energy and labor markets. There are plenty of news reports about falling oil prices due to a massive drop in demand and its damaging effects on oil companies such as Exxon Mobil and BP. This company-specific information can help traders to systematically track market sentiment about specific businesses or industries. In addition, a lot of reports discussed rising unemployment during the crisis. These reports are generally associated with negative sentiment. While the official unemployment rate is reported with a time lag, reports about large layoffs or assessments by business leaders can indicate changes in labor market conditions in a more timely manner.
On the positive side, news covering the emergency loan program that supports small businesses in the US during the crisis reflect optimistic reporting. These articles are associated with positive sentiment scores.
Operational deployment
Analyzing relevant topics and the associated sentiment in tandem provides us with a structured way to digest high-frequency textual information. While the sentiment index might be used as a quantitative input into a model, in practice it should always be accompanied with an analysis of the topics and events that are driving it. These topics and events encompass important economic signals during an economic crisis such as policy statements, parliamentary decisions or the publication of official statistics that might otherwise be missed.
Over the summer 2020, analysts were confronted with hundreds of news reports about the COVID19 crisis and its economic implications every day. Sentiment analysis and topic modelling helped to distinguish between positive and negative reporting and to surface relevant events. By following news clusters about oil prices and energy markets over time analysts could track patterns and trends in this market and identify how specific firms are affected. These tools will thus help to understand what is happening in the economy, evaluate change, and take more informed decisions.
Given that the event data in our Data Lake cover a wide range of topics these methods can also be applied to monitor security risks, operational disruptions or political developments such as election campaigns