The economy has been big news during the pandemic. In recent months, we’ve had skyrocketing CPI prints, massive supply disruptions, shipping bottlenecks, warnings about consumer sentiment (consumer spending is 70% of US GDP), and many eyes glued on economic indicators. While some indicators (e.g. CPI) are published monthly, others are only available after the end of the quarter (e.g. retail sales) and sometimes with a sizeable delay.
Can we aggregate multiple sources of data to estimate how something like retail sales might be trending, months before the official numbers are released?
Here, I experiment with the use of US Google Trends data and some economic readouts to extrapolate the current health of the retail sector. I will be using from Google Trends data on e-commerce retailers, specifically, because there isn’t a good proxy for trips to brick-and-mortar stores. The biggest general purpose e-commerce retailers are Amazon, Walmart, and eBay, so I think it’s sufficient to focus on those three.
Market share of leading retail e-commerce companies in the United States as of October 2021As to the question of “why not just directly look at the economic indicator for retail sales?”, the RETAILSMNSA indicator for September 2021 was only just released. It’s currently November 21st. While the US Census Bureau is undoubtedly the final authority, we might actually be curious about the current state of affairs.
I will use “hits” to refer to the scaled numbers out of Google Trends, since that’s what they call them. And for brevity and entertainment’s sake, any reference to combined numbers across the three retailers Amazon, Walmart, and eBay will be referred to as “AWE”.
There is a laundry list of limitations and assumptions to this analysis, which you can see at the end. Also, note that the more manipulations we do on data, the harder it is to interpret. We will be entering that territory.
This data was downloaded and analyzed on November 21, 2021.
To get an idea of what we’re working with, here’s a basic figure showing the raw hits on our three retailers over the past 11 years (it’s month-by-month data). Note that Google Trends scales the hits such that the month with the highest search activity across all search terms queried is always 100.
We see that Amazon and, to a lesser extent, Walmart, are seeing a growth in search activity, while eBay has been in decline. There’s always a spike in search activity around the end of the year, and a smaller spike around the end of summer (back-to-school shopping?). And of course, we see the COVID spike in the first half of 2020.
We can also create an aggregate sort of measure as seen below, by adding the hits across the three retailers across each month we have data on. This will be useful in the next section, when we combine this aggregate measure with e-commerce sales.
Non-seasonally adjusted e-commerce retail sales (in millions of $) are reported quarterly by the US Census Bureau as ECOMNSA. Here is what the past 11 years of data look like. There’s quite a stable trend and seasonal effect up until 2020.
E-commerce retail sales as a percentage of total sales are also reported quarterly by the US Census Bureau as ECOMPCTNSA. There’s also a stable trend and seasonal effect (more e-commerce in Q4), up until 2020.
Now for the fun part – combining the three data sets.
We can do this by first allocating quarterly e-commerce sales proportionally across the three months in that quarter and the aggregate hits from the previous section. When we do that, we get the following estimated retail sales per month:
Note that the red points (October and November 2021) are even more of an estimate than the other points, because we don’t have retail sales for Q4 yet. How did we guesstimate where the red points should be?
The trick was to do an intermediate estimation of how many sales $ each Google Trends hit corresponded to. We take quarterly sales and divide by aggregate hits across the retailers (“sales per hit”), producing the figure below. It suggests that over the past 11 years, the relationship between Google searches for a given retailer and amount of money spent at that retailer has increased by a lot. I’m not sure if that’s simply indicative of the growing reliance on e-commerce, or an artifact of the particulars of this data. Regardless, it looks like sales per hit isn’t super noisy and we can assume the Q3 value will be roughly valid for Q4, enabling us to estimate current Q4 sales by month. We will actually use a slightly higher estimate – 550 million $ per hit – indicated by the blue dot, since numbers have been steadily trending upwards.
Secondly, we can combine the estimated monthly e-commerce sales with the percentage of total sales that attributable to e-commerce retail, to estimate monthly total sales. Since we’re missing the percentage data for Q4, and there are strong seasonal effects, we can do a quick forecast using R’s forecast
package and arrive at 14.5% of Q4 retail sales being e-commerce, which looks reasonable (“looks reasonable” being not at all the basis for any serious decision-making).
Having calculated all these intermediate estimates, we can finally get back to estimating total monthly sales.
With the important caveat that these are estimates layered on estimates standing on a bedrock of estimates, there are some interesting observations to be made. There are typically strong, prolonged spikes in spending that correspond to the summer months, but COVID completely upset this. In 2020, there were three spikes that I think roughly correspond to “panic buying”, “pent-up demand spending”, and “I’m dying to celebrate the holidays properly”.
What is also interesting to note is that November (2nd red point) is looking nothing like previous Novembers. Normal Novembers are the highest point all year aside from December, but so far this year does not seem to be panning out that way. Perhaps it’s just because summer 2021 spending has been especially strong. November 2021 is also looking to be slightly below November 2020, though that isn’t super unusual (it happened in 2012/2013 and 2018/2019).
Estimated retail sales in the US for the month of November are looking potentially weak. In previous years, November has been as big or almost as big as December, in terms of consuming spending. With inflation rising, supply bottlenecks, and falling consumer confidence, we might have reasons to suspect that economy recovery will falter, especially as the Fed tapers.
There are many limitations, aside from the ones discussed already.
An obvious one is that Google searches for a retailer can only be loosely equated with shopping at said retailer. I have Amazon bookmarked on my desktop computer – no searching necessary. I might also use an e-commerce site for price checks, but then decide not to shop there. We don’t know if people are buying and what they are buying after they go to Amazon, Walmart, or eBay’s website. Is it more stuff or less stuff than before? Are they buying staples or luxuries? Are they hoping for sales/deals, or are they happy to pay today’s higher prices? Were they even able to find what they were looking for given today’s supply chain issues?
By only focusing on Amazon, Walmart, and eBay, I am making the very bad assumption that they are as fair a representation of today’s e-commerce sector as they were 11 years ago, and there are no other retailers we need to consider. While data on historical market share is probably available, I am not willing to pay for it.
The clear spike in August search hits for both Amazon and Walmart is not suggested by well-established, historical retail sales trends (e.g. General Merchandise, Clothing and Accessories). This highlights the perils of using monthly search volume and quarterly sales to say something about monthly sales, because there might be only a weak relationship. Perhaps August search hits are related to back-to-school type shopping, but the dollar amount is not especially high. Or Q3 numbers look less impressive than August might suggest, because July and September are lower-spend months.
I also expect that shopping activity increases noticeably between the beginning and end of November, as people start ramping up preparations for the holidays. Since all of this analysis depends on Google Trends, and Google Trends tells us the average relative daily volume for a given time period, that average should go up as November progresses. By the end of the month, we might well find the estimates looking more normal.