Here is the Statistical Reason why Calculating the Coronavirus Mortality Rate is so Difficult

Purple top test tubes held by a hand with purple gloves in a lab

Since the coronavirus (COVID-19) registered on our public health radar as a communicable infectious disease amongst humans, countries try to calculate their coronavirus mortality rate (otherwise known as case fatality rate). As a result, many different mortality rates have been reported, causing confusion. This article from Business Insider reports country-wide mortality rates that range from a low of 0.5% in Switzerland to a high of 5.9% in the United States (US).

What about China, where it is believed the epidemic started? In second place, with mortality rate of 3.8%.

So which is it? And why is the range so wide?

There’s actually a statistical reason for this range being so wide that probably has more influence than any biological reason, and it has to do with an intersection between measurement and public policy. When considering the equation behind the coronavirus mortality rate, the reason is actually pretty logical. I’ll link you to a page from the US Center for Disease Control and Prevention (CDC) with information about these outbreak equations we epidemiologists use.

How is the Coronavirus Mortality Rate Calculated?

Like any rate, the coronavirus mortality rate has a numerator and a denominator. The numerator is “number of deaths due to coronavirus”, and the denominator is “everyone positive for coronavirus”. It seems simple enough, but let’s unpack what can go wrong with this picture.

In epidemiology, we have this concept of “misclassification”, which is when we classify something in the wrong category due to mismeasurement. So, let’s say a person testing positive for coronavirus dies in a car accident. Counting them in the mortality rate numerator would be misclassification, because her death would not have been “due to coronavirus”.

Just from this example, you can see how misclassification could make both the numerator and denominator artificially large or artificially small. Artificially inflating or decreasing either of these numbers – or both – can result in an incorrect rate, which is either artificially inflated or artificially decreased.

How to Artificially Inflate or Decrease the Coronavirus Mortality Rate Through Misclassification

Now that I’ve explained how the mortality rate is calculated, I can explain how sensitive it is to misclassification. Here are ways misclassification can impact the resulting rate:

  • Let’s pretend a country decides not to test any of ifs citizens for coronavirus. The mortality rate would be 0%. This would be because both the numerator and denominator will be 0, as all deaths will be misclassified as non-coronavirus deaths (numerator), and no one will be diagnosed as a case (denominator).
  • Let’s pretend a country tests all suspected deaths for coronavirus, but has low levels of testing in the general population. The mortality rate will be artificially high. That’s because the numerator will likely not be misclassified, since all deaths are being tested, but the denominator will be artificially small, because of low levels of general population testing. Because the death rate from coronavirus is relatively low (like influenza), by not doing a good job of testing cases that aren’t dying and counting them in the denominator, scientists can artificially inflate the mortality rate.
  • Let’s pretend that the country does a bad job testing deaths for coronavirus, but does a great job testing the general population. The mortality rate will be artificially low. This is because the denominator will be accurately measured, meaning it will be big. But many coronavirus deaths would be missing from the numerator, making the resulting rate artificially low.

Why it’s so hard to Calculate an Accurate Mortality Rate

Calculating an accurate mortality rate is both simple and hard at the same time. It’s simple to calculate in terms of the actual equation, but it’s hard to measure both the numerator and the denominator. This is because a whole infrastructure is required for accurately measuring both numerators and denominators for mortality rates when an infectious disease comes through. Further, different parts of the infrastructures are deployed to accurately measure each number.

For both numbers in the equation, pathology labs are involved, because they do the testing. However, different labs are typically involved depending on which part of the fraction they are trying to measure. When it comes to the numerator, testing is likely handled by the lab at the hospital where the person who died was inpatient, and is assisted or directed by (in the US) by the state’s department of public health, and/or the federal CDC.

Community-based testing (e.g., measuring the denominator) is generally not in the purview of healthcare, but is entirely in under the authority of the state’s public health department. In the US, these departments tend to be underfunded, but it depends on the state. Certainly, outpatient healthcare settings could do testing of the community in the US, but because of the way our super-expensive and questionable-quality healthcare system is set up, testing in healthcare in the US has concentrated on inpatient settings up to now. That is because inpatient is a high risk setting where people are already using healthcare, so it is easy to implement a test.

Also, since people in the US might have to pay for their own coronavirus test, they are unlikely to do it unless they think they are infected.

Misclassification May be Inflating the US’s Mortality Rate

The difficulty in measuring the numerator and denominator in an outbreak can be illustrated with an example from the US, where I am. Remember, the US had the highest rate in the article. Part of that could be explained by misclassification, and I’ll explain how that can happen.

In the US, there has been a shortage of test kits. Therefore, the few test kits available have been routed toward the healthcare infrastructure. This is for practical reasons – clinicians with coronavirus who continue to work could infect patients, so we need to know who they are. But this means that most of the test kits are being used on “high risk” populations, so they are likely to find positive cases. But these positive cases are from the healthcare setting – not the general population. Remember, all the positive cases of COVID-19 go in the denominator of the mortality rate.

I’m not a fan of the Chinese government, but because they are an autocratic country, they can crack down and do public health things we could not do in the US, and we can learn from observing what ultimately happens as a result. China had the capacity to test a lot of their general population. For that reason, as a result of their testing crack down, they have a larger (and more accurate) denominator than we do in the US.

In contrast, given the shortage of community-based testing, in the US, we likely have a lot of community-based COVID-19 and just don’t know it. This means we are artificially decreasing our denominator by not testing the general population at a high rate.

Now let’s look at the US numerator. We are doing a very good job of testing deaths for COVID-19. Therefore, I think we are measuring the US numerator very accurately – but if we are undercounting the denominator, then that could lead to an inflated mortality rate.

Eventually, like all epidemics, this one will end. Hopefully, after it passes, scholars from different countries who were involved can author open source peer-reviewed literature, so we can all learn from their experiences trying to calculate an accurate coronavirus mortality rate during the course of an epidemic.

What is the current coronavirus mortality rate in your country? Do you think it is accurate? Why or why not? Leave a comment and let me know what you think!

Updated March 7, 2020

4 thoughts on “Here is the Statistical Reason why Calculating the Coronavirus Mortality Rate is so Difficult

    • Monika Wahi says:

      Thanks for your comment! You are right, but the question is – what are the right numbers? If just the denominator goes up, our rate will go down. But with testing, probably both the numerator and the denominator will go up, so it’s anyone’s guess where the rate will land!

    • Monika Wahi says:

      Well, I’d love it to be that low, but the mortality rate for our last flu season estimated by the CDC ranged from 0.4 per 100,000 among 5-17 year olds to a high of 48.7 per 100,000 for those aged 65 and older. Rates are here: When you have to say “per 100,000” you know it is super duper low, because % is essentially “per 100”, and right now, we are using % when talking about coronavirus. So let’s hope the COVID-19 mortality ends up being even lower than 1%, like influenza.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.