Subscribe and track India like never before..

Get full online access to
Civil Society magazine.

Already a subscriber? Login


Comment here

  • Home
  • ‘In a pandemic official data is crucial. We need it daily’

Chinmay Tumbe: ‘We are slow on all-cause mortality statistics, seroprevalence surveys’

‘In a pandemic official data is crucial. We need it daily’

Civil Society News, Gurugram

Published: Sep. 10, 2021
Updated: Sep. 10, 2021

THROUGHOUT the first and second waves of the coronavirus pandemic, the extent of the tragedy in India was mostly unknown. How many people had really died? Were they men or women? Information was anecdotal and speculative. This April, there were queues at crematoriums and burial grounds, but even as bodies piled up there were no reliable figures to go by.

We now have some figures based on data-hunting by Prof. Chinmay Tumbe and his colleagues. Tumbe teaches economics at IIM Ahmedabad and his co-workers in this effort are well-versed in accessing databases and arriving at plausible assumptions.

Tumbe is the author of The Age of Pandemics (1817-1920) which chronicles the cholera, plague and influenza pandemics in those years which devastated India. He is also an expert on migration.

In an interview, Tumbe spoke to Civil Society on the importance of data and the story that is emerging of the current pandemic from the numbers that have been garnered.


You have been collating and analyzing data at the Centre and in the states on the coronavirus pandemic. What does this data reveal to you?

Last year, during the first wave, there were concerns that we weren’t possibly capturing the full extent of deaths for the simple reason that, to be classified as a COVID-19 death, you needed to be first classified as positive. And that depends on testing. Obviously, testing capacity is a huge function of reported deaths. I was sceptical whether there really were many, many deaths. We didn’t see the actual evidence in terms of the visuals, the (crowded) crematoriums, etc that we saw in the second wave. But now we know that even last year there was substantial under reporting because that data has kind of come out.

It is the second wave which is India’s biggest demographic shock since the last quarter of 1918. I think  Divya Bhaskar was the first to really break the stories on deaths in Madhya Pradesh, Gujarat and so on. So, using the Gujarat data, I pointed out that a lot of excess deaths had happened and since they could not be accounted for they must be from COVID-19. That was in May.

In the last three months, we have five or six studies using different databases, all coming to the same conclusion. I would say the midpoint estimate now among all these different studies is that India has lost about three million people in the pandemic, starting from the beginning to about June 30 this year.

My co-worker, Prabhat Jha, has worked on Indian mortality for almost two decades. He worked with the census office on something called the Million Death Study about 10 years ago. He’s a thorough expert on that. Some of my other co-authors are economists and health economists. We are a team of 11.


What are the databases you have examined to arrive at this conclusion?

We have estimates using three different databases. One is the Civil Registration System (CRS), an important one because all deaths in India are registered there. Medical certification of death is very poor in our country. Nine out of 10 deaths in India are registered with this office, but only about 20 percent are medically certified. So, the cause of death in the CRS is very poor. That is why researchers look for all-cause mortality. If all-cause mortality spikes tremendously, there’s a notion of excess that is happening in that particular time period. The CRS points to a very large number of deaths in the second wave. The CRS also shows that the bulk of deaths in the pandemic have happened in the second wave.

There is another database, the Health Management Information System (HMIS). This is completely different. It is mainly for rural India and is done mainly by the Government of India. They release monthly spreadsheets on deaths which are registered at health facilities. So, this is a subset. The overall deaths that they count is about two million whereas India loses about 10 million people annually in normal times.

It is a smaller sample to work with, but the benefit is that they provide data of all the states for the CRS. For the CRS we are literally scraping. The problem here is no state government has released these figures.

The HMIS is very useful for two reasons. One, you can see the rural-urban split because they give data in that form. But, most importantly, they also give the cause of death. So you can actually start to deconstruct what part of that spike is because of factors that we know of.

Unfortunately, HMIS in their wisdom did not add a line item on COVID-19. But they have ‘cause unknown’. What’s nice about HMIS is that you can rule out heart attacks, malaria and so on. We have 40 categories of death, and there is 'cause unknown'. We can now say there was some spike in deaths due to hypertension and a variety of known causes.  But the biggest spike that happened between April and May this year was under the category 'cause unknown'. So it has to be COVID-19.

We are saying these excess deaths are at some 2.7 to 3.2 million, and the bulk of them are COVID-19 because it overlaps exactly when these waves happened, as was documented and reported.

The third database that we used was an opinion poll survey called the C-Voter. And this is the only survey in India which, since the beginning of the pandemic, pretty much since June last year, asked questions on a weekly basis on COVID-19 infection and death. There is no other survey which does this.

The main thing that all three databases are pointing to is 30 percent excess mortality, which translates into about three million deaths if you take 10 million deaths on an annual basis in normal times.

Unfortunately, we still don’t know the picture for all the states. We will get to know the picture for two big states, UP and Bihar, only one year down the line. The reason is that both the CRS and HMIS have very poor coverage. Their statistical systems are poor. They don’t do a good job of reporting even in normal times, let alone in a pandemic. 

It is very difficult to infer, but there are surveys. There was a study on Bihar which extrapolates 300,000 deaths based on a small sample. In Gujarat, too, the numbers are very high and these are also based on a range of estimates.

I would say, in terms of transparency, Karnataka, Maharashtra and Kerala have done a really good job. Which means that their undercounting factors are fairly low. It’s still high by international counts but relatively low. In other states, undercounting factors tend to be very high.

In our paper’s estimate of three million, we are basically saying that the actual deaths are likely to be seven to eight times higher than the reported numbers.


Now, you’ve studied this historically as well. How important is data to us when it comes to understanding what happens in pandemics?

You know, it is so relevant to have good data and communication of that data. Unfortunately, the  Union Government has not really communicated the scale of the disaster to any of us. The Union Government has actually put out a circular shouting down the studies that have been coming out. The government says it has robust systems and, yes, there’s some undercounting, but it’s not that much. The official Union Government response is shocking, to say the least.

I’ll give you a simple reason. Kerala is reporting  a high number of cases and a lot of people are upset. There is, of course, a partisan, political agenda in the criticism. But you cannot understand what is happening in Kerala, without understanding what happened in the second wave.

The second wave makes it abundantly clear that deaths were very low in Kerala. The seroprevalence survey shows that Kerala was less exposed to the virus in April and May and it is being exposed now so there is some kind of caseload.

In the popular imagination, believe it or not, Kerala is doing a bad job of checking the spread of COVID-19. That is far from the truth.  The other states have let the virus spread much more, had more deaths, but that happened in a very short period of time. So it’s like a contrast of a fast burn versus a slow burn. Kerala is, in all parameters, doing well. So data and communication of what happened in the second wave has, bizarrely, not been presented.

Many people think that the second wave started in Kerala or Punjab. We now know that the Delta variant broke in eastern Maharashtra. Even this has not been communicated to the public very well.

But there are two things that we need to do in terms of data. One, a daily release of the all-cause mortality statistics. It’s so simple. They have the numbers in most states on the government portal. They just have to release them. This is what the UK is doing. They can mark the last two weeks as provisional data. It  will tell us if all-cause mortality is spiking.

The other is a periodic seroprevalence survey for the level of exposure that the population has undergone. Some are questioning that also. When the numbers came out, they said the state-level samples are not representative, you can’t do state-level interpretation. We are one and a half years into the pandemic. By now we should be having weekly seroprevalence surveys because then you can see where the virus is more likely to be.

For many of us tracking this pandemic, it is so clear that Kerala will have more cases for the simple reason that it has low exposure, low deaths, and so after they opened up, it was obvious that they would be at highest risk. They don’t have immunity from getting the virus, or from the vaccine. So the data is very important and two critical things we are very slow on is the all-cause mortality statistics and seroprevalence surveys.


How would you rate the quality of data from across the country?

I think it varies tremendously. We had been measuring how good data on all-cause mortality until 2018 is, in between surveys. We know, for example, that Maharashtra and Gujarat are states with nearly 100 percent coverage, which means that almost every death is registered. And we also know that in Jharkhand, Bihar and UP that number may be 70 percent. It’s much lower.

The statistical system is good in some states, but even in those states they are not disseminating the data quickly. They should be releasing data on a daily basis. Unfortunately, poorer states with weaker health infrastructure systems also have poorer statistical systems. In those states, we just have to do independent surveys.


Would you say there’s a strong case for building a more robust data system, not just at state or national level but at district and panchayat level?

Yes, absolutely, and we have the capacity. Look at Karnataka. They have a fantastic dashboard. They actually release the total number of people who died in Karnataka on a daily basis. But they don’t do more than that. They don’t give you the district-level break-up. It’s like just one number on the dashboard. The UK provides data on a fortnightly basis, the all-cause mortality statistics down to the county level. And the last two weeks’ data is marked as provisional.

We are on a par with the UK because all this is now online for most people. It’s just a matter of somebody pressing a button or signing off on things. Let’s disclose this on a daily basis across districts. Ideally, they should also be providing age, gender and so on. It would reveal which age groups have not been exposed, and which are at higher risk in different ways, or who is at high risk, who is at low risk and so on.


You’ve done this landmark book. What is our learning from history, from the experience of those pandemics? And if today we are planning a system, how should we be learning from the past?

 The pandemics of the past, which were curbed by human agency, all relied on better data whether it was cholera or plague. The flu of 1918 was pretty much a story of either the virus mutating to less lethal forms or herd immunity. But its toll was so high, you know, it killed six percent of India’s population before herd immunity was acquired. That’s a flawed strategy.

But cholera and plague were eventually conquered by better prevention and cure. Vaccination is probably a small part of that story. In cholera they got the data. They understood transmission very well. Cholera is no longer a dangerous disease. So good data and trust in science are important. But we need to be much better at data dissemination on a daily basis. The puzzle is how, after one and a half years of the pandemic, we still haven’t moved to daily release of information.


How did you collect data on those deaths for your book? 

The CRS was started in 1886 in response to the cholera pandemic. Our death registration system owes its origins to a pandemic. We should be using this pandemic to make it better. There is an entire statistical database on deaths in colonial India, digitized and analysed, for my book.

Of course, there are issues with that data. You’d be surprised to know the undercount factor. The British said about five to six million people died in the influenza pandemic. Our estimate is that there were 20 million deaths. Other estimates are there were 18 million deaths. That's probably an undercount factor of three to four. Today our undercount factor is seven to eight. That is truly alarming.  The first lesson is: we need to invest in data. 

The second one is migration. All pandemics have led to migration. We need to anticipate and prepare for it. Obviously shutting down the railways does not make sense. I think there's a lot of learning from last year to this year. But some good steps, like social security measures, have been taken over the past one year.

Then, of course, mass vaccination,  one of our success stories now. That starts in the age of pandemics because India was actually exporting plague vaccines around the world back then. 

On the economic front, pandemics have devastating implications. All pandemics typically have worsened inequality. We’ve seen that in the past year as well. A lot of studies are now available on labour market implications. 

The last point is on how we assess regional variations. The politics and the blame-game have to be kept aside. We  don’t know much about the science on the coronavirus. In parts of India it's not about policy but the ecological condition of cities. That's exactly what happened years ago. Plague hit some parts of India. So did cholera and it turned out to be a seasonal disease. People responded accordingly. It remains to be seen if COVID-19 is seasonal or whether it depends on environmental factors.


Currently there are no Comments. Be first to write a comment!