Why the UK was dumb to use Microsoft Excel to count coronavirus cases
An avoidable mistake
We should not forget that the government and public health workers are doing an incredibly challenging and demanding job dealing with a pandemic. But this kind of mistake was avoidable. We live in a world of big data, with artificial intelligence and machine learning permeating all aspects of our lives. We have smart factories and smart cities; we have self-driving cars and machines trained to exhibit human intelligence. And yet Public Health England used Microsoft Excel as an intermediary to manage a large volume of sensitive data. And herein lies the problem.
Although Excel is popular and commonly used for analysis, it has several limitations that make it unsuitable for large amounts of data and more sophisticated analyses.
The companies that analysed the swab tests to identify who had the virus submitted theirresults as comma-separated text files to PHE. These were then ingested into Excel templates to be uploaded to a central system to be made available to the Test and Trace team and government. Although today’s Excel spreadsheets can handle 1,048,576 rows and 16,384 columns, developers at PHE used an older Excel file format (XLS instead of XLSX) resulting in each template being able to store only around 65,000 rows of data (or around 1,400 cases). When the limit was reached, any further cases were left off the template and therefore positive cases of coronavirus were missed in the daily reporting.
The bigger issue is that, in light of the data-driven and technologically advanced age in which we live, that a system based on shipping around Excel templates was even deemed suitable in the first place. Data engineers have for a long time been supporting businesses with managing, transforming and serving up data, and developing methods for building efficient, robust and accurate data pipelines. Data professionals have also developed approaches to information governance, including assessing data quality and developing appropriate security protocols.
For this kind of custom application there are plenty of data management technologies that could have been used, ranging from on-site to cloud-based solutions that can scale and provide managed data storage for subsequent reporting and analysis. The Public Health England developers no doubt had some reason to transform the text files into Excel templates, presumably to fit with legacy IT systems. But avoiding Excel together and shipping the data from source (with appropriate cleaning and checks) into the system would have been better and reduced the number of steps in the pipeline.
The blame game
Despite the benefits and widespread use of using Excel, it is not always the right tool for the job, especially for a data-driven system with such an important function. You can’t accurately report, model or make decisions on inaccurate or poor quality data.
During this pandemic we are all on a journey of discovery. Rather than point the finger and play the blame game, we need to reflect and learn from our mistakes. From this incident, we need to work on getting the basics right – and that includes robust data management. Perhaps rather concerning arereportsthat Public Health England is now breaking the lab data into smaller batches to create a larger number of Excel templates. This seems a poor fix and doesn’t really get to the root of the problem – the need for a robust data management infrastructure.
It is also remarkable how quickly technology or the algorithm is blamed (especially by politicians), but herein lies another fundamental issue – accountability and taking responsibility. In the face of a pandemic we need to work together, take responsibility, and handle data appropriately.
This article is republished fromThe ConversationbyPaul Clough, Professor in Search & Analytics,University of Sheffieldunder a Creative Commons license. Read theoriginal article.
Story byThe Conversation
An independent news and commentary website produced by academics and journalists.An independent news and commentary website produced by academics and journalists.
Get the TNW newsletter
Get the most important tech news in your inbox each week.