Digital Epidemiology

Author

Marcel Salathé

Published

May 12, 2023

Preface

Digital epidemiology is a relatively young field. It has no namesake journals, academic societies, prizes, or university departments. It doesn’t even have a broadly accepted definition. But it is an incredibly active field, practically and academically, and it will grow alongside the general growth of technology adoption worldwide - in other words, massively and rapidly in the coming years and decades.

Digital epidemiology has the same objectives as conventional epidemiology. Its key difference is in the tools and data sources, which are consequences of the digital age we now live in. The internet, mobile phones, wearable devices, social media - these are technologies we now all take for granted on a daily basis, almost anywhere in the world. They allow us to address epidemiological questions with novel approaches. Like in other fields, digitization in epidemiology can have far-reaching implications. Data can be obtained much faster. Data becomes more easily linkable and searchable. Mobile and wearable devices can not only collect data, but also communicate with each other. Combined with the relentless progress in miniaturization - we all have supercomputers in our pockets - and parallel developments such as the astonishing advances in machine learning, digital epidemiology will impact public health in numerous ways.

Digitization is a disruptive force that creates both opportunities and new challenges. There is no such thing as a painless digital transformation- neither in epidemiology nor anywhere else. The reasons are manifold but often similar. The pioneers exploring new digital approaches often come from other areas, unaware of established norms in the field, or lacking domain-specific competencies, creating friction. Conversely, those practicing conventional epidemiology sometimes lack the necessary technical background to embrace digital approaches. New digital data sources are often noisy, which can be uncomfortable for those working with established data sources. The consequence of this cultural mismatch is frustration on both sides. One side sees opportunities and doesn’t understand why the new approaches aren’t integrated into public health practice faster. The other side sees the challenges and doesn’t quite see the value that digital approaches bring to the table. One side may be offering exciting scientific explorations, while the other side may be looking for practical solutions that are useful on a daily basis. One side feels that the new approaches are driven by hype, and the other side sees their approaches dismissed due to unreasonably high expectations. Failures of early experiments are misused to throw out an idea altogether (anti-hype). Given such dynamics, it’s no wonder that digital transformations are painful.

The book doesn’t attempt to solve all of these problems. Instead, it tries to provide a broad conceptual overview of epidemiology and its digital approaches. It should be useful for two audiences, apart from the generally interested readers. The first group is those with expertise in epidemiology, or public health, seeking to understand how they can leverage digital methods for their work. The second group is those with technical expertise, seeking to apply it to epidemiological problems or public health problems. The COVID-19 pandemic has led many highly skilled technical people to start applying their expertise to matters of public health, which is very encouraging. However, the danger is that if outside experts lack key concepts of epidemiology, their work may end up being flawed, and sometimes altogether useless, if not even misleading. This is a wasted opportunity, as the field of epidemiology will benefit from all of the technical expertise it can get, given the public health challenges we will continuously face. I hope this book will allow that group to rapidly get up to speed. At the same time, those already working in epidemiology may be interested in new technological approaches and look for a good overview of digital epidemiology. I hope they will find the book helpful as a starting point in their explorations.

Ultimately, I wrote the book for my students at EPFL, using it as a teaching resource for my course “Digital Epidemiology”. If you are looking to teach a class that covers all or part of the topics discussed in the book, I hope you find these pages to be a useful teaching resource. Moreover, I tried to write the book in a way that would be accessible to a general audience, not just to academic students and researchers. In doing so, however, I have nevertheless assumed a fairly intelligent readership with an educational background at the high school level and a very large dose of curiosity. There is a little bit of math here and there, but unless you are a student in my class, you have my permission to skip those if you find them too cumbersome. If you have expertise in either technology or epidemiology, you might already be familiar with some aspects that the book covers - such is the nature of multidisciplinary approaches. I hope the content is engaging enough that, at minimum, you find those parts to be an interesting reminder of what you already know.

Digital epidemiology is a rapidly evolving area. The speed of technology development is such that, alas, parts of the book - especially those dealing with technology - could be outdated in a few years if they’re not continuously updated. To counter that, I will attempt to update the book regularly, so please ensure that you have the most recent edition, or one that is not older than a few years. The book does not require any particular expertise in technology or epidemiology. Some high school math will be helpful if you want to understand the few parts that contain math. Overall, you will notice that this book prioritizes breadth over depth. What I mean by that is that I tried to cover as many concepts as possible, provided they are linked to the overall goal of digital epidemiology, instead of choosing a few and going into the nitty-gritty details. I think this is the better approach for a book on a multidisciplinary field. Success in a multidisciplinary field requires a broad conceptual overview of all the relevant areas, and that is what I am hoping to provide you with. If you want to go deeper in any particular area, there will be plenty of books allowing you to go as deep as you desire.

I’ve partitioned the book into three parts. The first part, covering chapters 1 - 3, should give you an overview of key concepts in epidemiology. We will only tangentially talk about digital technologies in those chapters. The goal is to get you up to speed on general epidemiology. Chapter 1 will give you a short introduction to the main goals of epidemiology. Chapter 2 will introduce you to tests and diagnostics, and you’ll learn to deal with the fact that our measurements are typically not 100% accurate. Chapter 3 will provide you with an overview of the arsenal of epidemiological study types, from case series to randomized controlled trials and beyond. The second part is moving us closer to technology, and will focus on computational approaches, specifically those relevant to infectious diseases. Chapter 4 gives you an introduction to infectious disease epidemiology, and summarizes all the relevant concepts. Chapter 5 will give you an overview of computational models to study infectious disease dynamics, deriving various key ideas from relatively simple models. In chapter 6, we’ll look at some of the relevant extensions, incorporating both spatial considerations and contact networks into the models. This serves as a segway to the third part of the book, which lands us squarely in the digital world. Chapter 7 discusses digital contact tracing, arguably the largest and fastest rollout of digital epidemiology technology in history. Chapter 8 gives you an overview of digital public health surveillance, using non-traditional data sources such as search queries, social media, mobile phone and wearable data, and others. Chapter 9 will discuss digital cohorts and trials, and relatively new development that will likely change many epidemiological and medical studies. Finally, in chapter 10, we will touch on some of the relevant social discussions around digital epidemiology, such as ethical considerations, privacy preservation, and the spread of health misinformation.

Let me finish by saying what I did not do. I did not write a book to show off. In my academic career, I had to read so many texts that were unclear, and where it was obvious that one of the main purposes of the text was to demonstrate the supposedly superior intelligence of the author. As a student, I often fell for the trick (“I don’t get it, but this person is an established scientist, so the problem must lie with me”). As I grew older, I began to realize that the result of teaching is a product of both teacher and student abilities. Nothing good can come out of teaching if things are made too complicated for the purpose of signaling cleverness. Unclear writing generally comes from unclear thinking. So the highest compliment I can get from an intelligent reader will be that this was both a relatively easy and interesting read.

I’m grateful to EPFL for enabling my Sabbatical in 2022 / 2023 at UCSD, where I wrote a large part of the book, and to UCSD (in particular the lab of Rob Knight) for hosting me. I also thank my family for graciously handling my “just another sentence!” excuses, and for patiently listening to my non-stop ramblings about digital epidemiology details that occupied me on any given day.

I am particularly grateful to the many people who have given feedback on early drafts or discussed with me the topics that the book covers. No matter our level of expertise, we all have blindspots and biases, especially in our own fields. Like many of my epidemiology colleagues, I became a sought-after interview partner during the COVID-19 pandemic, fielding press requests daily (I remember some days when I gave about a dozen interviews within 24 hours). I started noticing that good journalists would try to understand a particular topic from different angles, and systematically fill in their blind spots. I realized that we don’t do this often enough in science. Sure, we have discussions at conferences all the time. But systematically reaching out to colleagues from around the world and asking them about the bigger picture questions they think about, the latest papers they found illuminating, and their experiences from interacting with other fields, is not something I have done before. While writing this book, I had these types of conversations with many people, which was a major source of joy and insight for me. I am immensely grateful to - in alphabetical order - Christian Althaus, John Brownstein, Ciro Cattuto, Laura Espinosa, Guy Fagherazzi, Christophe Fraser, Moritz Kraemer, Elaine Nsoesie, Mauricio Santillana, Laura Symul, Marc Lipsitch, Daniela Paolotti, Jennifer Radin, Sam Scarpino, Rohan Singh, Lone Simonsen, Effy Vayena, Alex Vespignani, and Cécile Viboud. As always, all errors in the book are mine alone. If you do happen to find any, or have suggestions for improvements for future editions, I would very much appreciate it if you would share them with me.

Marcel Salathé

Geneva, Switzerland, May 2023