If you want an example of bad data science, I recommend you read the book I just finished, “Bad Blood: Secrets and Lies in a Silicon Valley Startup”, which I strongly recommend. It’s a good read from a factual standpoint, and journalist author John Carryrou also has excellent storytelling skills that will keep you riveted.
Briefly, the book chronicles the long rise and spectacular fall of a medical startup called Theranos. In the end, the laboratory test innovation Theranos had been purporting to have been working on all those years was shown to be a fraud, and this is after it had been deployed for a short time at Walgreens and harmed patients. Therefore, we see an example of both bad business and bad data science. Theranos’s charismatic CEO, Elizabeth Holmes, was characterized as having a hypnotic spell over her board, who was very persuaded by her and did not want to look at issues when they were brought up by Theranos’s laboratorians.
Bad Data Science can also be Research Misconduct
Given my expertise and background, I was horrified numerous times throughout the book at the way Theranos executives abused and disregarded data, measurement, and various parts of the scientific process, which is basically the definition of bad data science. I provide workshops and seminars to research and healthcare professionals teaching them the connection between “keeping your data organized and protected” and “preventing research misconduct”.
Because Theranos is a real case study, I thought it would be helpful to identify a few places where their research misconduct could have been prevented through best practices data science approaches, and highlight them, which I do in this series of blog posts. However, in reality, I believe probably nothing could have prevented what happened with Theranos. The book documents numerous times where Holmes just straight up lied.
Holmes’ lies led some junior laboratorians to quit. But even junior researchers can create and maintain data documentation. In fact, the book focuses on a couple of junior laboratorians who actually did finally take down the fraud. Therefore, some of the advice I will give in this blog post series is about ways to try to create order when the leader is cultivating chaos, and how to document truth when the leader is spewing lies.
Timeline of Theranos’ Bad Data Science
Because Theranos’ rise and fall was over about 15 years, they went through different phases of research misconduct/inappropriate data stewardship, depending upon what stage of evolution they were in. I separated their timeline into four phases, and I will characterize these phases by the primary kind of misconduct they did at each phase:
Phase 1: No Product
Phase 2: No Data Stewardship
Phase 3: No Governance Structure
Phase 4: No Administrative Barrier Between Research and Clinical Data
My blog posts will demonstrate how this kind of overall business and research misconduct led to data science misconduct, and how leaders can take specific steps to prevent this kind of bad data science.
Updated June 12, 2021. Added blog menu July 19, 2021. Updated banners July 11, 2023. Updated banners July 8, 2024.
Read all posts in the series!
This is my first blog post in a series of five where I talk about data-related misconduct outlined in the book “Bad Blood”, and provide guidance on how to prevent it.