John Carryrou’s book, “Bad Blood”, shows how bad leadership leads to bad data if you are running a business that will succeed or fail on the basis of its data. The book chronicles the rise and fall of laboratory startup Theranos, and there were so many lessons for data scientists about misconduct in his book that I had to write about it in a few blog posts.
This post will focus on how Theranos’s CEO, Elizabeth Holmes, took steps to dismantle checks and balances in leadership, thus removing any sort of data governance structure. So while bad leadership leads to bad data, no leadership generally leads to no data, which is eventually what happened – and Theranos was found to be a fraud. My last post talked about how there was no data stewardship at Theranos; this one talks about how there was no governance structure to facilitate that data stewardship.
Theranos’s Bad Leadership Leads to Bad Data
In my last post, I talked about how junior laboratorian Erika Cheung struggled with the demands of leaders at Theranos to cherrypick the data that was coming out of the Theranos machines. This directive was literally coming from the top, CEO Holmes, so Cheung felt she had to comply. Holmes had made everyone essentially report directly to her – so Cheung was not dealing with any sort of middle manager.
But in reality, this directive to cherrypick data (or do p-hacking, which is a way of cherrypicking data) is usually done at a lower level – often by the laboratory director. This was the case with the famous p-hacking scandal by Brian Wansink, who was head of the Food and Brand Lab at Cornell. There are published accounts of him e-mailing his lab assistants and asking them to,
“…think of all the different ways you can cut the data and analyze subsets of it to see where [the relationship we are looking for] holds”.
The Wansink fraud basically exposes that Cornell had no better governance structure than Theranos, because people at the top did not realize what Wansink was doing. Again, bad leadership leads to bad data all the way up the chain of command. If the people at the top had had rules about data governance that they had made Wansink follow, he would have been found immediately, because his behavior was so egregious. The e-mails demonstrate that he didn’t even try to hide it.
Preventing Bad Data Throughout the Leadership Structure
Some may dismiss academia as already fraught with leadership problems, but analytics shops, financial management companies, and other data-intensive businesses really have no excuse. They absolutely need to enforce data stewardship all the way up a leadership structure – but many do not know how to do it. And if you tell them, many of them complain and resist, which is not good leadership.
First, I will give the example of the data lake I worked at when I went to the US Army. Coming in in 2008, I would give their data stewardship and governance a B+, which is high for me. Here are the things they had in place:
- An administrative manual laying out policies for requesting and using the data from the data lake
- An administrative process for providing data to researchers from the data lake
- Extensive data documentation and well-documented code from one expert SAS programmer who was a consultant wanting to retire
But critically, they were missing the following:
- Any sort of plan to fund the maintenance of the data lake from any budget
- Any business case or goal for the data lake
- A delineation of the functions of the data lake, and who did them
- A governance structure
- Data curation files that could be shared with the researchers (not just programmers) to provide data transparency
- Data use agreements
If you read my book on SAS data warehousing, you’ll learn how to do all of these things to harden your warehouse and make sure it thrives. In my short time at the US Army, I was able to put in place many of the items missing to try to save the data lake. I was aware that bad leadership leads to bad data, and I was trying to throw together a business case for the data lake, as well as a governance structure.
I was not able, however, to put in place enough things to keep the data lake going. My boss actively fought the establishment of a business case around the lake, and formalizing the governance. I think people had acted like royalty around this data project too long, and did not want to be reined in by policy. But it was too late – as with Theranos, regulatory issues were getting in our way.
In the process of all this, I learned the hard way that at the US Army, men generally don’t want women to control data lakes. So my boss and other men worked together to dissolve the data lake in an effort to get rid of me, and it’s not there anymore.
They had some sort of third-party data software, and some “IT guy” who kluged it together to keep it going with the help of the software company. They made all the women – no matter what they did – into “secretaries”. I tried to do a presentation, and had trouble getting my laptop to work with the PowerPoint projector. So you can see, you don’t have to be Theranos to have bad leadership lead to bad data – or no data.
I keep watching this company out of the corner of my eye. While other finance companies are talking about fintech and analytics, I’m just hoping the building this company is in is adequately fireproofed!
Updated June 12, 2021. Added slider menu July 19, 2021.
As a data science leader, what should you put in place so your organization doesn’t end up a data mess like startup Theranos? This blog posts provides guidance.