Data science, as an emerging profession, is often treated as more of an art than a science. Many professions, such as law or accounting, have established deliverables with known outcomes - everyone knows what they’ll get when they ask for a contract or a P&L report. But in data science, there is still enormous variability in what is expected.
Variability can be useful, but it makes leaders uneasy. Without the knowledge or experience to understand the sources of their discomfort, leaders often start questioning the data upon which insights are produced. Once this happens, the best data scientists in the world are not going to have an impact.
The feeling of discomfort brought about by this uncertainty is called ‘data anxiety’. It’s a creeping feeling that something is not correct in your data, or that opportunities are being missed. Uncertainties, anxieties, and concerns with data science are common.
All anxiety, but especially data anxiety, can be debilitating and an impediment to progress. But, data anxiety can also be a call to action that inspires change. The good news is that trust in your data is attainable - data anxiety doesn’t have to be an ever-present and normal part of doing business.
Trust begins with a routine
Specific data routines can help to navigate uncertainty and reduce generalised, data-related anxieties.
A data routine is a set of actions that are performed every time a dataset is ingested or a data analysis is done. For example - checking when this dataset was last updated, ensuring the correct filters have been applied, and assessing the risk of statistical bias.
Data routines usually start off as manually-handled tasks that become more automated as the business matures. A data routine can decrease data anxiety by allowing people to become comfortable with the steps of data analysis, and by showing clearly the assumptions and limitations of a dataset or of an analysis technique. For the data scientist, adhering to a statistically robust data routine is the key to ensuring that your analyses and insights have the impact they deserve.
However, in most cases (even in massive tech companies), data science and analytics teams generally roll their own data routines as required, improvising with the resources they have available. The result is that every insight and analysis is based on different ‘truths’, as every improvised data preparation process morphs and transforms in a different way.
Without transparent, known, and standardised data routines across your business, it’s like your analytics are being presented through warped funhouse mirrors. This variability is what makes leaders question data science insights. The uncertainty of an improvised data routine creates data anxiety, and paralyses an organisation’s decision making capacity.
Even if you aren’t sure what exactly your data scientists are doing, if you know that they’re following a trustworthy data routine, then it is rational to trust their results. It’s like trusting a baker to create a cake, even if you might not understand the exact chemistry of baking.
Three recommendations for an effective data routine
1. Conduct routine checks for data quality before carrying out an analysis.
Every user of data science outputs should be able to see the results of the routine checks run by the analyst before preparing insights. This gives the user confidence that they understand the basis of any recommendations. If it’s not clear what routine checks need to be run, there are generalist tools that can give an overview of data trustworthiness.
2. Create a skeptical checklist for any analyses.
A skeptical checklist means a set of checks you can do that would defend your work against the most adversarial person you can imagine. Every insight should be defensible without relying on the trust of the user.
3. Note methodological flaws and limitations.
Being honest about the compromises made during an analysis builds confidence within the leadership team, and allows them to make decisions with the appropriate knowledge of uncertainty. Sometimes there’s a really interesting direction, but the path remains foggy with the available data. Clear communication, including communication of flaws and uncertainties, is key for the development of trust.
Trust in your data is crucial to implementing an effective data science strategy. However, trust doesn’t happen by default. Breakdowns in communication are unfortunately possible and expecting data science advancements to be “obvious” is weak. A careful approach to data health is needed to prevent data anxiety and to ensure a smooth integration between data science and business functions.
If you are interested in AI tools for improving confidence in your data, check out our Laboratory/a> - a beta tool we’re working on to help the community improve its data health and data confidence.
Jac Davis and Jegar Pitchforth are data science consultants with a shared 15 years experience in data science. We’ll be following up with a series of investigations into data anxiety and data leadership. If you’re interested, follow along to see posts on: strategies that excellent teams use to make data decisions, why performance reviews aren’t fixing your data problem, risk factors for a data anxious organisation, and treatments for data anxiety.