Why can't we talk about data? (Part 2)

I wrote a blog post yesterday and got a really thought-provoking response on Twitter so I thought I'd expand on it some.

It wasn't much of a post. I could reduce it down to "I think this is hard and I want to get better" with a little bit of a call to arms thrown in.

Being connected to an expert hive-mind on the internet is pretty great though. I don't know much, but I know I love data, so I'll try to reflect what I picked up from others since yesterday and hopefully frame where I'm at for future discussions and work in real life.

What are we talking about?

'Data' is too broad. Also labouring the point about a distinction between data and information probably isn't helpful. There's a huge amount of varied work across the 'data and information' spectrum. I think I need a richer and more specific vocabulary here to be able to work with the experts, let alone my desired broad audience. Credit to Michael for the spectrum description, Paul for describing the variety of work, and Sophie for calling out the pointless data vs. information debate.

What am I talking about?

At the time of writing I'm the Head of Data and Search at the UK Parliament. One of the things I'm responsible for is a data strategy for our organisations. I believe it's both possible and necessary to have an appropriately consistent set of principles and practices that apply to all of our data and the varied ways in which it is used.

Still, I have some particular areas of interest:

This is structured, relatively low-volume, but high complexity stuff. When I go somewhere else to be the Head of Data and Search I expect I'd still be particularly interested in this kind of data work. This is because, to me, these are the basics that you need to get right in order to get the stuff done with the least pain possible. It is about maximising utility, and reducing friction.

Services and infrastructure

However, a utilitarian approach to data can come with problems. I thought Sophie describes this particularly well in the context of building 'digital' services [2]:

...in “digital” they expect us to iterate and release quickly, and we’re like “now we need to go away and spend many many months/years restructuring all our data and information at vast expense before you can have the [service]”

No controversy or criticism of anybody intended here.

I expect the chances are going away to restructure all your data and information at vast expense isn't going to fly. So, I believe there can be a tension here between the 'digital' work and the 'data' work. Maybe lots of people have resolved this already, but hey I haven't [3].

Let's say it's ok, or better yet let's say it's desirable, to release your new 'digital' service when it's 60% complete, and then you iterate. In contrast, you don't want to release with your reference data being 60% correct. That would be bad.

To me, 'minimum viable data' is a small but irreducible core. You iterate outwards rather than throwing it all away.

^ note this is my only strong idea maybe I haven't expressed it as well as I could have done but I reckon it's good.

I advocate for working on fundamental data infrastructure and digital services at the same time. In an organisation that does more than one thing, there's going to be a need to share the kinds of data I've described across services. Starting from scratch for each service increases cost at the very least. Also, the amount of effort an organisation puts in to data integration is an indicator of how bad its data is because it should be as little as possible - high effort data integration is like data failure demand.

I have views [4] on what a cross-cutting data infrastructure team looks like, and I think having data expertise in multidisciplinary 'digital' teams is important as well. I don't believe that multidisciplinary teams focused on a particular service are best placed to lead on data infrastructure.

I suppose distance from users for infrastructure teams comes with risks, particularly because several people responded to my initial blog post to say that context was essential, and that rather than trying to talk about data in the abstract it's better to talk about it in terms of problems to solve or outcomes to achieve (thanks Franklin, Amanda, Ann, and Chris). I'm sure there's a way to balance this effectively.

Tips and tactics

I had several responses to my post about sticking with the metaphors, and hey maybe I need better metaphors. Not everybody likes baking. My favourite advice was from Beck, who said that passion for the subject helps. I'll try ramping it up.

I was also reassured to see (and be reminded of) lots of experienced, brilliant people who are either going through the same thing or who've already had success. I look forward to finding out more.

If you're interested in talking about data with me in real life please get in touch.


Footnotes

[1] Full disclosure: Maybe this is master data I'm not sure. Regardless, I'd advocate for a collaborative, conversational practice to derive and maintain this kind of data regardless of the domain

[2] I defined 'digital' as a broad, rich, and difficult multidisciplinary design practice for the purpose of my initial post

[3] It's not just 'digital' work either. I work in waterfall / PRINCE2 environments as well and the tension with data there is the same

[4] Maybe something for a subsequent post

Why can't we talk about data?

This is an attempt to write down a conversation I've had with several people of the past couple of months.

For my next career goal I'd like to get really good at talking about data. I'm not very good at the moment. Maybe this post should be titled 'Why can't *I* talk about data?' but it feels more comfortable to suggest that everybody could be better at least, rather than just me.

If you're really good at talking about data please get in touch and tell me all your secrets.

Have you been trying to talk about data and experienced the blank stares? That moment when your carefully crafted analogy crumbles into dust? That conversation about the difference between data and information where people conclude that data is information and your inner voice says "just give it up sunshine it's no use"? Maybe somebody says "it's too technical for me"? Or "it sounds boring"? Perhaps you had to explain why data wasn't a subset of technology? Or maybe you like lording it over the uninitiated like some dark data mage, using your power to create eldritch management information dashboards that nobody but you understands (you are a bad person)?

My hunch is that talking about data in ways that resonate with the broadest possible audience is going to become increasingly important in society.

I can't understand why it seems to be so difficult.

My job broadly involves Digital, Data, and Technology. Oh, and people. Accepting that people are the most complicated of the four and putting them to one side, I think that data should be the easiest to understand and discuss of the three.

Technology is hard, by which I mean things like understanding what is actually going on inside a computer. That 'what is actually going on' has moved further and further away from the average person during my lifetime. In many ways that's great, with far less need to worry about the nuts and bolts, and more time to focus on doing the unreal dystopian science fiction convenience hellworld thing.

Digital is hard, if by digital for this purpose I mean a design practice that's really rich and involves things like (deep breath) understanding users and working to overcome bias and releasing things that aren't perfect and committing to iterative development and working collaboratively in multidisciplinary teams and trusting people and so on. You don't know what's going to happen.

Data is hard, but maybe it needn't be. Data isn't moving further and further away from the average person - it's right there next to you, all the time. Working with data should be less of a psychological workout than the digital thing too, I think?

For example, which of the following should be easiest to answer without using the internet?
  • How does a solid state drive work?
  • What is the impact on a user of this service not matching their mental model?
  • What might happen if somebody gets my postcode wrong?

I know data can be really messy and complicated and huge. I know that sometimes you need to do actual maths. However, for much of the fundamental infrastructure work that's required in the public sector, there are broad, human conversations to be had to help solve basic problems. The barrier to understanding them is an illusion.

I think talking about data is the answer, and maybe there's an opportunity to develop a better collective vocabulary for working with it as a result.