Why can't we talk about data? (Part 2)

I wrote a blog post yesterday and got a really thought-provoking response on Twitter so I thought I'd expand on it some.

It wasn't much of a post. I could reduce it down to "I think this is hard and I want to get better" with a little bit of a call to arms thrown in.

Being connected to an expert hive-mind on the internet is pretty great though. I don't know much, but I know I love data, so I'll try to reflect what I picked up from others since yesterday and hopefully frame where I'm at for future discussions and work in real life.

What are we talking about?

'Data' is too broad. Also labouring the point about a distinction between data and information probably isn't helpful. There's a huge amount of varied work across the 'data and information' spectrum. I think I need a richer and more specific vocabulary here to be able to work with the experts, let alone my desired broad audience. Credit to Michael for the spectrum description, Paul for describing the variety of work, and Sophie for calling out the pointless data vs. information debate.

What am I talking about?

At the time of writing I'm the Head of Data and Search at the UK Parliament. One of the things I'm responsible for is a data strategy for our organisations. I believe it's both possible and necessary to have an appropriately consistent set of principles and practices that apply to all of our data and the varied ways in which it is used.

Still, I have some particular areas of interest:

This is structured, relatively low-volume, but high complexity stuff. When I go somewhere else to be the Head of Data and Search I expect I'd still be particularly interested in this kind of data work. This is because, to me, these are the basics that you need to get right in order to get the stuff done with the least pain possible. It is about maximising utility, and reducing friction.

Services and infrastructure

However, a utilitarian approach to data can come with problems. I thought Sophie describes this particularly well in the context of building 'digital' services [2]:

...in “digital” they expect us to iterate and release quickly, and we’re like “now we need to go away and spend many many months/years restructuring all our data and information at vast expense before you can have the [service]”

No controversy or criticism of anybody intended here.

I expect the chances are going away to restructure all your data and information at vast expense isn't going to fly. So, I believe there can be a tension here between the 'digital' work and the 'data' work. Maybe lots of people have resolved this already, but hey I haven't [3].

Let's say it's ok, or better yet let's say it's desirable, to release your new 'digital' service when it's 60% complete, and then you iterate. In contrast, you don't want to release with your reference data being 60% correct. That would be bad.

To me, 'minimum viable data' is a small but irreducible core. You iterate outwards rather than throwing it all away.

^ note this is my only strong idea maybe I haven't expressed it as well as I could have done but I reckon it's good.

I advocate for working on fundamental data infrastructure and digital services at the same time. In an organisation that does more than one thing, there's going to be a need to share the kinds of data I've described across services. Starting from scratch for each service increases cost at the very least. Also, the amount of effort an organisation puts in to data integration is an indicator of how bad its data is because it should be as little as possible - high effort data integration is like data failure demand.

I have views [4] on what a cross-cutting data infrastructure team looks like, and I think having data expertise in multidisciplinary 'digital' teams is important as well. I don't believe that multidisciplinary teams focused on a particular service are best placed to lead on data infrastructure.

I suppose distance from users for infrastructure teams comes with risks, particularly because several people responded to my initial blog post to say that context was essential, and that rather than trying to talk about data in the abstract it's better to talk about it in terms of problems to solve or outcomes to achieve (thanks Franklin, Amanda, Ann, and Chris). I'm sure there's a way to balance this effectively.

Tips and tactics

I had several responses to my post about sticking with the metaphors, and hey maybe I need better metaphors. Not everybody likes baking. My favourite advice was from Beck, who said that passion for the subject helps. I'll try ramping it up.

I was also reassured to see (and be reminded of) lots of experienced, brilliant people who are either going through the same thing or who've already had success. I look forward to finding out more.

If you're interested in talking about data with me in real life please get in touch.


[1] Full disclosure: Maybe this is master data I'm not sure. Regardless, I'd advocate for a collaborative, conversational practice to derive and maintain this kind of data regardless of the domain

[2] I defined 'digital' as a broad, rich, and difficult multidisciplinary design practice for the purpose of my initial post

[3] It's not just 'digital' work either. I work in waterfall / PRINCE2 environments as well and the tension with data there is the same

[4] Maybe something for a subsequent post