Data: 20 years of hurt

I went to the Institute for Government's 'data bites' event a couple of weeks ago - an evening of four short talks from folks working in public service with data. It was great, thank you to Gavin and the team.

I was particularly struck by the fourth talk, by Yvonne Gallagher from the National Audit Office (NAO). Yvonne was talking about the challenges of using data in government, in advance of an NAO report coming out later this month. You can watch the talk for yourself. It was eight minutes long with eight minutes for questions.

My main impression from Yvonne's talk was that 'data' has been a problem across government for over twenty years. I felt an overriding sensation of 'enough of this it's really time to sort it out now folks'. 

I heard Yvonne say that there had been multiple strategies launched over the last two decades to fix the problems, and I'm paraphrasing, but I heard that the time for launching strategies to fix the problem was over and it was time to actually do something. This made me uncomfortable, not because I don't do things but rather because I've been working on a data strategy [1].  

I also heard Yvonne say that thirty years ago, things had been better. There was an established practice of data modelling, data management, cataloguing and the like. I don't know if this is true, because I was busy watching 'Rude Dog and the Dweebs' and so on. Let's take it as truth. Yvonne was listing practices used back then that facilitated things working, and working together, through collaborative effort, documentation and widespread understanding. These practices are deeply uncool and unfashionable at the moment [2]. Forgotten, even.

I am increasingly convinced that it's time to be deeply uncool and unfashionable. To be boring [3].

Being boring

I am trying to work through an idea, or collection of ideas, where 'data' is a distinct practice that sits alongside 'digital' and 'technology', complementing them both. I've written about it already. Nobody has said "oh yeah Dan you are right take my money" yet, so clearly it needs further work.

I believe we [4] have problems with the language we use when we talk about 'data', and that it's too broad an umbrella to be meaningful. Leigh wrote a post about this (it's great). Leigh also proposed some data archetypes recently. Using Leigh's archetypes as a starting point, I think that the problems we face across government (or across public service, which I prefer) mostly come from:
  • The Register
  • The Database
  • The Description

In my opinion, these are the types of data that facilitate the absolute basics. If they aren't done well then impenetrable silos result.

Now, I believe that 'data' has this unwarranted mystique about it in general that people mistake for computer magic. At the root of the types of data I've listed above is a broad, human understanding and a set of human practices - like ownership for example. Not magic, just taking responsibility.

So, I suggest that over the past 20 years we outsourced too much of our collective responsibility to people who do (or claim to be able to do) computer magic. It's easy to point a finger at a Big IT company who've failed to deliver an outcome at great public expense, so lets do that.

*points finger at Big IT company who've failed to deliver an outcome at great public expense*

BUT WAIT I'm actually advocating for retaining a greater degree of ownership of the fundamentals of our organisations and how we describe them. So that a supplier doesn't imperfectly represent these things on our behalf when they don't have the expertise or long-term incentive to do so, and also have to do the heavy-lifting to work out what they are in the first place, probably from scratch.

The same applies when you have an in-house team, or are working closely with smaller agency partners. Hopefully in a multidisciplinary way. Who does the data modelling in a multidisciplinary team? Don't leave it to computer magic - it needs to have an explicit representation somewhere beyond a database only one person came up with [5]. How do the descriptions get done, beyond the content? Do you have a librarian [6]?

Librarians, yeah?

People joke about tech bros [7] coming up with ideas for things that already exist, like trains, and libraries. I think the libraries one often misses the point, because a library isn't just a quiet place where you can work that is free of charge (LOL). 

A library is an open, dynamic repository of knowledge that's been organised in a way that is collectively understood, with expert help on hand to aid discovery, and no technical barrier to entry.

The demise of libraries in our society is profoundly sad. And sorry but (as much as I love it) the internet / web hasn't satisfactorily replaced my definition above. There is a lack of balance and access for all, where 'access' means being able to contribute rather than just consume.

Make the thing

Everybody being able to contribute is central to my line of thought. I think that's where this 'data' practice will come from in a multidisciplinary delivery team. The opposite end of the pipe from developing a collective understanding of needs: developing and owning a collective understanding of your essential data. 

Will there be resistance? Probably. I could imagine people saying "hey man we're agile we don't do any work upfront that's waterfall UR so lame" or similar, but I don't think there's anything wrong with being prepared before diving in on the build. Design disciplines advocate for developing a rich understanding. The same should apply for data.

Beyond 'the thing', I do believe we gave away too much responsibility, when really we should have retained and maintained our corporate-level data assets which are more permanent than any individual system.

But, as Steve and Adam made me realise, you need to sweat these assets (so to speak) by putting them to use and managing them like any other product so that they are as good as they can possibly be - so that people want to use them rather than inventing their own.

Pull up your brain

There's work required here that isn't currently happening. It is unfashionable as I've said three times now. The benefits of doing it aren't immediately apparent, so it's a hard sell. We are also working in a time when "technology will save us" is considered a legitimate argument, and arguing the contrary will require some political courage and will [8]. 

Personally I think that's avoidance of responsibility and putting the effort in, and a case could definitely be made for financial savings over time through reduced friction and increased re-use.

I'll continue to work on this. Let me know your thoughts, I'd really appreciate them.


[1] Yvonne Gallagher didn't say this, that's me.

[2] I was reminded of the scene in 'Black Dynamite' where Black Dynamite declares war on the people who deal drugs in the community and the drug dealer says "but Black Dynamite, I deal drugs in the community!"

[3] As in dull. Not like Elon Musk making big holes. Have a link to a song by the Pet Shop Boys. Have a link to a song by Deftones.

[4] I'm using 'we' a shorthand for people working in public service. Let's say all over the world, apart from Estonia.

[5] This isn't sour grapes at not being able to do the Johnny Lee Miller myself. I do know a bit about developing a shared understanding of things.

[6] EDIT: here is a post from Silver about Librarians and the web that I saw just after I pressed 'publish'.

[7] Not a robot version of Bros, sadly.

[8] I work at the UK Parliament at the moment so 'political' here means organisational rather than party.

What does 'data-driven' mean to me?

Words are important. Have you heard this phrase 'data-driven'? I expect you have. I don't like it, but hey there are lots of things I don't like that are really popular nowadays.

One of the issues with buzzy management shorthand phrases that their use actively inhibits a shared understanding. One person's 'data-driven' might not be the same as another's, but they're all there in the boardroom [1] going 

"Franck, our strategical approach is data-driven and leverages machine learning cloud AI capabilities"

"Yes Wayne, 100% data-driven insightisation of our asset base"

"Totally Sandra, leveragise our data-driven operating model"

or similar.

For a while, I was working on a 'data-driven' website, where I think 'data-driven' was being used to mean "there is data from internal applications that automatically goes onto our public facing website"

I always found that a bit strange, because (to my mind) that's just a legitimate way to approach making a website [2] and I don't understand why you would make part of what’s going on behind the pixels on the screen a thing of note. I don't think everybody involved understood that was what the phrase was being used to mean either. 'Data-driven' had become meaningless, and that meaning vacuum got filled with negative connotations and disdain, like a sad golem.

Say what you mean

I always preferred 'data-driven' meaning "we make decisions based on evidence". However, as I've worked on data strategy and (particularly) measurement in the past year, "we make decisions based on evidence" is also problematic.

Why are you making decisions?

Understanding intent and organisation's direction and focus is essential. If this is lacking in any way it can be disproportionately hard to develop goals and measures. Clear statements of intent really help to frame decisions.

So now to me 'data-driven' means "we know where we're going, and we make decisions based on evidence". But it's still not right.

What evidence?

Maybe your data is absolute trash. It's worth really getting into where the data has come from. Maybe you've really gone to town on a rigorous qualitative approach that's disproportionate to the task. Maybe you've taken a qualitative approach that's disproportionate to the task, or that's going to be expensive to establish as a repeated measure.

So now to me 'data-driven' means "we know where we're going, and we make decisions based on sound evidence that is contextually appropriate".

Sure, it's a bit of a mouthful. This is why I'll never have a career in marketing. 

I believe it's always worth taking the time to make sure there's a common understanding though. What does 'data-driven' mean to you?


[1] This is a fictional scenario and Franck, Wayne, and Sandra are fictional characters

[2] Not all websites

Why can't we talk about data? (Part 3)

This is a continuation of my line of thought from my previous two blog posts.

I speculate that there's a gap to be filled with ways of working with data that aren't happening at the moment as far as I'm aware [1].

Isn't it exciting when you notice that ideas have taken on a life of their own? Not my ideas, to be clear.

Those moments when you catch a hint that similar conversations are happening somewhere else and you're hearing part of an underground transmission. Like a story being passed around from person to person before anybody actually writes the book.

Blog posts. Talks. Snatches of something new. I reckon people are going to start to crack this 'talk about data' business fairly soon.

I read the 'Rewired State: 10 years on' blog post by James Darling and Richard Pope. Two paragraphs stuck out for me in particular (emphasis mine):

Legacy IT is still a major problem that the GDS movement hasn’t really addressed. The data layer is still unreformed. It remains an unanswered question if the UK’s ‘service-layer down’ approach, the ‘registry-layer up’ approach of Estonia, or some totally different model will ultimately prove to be the best route to a transformed government.

Both legacy and data lack a place in the new orthodoxy, and in user centred design more broadly. That’s probably because they are difficult and require political capital. It’s hard to justify investment in digital infrastructure when the benefits come in the next electoral cycle or to another department’s budget.

There's a scene in 'Velvet Goldmine' where Christian Bale's character (a young, awkward glam rock devotee at that point in the film) points at his hero on the television and says to his parents "that's me! that's me that is!". I think the data layer is still unreformed! I think data lacks a place in the new orthodoxy and in user centred design more broadly! [2]

A new new orthodoxy?

I met with Leigh from the Open Data Institute this week. We spoke about this broad topic, and the conversation helped me on with my thoughts. Claire joined us, and suggested that 'data' is a decade behind 'digital' in terms of developing and embedding a multidisciplinary working practice. This resonated with me, and I've heard others suggest similar things in recent months (the underground transmission!).

I swear there's something here. Something distinct from 'digital', but complimentary and with porous boundaries.

Technology is about computers. 'Digital' isn't about computers but lots of people still think it is. Most people think 'data' is 100% all about computers but actually it's even less about computers than 'Digital'.

In this multidisciplinary data practice I imagine being good with computers is a secondary skill - a means to an end. The engineering piece for services can be done collaboratively with others, and I'd expect increasingly over time it will become less bespoke [3]. If data lacks a place in that new orthodoxy maybe it's time to revisit some unfashionable roles, define new ones, and hire some librarians. Where I work we've got a couple of libraries. I'm a big fan of librarians.

So, maybe a practice featuring the full spectrum of data related roles, from the structural to the analytical.

A sort of infrastructure

In my second post I described the data I am most interested in, recognising that 'data' is a broad term which would benefit from more detailed definition:

When we talk about 'infrastructure', these ^ are the things that I think are most important.

What is 'infrastructure' to me, though? I'm not thinking of platforms, let alone individual services (no matter how large). There is something here that's more fundamental. Note that when I say fundamental I don't mean 'more important'. I've seen enough unnecessary inter-disciplinary disagreement about relative preeminence first- and second-hand over the past few years.

I just mean that there's something else there - something beyond the platform, or the service, or the contents of the database. Underneath, on top, all around it, in a mirror universe with twinkling stars and comets - you can draw the diagram however you like. It's there already, but it needs to be made more explicit.

Leigh and I spoke about domain modelling. I've got into a habit of avoiding using the term, but it's a great example of a collaborative, human practice for working with data involving a variety of different types of people.

Imagine these models were considered as a corporate-level asset [4]. This is the truth of your organisation [5]. You can use this asset to help build services. This asset is reflected in the platforms you build. It's not an academic exercise. This asset isn't static, and you would have feedback loops to and from your services and platforms because things change. In the public service context, outcomes for users traverse organisational boundaries, so your models would link out to those of other organisations.

For the justifying investment point from James and Richard's post, I believe the case is there to be made. Not working to the map of your organisation's truth, and maintaining the map, is one of the reasons the legacy issue builds up in the first place and is a vicious cycle to be broken. Every system where the models are implicit, hidden in an undocumented database, introduces cost. Every local exception introduces duplication of effort and friction for teams and end users. What is the cost of not having this kind of infrastructure?

I wonder where in an organisation this infrastructure would sit? If you don't have a library you should get one. My point being that the technology area definitely isn't the natural home for this work, and I suggest the digital area isn't either. There would be a collaborative effort between the three, of course. Doing the hard work to break down silos and what have you.

Nonetheless, it would be a hard sell to build up a nationwide infrastructure before delivering any outcomes. I envisage a messy period of compromise and starting small, but hopefully there will be a start [6].


[1] I'm almost certainly channeling Michael and Robert at best here, and plagiarising at worst. Sorry blokes.

[2] As I recall, Christian Bale's parents ignore him. My mum and dad don't know what I do for a living either

[3] I do think there is a place for specialist engineering where a comprehensive understanding of complex data domains is required

[4] 'Corporate-level asset' was Leigh's term, as I recall

[5] Or a truth. Doesn't cover organisational culture, for example

[6] Or a restart, to be fair, with respect and recognition to colleagues who've been here before

Why can't we talk about data? (Part 2)

I wrote a blog post yesterday and got a really thought-provoking response on Twitter so I thought I'd expand on it some.

It wasn't much of a post. I could reduce it down to "I think this is hard and I want to get better" with a little bit of a call to arms thrown in.

Being connected to an expert hive-mind on the internet is pretty great though. I don't know much, but I know I love data, so I'll try to reflect what I picked up from others since yesterday and hopefully frame where I'm at for future discussions and work in real life.

What are we talking about?

'Data' is too broad. Also labouring the point about a distinction between data and information probably isn't helpful. There's a huge amount of varied work across the 'data and information' spectrum. I think I need a richer and more specific vocabulary here to be able to work with the experts, let alone my desired broad audience. Credit to Michael for the spectrum description, Paul for describing the variety of work, and Sophie for calling out the pointless data vs. information debate.

What am I talking about?

At the time of writing I'm the Head of Data and Search at the UK Parliament. One of the things I'm responsible for is a data strategy for our organisations. I believe it's both possible and necessary to have an appropriately consistent set of principles and practices that apply to all of our data and the varied ways in which it is used.

Still, I have some particular areas of interest:

This is structured, relatively low-volume, but high complexity stuff. When I go somewhere else to be the Head of Data and Search I expect I'd still be particularly interested in this kind of data work. This is because, to me, these are the basics that you need to get right in order to get the stuff done with the least pain possible. It is about maximising utility, and reducing friction.

Services and infrastructure

However, a utilitarian approach to data can come with problems. I thought Sophie describes this particularly well in the context of building 'digital' services [2]: “digital” they expect us to iterate and release quickly, and we’re like “now we need to go away and spend many many months/years restructuring all our data and information at vast expense before you can have the [service]”

No controversy or criticism of anybody intended here.

I expect the chances are going away to restructure all your data and information at vast expense isn't going to fly. So, I believe there can be a tension here between the 'digital' work and the 'data' work. Maybe lots of people have resolved this already, but hey I haven't [3].

Let's say it's ok, or better yet let's say it's desirable, to release your new 'digital' service when it's 60% complete, and then you iterate. In contrast, you don't want to release with your reference data being 60% correct. That would be bad.

To me, 'minimum viable data' is a small but irreducible core. You iterate outwards rather than throwing it all away.

^ note this is my only strong idea maybe I haven't expressed it as well as I could have done but I reckon it's good.

I advocate for working on fundamental data infrastructure and digital services at the same time. In an organisation that does more than one thing, there's going to be a need to share the kinds of data I've described across services. Starting from scratch for each service increases cost at the very least. Also, the amount of effort an organisation puts in to data integration is an indicator of how bad its data is because it should be as little as possible - high effort data integration is like data failure demand.

I have views [4] on what a cross-cutting data infrastructure team looks like, and I think having data expertise in multidisciplinary 'digital' teams is important as well. I don't believe that multidisciplinary teams focused on a particular service are best placed to lead on data infrastructure.

I suppose distance from users for infrastructure teams comes with risks, particularly because several people responded to my initial blog post to say that context was essential, and that rather than trying to talk about data in the abstract it's better to talk about it in terms of problems to solve or outcomes to achieve (thanks Franklin, Amanda, Ann, and Chris). I'm sure there's a way to balance this effectively.

Tips and tactics

I had several responses to my post about sticking with the metaphors, and hey maybe I need better metaphors. Not everybody likes baking. My favourite advice was from Beck, who said that passion for the subject helps. I'll try ramping it up.

I was also reassured to see (and be reminded of) lots of experienced, brilliant people who are either going through the same thing or who've already had success. I look forward to finding out more.

If you're interested in talking about data with me in real life please get in touch.


[1] Full disclosure: Maybe this is master data I'm not sure. Regardless, I'd advocate for a collaborative, conversational practice to derive and maintain this kind of data regardless of the domain

[2] I defined 'digital' as a broad, rich, and difficult multidisciplinary design practice for the purpose of my initial post

[3] It's not just 'digital' work either. I work in waterfall / PRINCE2 environments as well and the tension with data there is the same

[4] Maybe something for a subsequent post

Why can't we talk about data?

This is an attempt to write down a conversation I've had with several people of the past couple of months.

For my next career goal I'd like to get really good at talking about data. I'm not very good at the moment. Maybe this post should be titled 'Why can't *I* talk about data?' but it feels more comfortable to suggest that everybody could be better at least, rather than just me.

If you're really good at talking about data please get in touch and tell me all your secrets.

Have you been trying to talk about data and experienced the blank stares? That moment when your carefully crafted analogy crumbles into dust? That conversation about the difference between data and information where people conclude that data is information and your inner voice says "just give it up sunshine it's no use"? Maybe somebody says "it's too technical for me"? Or "it sounds boring"? Perhaps you had to explain why data wasn't a subset of technology? Or maybe you like lording it over the uninitiated like some dark data mage, using your power to create eldritch management information dashboards that nobody but you understands (you are a bad person)?

My hunch is that talking about data in ways that resonate with the broadest possible audience is going to become increasingly important in society.

I can't understand why it seems to be so difficult.

My job broadly involves Digital, Data, and Technology. Oh, and people. Accepting that people are the most complicated of the four and putting them to one side, I think that data should be the easiest to understand and discuss of the three.

Technology is hard, by which I mean things like understanding what is actually going on inside a computer. That 'what is actually going on' has moved further and further away from the average person during my lifetime. In many ways that's great, with far less need to worry about the nuts and bolts, and more time to focus on doing the unreal dystopian science fiction convenience hellworld thing.

Digital is hard, if by digital for this purpose I mean a design practice that's really rich and involves things like (deep breath) understanding users and working to overcome bias and releasing things that aren't perfect and committing to iterative development and working collaboratively in multidisciplinary teams and trusting people and so on. You don't know what's going to happen.

Data is hard, but maybe it needn't be. Data isn't moving further and further away from the average person - it's right there next to you, all the time. Working with data should be less of a psychological workout than the digital thing too, I think?

For example, which of the following should be easiest to answer without using the internet?
  • How does a solid state drive work?
  • What is the impact on a user of this service not matching their mental model?
  • What might happen if somebody gets my postcode wrong?

I know data can be really messy and complicated and huge. I know that sometimes you need to do actual maths. However, for much of the fundamental infrastructure work that's required in the public sector, there are broad, human conversations to be had to help solve basic problems. The barrier to understanding them is an illusion.

I think talking about data is the answer, and maybe there's an opportunity to develop a better collective vocabulary for working with it as a result.