tag:danbarrett.posthaven.com,2013:/posts dasbarrett 2024-03-06T13:01:26Z Dan Barrett tag:danbarrett.posthaven.com,2013:Post/2094773 2024-03-06T13:01:25Z 2024-03-06T13:01:26Z Start the week with data
I am the Head of Data Science at Citizens Advice. In a job like mine you delivery and achievement tends to happen through the team. I don't have many things I can say I did all by myself. Thinking back on the last year or so though there is one thing that I think has been particularly successful that I made happen. It's a weekly open forum called 'Start the week with data'.

I wrote about data conversations before. These kinds of forums evolve in my experience. Sometimes you decide they need a change or don't need to happen anymore. Sometimes you think they need to happen but others just aren't feeling it for whatever reason. Generally I like to stick at things for a while before deciding it's time to stop. Start the week with data has been going since May 2022. It was last year when it really came into its own. So yes, sticking at it.

Start the week with data is informed by the principles I set out in that data conversations blog post, especially "conversation not presentation" and "keep it frequent".

Every Monday morning we have a 30 minute video call with a speaker on something to do with data at Citizens Advice. Usually it's 15 minutes speaking to some slides and then the rest of the call for that all important conversation. I chair or maybe it's conduct the session. I sign off every session with "thank you for starting your week with data" which is neat and hasn't got old yet.

The invite list started as the Senior Leadership team but has grown organically to be wider than that. Every session is recorded and linked from a rolling log which is a Google doc. Slides are linked there too if they were used. All the presentations are good, but especially good ones are marked on the log with a fire emoji. Finally there's an email at the end of every week to the invite list telling them what's coming up on Monday and linking to the materials from that week's session.

The emails are especially repetitious. Also my inbox is mostly people declining Start the week with data. But every week numbers are pretty good - 25 to 40 people from all over this fairly big and diverse organisation.

The thing that made Start the week with data in 2023 so good was the variety in the programme. This forum isn't just the data experts talking about the expert data work they've done (although we have that too). We had 35 sessions with 24 different speakers from teams like

  • Operations
  • Finance
  • Business Development
  • Technology
  • Policy and Advocacy
  • Product and Delivery
  • Evaluation
  • Expert Advice and Content
  • and more

The point is that everybody is working with data and can tell an interesting story about it that will be relevant to others.

One of my favourite sessions last year was our Finance team talking about the most successful systems migration I've ever seen. I'd never known a systems migration go well. And there was a strong piece of practical data improvement at the heart of the work, taking the opportunity to change data structures so that they could provide more meaningful insight.

We also get brand new insights at the session, like investigating access gaps for our clients across England and Wales. And we see things that absolutely everybody in the organisation should know, like fundamentals about our client base or how we measure our financial and social impact.

Another benefit of sticking with a forum like this for a while is you get the opportunity to revisit topics and see how they've developed. Or you hear about the next steps in a piece of data work that have been facilitated by something delivered earlier.

Managing to do 35 sessions isn't bad, especially accounting for holidays. It is a fair amount of effort to curate a programme like this, but the fact of doing that helps to build connections and visibility across the organisation for me. There's also something rewarding about encouraging people that they have an interesting story to tell. Every week I say please volunteer a topic if you want to and on the occasions people do come forward it makes me happy. But most of the effort is on me to keep that forward plan going. I think it's worth it though.

Recently a colleague said how much they appreciate the session because it's always relevant to them and their work and it gives them insight into what's happening at Citizens Advice in an effective way. I will take that as a win.

This is the work. It's fairly simple but it does require sustained effort to keep it going. If you're in a role where you're leading data improvement in a fairly large organisation I recommend giving something similar a try.

Thanks for reading.

]]>
Dan Barrett
tag:danbarrett.posthaven.com,2013:Post/1834651 2022-05-27T17:04:05Z 2022-05-27T17:04:06Z Data conversations: some practical examples

We’re learning a great deal on our data journey at Citizens Advice. I think it’s worth sharing where we’re at.

18 months ago I would have recommended “talking about your data more”. I still recommend that of course!

Given experience in the intervening time I’m able to describe something more than that original intent, with some structure to it as well.

Maybe you work in an organisation where everything I describe here happens already. That’s great, I’d love to hear about your experiences.

Maybe the data you work with is different to ours —perhaps larger and updated more frequently, or smaller and updated less frequently. I think that the things I’m describing here could still be useful.

The ‘data’ that I’m talking about here are things like:

  • Channel activity for example website and telephone
  • Client volumes and demographics
  • Topic trends at high- and more detailed levels
  • Client outcomes — experience and satisfaction from survey data

Principles

There are a few guiding principles behind what we’re doing. I think the most important one is

Conversation, not presentation

Everything I’m describing here should encourage everybody to talk about the data. It’s not a one-shot pitch — this is ongoing practice and there’s no end to it. Shared understanding builds over time and feedback is essential.

Talk about the whole picture

We have a complex organisation with many channels and services. We try to make sure that we are talking about the relationships between things rather than focusing on isolated data sets.

Using as few products at possible

We have a main data product called the ‘Service Dashboard’ which brings a wide variety of our data together in one place. We try to use this product to meet the needs of as many audiences as we can.

We have a preference for presenting from this product or other dynamic reports and dashboards over transposing data into a slide deck.

Keep it frequent

We have a weekly pattern. This is appropriate because of the wide variety of data that we have, rather than because of the cadence of our data (where trends play out over months rather than weeks). This frequency keeps what’s happening with out data at the front of peoples’ attention though, and there are a wide variety of topics we can cover.

If you’re working with data and it doesn’t change that much I’d still recommend talking about it once a week.

Repeat and reuse

The forums are porous. We present the same material in multiple places. We use the forums to generate content that people can revisit and share.

Iterate

We learn by doing and regularly reflect on how things are progressing and make changes accordingly — both to our practices and to our data products.

Here are the four practices I recommend trying if you’re not doing them already. This isn’t an exhaustive list and it will develop further I’m sure.

1. The top team conversation

We have a rolling programme of short weekly data updates to our Executive and Directors team. They happen at the beginning of the week as part of an existing ‘start the week’ meeting.

When we started this James (who is part of the Executive team) gave these updates but increasingly we’ve brought in other voices. It’s a collective effort to achieve a weekly update and we have input from the Data Science team, our counterparts in the Impact team, and others particularly from Operations.

It is a conversation because this top team get to provide feedback and set priorities for questions to answer. This has been supported by a fairly regular retrospective.

The same material gets shared with all staff from the National Citizens Advice organisation on Workplace and we will be sharing more widely with the Network of 250+ local Citizens Advice across England and Wales too.

In order to keep to the weekly pattern we need a decent but flexible forward plan, and we try to keep a month ahead of ourselves on that. There can be significant lead time for the some of the work required to answer the questions asked. Seemingly simple things can be complex and vice versa.

This practice has driven some of the most forward-thinking and fresh data work that I’ve been involved in since I started this role in late 2019. We know things that we didn’t know 6 months ago, which is itself a measure of improvement.

This practice also involves the most collective effort and preparation. That feels appropriate. This is the level of the organisation that can be supported through data to make the most significant decisions.

What are the benefits?

  • Builds a collective understanding
  • Drives improvement in our evidence base
  • An opportunity to prioritise based on the most important questions to answer
  • Should lead to more informed decision making

Examples from Citizens Advice

In recent weeks we’ve covered these topics:

  • High level client outcome and activity numbers from across our service for the past year.
  • New data that tells us about the impact of our online advice content.
  • New data that tells us about the variety of different telephone service models used by the Network of local Citizens Advice.
  • New analysis on depth of issues our clients experience and the strong relationships between different issues (for example housing and debt).

2. Data at the start of every meeting

Well, not every meeting. Let’s imagine you’ve got a regular team meeting, or an ‘all hands’ session. The practice here is to do a tight 5 minute update on data at the start of the meeting.

As an example, I do this at the weekly meeting for the leadership team I’m part of. 5 minutes translates to 3 or 4 talking points about our data. Committing to this regular practice means that I have to be engaged with the data that we’re working with — I have to look for patterns and trends to highlight. I can also bring in insights from the other forums.

I write up those 3 or 4 talking points and share the document with the team. This is a further commitment, but it’s worth it because it can be shared with the whole group. They can refer back to it and consider the points in their own time. Also it means that nobody in the group is left out if they aren’t at the meeting. Finally these documents are open, they can be shared more widely if my peers think there’s value in doing so.

What are the benefits?

  • Provides context and helps to break out of siloed thinking
  • Builds a shared understanding
  • Builds expertise in talking about data from a variety of sources and how it interrelates
  • Keeps you curious

Examples from Citizens Advice

Here’s an example Google Doc with 3 real data talking points I’ve covered recently.

3. The regular open forum

This is our most established practice. We began it soon after the start of the first pandemic lockdown in 2020. Tom (chief analyst) wrote about it. It is a fortnightly session that lasts around 45 minutes. It is open to all staff from the National organisation. We get around 30 people on the call each time, from a variety of teams and backgrounds.

We use the Service Dashboard, presenting this to the group and having the data specialists who are responsible for each category of data talking about the latest trends. For example Mankeet (senior data analyst) covers website trends.

We take questions from the group. One of the most valuable aspects of this session is that colleagues from Operations or Policy often provide valuable insight and context for what we’re seeing in the data. It’s very much a conversation.

Of the four practices this is the one that gives many people the opportunity to understand and describe the narrative of what’s happening across our service as trends play out over months and years. There’s an element of oral history to it, which could be seen as a weakness because some of the explanations for patterns that we’ve seen aren’t documented. However, we record the sessions and post them in a dedicated Workplace group so people who can’t attend can participate in their own time. And we see the narrative that gets developed reflected consistently in other work that we do, which is a strength.

What are the benefits?

  • Builds a collective understanding
  • A rich exploration of the data given the expertise involved
  • The narrative stays current and is reflected in other forums
  • Provides early sight of trends and issues that can be highlighted or escalated elsewhere

Examples from Citizens Advice

The Service Dashboard is updated weekly. This forum has established regular content. We look at client trends (numbers and demographics), advice topic trends, website trends (topics, top pages, volumes, search terms), and telephone and webchat trends (volumes). We can compare back to a variety of different time periods but we find comparing year on year to be most valuable because of strong seasonality in our data.

4. The deeper dive

We run a roughly weekly 30 minute session with an agenda that covers a wide variety of data topics. I say roughly weekly because it’s 4 weeks on and 1 week off with the slot being at a different time each week to encourage attendance.

The session is open to all National staff. There’s a fairly large invite list and we tell everybody what we’re going to be covering and share materials in advance. We get around 15 people at each session.

This practice provides an opportunity to go deeper on new analyses and insight, presented by the specialists who have done the work. It also provides an opportunity to talk about our data work ‘behind the scenes’, for example developing new services and standards. It has been really valuable for developing our data strategy work in an open and collaborative way. Finally we’ve had guests from other organisations telling us about their experiences — that’s particularly valuable when you get to hear about shared challenges and how people have approached them.

We record the sessions and post them in a Slack channel. Necessarily this forum generates a fair few slide decks. We share those too, and make sure that they contain links to other resources.

What are the benefits?

  • Showing what really goes in to the work to an audience who wouldn’t otherwise understand
  • Doing justice to data specialists’ effort by having time to go into greater detail
  • Bringing in fresh perspectives for shared problems
  • Developing in the open, particularly strategic work

Examples from Citizens Advice

In recent weeks we’ve covered these topics:

  • Data strategy principles framework — how we’re getting owners for these principles from across the organisation
  • A new ‘service’ approach to data, building initial versions of services that we can iterate. Examples include a service for data about our volunteers and a service for data about the Network of local Citizens Advice.
  • Collaboration between Data Science and Product to decommission a legacy system, and establish a new primary source of data for reuse by multiple systems as a result.
  • A new way to visualise client volume data, showing it across our entire service at a high level for the first time.

One day I’d like to put together a ‘playbook’ for data specialist work but it’s a daunting task. I can break it down into smaller pieces though. This post is a first attempt at that.

Please get in touch if you’ve got any thoughts — you can find me on Twitter and occasionally on LinkedIn.

Thank you for reading.


Creative Commons Licence
Data conversations: some practical examples by Dan Barrett is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
]]>
Dan Barrett
tag:danbarrett.posthaven.com,2013:Post/1764939 2021-11-29T19:53:04Z 2021-11-29T19:53:04Z Citizens Advice Data Science team month-notes #3 (October)

We are blogging about the work we do. We intend to write a post each month with contributions from various members of the team. Here’s our previous post

We haven’t managed to post for a while because we’ve been especially busy. We’re working as a team to get around that in future. Dan’s going to try transcribing oral updates from the team so that we can all be contributing to writing in the open on the internet. 

As a team we believe in continuous improvement, building capability, and focusing on meeting data users’ needs. The opportunities we have for improving the way we work with data at Citizens Advice aren’t unique to our organisation. It’d be great if the practical approaches we take are useful to others.

Data Glossary

Josh (Data Architect)

Earlier this year we ran a survey for all staff to understand the experience of using data at Citizens Advice. This gave us valuable insights into opportunities for improvement, revealing that people want data to be quicker to find and easier to understand. 

One of the ways in which we’re trying to help is through the creation of our organisation wide Data Glossary. This is a hybrid of a traditional business glossary, data dictionary, and data catalog. It has a number of different components and aims to be a central hub to answer data related questions from across the organisation. We’ve created this in Google Sheets for now, as it was quick to create, change, and share widely across the organisation.

Here’s an outline of the sections we have so far and their purpose:

Terms

This section is what you’d expect to see from a business glossary but also includes some high level technical information. It lists our main data entities, describes what they mean in the context of the organisation, synonyms, which systems they can be found in, and the suggested way to uniquely identify them. 

Its main purpose is to encourage consistent terminology and language, as well as promoting data standards. It’s also been an important collaboration tool for our team to discuss with people from across the organisation on what their understanding of these terms are.

Data reports 

We created this section as a way to make it easier for people to find the reports they’re looking for. We’ve listed our main products, explaining what they can help with and which team can help answer any questions. It's particularly useful for any new members of staff as it gives them one place to look when trying to understand what data is widely used. It also provides a single place for anyone who’s trying to find a report that has been mentioned that they didn’t know about. 

Data sets 

This part is the start of our journey towards building a shared metadata framework. It’s a collection of our data assets, describing what they are, where they can be found, who has responsibility for maintaining and links to supporting sources, such as the data asset information document we’ve recently launched (more to come on this in a later post!)

We think this section has the biggest potential for making data more accessible across the organisation and has opportunities for automation with our data platform

The main purpose is for people to be able to search for a dataset they’re looking for and reduce the time spent searching for a link or the right team to speak to.

We also have sections with training videos, team-specific supporting documentation and links to external data sources. 

We’re continually revising these to improve their usefulness. 

What’s next?

We need to measure the current usefulness of the Data Glossary. For this we will start research with users on their experience using it. The data collected from this will influence any future changes we make. 

One proposed change we plan to make is to link out to data models for our main entities. The intent here is to raise the profile of these data models and provide a place to reference when creating existing data concepts in a new location.

We’ll also be monitoring the sustainability of keeping this data in Google Sheets, as well as exploring opportunities to automate the capturing and updating of metadata. 

Technology growth in this space has led to a number of emerging platforms offering great solutions for metadata management and we’ll explore whether we can make use of any of these in the future.

Though it’s early days for our Data Glossary, we’re seeing a consistent level of people using it and we think it’s making our data more accessible.


Learning from our regular open sessions

Dan (Head of Data Science)

Since April Josh and I have been running regular sessions over Zoom about data architecture and data strategy. These sessions are open to all our colleagues at National Citizens Advice. After a couple of months we reflected on how we could improve them. We thought the main issue was that they weren’t inclusive or diverse enough.

We had been running the session at the same time and day each week. We changed this so that the schedule is a 5 week cycle of Tuesday / Wednesday / Thursday / Friday on subsequent weeks and then a week off. The sessions are all at different times of day to accommodate different working patterns, and they're not all at lunch time. 

We also wanted to bring in external speakers so that it wasn’t just Josh and I doing the talking. We had 8 different speakers from other organisations on a broad and fascinating range of data topics. Several of these sessions resulted in follow up calls to get into more detail on topics, like a session on data governance with Jenny.

We recognised that not everybody is comfortable asking questions on a call, so we allow for that when it’s a session with Q+A but we’ve also run more creative workshop sessions using Google Jamboard. We put more effort into communications and publicity too. Ahead of every session we email the invite list to tell them what we’re going to be covering so people can choose if they’re interested, and we include the recording and any materials from the previous session too.

Our open sessions are giving us a regular opportunity to showcase the wide range of data work that we do at Citizens Advice. It also provides an opportunity to get wider input - the sessions have been really helpful to me in developing a data strategy for example. We’ll keep reflecting on how they’re going and making improvements.

We want the barriers to entry for working with data to be as low as possible and work hard as a team to achieve that. As one of our guest speakers Adam put it we are normalising talking about data.


Interesting links

Why data scientists shouldn’t need to know Kubernetes” via Jon

]]>
Dan Barrett
tag:danbarrett.posthaven.com,2013:Post/1734167 2021-09-10T16:50:05Z 2021-09-10T16:50:06Z Citizens Advice Data Science team month-notes #2 (August 2021)

We wanted to start blogging about the work we do. We intend to write a post each month with contributions from various members of the team. Here’s our previous post from July.

As a team we believe in continuous improvement, building capability, and focusing on meeting data users’ needs. The opportunities we have for improving the way we work with data at Citizens Advice aren’t unique to our organisation. It’d be great if the practical approaches we take are useful to others.

Building better data products that meet users’ needs

Hamza (Senior Data Analyst)

The Citizens Advice service is made up of Citizens Advice - the national charity - and a network of over 260 local Citizens Advice members. Our network members are all independent charities delivering services to help people across England and Wales. At the national charity we have a responsibility to support each local charity to deliver their services in the best way possible.

There is a leadership self-assessment every year. This is where we ask each of the 260+ charities to rate themselves across organisational areas such as Governance, People Management, Equality, and Financial Management. At the same time, performance assessors from the national charity carry out the same assessment on each local charity. Where it’s evident that there are opportunities for improvements the national charity works closely with the local charity and suggests specific courses of action. 

The leadership self-assessment is vital because it embeds risk-based thinking, helping local charities to assure that their organisation is well run. It also accredits local charities to external quality standards that are recognised by funders.

In the national organisation we have a dedicated team that looks after all activities related to the leadership self-assessment. This team also includes a small number of performance assessors whose role is to assess each of our network members at least once a year. 

To help with monitoring these assessments, the team was using a tracker document in Google Sheets. The aim behind this tracker was to help plan resources, track how many assessments have been completed and for which charities, and analyse results from the assessments. 

As this tracker was also being used by several other teams, over time it became congested with multiple tabs, showing different pieces of information in different formats. In some cases, there were even multiple tabs that would show the same information but in different formats. It became evident that the tracker was losing focus and there was just too much information in different formats, not to mention that there were also broken formulas in some places too! It sounded like a job for the Data Science team... 

We collaborated extensively with the leadership self-assessment team and met with them to understand the problems they were facing and their data needs. Our task was to create a new, more consolidated and effective tracker tool that would enable its users to better plan resources and gain richer insights into the assessments. 

We discussed things such as

  • which is the best platform to use (Google Sheets or Tableau)?

  • which data is needed? 

  • which format should the data be presented in? 

  • how should the data be updated?

  • who should have edit access to the data?

Having these initial discussions really helped us to build the right tool for their needs. For example, the stakeholders had a stronger preference for Google Sheets, therefore we built the new tool in Google Sheets and chose not to use Tableau. 

We adopted an agile approach to building the tool. We didn’t just go away and build a complete tool in one go for them and then think ‘job done’. Instead, we built it in steps. We built a version 1, presented this to the team, received feedback and then built version 2, presented this to the team, received feedback and so forth. In each version, we kept on refining the tool based on the feedback, making it easier to use. 

We focused more on building clear visualizations as opposed to just building tables of data. We also focused on making the tool interactive, for example building features that would allow users to extract specific data for their own needs. We try to encourage this kind of ‘self service’ as much as possible.

In the end, after several iterations and meetings with the stakeholders, we had built a tracker tool in Google Sheets that could better serve the needs of the leadership self-assessment team. The tool has been created in such a way that it involves little to no effort in terms of maintenance as all the data is updated automatically. The tracker tool is linked to a Google Form that is completed by the performance assessors for recording scores from each of their assessments. Therefore, as each new response comes in, all the visuals and data summaries in the tracker are updated automatically (this is one great benefit for using Google Forms for automatically analysing form data in any way you want). 

The new tracker tool we created has received great feedback from the end users. They feel they have something more focused, and that it helps them to answer the questions of the data that they need to ask.


Connecting data to give us new insights

Sarah (Channel Reporting Manager)

Josh (Data Architect)

Jon (Data Analyst)

How we identified the opportunity

There have been a few new hires recently in the Data Science team which has helped increase our bandwidth and allowed us to take on additional work, like solving long-standing problems. The formation of the Channel Reporting team has resulted in improved reporting, and we found that in order to take it to the next level and better meet the needs of users across the organisation we needed a greater level of detail in the data about advisors. 

Alongside this, the team also has data architecture skills now, which resulted in a detailed system context map being drawn for how data moves between products and teams at Citizens Advice. We call it ‘the data landscape’. This map allows us to hone in on improvement opportunities, for example we discovered a reliance on spreadsheets for managing advisor data.

What did we discover

We’ve historically had decentralised management of advisor data. This has worked fine for reporting and analysis, but we’re always looking to improve. We didn’t have a unique way to identify the same advisors across various systems and it has created inconsistencies in how the data is structured. Having this data would mean that our users could get a better and more joined up view of the performance of their service. It would help show activity data across channels, rather than viewing each channel in relative isolation.

What we did

We needed to change how we viewed the relationship between an advisor and the systems they use. For this we created a conceptual data model for advisors, which helped show the commonality across systems and what we could use as a common identifier. We found a number of different systems in which advisor profiles existed. The one where most of our advisors could be found was in our user authentication product. Advisors use this product in order to access various systems and it provides the best opportunity to have a common identifier for an advisor. 

The majority of our channel systems are connected to the user authentication product. This allowed us to quickly map the common ID to the advisors in those systems. In a system that isn’t currently connected we used our data skills to get to 80% of mapping identifiers to all advisors by matching google sheets via various VLOOKUPs.

This left us with just 1,100 advisors in that system that didn’t have a direct login or name match to an ID. We knew that many of these advisors probably did have a central user authentication ID - but weren’t matching due to inconsistencies in naming conventions and data entry errors - for example an advisor with local login “London Jon L” might have an existing ID, but registered against “SELondon Jon L”, or “London JJon L”, or “London Jonathan L”. [1]

Rather than match all of these manually (which would have taken us about a minute per advisor, based on the advisors we did match by hand), we wrote a quick script to help us look for these loose matches that may have been caused by human error. Our script used local advisor metadata such as office location to narrow its search criteria to a few hundred possible IDs out of thousands, and then identified close matches using a fuzzy matching algorithm. 

Fuzzy matching is a simple technique for matching text that’s approximately but not quite the same. This is super useful for identifying close matches caused by typos and nickname variations, which is a pretty common problem faced by volunteering charities! We used a standard fuzzy matching package for Python, but the algorithm can be found implemented in most languages, and is even implemented as a standard formula in Excel.  For example, with the package we used, the typo “Josh Tedgett” matched up against “Josh Tredgett” with a similarity score of 0.97, and less immediately obvious mismatches caused by nicknames like “Becky Harlow” and “Rebecca Harlow” still returned a high similarity score of 0.77. [2]

While this method returned a fair amount of false positives, it was much faster to work off these loose matches and either confirm or discard the script’s matches than it was to match all 1,100 advisors by hand. After writing the script (which took about an hour) and running it, it took us about an hour to match 600 of the remaining 1100 advisors, and discard the rest as having no likely matches. At the original rate of about a minute per advisor, it would have taken us about 18 hours to work through the same list manually.

Once we’d completed the matching process, we still had the challenge of maintaining this data set going forward. How were we going to make sure new advisors had the ID assigned to them? This is where colleagues in Technology have stepped in. They proposed an automated solution via our internal ticketing system. This will take the effort away from our Operations colleagues to maintain the data. Instead, an orchestration tool will pick up the new user requests and assign the ID in the consolidated dataset. It will also allow us to monitor the quality of the process as it can pick up any errors that have occurred.

How will it work and what this helps with

Once this work is complete, we will be able to drill down to a greater level of detail in our data products. We recently set up a new report that analyses performance for local Citizen Advice charities across all channels (phone, chat, and email). This report gives a more joined up view than before, and with our improved data the next iteration will give our network members deeper insight. 

Things to think about in the future

We think the new process will provide better insight for our users and more robust data management. There are still some aspects to work on. Part of the reason why we were able to progress on this work is that we weren’t aiming for perfection. We focused on making something better now and not trying to create anything that was difficult to upgrade or replace in the future.


Thanks for reading. Feel free to get in touch with any of us on Twitter or LinkedIn.


Footnotes

[1] This is not a real example

[2] This is not a real example

]]>
Dan Barrett
tag:danbarrett.posthaven.com,2013:Post/1721068 2021-08-05T19:49:31Z 2021-08-05T19:57:42Z Citizens Advice Data Science team month-notes #1 (July 2021)

We wanted to start blogging about the work we do. We intend to write a post each month with contributions from various members of the team. 

As a team we believe in continuous improvement, building capability, and focusing on meeting data users’ needs. The opportunities we have for improving the way we work with data at Citizens Advice aren’t unique to our organisation. It’d be great if the practical approaches we take are useful to others.

Enabling self-service for users

Sarah (Channel Reporting Manager)

The Citizens Advice service in England and Wales is made up of the National organisation and over 260 independent local Citizens Advice charities. Our work as the National charity involves helping to coordinate the efforts of the local charities so they can provide national level services as well as pooling expertise, tools, and insight. 

One example of this collaboration can be seen through our efforts to help manage Adviceline, our national phone service for England. The Data Science team currently provides an Adviceline Tableau dashboard to help local charities understand call volume and other performance and demand data. However, over 230 local Citizens Advice provide the service and they have a wide variety of reporting requirements. It’s impractical for the Channel Reporting team (currently 2 people) to provide over 230 unique daily reports. The Adviceline dashboard we provide is insufficient to meet the individual needs of local charities, even if we do our best to respond to user requests. 

So we decided to enable Tableau web editing, allowing all users to edit dashboards as they see fit. This will help to democratise data, putting additional power in local offices’ hands, whilst freeing up Data Science team capacity and allowing us to provide more advanced models and better quality data and reporting for other channels like web chat and email. 

Two potential barriers to successful implementation were the impact on load time for reports due to server capacity, and wanting the new features to be accessible to  staff in local charities. To tackle the first problem we sent surveys to help estimate demand for the feature from local charities and also estimate the maximum concurrent Tableau users. We then used this information to simulate the potential impact on server load time. To tackle data skills we will work with our Facilities team to deliver training via live sessions. These sessions will be recorded and later published on our internal learning platform to allow participants to revisit information. 

Tableau web editing gives us an opportunity to facilitate the local charities’ data needs like never before. There’s also the opportunity to build skills in the local Citizens Advice network so they can use our trusted data to provide high quality reporting that meets their individual needs. We’ll continue to monitor the implementation to measure impact and look for further ways to improve reporting across the network. 

Automating away the busywork - reducing a 1 - 3 hour daily process into 10 minutes of automation

Jon (Data Analyst)

I'm part of the Channel Reporting Team. Among other things, we provide data and reporting for remote help services across phone, webchat and email. By making this information readily available, the 260+ local offices in our network can access the data required for deploying their staff and volunteers as effectively as possible.

Sarah wrote about our Adviceline reporting, and we also provide reporting for the Welsh Advicelink service. There's always been a pressing need for this kind of daily reporting - especially with the pandemic causing a historic rise in people seeking help through remote services - so we can understand how well our organisation is meeting clients’ needs. 

We want to be as efficient as possible in this work, and make sure that our data is trusted. We recently made some improvements to how we provide this data.

Our reporting workflow was previously very complicated due to a number of factors:

  • Access to even basic data from our telephony system requires manual effort.

  • Operational adjustments were (and still are) made everyday to try and adapt to changing conditions - and our reporting needed to reflect that. The way in which the team had to deliver their reporting changed multiple times during the pandemic, making it difficult to ratify and streamline a single process for daily reporting.

  • With so many different teams and offices having input on the reporting, our reports needed to pull in information maintained by many different stakeholders across several disparate sources. These sources include various SQL servers, a bunch of different google sheets, the aforementioned manual effort for data access, and daily call record CSVs sent to an email account.

As a result of these constantly shifting conditions and ad hoc adjustments, we ended up with a highly manual process that was difficult to run, and error prone if you didn’t have the requisite skills and knowledge.

When everything went right, the process would take about 45 minutes to complete every day. If there were complications the process would take anywhere from 1 to 3 hours to execute.

With almost a hundred different manual steps along the way the process was error prone. If you made a mistake at any point during the process, you had to catch it immediately or face restarting from scratch. If you didn’t catch an error it could take an entire day to roll back.

For a process producing a daily report with a noon deadline, this was clearly unacceptable. We knew we had to make it easier on ourselves, and reduce the possibility of errors along the way.

Using some simple Python scripting and Google Apps scripting we managed to distill the sequence into a largely automated process that takes ten minutes a day. Here's some of the things we did to achieve this:

  • We built a Selenium Browser Automation script that crawled the telephone records we needed, and dumped them straight into our python scripts for ingest.

  • We used Google Apps scripts to automate sourcing the data we needed from Google Sheets, and using pandas to perform the necessary lookups and data manipulation.

  • We used Pyodbc to give our Python scripts access to the SQL servers they needed to both draw data from and update.

  • We wrote a series of unit tests ensuring that the processes executed by the script all pass basic common-sense checks and don't contradict our knowledge about our telephone systems’ operational setup. In addition to these unit tests, the script still asks for a human to sign off on the changes it's about to make - while this means the script isn't fully automated, it means that we're aware of any recent changes made by the process, and can catch stuff that simple common-sense logic tests can't.

Pulling these changes into a Python prototype took about a day, but iterating and readying the script into a production-ready state took the better part of a week. The unit tests themselves were written iteratively over several weeks, and the scripts still get tweaks whenever we see room for improvement. The end result is that we no longer have to dread wrestling with our reporting processes in the morning, and we save hours of development time every week which we can spend on higher value work.

User feedback is invaluable thought not always easy to get - data survey

Josh (Data Architect)

Throughout this month the data team has been running a survey to understand what it’s like for staff to use data at Citizens Advice. We used Google Forms. 

The survey is a way to gain insight into how accessible our data is, the level of data capability across the organisation, and which data products are currently helping people with decision making. It can be a challenge to get people to participate in surveys. Thankfully we had support from executives and our internal communications team which gave us the response rate we aimed for.

We’ve now been able to calculate baseline metrics that give us an indication of how easy our data is to find, how confident people feel using data, and how often data is used in decision making. Of these three things, finding data seems to be the biggest pain point. Though this isn’t unique to Citizens Advice it is a challenge for us to solve. The survey also has a lot of insightful qualitative data that we’ll be digging into next month and following up with more focused research on specific areas.

Overall it’s been a great way to collect thoughts about our data from across the organisation. It’s also providing an indication of the areas we should be focusing which will make the experience of using data better for all. We will repeat the survey quarterly so that we can measure the impact of the improvements we make. To encourage the same level of participation as in this round of the survey we’ll be demonstrating how we’ve acted on the feedback to make a meaningful difference. We intend to extend the survey to the network of local Citizens Advice as well.

Building the team

Dan (Head of Data Science)

I love recruitment. Having the opportunity to build and lead a team is a privilege. This month we’ve had a new starter Rahul who is a Web Analyst working in the multidisciplinary Content Platform team. This is a model that I want to establish when it’s the best fit for the work. Having a dedicated data specialist in a product team is much better for meeting that product- and team’s data needs than being at arm’s length.

Rahul is being inducted remotely and I think that’s more challenging than being in person. On the other hand, I definitely prefer interviewing remotely. Maybe there’s something about it being a leveller, balancing out the dynamic and making it less intimidating than having to visit somebody’s office. Maybe a child will get locked out of somebody’s house in their pyjamas and it’ll be alright to rescue them midway through the call. 

Did I say somebody’s house? Ok I meant my house.

We had a successful interview campaign for a new Data Science Lead this month. I really enjoyed interviewing with Josh, and with Maz (Director of Content and Expert Advice). I have definitely learned from others  in the past year and improved my interviewing practice. There have been small things, like introducing yourself with your pronouns, or pasting the text of each question into the video chat function. There have been larger things, like putting the effort into producing a scoring guide for each question so the panel is working from a shared, explicit understanding. Writing the scoring guide takes me considerably longer than the questions. Finally there are things that mean you’re being explicit about your organisation’s values with candidates. I’ve included a question about equity, diversity and inclusion in interviews for a while but I’ve never been completely happy with how it was phrased. For this campaign Maz introduced a new formulation that was a big improvement.


Thanks for reading. Feel free to get in touch with any of us on Twitter or LinkedIn.

Interesting links 

]]>
Dan Barrett
tag:danbarrett.posthaven.com,2013:Post/1692311 2021-05-18T17:55:01Z 2021-05-19T12:59:57Z How can I make sure it's inclusive?

The title of this post is a question for me to answer and nobody else. This isn't a request for help or advice. I haven't been writing on the internet recently but something came up last week and I thought I could do some of that 'working out loud' like I used to. 

The session

About six weeks ago I started a weekly 'open session' about data architecture and data strategy with my colleague Josh (data architect). The idea was to start small and see if a regular 30 minute video call with an open invite would be a useful vehicle for sharing work and progress, and testing our work with colleagues. Josh and I were alternating each week on who would take the lead, so data architecture one week and data strategy the next. 

We recorded the sessions so that people who couldn't attend could watch them back when convenient [1]. For the first session I invited people who had expressed an interest in the 'data' Slack channel. Over the next few weeks the invite list started to grow organically. We had positive feedback and the quality of conversation and contributions was really good. I hadn't advertised the forum widely but was planning to because it felt like we had something credible and constructive going.

At Citizens Advice we care deeply about inclusive working practices and diverse teams. We need to make sure we represent the population and communities that we serve. Our data work must be inclusive, and I'm particularly motivated to embed this into the development of our data strategy.

The problem

At the session last week my colleague Phil made a closing remark along the lines of "could we get more people in the group who aren't white men?".

And Phil was right of course. It had occurred to me already, but I hadn't called it out. I should have called it out myself by then, 6 weeks in to the work.

My boss had watched one of the videos and thought it was good, but when I mentioned Phil's corrective he said something along the lines of "yes it did look like six white men talking about data."

There were more than six people on the call (we were getting 15-20 attendees), but yes the white men were doing the talking and yes most of the people in the overall group were white men.

This isn't to diminish the thought and enthusiasm that my colleagues have put into this forum so far. It's made the work better, no question. But there is a diversity issue, and it's mine to resolve.

The working out loud

I need to work out how we got here. I am responsible for this, after all, and it's early enough to correct it. 

We have another regular forum where we talk about our data trends. That grew organically as well. I don't think there's an apparent lack of diversity on that call. Also I ran an open session on data strategy before. I don't recall a lack of diversity in that group either. And when I say 'lack of diversity' it doesn't mean it's good enough, just that it's not "six white men talking about data" bad.

Maybe there's a barrier to entry for the topics themselves. Perhaps the language is wrong. Are the words 'architecture' and 'strategy' creating a barrier to entry? Although I haven't found either of these areas of data work to be the sole preserve of white men, both in terms of my work experience and the people I look up to in the sector. Pretty much all those people I look up to aren't white men.

So no, I don't think the topics themselves are excluding people. But we could work to make them more accessible.

I suspect the forum itself isn't quite right, with the format being based on conversation and having the confidence to ask questions. There's also my role in this forum. I know the subject matter and it's a situation where I'm particularly confident. If a question is asked of me I will come up with an answer for it then. And it might not be a particularly good answer but it will probably sound like it is. 

I think when it's my turn to lead the session I put myself in the centre of it too much, rather than the substance of the work. I don't think Josh does that, to his credit.

So I think it would be good to try other ways to contribute. Questions in advance. Questions afterwards. Splitting in to smaller groups for discussion. Not defaulting to answering questions myself, and listening more.

Then there's something about growing the audience. I should have acted sooner. So when I do start to publicise it more widely I need to watch how it develops. I could ask everybody on the invite list to invite one colleague who isn't a white man, for example.

Finally, while data architecture and data strategy are mine and Josh's responsibility to lead in our organisation, I will encourage other people to present and lead the discussion. This could include people from outside of the organisation.

That was pretty useful. Thanks for reading.

Footnotes

[1] If you've met me in real life you know that I can speak really slowly so watching a video where I'm talking at 1.5x speed works pretty well.

]]>
Dan Barrett
tag:danbarrett.posthaven.com,2013:Post/1543431 2020-05-19T18:29:54Z 2020-08-01T01:08:59Z What I learned from building a data product in a crisis

I work at Citizens Advice. The Covid-19 pandemic has had a dramatic impact across our services and seen an incredible response from staff in the organisation, for example

  • Unprecedented demand for our website content
  • Creation of new, trusted expert advice content at speed
  • Stopping the provision of advice in person at the 250+ independent local Citizens Advice locations due to lockdown measures
  • A resulting shift to providing advice through our other channels, such as telephone
  • A pronounced change in the patterns of issues that our clients are coming to us with

    I lead the Data team. I work closely with my colleague Tom, who leads the Impact team. Broadly speaking, my team is responsible for making data available, and Tom's team are responsible for asking questions of it.

    On 19 March Tom and I were asked to draw together a wide variety of operational data into a single place, to help management and leadership navigate the crisis. It would include activity and demand data for various channels, and data on client demographics and breakdown by issue. It would also include a summary for a quick, headline view.

    Citizens Advice colleagues have spoken about our data and what it is telling us on social media and in the news in the past couple of months. Rather than talking about our data myself, in this post I wanted to reflect on the process and experience of "making the thing", rather than what's in that thing or what it means.

    It's been a really rewarding experience and I have learned lessons from it that I thought would be worth sharing.

    Get something out there

    It was crisis time, and we received an understandably loose brief as a result. We brought together a group from both of our teams, and came up with a first iteration in 4 days.

    We made a spreadsheet. Spreadsheets are great.

    It is a spreadsheet that's intended to be read by humans, rather than a spreadsheet that's a data source. We collectively agreed to making a spreadsheet, having been given a steer not to build something in the proprietary data analysis tool that's widely used at Citizens Advice. Initially we thought it could be a slide deck, and I had a strong view it should be in Excel, but the consensus was to go with Google sheets. G-Suite is what we use at Citizens Advice, and sheets has the advantage of being easy to share and less of a daily overhead to maintain than a slide deck.

    Ideally we would have had a better understanding of user needs, and some clearer questions to be asked of the data. Regardless, our first version was well received, and put us in a position to improve it regularly. Me aside, the team we put together has a really good understanding of how Citizens Advice works, and I think this helped with our initial and continued success.

    Ask for expert input

    We have a wide range of expertise at Citizens Advice, including a really strong digital team in the form of Customer Journey. I was able to ask for input from a designer and content expert before we circulated the first version of the report. This really improved the product, helping it to be clearer and easier to navigate. Later, when we were trying to understand how the report was being used, I had input from a user researcher on survey questions.

    Even if you aren't working in a 'model' multidisciplinary team, it doesn't mean you shouldn't take a multidisciplinary approach. And if you don't have this kind of expertise to hand, just asking somebody for an objective view and opening yourself up to constructive criticism is always good.

    Work to understand your users

    Again, it wasn't ideal to get into this after the fact. But it's essential to try, and I thought what we ended up doing was neat and pragmatic.

    We were able to get a log of all the staff who had accessed the report, and when. From this, we were able to build a picture of when the report was being used - early in the morning, justifying the demand for an early update of the data. It also gave us a list of people to survey.

    I wrote a short survey in Google forms and around 40% of our users responded. I was most interested in whether the report had resulted in people taking decisions, or doing things differently. For me, this is the benchmark for all reporting - if you're not taking decisions as a result, why go to the expense of doing it?

    So seeing this pie chart [1] showing that over 70% of people had taken decisions or actions as a result of this work was really gratifying:


    The follow up question was detail about what those decisions were. This gave us an understanding of the breadth of decisions, the variety of our users, and the relevance of the report across multiple parts of our organisation.

    The next thing I was most interested in was whether users could understand what was in the report. I think the barrier to entry for data products needs to be as low as possible, and that data focused teams can tend to take understanding for granted. This pie chart indicates that the report was pretty legible, but that further work could be done:

    Being able to do these two humble tests meant a great deal to me.

    Keep iterating

    The report has been constantly changed over the past couple of months, in response to new requests, changing circumstances, and a team attitude of continuous improvement. It's been good to be part of a group where it's understood from the outset that the thing will never be complete.

    I think this has been supported by the team practice, which is curious and conversational. Speaking of which...

    Have regular discussion

    I think the team practice has been the most valuable thing to emerge from this work. We have a weekly discussion about what the data in the report is telling us, facilitated by Tom. This encourages ownership and recognises expertise, because the people closest to the data are driving the conversation. Some really valuable lines of thought and investigation have come out of these meetings. We thought it was so good that we started inviting guests, and they've found it valuable too [2].

    We separated out the mechanics of developing the product from the discussion of the data. We have had a fortnightly meeting to do that, led by me. That's worked well, I think in particular because the team have a high degree of autonomy, with individuals trusted to work on their own parts of the report with some light oversight from the most experienced team members.

    Take things out and do less

    The first stage of the crisis called for daily updates. This is unsustainable over time, and developing the report helped us to understand the various states of the data we have. Some data is easy to automate, whereas some requires a large amount of manual intervention and also changes shape regularly, making it labour intensive to report on. This has been a helpful secondary outcome of the work, because it can help inform where we put effort to improve our underlying systems and practices.

    Not everything we've done was useful, or used. So we've taken things out. In future, I will work to understand what's being used in a more methodical way. I missed an opportunity to ask a question in the first survey - "which tabs are you using?" and a pick list. We also tried to track usage using Google analytics, but it was unsatisfactory.

    Due to the iterative nature of the work and the regular discussion of patterns, it also became clear to the team that the time periods for significant changes in the data were longer than daily. If we hadn't kept developing and discussing, we might have been saddled with a daily reporting burden for longer. This also gives us a sense of the various cadences of the different operational data we have at Citizens Advice. Not everything behaves like the instantaneous spike we'd expect to see on our website after a government announcement, for example. Our service is broad and varied, and to my mind that variety helps us to meet the complex needs of our clients.

    Help people to understand the questions they need to ask

    I think one of the most important areas of work for data maturity in public service is to help people to formulate the questions they want to ask of the data. It helps to focus the discussion and build of any data product. Generally speaking, "give me all the data" isn't the best starting point and there's no intrinsic good in having large amounts of data, hoping that some incredible truth will emerge from it.

    In my experience over the past couple of months these questions have increasingly started coming from colleagues, which is great to see. A polite challenge of "what questions do you want to ask?" has been useful.

    You don't need to serve everybody with one thing

    I think we've resisted putting ever more measures in our report just because it's the first time so much of our data has been together in one place. The survey showed us that our users have different needs, and we recognised these might be better met by other products in future.

    I said earlier about the thing never being complete, but that isn't the same as it not being finished - 'finished' as in set aside in order to move on to the next thing, taking what you learned with you.


    Footnotes

    [1] Better charts are available
    [2] This is what they tell us at least
    ]]>
    Dan Barrett
    tag:danbarrett.posthaven.com,2013:Post/1418024 2019-06-14T19:24:23Z 2019-07-05T15:00:48Z Data: 20 years of hurt I went to the Institute for Government's 'data bites' event a couple of weeks ago - an evening of four short talks from folks working in public service with data. It was great, thank you to Gavin and the team.

    I was particularly struck by the fourth talk, by Yvonne Gallagher from the National Audit Office (NAO). Yvonne was talking about the challenges of using data in government, in advance of an NAO report coming out later this month. You can watch the talk for yourself. It was eight minutes long with eight minutes for questions.

    My main impression from Yvonne's talk was that 'data' has been a problem across government for over twenty years. I felt an overriding sensation of 'enough of this it's really time to sort it out now folks'. 

    I heard Yvonne say that there had been multiple strategies launched over the last two decades to fix the problems, and I'm paraphrasing, but I heard that the time for launching strategies to fix the problem was over and it was time to actually do something. This made me uncomfortable, not because I don't do things but rather because I've been working on a data strategy [1].  

    I also heard Yvonne say that thirty years ago, things had been better. There was an established practice of data modelling, data management, cataloguing and the like. I don't know if this is true, because I was busy watching 'Rude Dog and the Dweebs' and so on. Let's take it as truth. Yvonne was listing practices used back then that facilitated things working, and working together, through collaborative effort, documentation and widespread understanding. These practices are deeply uncool and unfashionable at the moment [2]. Forgotten, even.

    I am increasingly convinced that it's time to be deeply uncool and unfashionable. To be boring [3].

    Being boring

    I am trying to work through an idea, or collection of ideas, where 'data' is a distinct practice that sits alongside 'digital' and 'technology', complementing them both. I've written about it already. Nobody has said "oh yeah Dan you are right take my money" yet, so clearly it needs further work.

    I believe we [4] have problems with the language we use when we talk about 'data', and that it's too broad an umbrella to be meaningful. Leigh wrote a post about this (it's great). Leigh also proposed some data archetypes recently. Using Leigh's archetypes as a starting point, I think that the problems we face across government (or across public service, which I prefer) mostly come from:
    • The Register
    • The Database
    • The Description

    In my opinion, these are the types of data that facilitate the absolute basics. If they aren't done well then impenetrable silos result.

    Now, I believe that 'data' has this unwarranted mystique about it in general that people mistake for computer magic. At the root of the types of data I've listed above is a broad, human understanding and a set of human practices - like ownership for example. Not magic, just taking responsibility.

    So, I suggest that over the past 20 years we outsourced too much of our collective responsibility to people who do (or claim to be able to do) computer magic. It's easy to point a finger at a Big IT company who've failed to deliver an outcome at great public expense, so lets do that.

    *points finger at Big IT company who've failed to deliver an outcome at great public expense*

    BUT WAIT I'm actually advocating for retaining a greater degree of ownership of the fundamentals of our organisations and how we describe them. So that a supplier doesn't imperfectly represent these things on our behalf when they don't have the expertise or long-term incentive to do so, and also have to do the heavy-lifting to work out what they are in the first place, probably from scratch.

    The same applies when you have an in-house team, or are working closely with smaller agency partners. Hopefully in a multidisciplinary way. Who does the data modelling in a multidisciplinary team? Don't leave it to computer magic - it needs to have an explicit representation somewhere beyond a database only one person came up with [5]. How do the descriptions get done, beyond the content? Do you have a librarian [6]?

    Librarians, yeah?

    People joke about tech bros [7] coming up with ideas for things that already exist, like trains, and libraries. I think the libraries one often misses the point, because a library isn't just a quiet place where you can work that is free of charge (LOL). 

    A library is an open, dynamic repository of knowledge that's been organised in a way that is collectively understood, with expert help on hand to aid discovery, and no technical barrier to entry.

    The demise of libraries in our society is profoundly sad. And sorry but (as much as I love it) the internet / web hasn't satisfactorily replaced my definition above. There is a lack of balance and access for all, where 'access' means being able to contribute rather than just consume.

    Make the thing

    Everybody being able to contribute is central to my line of thought. I think that's where this 'data' practice will come from in a multidisciplinary delivery team. The opposite end of the pipe from developing a collective understanding of needs: developing and owning a collective understanding of your essential data. 

    Will there be resistance? Probably. I could imagine people saying "hey man we're agile we don't do any work upfront that's waterfall UR so lame" or similar, but I don't think there's anything wrong with being prepared before diving in on the build. Design disciplines advocate for developing a rich understanding. The same should apply for data.

    Beyond 'the thing', I do believe we gave away too much responsibility, when really we should have retained and maintained our corporate-level data assets which are more permanent than any individual system.

    But, as Steve and Adam made me realise, you need to sweat these assets (so to speak) by putting them to use and managing them like any other product so that they are as good as they can possibly be - so that people want to use them rather than inventing their own.

    Pull up your brain

    There's work required here that isn't currently happening. It is unfashionable as I've said three times now. The benefits of doing it aren't immediately apparent, so it's a hard sell. We are also working in a time when "technology will save us" is considered a legitimate argument, and arguing the contrary will require some political courage and will [8]. 

    Personally I think that's avoidance of responsibility and putting the effort in, and a case could definitely be made for financial savings over time through reduced friction and increased re-use.

    I'll continue to work on this. Let me know your thoughts, I'd really appreciate them.


    Footnotes

    [1] Yvonne Gallagher didn't say this, that's me.

    [2] I was reminded of the scene in 'Black Dynamite' where Black Dynamite declares war on the people who deal drugs in the community and the drug dealer says "but Black Dynamite, I deal drugs in the community!"

    [3] As in dull. Not like Elon Musk making big holes. Have a link to a song by the Pet Shop Boys. Have a link to a song by Deftones.

    [4] I'm using 'we' a shorthand for people working in public service. Let's say all over the world, apart from Estonia.

    [5] This isn't sour grapes at not being able to do the Johnny Lee Miller myself. I do know a bit about developing a shared understanding of things.

    [6] EDIT: here is a post from Silver about Librarians and the web that I saw just after I pressed 'publish'.

    [7] Not a robot version of Bros, sadly.

    [8] I work at the UK Parliament at the moment so 'political' here means organisational rather than party.

    ]]>
    Dan Barrett
    tag:danbarrett.posthaven.com,2013:Post/1410148 2019-05-18T15:38:19Z 2019-05-18T16:01:10Z What does 'data-driven' mean to me?

    Words are important. Have you heard this phrase 'data-driven'? I expect you have. I don't like it, but hey there are lots of things I don't like that are really popular nowadays.

    One of the issues with buzzy management shorthand phrases that their use actively inhibits a shared understanding. One person's 'data-driven' might not be the same as another's, but they're all there in the boardroom [1] going 

    "Franck, our strategical approach is data-driven and leverages machine learning cloud AI capabilities"

    "Yes Wayne, 100% data-driven insightisation of our asset base"

    "Totally Sandra, leveragise our data-driven operating model"

    or similar.

    For a while, I was working on a 'data-driven' website, where I think 'data-driven' was being used to mean "there is data from internal applications that automatically goes onto our public facing website"

    I always found that a bit strange, because (to my mind) that's just a legitimate way to approach making a website [2] and I don't understand why you would make part of what’s going on behind the pixels on the screen a thing of note. I don't think everybody involved understood that was what the phrase was being used to mean either. 'Data-driven' had become meaningless, and that meaning vacuum got filled with negative connotations and disdain, like a sad golem.

    Say what you mean

    I always preferred 'data-driven' meaning "we make decisions based on evidence". However, as I've worked on data strategy and (particularly) measurement in the past year, "we make decisions based on evidence" is also problematic.

    Why are you making decisions?

    Understanding intent and organisation's direction and focus is essential. If this is lacking in any way it can be disproportionately hard to develop goals and measures. Clear statements of intent really help to frame decisions.

    So now to me 'data-driven' means "we know where we're going, and we make decisions based on evidence". But it's still not right.

    What evidence?

    Maybe your data is absolute trash. It's worth really getting into where the data has come from. Maybe you've really gone to town on a rigorous qualitative approach that's disproportionate to the task. Maybe you've taken a qualitative approach that's disproportionate to the task, or that's going to be expensive to establish as a repeated measure.

    So now to me 'data-driven' means "we know where we're going, and we make decisions based on sound evidence that is contextually appropriate".

    Sure, it's a bit of a mouthful. This is why I'll never have a career in marketing. 

    I believe it's always worth taking the time to make sure there's a common understanding though. What does 'data-driven' mean to you?

    Footnotes

    [1] This is a fictional scenario and Franck, Wayne, and Sandra are fictional characters

    [2] Not all websites
    ]]>
    Dan Barrett
    tag:danbarrett.posthaven.com,2013:Post/1383303 2019-03-09T01:04:33Z 2019-03-09T23:20:55Z Why can't we talk about data? (Part 3)

    This is a continuation of my line of thought from my previous two blog posts.

    I speculate that there's a gap to be filled with ways of working with data that aren't happening at the moment as far as I'm aware [1].

    Isn't it exciting when you notice that ideas have taken on a life of their own? Not my ideas, to be clear.

    Those moments when you catch a hint that similar conversations are happening somewhere else and you're hearing part of an underground transmission. Like a story being passed around from person to person before anybody actually writes the book.

    Blog posts. Talks. Snatches of something new. I reckon people are going to start to crack this 'talk about data' business fairly soon.

    I read the 'Rewired State: 10 years on' blog post by James Darling and Richard Pope. Two paragraphs stuck out for me in particular (emphasis mine):

    Legacy IT is still a major problem that the GDS movement hasn’t really addressed. The data layer is still unreformed. It remains an unanswered question if the UK’s ‘service-layer down’ approach, the ‘registry-layer up’ approach of Estonia, or some totally different model will ultimately prove to be the best route to a transformed government.

    Both legacy and data lack a place in the new orthodoxy, and in user centred design more broadly. That’s probably because they are difficult and require political capital. It’s hard to justify investment in digital infrastructure when the benefits come in the next electoral cycle or to another department’s budget.

    There's a scene in 'Velvet Goldmine' where Christian Bale's character (a young, awkward glam rock devotee at that point in the film) points at his hero on the television and says to his parents "that's me! that's me that is!". I think the data layer is still unreformed! I think data lacks a place in the new orthodoxy and in user centred design more broadly! [2]

    A new new orthodoxy?

    I met with Leigh from the Open Data Institute this week. We spoke about this broad topic, and the conversation helped me on with my thoughts. Claire joined us, and suggested that 'data' is a decade behind 'digital' in terms of developing and embedding a multidisciplinary working practice. This resonated with me, and I've heard others suggest similar things in recent months (the underground transmission!).

    I swear there's something here. Something distinct from 'digital', but complimentary and with porous boundaries.

    Technology is about computers. 'Digital' isn't about computers but lots of people still think it is. Most people think 'data' is 100% all about computers but actually it's even less about computers than 'Digital'.

    In this multidisciplinary data practice I imagine being good with computers is a secondary skill - a means to an end. The engineering piece for services can be done collaboratively with others, and I'd expect increasingly over time it will become less bespoke [3]. If data lacks a place in that new orthodoxy maybe it's time to revisit some unfashionable roles, define new ones, and hire some librarians. Where I work we've got a couple of libraries. I'm a big fan of librarians.

    So, maybe a practice featuring the full spectrum of data related roles, from the structural to the analytical.

    A sort of infrastructure

    In my second post I described the data I am most interested in, recognising that 'data' is a broad term which would benefit from more detailed definition:

    When we talk about 'infrastructure', these ^ are the things that I think are most important.

    What is 'infrastructure' to me, though? I'm not thinking of platforms, let alone individual services (no matter how large). There is something here that's more fundamental. Note that when I say fundamental I don't mean 'more important'. I've seen enough unnecessary inter-disciplinary disagreement about relative preeminence first- and second-hand over the past few years.

    I just mean that there's something else there - something beyond the platform, or the service, or the contents of the database. Underneath, on top, all around it, in a mirror universe with twinkling stars and comets - you can draw the diagram however you like. It's there already, but it needs to be made more explicit.

    Leigh and I spoke about domain modelling. I've got into a habit of avoiding using the term, but it's a great example of a collaborative, human practice for working with data involving a variety of different types of people.

    Imagine these models were considered as a corporate-level asset [4]. This is the truth of your organisation [5]. You can use this asset to help build services. This asset is reflected in the platforms you build. It's not an academic exercise. This asset isn't static, and you would have feedback loops to and from your services and platforms because things change. In the public service context, outcomes for users traverse organisational boundaries, so your models would link out to those of other organisations.

    For the justifying investment point from James and Richard's post, I believe the case is there to be made. Not working to the map of your organisation's truth, and maintaining the map, is one of the reasons the legacy issue builds up in the first place and is a vicious cycle to be broken. Every system where the models are implicit, hidden in an undocumented database, introduces cost. Every local exception introduces duplication of effort and friction for teams and end users. What is the cost of not having this kind of infrastructure?

    I wonder where in an organisation this infrastructure would sit? If you don't have a library you should get one. My point being that the technology area definitely isn't the natural home for this work, and I suggest the digital area isn't either. There would be a collaborative effort between the three, of course. Doing the hard work to break down silos and what have you.

    Nonetheless, it would be a hard sell to build up a nationwide infrastructure before delivering any outcomes. I envisage a messy period of compromise and starting small, but hopefully there will be a start [6].


    Footnotes

    [1] I'm almost certainly channeling Michael and Robert at best here, and plagiarising at worst. Sorry blokes.

    [2] As I recall, Christian Bale's parents ignore him. My mum and dad don't know what I do for a living either

    [3] I do think there is a place for specialist engineering where a comprehensive understanding of complex data domains is required

    [4] 'Corporate-level asset' was Leigh's term, as I recall

    [5] Or a truth. Doesn't cover organisational culture, for example

    [6] Or a restart, to be fair, with respect and recognition to colleagues who've been here before
    ]]>
    Dan Barrett
    tag:danbarrett.posthaven.com,2013:Post/1380659 2019-03-03T00:33:19Z 2019-03-03T10:22:07Z Why can't we talk about data? (Part 2)

    I wrote a blog post yesterday and got a really thought-provoking response on Twitter so I thought I'd expand on it some.

    It wasn't much of a post. I could reduce it down to "I think this is hard and I want to get better" with a little bit of a call to arms thrown in.

    Being connected to an expert hive-mind on the internet is pretty great though. I don't know much, but I know I love data, so I'll try to reflect what I picked up from others since yesterday and hopefully frame where I'm at for future discussions and work in real life.

    What are we talking about?

    'Data' is too broad. Also labouring the point about a distinction between data and information probably isn't helpful. There's a huge amount of varied work across the 'data and information' spectrum. I think I need a richer and more specific vocabulary here to be able to work with the experts, let alone my desired broad audience. Credit to Michael for the spectrum description, Paul for describing the variety of work, and Sophie for calling out the pointless data vs. information debate.

    What am I talking about?

    At the time of writing I'm the Head of Data and Search at the UK Parliament. One of the things I'm responsible for is a data strategy for our organisations. I believe it's both possible and necessary to have an appropriately consistent set of principles and practices that apply to all of our data and the varied ways in which it is used.

    Still, I have some particular areas of interest:

    This is structured, relatively low-volume, but high complexity stuff. When I go somewhere else to be the Head of Data and Search I expect I'd still be particularly interested in this kind of data work. This is because, to me, these are the basics that you need to get right in order to get the stuff done with the least pain possible. It is about maximising utility, and reducing friction.

    Services and infrastructure

    However, a utilitarian approach to data can come with problems. I thought Sophie describes this particularly well in the context of building 'digital' services [2]:

    ...in “digital” they expect us to iterate and release quickly, and we’re like “now we need to go away and spend many many months/years restructuring all our data and information at vast expense before you can have the [service]”

    No controversy or criticism of anybody intended here.

    I expect the chances are going away to restructure all your data and information at vast expense isn't going to fly. So, I believe there can be a tension here between the 'digital' work and the 'data' work. Maybe lots of people have resolved this already, but hey I haven't [3].

    Let's say it's ok, or better yet let's say it's desirable, to release your new 'digital' service when it's 60% complete, and then you iterate. In contrast, you don't want to release with your reference data being 60% correct. That would be bad.

    To me, 'minimum viable data' is a small but irreducible core. You iterate outwards rather than throwing it all away.

    ^ note this is my only strong idea maybe I haven't expressed it as well as I could have done but I reckon it's good.

    I advocate for working on fundamental data infrastructure and digital services at the same time. In an organisation that does more than one thing, there's going to be a need to share the kinds of data I've described across services. Starting from scratch for each service increases cost at the very least. Also, the amount of effort an organisation puts in to data integration is an indicator of how bad its data is because it should be as little as possible - high effort data integration is like data failure demand.

    I have views [4] on what a cross-cutting data infrastructure team looks like, and I think having data expertise in multidisciplinary 'digital' teams is important as well. I don't believe that multidisciplinary teams focused on a particular service are best placed to lead on data infrastructure.

    I suppose distance from users for infrastructure teams comes with risks, particularly because several people responded to my initial blog post to say that context was essential, and that rather than trying to talk about data in the abstract it's better to talk about it in terms of problems to solve or outcomes to achieve (thanks Franklin, Amanda, Ann, and Chris). I'm sure there's a way to balance this effectively.

    Tips and tactics

    I had several responses to my post about sticking with the metaphors, and hey maybe I need better metaphors. Not everybody likes baking. My favourite advice was from Beck, who said that passion for the subject helps. I'll try ramping it up.

    I was also reassured to see (and be reminded of) lots of experienced, brilliant people who are either going through the same thing or who've already had success. I look forward to finding out more.

    If you're interested in talking about data with me in real life please get in touch.


    Footnotes

    [1] Full disclosure: Maybe this is master data I'm not sure. Regardless, I'd advocate for a collaborative, conversational practice to derive and maintain this kind of data regardless of the domain

    [2] I defined 'digital' as a broad, rich, and difficult multidisciplinary design practice for the purpose of my initial post

    [3] It's not just 'digital' work either. I work in waterfall / PRINCE2 environments as well and the tension with data there is the same

    [4] Maybe something for a subsequent post
    ]]>
    Dan Barrett
    tag:danbarrett.posthaven.com,2013:Post/1380278 2019-03-02T00:22:56Z 2019-06-03T16:30:19Z Why can't we talk about data?
    This is an attempt to write down a conversation I've had with several people of the past couple of months.

    For my next career goal I'd like to get really good at talking about data. I'm not very good at the moment. Maybe this post should be titled 'Why can't *I* talk about data?' but it feels more comfortable to suggest that everybody could be better at least, rather than just me.

    If you're really good at talking about data please get in touch and tell me all your secrets.

    Have you been trying to talk about data and experienced the blank stares? That moment when your carefully crafted analogy crumbles into dust? That conversation about the difference between data and information where people conclude that data is information and your inner voice says "just give it up sunshine it's no use"? Maybe somebody says "it's too technical for me"? Or "it sounds boring"? Perhaps you had to explain why data wasn't a subset of technology? Or maybe you like lording it over the uninitiated like some dark data mage, using your power to create eldritch management information dashboards that nobody but you understands (you are a bad person)?

    My hunch is that talking about data in ways that resonate with the broadest possible audience is going to become increasingly important in society.

    I can't understand why it seems to be so difficult.

    My job broadly involves Digital, Data, and Technology. Oh, and people. Accepting that people are the most complicated of the four and putting them to one side, I think that data should be the easiest to understand and discuss of the three.

    Technology is hard, by which I mean things like understanding what is actually going on inside a computer. That 'what is actually going on' has moved further and further away from the average person during my lifetime. In many ways that's great, with far less need to worry about the nuts and bolts, and more time to focus on doing the unreal dystopian science fiction convenience hellworld thing.

    Digital is hard, if by digital for this purpose I mean a design practice that's really rich and involves things like (deep breath) understanding users and working to overcome bias and releasing things that aren't perfect and committing to iterative development and working collaboratively in multidisciplinary teams and trusting people and so on. You don't know what's going to happen.

    Data is hard, but maybe it needn't be. Data isn't moving further and further away from the average person - it's right there next to you, all the time. Working with data should be less of a psychological workout than the digital thing too, I think?

    For example, which of the following should be easiest to answer without using the internet?
    • How does a solid state drive work?
    • What is the impact on a user of this service not matching their mental model?
    • What might happen if somebody gets my postcode wrong?

    I know data can be really messy and complicated and huge. I know that sometimes you need to do actual maths. However, for much of the fundamental infrastructure work that's required in the public sector, there are broad, human conversations to be had to help solve basic problems. The barrier to understanding them is an illusion.

    I think talking about data is the answer, and maybe there's an opportunity to develop a better collective vocabulary for working with it as a result.

    ]]>
    Dan Barrett