Citizens Advice Data Science team month-notes #3 (October)

We are blogging about the work we do. We intend to write a post each month with contributions from various members of the team. Here’s our previous post.

We haven’t managed to post for a while because we’ve been especially busy. We’re working as a team to get around that in future. Dan’s going to try transcribing oral updates from the team so that we can all be contributing to writing in the open on the internet.

As a team we believe in continuous improvement, building capability, and focusing on meeting data users’ needs. The opportunities we have for improving the way we work with data at Citizens Advice aren’t unique to our organisation. It’d be great if the practical approaches we take are useful to others.

Data Glossary

Josh (Data Architect)

Earlier this year we ran a survey for all staff to understand the experience of using data at Citizens Advice. This gave us valuable insights into opportunities for improvement, revealing that people want data to be quicker to find and easier to understand.

One of the ways in which we’re trying to help is through the creation of our organisation wide Data Glossary. This is a hybrid of a traditional business glossary, data dictionary, and data catalog. It has a number of different components and aims to be a central hub to answer data related questions from across the organisation. We’ve created this in Google Sheets for now, as it was quick to create, change, and share widely across the organisation.

Here’s an outline of the sections we have so far and their purpose:

Terms

This section is what you’d expect to see from a business glossary but also includes some high level technical information. It lists our main data entities, describes what they mean in the context of the organisation, synonyms, which systems they can be found in, and the suggested way to uniquely identify them.

Its main purpose is to encourage consistent terminology and language, as well as promoting data standards. It’s also been an important collaboration tool for our team to discuss with people from across the organisation on what their understanding of these terms are.

Data reports

We created this section as a way to make it easier for people to find the reports they’re looking for. We’ve listed our main products, explaining what they can help with and which team can help answer any questions. It's particularly useful for any new members of staff as it gives them one place to look when trying to understand what data is widely used. It also provides a single place for anyone who’s trying to find a report that has been mentioned that they didn’t know about.

Data sets

This part is the start of our journey towards building a shared metadata framework. It’s a collection of our data assets, describing what they are, where they can be found, who has responsibility for maintaining and links to supporting sources, such as the data asset information document we’ve recently launched (more to come on this in a later post!)

We think this section has the biggest potential for making data more accessible across the organisation and has opportunities for automation with our data platform.

The main purpose is for people to be able to search for a dataset they’re looking for and reduce the time spent searching for a link or the right team to speak to.

We also have sections with training videos, team-specific supporting documentation and links to external data sources.

We’re continually revising these to improve their usefulness.

What’s next?

We need to measure the current usefulness of the Data Glossary. For this we will start research with users on their experience using it. The data collected from this will influence any future changes we make.

One proposed change we plan to make is to link out to data models for our main entities. The intent here is to raise the profile of these data models and provide a place to reference when creating existing data concepts in a new location.

We’ll also be monitoring the sustainability of keeping this data in Google Sheets, as well as exploring opportunities to automate the capturing and updating of metadata.

Technology growth in this space has led to a number of emerging platforms offering great solutions for metadata management and we’ll explore whether we can make use of any of these in the future.

Though it’s early days for our Data Glossary, we’re seeing a consistent level of people using it and we think it’s making our data more accessible.

Learning from our regular open sessions

Dan (Head of Data Science)

Since April Josh and I have been running regular sessions over Zoom about data architecture and data strategy. These sessions are open to all our colleagues at National Citizens Advice. After a couple of months we reflected on how we could improve them. We thought the main issue was that they weren’t inclusive or diverse enough.

We had been running the session at the same time and day each week. We changed this so that the schedule is a 5 week cycle of Tuesday / Wednesday / Thursday / Friday on subsequent weeks and then a week off. The sessions are all at different times of day to accommodate different working patterns, and they're not all at lunch time.

We also wanted to bring in external speakers so that it wasn’t just Josh and I doing the talking. We had 8 different speakers from other organisations on a broad and fascinating range of data topics. Several of these sessions resulted in follow up calls to get into more detail on topics, like a session on data governance with Jenny.

We recognised that not everybody is comfortable asking questions on a call, so we allow for that when it’s a session with Q+A but we’ve also run more creative workshop sessions using Google Jamboard. We put more effort into communications and publicity too. Ahead of every session we email the invite list to tell them what we’re going to be covering so people can choose if they’re interested, and we include the recording and any materials from the previous session too.

Our open sessions are giving us a regular opportunity to showcase the wide range of data work that we do at Citizens Advice. It also provides an opportunity to get wider input - the sessions have been really helpful to me in developing a data strategy for example. We’ll keep reflecting on how they’re going and making improvements.

We want the barriers to entry for working with data to be as low as possible and work hard as a team to achieve that. As one of our guest speakers Adam put it we are normalising talking about data.

Interesting links

“Why data scientists shouldn’t need to know Kubernetes” via Jon