Citizens Advice Data Science team month-notes #2 (August 2021)

We wanted to start blogging about the work we do. We intend to write a post each month with contributions from various members of the team. Here’s our previous post from July.

As a team we believe in continuous improvement, building capability, and focusing on meeting data users’ needs. The opportunities we have for improving the way we work with data at Citizens Advice aren’t unique to our organisation. It’d be great if the practical approaches we take are useful to others.

Building better data products that meet users’ needs

Hamza (Senior Data Analyst)

The Citizens Advice service is made up of Citizens Advice - the national charity - and a network of over 260 local Citizens Advice members. Our network members are all independent charities delivering services to help people across England and Wales. At the national charity we have a responsibility to support each local charity to deliver their services in the best way possible.

There is a leadership self-assessment every year. This is where we ask each of the 260+ charities to rate themselves across organisational areas such as Governance, People Management, Equality, and Financial Management. At the same time, performance assessors from the national charity carry out the same assessment on each local charity. Where it’s evident that there are opportunities for improvements the national charity works closely with the local charity and suggests specific courses of action. 

The leadership self-assessment is vital because it embeds risk-based thinking, helping local charities to assure that their organisation is well run. It also accredits local charities to external quality standards that are recognised by funders.

In the national organisation we have a dedicated team that looks after all activities related to the leadership self-assessment. This team also includes a small number of performance assessors whose role is to assess each of our network members at least once a year. 

To help with monitoring these assessments, the team was using a tracker document in Google Sheets. The aim behind this tracker was to help plan resources, track how many assessments have been completed and for which charities, and analyse results from the assessments. 

As this tracker was also being used by several other teams, over time it became congested with multiple tabs, showing different pieces of information in different formats. In some cases, there were even multiple tabs that would show the same information but in different formats. It became evident that the tracker was losing focus and there was just too much information in different formats, not to mention that there were also broken formulas in some places too! It sounded like a job for the Data Science team... 

We collaborated extensively with the leadership self-assessment team and met with them to understand the problems they were facing and their data needs. Our task was to create a new, more consolidated and effective tracker tool that would enable its users to better plan resources and gain richer insights into the assessments. 

We discussed things such as

  • which is the best platform to use (Google Sheets or Tableau)?

  • which data is needed? 

  • which format should the data be presented in? 

  • how should the data be updated?

  • who should have edit access to the data?

Having these initial discussions really helped us to build the right tool for their needs. For example, the stakeholders had a stronger preference for Google Sheets, therefore we built the new tool in Google Sheets and chose not to use Tableau. 

We adopted an agile approach to building the tool. We didn’t just go away and build a complete tool in one go for them and then think ‘job done’. Instead, we built it in steps. We built a version 1, presented this to the team, received feedback and then built version 2, presented this to the team, received feedback and so forth. In each version, we kept on refining the tool based on the feedback, making it easier to use. 

We focused more on building clear visualizations as opposed to just building tables of data. We also focused on making the tool interactive, for example building features that would allow users to extract specific data for their own needs. We try to encourage this kind of ‘self service’ as much as possible.

In the end, after several iterations and meetings with the stakeholders, we had built a tracker tool in Google Sheets that could better serve the needs of the leadership self-assessment team. The tool has been created in such a way that it involves little to no effort in terms of maintenance as all the data is updated automatically. The tracker tool is linked to a Google Form that is completed by the performance assessors for recording scores from each of their assessments. Therefore, as each new response comes in, all the visuals and data summaries in the tracker are updated automatically (this is one great benefit for using Google Forms for automatically analysing form data in any way you want). 

The new tracker tool we created has received great feedback from the end users. They feel they have something more focused, and that it helps them to answer the questions of the data that they need to ask.

Connecting data to give us new insights

Sarah (Channel Reporting Manager)

Josh (Data Architect)

Jon (Data Analyst)

How we identified the opportunity

There have been a few new hires recently in the Data Science team which has helped increase our bandwidth and allowed us to take on additional work, like solving long-standing problems. The formation of the Channel Reporting team has resulted in improved reporting, and we found that in order to take it to the next level and better meet the needs of users across the organisation we needed a greater level of detail in the data about advisors. 

Alongside this, the team also has data architecture skills now, which resulted in a detailed system context map being drawn for how data moves between products and teams at Citizens Advice. We call it ‘the data landscape’. This map allows us to hone in on improvement opportunities, for example we discovered a reliance on spreadsheets for managing advisor data.

What did we discover

We’ve historically had decentralised management of advisor data. This has worked fine for reporting and analysis, but we’re always looking to improve. We didn’t have a unique way to identify the same advisors across various systems and it has created inconsistencies in how the data is structured. Having this data would mean that our users could get a better and more joined up view of the performance of their service. It would help show activity data across channels, rather than viewing each channel in relative isolation.

What we did

We needed to change how we viewed the relationship between an advisor and the systems they use. For this we created a conceptual data model for advisors, which helped show the commonality across systems and what we could use as a common identifier. We found a number of different systems in which advisor profiles existed. The one where most of our advisors could be found was in our user authentication product. Advisors use this product in order to access various systems and it provides the best opportunity to have a common identifier for an advisor. 

The majority of our channel systems are connected to the user authentication product. This allowed us to quickly map the common ID to the advisors in those systems. In a system that isn’t currently connected we used our data skills to get to 80% of mapping identifiers to all advisors by matching google sheets via various VLOOKUPs.

This left us with just 1,100 advisors in that system that didn’t have a direct login or name match to an ID. We knew that many of these advisors probably did have a central user authentication ID - but weren’t matching due to inconsistencies in naming conventions and data entry errors - for example an advisor with local login “London Jon L” might have an existing ID, but registered against “SELondon Jon L”, or “London JJon L”, or “London Jonathan L”. [1]

Rather than match all of these manually (which would have taken us about a minute per advisor, based on the advisors we did match by hand), we wrote a quick script to help us look for these loose matches that may have been caused by human error. Our script used local advisor metadata such as office location to narrow its search criteria to a few hundred possible IDs out of thousands, and then identified close matches using a fuzzy matching algorithm. 

Fuzzy matching is a simple technique for matching text that’s approximately but not quite the same. This is super useful for identifying close matches caused by typos and nickname variations, which is a pretty common problem faced by volunteering charities! We used a standard fuzzy matching package for Python, but the algorithm can be found implemented in most languages, and is even implemented as a standard formula in Excel.  For example, with the package we used, the typo “Josh Tedgett” matched up against “Josh Tredgett” with a similarity score of 0.97, and less immediately obvious mismatches caused by nicknames like “Becky Harlow” and “Rebecca Harlow” still returned a high similarity score of 0.77. [2]

While this method returned a fair amount of false positives, it was much faster to work off these loose matches and either confirm or discard the script’s matches than it was to match all 1,100 advisors by hand. After writing the script (which took about an hour) and running it, it took us about an hour to match 600 of the remaining 1100 advisors, and discard the rest as having no likely matches. At the original rate of about a minute per advisor, it would have taken us about 18 hours to work through the same list manually.

Once we’d completed the matching process, we still had the challenge of maintaining this data set going forward. How were we going to make sure new advisors had the ID assigned to them? This is where colleagues in Technology have stepped in. They proposed an automated solution via our internal ticketing system. This will take the effort away from our Operations colleagues to maintain the data. Instead, an orchestration tool will pick up the new user requests and assign the ID in the consolidated dataset. It will also allow us to monitor the quality of the process as it can pick up any errors that have occurred.

How will it work and what this helps with

Once this work is complete, we will be able to drill down to a greater level of detail in our data products. We recently set up a new report that analyses performance for local Citizen Advice charities across all channels (phone, chat, and email). This report gives a more joined up view than before, and with our improved data the next iteration will give our network members deeper insight. 

Things to think about in the future

We think the new process will provide better insight for our users and more robust data management. There are still some aspects to work on. Part of the reason why we were able to progress on this work is that we weren’t aiming for perfection. We focused on making something better now and not trying to create anything that was difficult to upgrade or replace in the future.

Thanks for reading. Feel free to get in touch with any of us on Twitter or LinkedIn.


[1] This is not a real example

[2] This is not a real example