Citizens Advice Data Science team month-notes #1 (July 2021)

We wanted to start blogging about the work we do. We intend to write a post each month with contributions from various members of the team.

As a team we believe in continuous improvement, building capability, and focusing on meeting data users’ needs. The opportunities we have for improving the way we work with data at Citizens Advice aren’t unique to our organisation. It’d be great if the practical approaches we take are useful to others.

Enabling self-service for users

Sarah (Channel Reporting Manager)

The Citizens Advice service in England and Wales is made up of the National organisation and over 260 independent local Citizens Advice charities. Our work as the National charity involves helping to coordinate the efforts of the local charities so they can provide national level services as well as pooling expertise, tools, and insight.

One example of this collaboration can be seen through our efforts to help manage Adviceline, our national phone service for England. The Data Science team currently provides an Adviceline Tableau dashboard to help local charities understand call volume and other performance and demand data. However, over 230 local Citizens Advice provide the service and they have a wide variety of reporting requirements. It’s impractical for the Channel Reporting team (currently 2 people) to provide over 230 unique daily reports. The Adviceline dashboard we provide is insufficient to meet the individual needs of local charities, even if we do our best to respond to user requests.

So we decided to enable Tableau web editing, allowing all users to edit dashboards as they see fit. This will help to democratise data, putting additional power in local offices’ hands, whilst freeing up Data Science team capacity and allowing us to provide more advanced models and better quality data and reporting for other channels like web chat and email.

Two potential barriers to successful implementation were the impact on load time for reports due to server capacity, and wanting the new features to be accessible to staff in local charities. To tackle the first problem we sent surveys to help estimate demand for the feature from local charities and also estimate the maximum concurrent Tableau users. We then used this information to simulate the potential impact on server load time. To tackle data skills we will work with our Facilities team to deliver training via live sessions. These sessions will be recorded and later published on our internal learning platform to allow participants to revisit information.

Tableau web editing gives us an opportunity to facilitate the local charities’ data needs like never before. There’s also the opportunity to build skills in the local Citizens Advice network so they can use our trusted data to provide high quality reporting that meets their individual needs. We’ll continue to monitor the implementation to measure impact and look for further ways to improve reporting across the network.

Automating away the busywork - reducing a 1 - 3 hour daily process into 10 minutes of automation

Jon (Data Analyst)

I'm part of the Channel Reporting Team. Among other things, we provide data and reporting for remote help services across phone, webchat and email. By making this information readily available, the 260+ local offices in our network can access the data required for deploying their staff and volunteers as effectively as possible.

Sarah wrote about our Adviceline reporting, and we also provide reporting for the Welsh Advicelink service. There's always been a pressing need for this kind of daily reporting - especially with the pandemic causing a historic rise in people seeking help through remote services - so we can understand how well our organisation is meeting clients’ needs.

We want to be as efficient as possible in this work, and make sure that our data is trusted. We recently made some improvements to how we provide this data.

Our reporting workflow was previously very complicated due to a number of factors:

Access to even basic data from our telephony system requires manual effort.
Operational adjustments were (and still are) made everyday to try and adapt to changing conditions - and our reporting needed to reflect that. The way in which the team had to deliver their reporting changed multiple times during the pandemic, making it difficult to ratify and streamline a single process for daily reporting.
With so many different teams and offices having input on the reporting, our reports needed to pull in information maintained by many different stakeholders across several disparate sources. These sources include various SQL servers, a bunch of different google sheets, the aforementioned manual effort for data access, and daily call record CSVs sent to an email account.

As a result of these constantly shifting conditions and ad hoc adjustments, we ended up with a highly manual process that was difficult to run, and error prone if you didn’t have the requisite skills and knowledge.

When everything went right, the process would take about 45 minutes to complete every day. If there were complications the process would take anywhere from 1 to 3 hours to execute.

With almost a hundred different manual steps along the way the process was error prone. If you made a mistake at any point during the process, you had to catch it immediately or face restarting from scratch. If you didn’t catch an error it could take an entire day to roll back.

For a process producing a daily report with a noon deadline, this was clearly unacceptable. We knew we had to make it easier on ourselves, and reduce the possibility of errors along the way.

Using some simple Python scripting and Google Apps scripting we managed to distill the sequence into a largely automated process that takes ten minutes a day. Here's some of the things we did to achieve this:

We built a Selenium Browser Automation script that crawled the telephone records we needed, and dumped them straight into our python scripts for ingest.
We used Google Apps scripts to automate sourcing the data we needed from Google Sheets, and using pandas to perform the necessary lookups and data manipulation.
We used Pyodbc to give our Python scripts access to the SQL servers they needed to both draw data from and update.
We wrote a series of unit tests ensuring that the processes executed by the script all pass basic common-sense checks and don't contradict our knowledge about our telephone systems’ operational setup. In addition to these unit tests, the script still asks for a human to sign off on the changes it's about to make - while this means the script isn't fully automated, it means that we're aware of any recent changes made by the process, and can catch stuff that simple common-sense logic tests can't.

Pulling these changes into a Python prototype took about a day, but iterating and readying the script into a production-ready state took the better part of a week. The unit tests themselves were written iteratively over several weeks, and the scripts still get tweaks whenever we see room for improvement. The end result is that we no longer have to dread wrestling with our reporting processes in the morning, and we save hours of development time every week which we can spend on higher value work.

User feedback is invaluable thought not always easy to get - data survey

Josh (Data Architect)

Throughout this month the data team has been running a survey to understand what it’s like for staff to use data at Citizens Advice. We used Google Forms.

The survey is a way to gain insight into how accessible our data is, the level of data capability across the organisation, and which data products are currently helping people with decision making. It can be a challenge to get people to participate in surveys. Thankfully we had support from executives and our internal communications team which gave us the response rate we aimed for.

We’ve now been able to calculate baseline metrics that give us an indication of how easy our data is to find, how confident people feel using data, and how often data is used in decision making. Of these three things, finding data seems to be the biggest pain point. Though this isn’t unique to Citizens Advice it is a challenge for us to solve. The survey also has a lot of insightful qualitative data that we’ll be digging into next month and following up with more focused research on specific areas.

Overall it’s been a great way to collect thoughts about our data from across the organisation. It’s also providing an indication of the areas we should be focusing which will make the experience of using data better for all. We will repeat the survey quarterly so that we can measure the impact of the improvements we make. To encourage the same level of participation as in this round of the survey we’ll be demonstrating how we’ve acted on the feedback to make a meaningful difference. We intend to extend the survey to the network of local Citizens Advice as well.

Building the team

Dan (Head of Data Science)

I love recruitment. Having the opportunity to build and lead a team is a privilege. This month we’ve had a new starter Rahul who is a Web Analyst working in the multidisciplinary Content Platform team. This is a model that I want to establish when it’s the best fit for the work. Having a dedicated data specialist in a product team is much better for meeting that product- and team’s data needs than being at arm’s length.

Rahul is being inducted remotely and I think that’s more challenging than being in person. On the other hand, I definitely prefer interviewing remotely. Maybe there’s something about it being a leveller, balancing out the dynamic and making it less intimidating than having to visit somebody’s office. Maybe a child will get locked out of somebody’s house in their pyjamas and it’ll be alright to rescue them midway through the call.

Did I say somebody’s house? Ok I meant my house.

We had a successful interview campaign for a new Data Science Lead this month. I really enjoyed interviewing with Josh, and with Maz (Director of Content and Expert Advice). I have definitely learned from others in the past year and improved my interviewing practice. There have been small things, like introducing yourself with your pronouns, or pasting the text of each question into the video chat function. There have been larger things, like putting the effort into producing a scoring guide for each question so the panel is working from a shared, explicit understanding. Writing the scoring guide takes me considerably longer than the questions. Finally there are things that mean you’re being explicit about your organisation’s values with candidates. I’ve included a question about equity, diversity and inclusion in interviews for a while but I’ve never been completely happy with how it was phrased. For this campaign Maz introduced a new formulation that was a big improvement.

Thanks for reading. Feel free to get in touch with any of us on Twitter or LinkedIn.

Interesting links

https://moderndatastack.substack.com/p/the-power-of-the-written-word
http://worrydream.com/LadderOfAbstraction/ - A demonstration of how better interactive visualizations make decision-making and design faster and easier.
https://explosion.ai/blog/applied-nlp-thinking - An excellent article from Ines Montani about applied NLP - how do we make the jump from a business problem to a production-ready NLP solution?
https://data-collective.org.uk/2021/07/21/leadership-data/ - Why we need leadership on data in the voluntary sector, and how you can get involved