Dealing with messy data

 

Every month, our Nonprofit Datafolk Club gets together to share experiences and learning. It’s a chance for data folk working in or with nonprofits to network and discuss matters of mutual interest.

In November, we discussed ‘Dealing with messy data’. Our nonprofit data folk split off into small groups to discuss their most common quality issues, their go-to approaches for dealing with messy data, and how to prevent getting messy data in the first place.

Common data quality issues

One of the most prevalent issues with data quality is the entry of incorrect or inconsistent data into systems. For example, frontline staff entering data into a customer relationship management (CRM) system may face no restrictions on the fields, allowing them to input incorrect data formats. As well as consistency errors, folk mentioned both gaps in their data and duplication. Surveys often have missing information, and the phrasing of data entries varies – especially if free text boxes are offered.

Collecting data from vulnerable participants or those with limited capacity to provide information is challenging. It may not be appropriate to ask for certain information and this may lead to inconsistent and empty data.

Additionally, if you are collaborating with multiple partners, or using external datasets (such as from the government), it can be difficult to integrate data and processes.

And of course, sometimes the data is inherently messy! Information about people, or complex situations can mean that messiness is difficult to avoid.

Cleaning messy data

When cleaning messy data, artificial intelligence (AI) was mentioned as a tool that could be helpful – although the risk around confidentiality needs to be investigated. If data collection is digitised, you may be able to automate some data cleaning tasks. Another suggestion was to clean datasets using the same rules before trying to connect them.

Preventing messy data

There were a few key ways people suggested to address the problem of messy data at source.

Firstly, training for staff on data collection and why it’s important. It was suggested that this should be carried out during onboarding and also regularly for existing staff. Sharing positive feedback with frontline workers, demonstrating how data reflects their work and why it’s important can lead to better data input. Communication is key!

Building restrictions into data collection tools can help ensure data is collected in consistent ways. Other data validation techniques mentioned were mandatory fields and warnings or reminders for certain fields. One person also noted you should be careful not to be too restrictive, as you don’t want to stop people inputting data that could be useful.

Finally, pulling together a list of data gaps, or a data quality report was mentioned as a useful way to identify, visualise, and communicate the problem. It can also be used to get endorsement from management or leadership.

Finding and tackling your messy data

If you don’t know where to start with your messy data, you may be interested in our 'Mapping and evaluating data assets' service. Sometimes we do this as part of a data strategy, sometimes as a separate project. Either way it's about working out where all the data is, who's responsible, and what state it’s in. For many organisations, this is one of the first steps (along with a data maturity assessment) towards building a roadmap to get better with data. Find out more about our data support services and get in touch if you want to find out more.

About our Nonprofit Datafolk Club

Our Nonprofit Datafolk Club is a friendly group of like-minded data folk working in or with nonprofits. Every month we get together online for a free interactive workshop to share expertise, ask questions and discuss anything data-related.

What’s more, it’s free.

Previous attendees have said:

"What I like most about Nonprofit Datafolk Club is its relaxed and friendly nature."

"The format is good and the topics are always interesting."

"It was useful to speak to people from other organisations to hear what they are doing. Talking about the challenges we're facing here was helpful and I came away with a clearer understanding of our position."

Join the Nonprofit Datafolk Club

If you found this resource interesting, or if you have any curiosity in nonprofit data more generally, please come and join us at our next workshop. Each month has a different topic – you’ll find details on our events page. Previous topics have included:  

Tips for adopting new data tools

Statistics in nonprofits 

Data disasters and how to avoid them

Measuring impact in nonprofits

AI in nonprofits