In April, a team of developers, data scientists and user experience experts descended on the Centre for Humanitarian Data in The Hague to build a new tool. We are calling it ‘Data Check’ and it is the next installment in the HDX Tools domain. The previous tools include Quick Charts and HXL Tag Assist.

If you work with humanitarian data, it won’t be long before you receive a 3W activity list that you want to check for errors, or a long set of survey results that you need a little help cleaning and correcting. Data Check can speed up that process, just as spell check speeds up finding errors in written reports. Drop your humanitarian data into the tool and get a report back highlighting the potential errors.

The Data Check tool highlighting errors in a sample activity dataset.

How does this magic work? HXL!

The Humanitarian Exchange Language (HXL) is a simple standard for messy data. The HDX Tools domain uses HXL to improve data processing. Once you have added HXL hashtags to your dataset, Data Check will know what kind of information to expect in each column and can check it for errors.

The initial version of Data Check can perform several validations on your data, including:

  • Comparing administrative divisions code (#adm1+code) and place name (#adm1+name) to official sources (which are included in the COD Services).
  • Checking for consistent data types (string, number, date).
  • Identifying outlier values.
  • Comparing your own custom lists of acceptable values.
  • Checking for spelling consistency of the values in a column.
  • Identifying leading and trailing whitespaces.
  • Ensuring your data types are correct (#date, #affected, etc.).

You can submit your data for validation as CSV, XLS, or XLSX via file upload, or any publicly accessible URL such as Dropbox or Google Drive. In particular, syncing your local file to Dropbox and having Data Check access it there is a great way to work — Data Check will reflect your edits with a simple refresh of the page.

A couple other things to keep in mind: 1) You don’t need to be a registered user to access Data Check or any HDX Tools, and 2) You should not use Data Check to clean data with personal attributes or personally identifiable information.

We want to hear from you

This is just the beginning. We have many ideas for improvements and new validation types. Do you need help with cleaning merged cells? Or correcting your data directly in the Data Check tool? What about automatically detecting the type of data in use (3W, assessments, facilities lists, etc)?

Let us know by taking the following steps:

  • Add HXL hashtags to your data (get in touch if you need help!).
  • Use Data Check to clean your data.
  • Tell us what to improve or add next.

Learn more about Data Check features in the introductory slides below, or just try it!