We love building new tools and features to make the Humanitarian Data Exchange (HDX) more useful. However, as HDX has grown to now host over 17,000 datasets, we are mindful that we risk becoming a graveyard of broken features and stale data. As a result, we find ourselves more focused on the often overlooked and underappreciated task of maintenance. As the author Tim Robbins said, “There’s birth, there’s death and in between, there’s maintenance.”

A quick survey of the HDX team found that we spend 75% of our time on maintenance activities. When we talk about maintenance, we are focused on the following categories: 

  1. Datasets and data visualizations
  2. Technical infrastructure
  3. User requests 
  4. Internal processes 

Below, I provide an overview of what is involved in each area. 

1. Datasets and data visualizations

Datasets are the central “product” of HDX and they constitute the biggest overall investment in maintenance. This involves manual work to ensure that datasets shared by partners have complete and correct metadata. We also check to make sure that features like geopreview works on geospatial datasets and that quick charts are automatically generated when a dataset includes HXL hashtags. Sometimes we need to help a contributor change the format or schema of their data to make it more useful to users. 

Every time an organization adds a new dataset or updates an existing one, we check each resource to make sure that no personal data is included. At least a couple of times a month, assessment data is shared that does not include personal data but may have a risk of re-identification related to the location of the groups or people surveyed and their needs. These datasets need to be evaluated more closely and taken through a process for statistical disclosure control. In these cases, we contact the data contributor to get more information about the survey methodology. 

To keep up with this quality assurance (QA) workload, HDX’s Data Partnerships Team has a weekly duty roster in which one team member is responsible for evaluating all newly created or edited datasets and addressing issues that need to be resolved. This work has become complex enough that our programmers have developed a tracking system with a QA dashboard that everyone can follow. 

We have added additional maintenance work with the release of our Data Grid feature, which shows what core datasets are available and missing for a given crisis. Although we have automated some of the evaluations that determine if a dataset can be included in the Data Grid, such as checking if the dataset is up-to-date, many checks are subjective and require the attention of our Data Partnerships Team. 

Many of the visualizations you see on HDX are automatically updated based on the HXL hashtags in the datasets. In some cases, we develop custom visualizations and we need to make sure that the data stays up-to-date. We have also created a reminder that prompts us to update our extensive FAQs as the HDX platform evolves. 

2. Technical infrastructure

You have probably noticed that your phone and computer frequently download and install updates. Behind each update is the work of dozens or hundreds of coders, not just on the particular app that is upgraded, but across the many software libraries on which it relies. This complex web of dependencies, maintained by dispersed and disparate individuals and groups, enables faster software development by freeing the application’s programmers to concentrate on the code that makes their product unique. However, it can also resemble a house of cards in its fragility.

HDX is no different: we regularly update our code and the underlying infrastructure. Over the years since we launched HDX, we have made 638 updates to the platform, which is just one of many applications we maintain. Sometimes an update breaks something, necessitating a quick response from our software development team to determine the cause and implement a fix. More significant upgrades are planned and tested in advance on separate instances of HDX where we can identify problems and resolve them before releasing a new version of HDX. The upcoming January 2020 end-of-life for the Python 2 programming language will trigger this sort of upgrade once the community supporting CKAN, the platform on which HDX is built, delivers a Python 3-based version.

HDX also relies on the information systems of others. We have automated processes that pull data from other organizations’ databases, such as the World Bank, the Armed Conflict Location & Event Data Project, and the World Food Programme. If something changes in one of these systems, we have to adapt our code accordingly. 

3. User requests

There are several ways for users to communicate with us. Registered users can also communicate with each other (such as the contributor of a dataset or the administrator of an organization on HDX). We respond to all user requests and also make sure that registered users follow up with each other so that issues get resolved swiftly. This is especially true for the HDX Connect feature which allows organizations to just share a datasets’ metadata and registered users can then request access to the underlying data. We like to make sure these requests are taken care of. 

Whether we are contacted by humanitarians, journalists, academics or students, we remain on standby to reply with the desired information. 

4. Internal processes

All of the activities listed above have associated procedures that we try to document in detail. This helps ensure that different members of the HDX team are able to complete tasks in a consistent way. It also provides a way for work to continue smoothly if someone has to take over an unfamiliar task. Similarly with software, good documentation is invaluable to the person charged with modifying a piece of code they didn’t write or haven’t looked at in a long time.

All our procedures, documentation, and our software must be updated as our work evolves. We aim to maintain the right balance of storing institutional knowledge in our heads (which is efficient but fragile) and in documents (which is time consuming, but durable).

_ _ _ 

While it is tempting to concentrate on new features and challenges, the effort focused on upkeep makes for a better experience for our users, and, we hope, will lead to more use of data in humanitarian action. Is there somewhere on the site that requires more maintenance? Get in touch at hdx@un.org.