Data from household surveys, needs assessments and other forms of microdata make up an increasingly significant volume of data in the humanitarian sector. This type of data is critical to determining the needs and perspectives of people affected by crises but it also presents unique risks. Understanding how to assess and manage the sensitivity of this data is essential to ensuring its safe, ethical and effective use in different response contexts.

In 2019, the Centre conducted user research to better understand how humanitarian organisations manage and share microdata, exploring both their internal processes and their experience sharing data on HDX. We found that organisations collecting microdata do not always have the in-house expertise to assess the disclosure risk of this type of data or to use methods, such as Statistical Disclosure Control, to minimise those risks. Without a support system in place to facilitate data anonymisation, organisations often tend to not share this valuable data.

In this blog, we summarize developments in three areas that we hope will help improve how our team and our partners manage microdata more responsibly:

  1. Improvements to the HDX data contributor flow
  2. A new ‘learning path’ on disclosure risk assessment
  3. Documentation on tools and methods for sensitive data management

1. Improvements to the HDX contributor flow

Earlier this year, we announced our plans to improve the management of sensitive data on HDX. We have since conducted user research with our collaborators at Oblo Design to explore possible solutions with our community.

Since HDX was created in 2014, our team has manually reviewed every dataset uploaded to the platform as part of a standard quality assurance (QA) process. This has included basic checks on the sensitivity of each resource (a data file). In the past, when we found microdata, we would run a disclosure risk assessment on the resource and inform the data contributor if we detected a potential disclosure risk.

During our research, we found that users were either unaware of this process or confused and frustrated because the work we were doing was not adequately communicated and explained. As part of our improved process, when a new resource is added to HDX, the contributor will be asked whether the data could contain personally identifiable information (PII) or microdata and then they are taken through the following steps:

  • If the contributor selects PII, then a message will appear saying that the dataset cannot be uploaded since this type of data is not allowed on HDX.
  • If the contributor selects microdata, then a message will appear explaining that the dataset will be placed under review while the HDX team conducts a disclosure risk assessment. This review will take place within 24 hours and the results will be shared by email.

We have created a new page documenting the HDX QA Process to help our users better understand the actions we take to support responsible data exchange.

Responsible Data Sharing on HDX banner
Our new page introducing the HDX QA Process.

2. A new learning path on disclosure risk assessment

Our team will continue to assess the disclosure risk of any microdata shared on HDX but those closest to the data collection process will always have the most in-depth knowledge of the context and are therefore best placed to assess the disclosure risk. Nevertheless, the breadth of our experience in assessing humanitarian microdata has provided us with valuable insight that we want to share.

We have developed an introduction to disclosure risk assessment that includes a step-by-step guide, a series of short videos and links to additional resources. Soon we will add a technical tutorial to walk users through the process of conducting a risk assessment and applying disclosure control techniques using the sdcMicro package in R.

The goal of these resources is to provide an accessible on-ramp for learning more about what can be a dense subject and to build a common understanding of their application in humanitarian response.

3. Documentation on tools and methods for sensitive data management

We have published documentation on the tools and methods that we use for statistical disclosure control on HDX and data loss prevention (automated screening for sensitive information) on HDX. We will continue to update this documentation based on what we learn and will explore additional topics of interest based on demand from our community.

Thanks and next steps

We would like to acknowledge the support from the Directorate-General for European Civil Protection and Humanitarian Aid Operations, the Government of the Netherlands, and the United Kingdom Foreign, Commonwealth and Development Office’s COVIDAction programme. We would also like to thank our partners at Ground Truth Solutions, IOM, JIPS, MSF, OHCHR, REACH, UNESCO, UNHCR, UNICEF, and WHO for participating in the user research that informed this work.

To learn more about how the Centre and our partners are working together to advance safe, ethical, and effective data management, join us on December 17th for a High-Level Event on Data Responsibility in Humanitarian Action.

Let us know if you have questions or would like to share your own experience in managing sensitive data responsibly by contacting us at