What measures can I take to reduce the risk of re-identification of individuals and groups before publishing data?

Question

Alexandru Artimon · Accepted Answer

Data on the characteristics of units of a population (e.g. individuals, households or establishments) collected by a census, survey or experiment is referred to in statistics as ‘microdata’. In humanitarian response, this type of data is gathered through exercises such as household surveys, needs assessments, and other programme monitoring activities. Such data make up an increasingly significant volume of data in the humanitarian sector, and will play a key role in the COVID-19 response.

In its raw form, microdata can contain both personal data and non-personal data on a range of topics. Most humanitarian organisations acknowledge the sensitivity of personal data such as names, biometric data, or ID numbers and anonymise data sets accordingly as a matter of standard practice. However, it is often still possible to re-identify individual respondents or groups by combining answers to different questions, even after such ‘anonymisation’ is applied.

Depending on the type of data you’re managing, there are various tools available to determine and reduce the risk of re-identification in the data. For microdata, one such approach is Statistical Disclosure Control (SDC).

SDC is a technique used to assess and lower the risk of a person or organization being re-identified from the analysis of microdata (data on the characteristics of a population). The purpose of applying disclosure control to microdata is to be able to share the data more widely in a responsible manner. An SDC process can lower the risk of re-identification to an acceptable level but the risk threshold may vary depending on the context to which the data relates. There are a variety of free and open source tools available for conducting SDC, including sdcMicro. Read this guidance note from the Centre for Humanitarian Data for more information on how to start using SDC.

What measures can I take to reduce the risk of re-identification of individuals and groups before publishing data?

By Alexandru Artimon

Share