The Humanitarian Exchange Language is a simple data standard developed collaboratively by representatives from 15 humanitarian organisations and released in March 2016. HXL’s goal is to improve data quality, automation, and interoperability, without placing a major burden on the people and organisations who provide humanitarian data (see Simon Johnson’s blog post “How HXL is being used at the British Red Cross” for an example of these benefits in practice). The standard continues to be guided by an informal working group.

We recently analysed the 2,359 HXL data files available on the Humanitarian Data Exchange (HDX) to see how often each HXL hashtag, attribute, and combination appears, counting multiple columns in the same dataset separately. These datasets represent contributions since late 2014 by 13 different organisations, and cover 233 countries and other locations. The full results are in the publicly-readable Google Sheet “HXL hashtag stats on HDX.”

We hoped that knowing the hashtags and attributes that data contributors use most commonly would help us understand more about the humanitarian-data ecosystem, and the opportunities for interoperability and analysis.

The top five HXL hashtags were #affected (10,124×), #country (4,137×), #date (3,312×), #meta (metadata, 2,205×), and #loc (a specific location, such as a health facility or refugee camp, 1,572×).

The high frequency of the #affected hashtag suggests a strong focus on data that describes the human impact of a crisis. The focus on #country is not surprising, given that humanitarian organisations often organise themselves and their responses by country, but the absence of subnational geographical hashtags like #adm1 (an administrative level-one subdivision, such as a province or governorate) in the top five is a concern, and points to a need to ensure more granular, subnational data that can inform an international or local response, though the frequency of the #loc tag does point to geographically-specific data of a different type.

HXL attributes modify hashtags to make them more specific. The top five attributes were +f (applies to female people, 2,840×), +m (applies to male people, 2,840×), +children (applies to children, 2,132×), +total (represents a total number, 1,963×), and +origin (describes an affected group’s place of origin, 1,950×). The first three attributes all point to a focus on Sex- and Age-Disaggregated Data (SADD), especially for disambiguating numbers of people affected and in need of assistance.

The top five hashtag+attribute combinations are #country+origin (country of origin [for displaced people], 1,950×), #date+year (1,948×), #country+residence (current country of residence, 1,098×), #affected (with no attributes, 1,040×), and #country+asylum (country of asylum, 850×). These combinations reinforce the earlier evidence that much of the existing HXL-tagged humanitarian data focuses on the national level and an annual timeframe, which is useful for spotting long-term geopolitical trends, but often not granualar enough to support an aid response in progress.

Does your organisation have any data you would like to share with HXL hashtags added? Would you like to join the HXL conversation? Please join us on the HXL community mailing list, hxlproject@googlegroups.com.