Share

On 23 August 2007, Chris Messina posted this message to Twitter:

“how do you feel about using # (pound) for groups. As in #barcamp [msg]?”

Messina didn’t invent the idea of using simple tags to identify topics, but this tweet started a chain of events that eventually led to Twitter supporting hashtags officially in 2009, and marked the moment when information tagging broke out from small, specialised technical communities to seize the public’s imagination. Seven years later, hashtags have become one of the main ways we connect online, not only around events (#barcamp), but also around sports teams (#realmadrid), places (#nairobi), politics (#IndiaPoli), social advocacy (#CARCrisis), special interests (#knitting), and even shared jokes (#CatsSaveTigers).

Hashtags and spreadsheets

The members of the HXL Working Group have focused on stripping layer after layer of complexity from the proposed Humanitarian Exchange Language (HXL) standard, until we finally asked ourselves whether labeling information in shared data could be as simple as tagging topics is in social media posts. We realized that the core problem — helping machines understand how to classify information — is the same in both cases. Consider these two (made-up) tweets:

“When I can’t be in Manchester, los Blancos’ll do for me.”

“Tengo muchas ganas de ver Real esta noche.”

A sophisticated natural-language system might (or might not) be able to guess that these Tweets are both about the Spanish football team Real Madrid, but adding hashtags makes it obvious and simple:

“When I can’t be in Manchester, los Blancos’ll do for me. #realmadrid
“Tengo muchas ganas de ver #RealMadrid esta noche.”

Coordinating the response to a humanitarian crisis might seem a long way from tweeting about a football game, but we run into the same kind of problem. Consider these spreadsheet excerpts:

Activité Secteur Organisation Pays Admin niveau 1
Train teachers/other educational personnel in life skills and psycho-social support Education UNICEF Mali Gao
Monitoring internal and cross-border movements of people (disaggregated by sex and age data), including the return movements of IDPs and refugees, in partnership with the government Protection UNHCR Mali Segou
Implementor Region Cluster Project
Agronomes et Vétérinaires Sans Frontières Gao WASH Reinvigorate / put in place the structures for managements of water points / network
OXFAM Gao WASH Drinking water supply for the populations affected site

These two shortened spreadsheets describe the same kind of information, but — like the sample tweets earlier — do not use the same words to describe the topics. So why not add a hashtag to each column?

Activité Secteur Organisation Pays Admin niveau 1
#activity #sector #org #country #adm1
Train teachers/other educational personnel in life skills and psycho-social support Education UNICEF Mali Gao
Monitoring internal and cross-border movements of people (disaggregated by sex and age data), including the return movements of IDPs and refugees, in partnership with the government Protection UNHCR Mali Segou
Implementor Region Cluster Project
#org #adm1 #sector #activity
Agronomes et Vétérinaires Sans Frontières Gao WASH Reinvigorate / put in place the structures for managements of water points / network
OXFAM Gao WASH Drinking water supply for the populations affected site

With this small change, it’s now obvious how the columns in the two spreadsheets are related to each-other, even though they use different language and appear in a different order. Just as Twitter or Google+ can gather related postings that use the same hashtags, HXL-aware software can merge data from multiple sources, provide visualisations, validate, analyse, and summarise information.

The current list of standard HXL hashtags appears in the HXL tag dictionary. There are also some special conventions for applying tags to different types of data, described in the HXL tagging conventions.

What can you do with tagged data?

If we all start tagging our humanitarian data spreadsheets, what happens next?

On the simplest level, we can take a spreadsheet of humanitarian data from anywhere — without knowing anything about it — and extract information from it. For example, a software application can note that four different values, “Education”, “Food Security”, “Protection”, and “Water Sanitation & Hygiene” appear in the column tagged “#sector”, and because of the tag, the application can know that those values are the names of humanitarian sectors. Without any user intervention, the application can count how often each sector appears in the data and generate a chart, like the one illustrated above.

A HXL-aware application can also use some hashtags (e.g. those for geography) to filter other ones. For example, assume that a spreadsheet about refugee camps in Haiti has one column tagged “#adm3″ (administrative level 3), one column tagged “#adm4″ (administrative level 4), and one column tagged “#loctype” (location type). A HXL-aware application can use the #adm3 and #adm4 columns to filter the dataset, and then count only the number of different camp-like locations in the 2eme Varreux of Cité Soleil, as in the accompanying illustration.

Of course, all of these analytics could be custom-written for different situations, but the benefit of HXL is that by adding intelligence to spreadsheets using tags, you get some of this analysis for free, across the whole humanitarian community.

To see these examples and many more live, please visit the #HXL showcase (we’ll be adding more humanitarian datasets, visualisations, and analysis every week). The next step — and, perhaps, the most-valuable one — will be demonstrating how HXL tags allow data from multiple sources to be combined into a single common operating picture to help coordinate a crisis. Tags alone will take us part of the way there, but the next section describes some of new challenges that the HXL standards community will be addressing in 2015 and beyond.

Beyond tags

Adding simple hashtags to humanitarian data will bring huge benefits, but there will still be challenges for merging data from different sources. For example, if one spreadsheet has the value “WASH” under the column tagged “#sector”, and another spreadsheet has the value “Water Sanitation & Hygiene” under the same column, how can a HXL-aware application know that those are the same sectors when it merges the data? Are “United Nations Children’s Fund” and “UNICEF” the same organisation? Are “Ivory Coast” and “Côte d’Ivoire” different places? The answers to these questions are often obvious to humans, but not so to software.

HXL also defines hashtags for columns that contain unique, machine-readable codes: for example, while “#sector” refers to the name of a sector or cluster, “#sector_id” refers to a unique code for a sector or cluster. However, someone, somewhere, needs to define those codes, and the humanitarian community has to agree on their use.

There are initiatives outside the HXL community working on some of these problems. For example, the International Aid Transparency Initiative (IATI) maintains a large set of code lists and identifiers for development aid, the BRIDGE project aims to create a global registry of identifiers for aid organisations, and OCHA’s Common Operational Datasets (CODs) include geographical codes down to a very local level for many countries. In future years, the HXL community will work with these organisations (and many others) to get agreement on the common codes, identifiers, and taxonomies needed for fully-automated data sharing at a detailed level.

But that’s the future. For now, we are working with agencies such as UNHCR and IOM to introduce HXL hashtags into their data, and will soon provide more tools to help the larger humanitarian community create, manage, analyse, and visualise HXL-tagged data. If you’d like to experiment with tagging your own humanitarian data, please get in touch!

This blog post can also be found on the website of the Humanitarian Innovation Fund, which provides financial support for the HXL work.

×