Metadata and Data Quality
| January 9, 2021
The HDX team manually reviews every dataset uploaded to the platform as part of a standard quality assurance (QA) process. This process exists to ensure compliance with the HDX Terms of Service, which prohibit the sharing of personal data. It also serves as a means to check different quality criteria, including the completeness of metadata, the relevance of the data to humanitarian action, and the integrity of the data file(s).
If an issue is found, the resource(s) requiring additional review will be temporarily unavailable for download and marked as 'under review' in the dataset page on the public HDX interface.
| January 9, 2021
No. HDX will never make changes to the data that has been shared. We do add tags(See the full list of approved tags), or make changes to dataset titles to help make your data more discoverable by HDX users. We may also add a data visualization for the data in the dataset showcase. A list of changes appears in the activity stream on the left-hand column of the dataset page.
| January 9, 2021
The green leaf symbol indicates that a dataset is up to date - that there has been an update to the data in the dataset (not the dataset metadata) within the expected update frequency plus some leeway. For more information on the expected update frequency metadata field and the number of days a dataset qualifies as being fresh, see here.
| January 9, 2021
This metadata field indicates how often you expect the data in your dataset to be updated. It should reflect the frequency with which you believe your data will change. This can be different from how often you check your data. It includes values like "Every day" and "Every year" as well as the following:
- Live - for datasets where updates are continuous and ongoing
- As needed - for datasets with an unpredictable, widely varying update frequency
- Never - for datasets with data that will never be changed
We recommend you choose the nearest less frequent regular value instead of "As needed" or "Never". This helps with our monitoring of data freshness. For example, if your data will be updated every 1-6 days, pick "Every week", or if every 2 to 9 weeks, choose "Every three months".
| January 9, 2021
Data quality is important to us, so we manually review every new dataset for relevance, timeliness, interpretability and comparability. We contact data contributors if we have any concerns or suggestions for improvement. You can learn more about our definition of the dimensions of data quality and our quality-assurance processes here.
| January 9, 2021
All data on HDX must include a minimum set of metadata fields. You can read our Guide to Metadata to learn more. We encourage data contributors to include as much metadata as possible to make their data easier to understand and use for analysis.