Introduction to Metadata
Return to GEMINI 2.3 home page
These guidelines cover the basics of metadata for geospatial data resources and services. They are intended for general use in the UK geographic information environment and serve as a preamble to the more detailed and specific guidelines Metadata Guidelines for Geospatial Data Resources - Part 2. They are primarily concerned with geospatial data (i.e. that which references data to a location on the surface of the Earth), and which has a limited geographic extent (i.e. is restricted to a defined territory). Services which provide access to such data are also in scope. The Guidelines have been developed within the context of the capture and maintenance of metadata for INSPIRE using the UK’s interpretation of the INSPIRE implementing rules, referred to as UK GEMINI2. However, they are sufficiently broadly based to be applicable in a wider context of geospatial metadata creation and management.
The Guidelines are aimed at data managers and creators of metadata, providers of metadata services and general data users. They include some guidance on quality management such that they could be used in the context of a national metadata service. A UK discovery metadata service forms part of data.gov.uk. As well as the human usable portal, this has a machine readable OGC “Catalogue Services for the Web” interface (OGC Capabilities document link).
The Guidelines have been revised during 2018 in the light of other developments, including revisions to UK GEMINI2. Any comments on these Guidelines or on UK GEMINI2 should be sent to firstname.lastname@example.org.
These guidelines for the creation, maintenance and quality management of metadata for geospatial data resources, including the services which provide access to the resources, provide a general introduction to the principles and concepts of metadata. They are aimed at data managers and creators of metadata, providers of metadata services and general data users.
The data resources may be datasets, dataset series, services delivering geographic data, or any other information resource with a geospatial content. This includes datasets that relate to a limited geographic area. The data resources may be graphical or textual (tabular or free text), hardcopy or digital. Geospatial data is data containing a positional or locational element relative to the Earth. Many data resources that at first sight do not appear to be geospatial nevertheless do have a geospatial component, in that they apply to a limited geographic area, for example statistics for a local authority area. Geospatial data contains spatial references which may take the form of coordinates, for example in latitude and longitude, or references to geographic place names, for example street data.
Metadata is data about data. It provides additional information about the data resource, to enable it to be better understood and used to good effect.
In an organisation, metadata is required for both internal and external purposes. It can add significantly to the value of an organisation’s data holdings, and failure to create it at the appropriate time can lead to greater hidden costs later. Lack of knowledge about available data can lead to costly duplication of effort.
Within an organisation, metadata is required to organise and maintain the internal investment in data. As staff change, institutional knowledge leaves the organisation. Undocumented data can lose its value. Subsequent staff may have little understanding of the contents and use of the data and may find that they cannot trust results generated using this data.
Externally, metadata is required to provide information about an organisation’s data holdings and data access services. Data resources are a major national asset, and information of what data resources exist within different organisations, particularly in the public sector, is required to improve efficiencies and reduce data duplication. Data catalogues and data discovery services enable potential users to find, evaluate and use that data, thereby increasing its value. In addition, data received from an external source requires further information, supplied by metadata, to process and interpret it.
The requirements for metadata for internal and external purposes are different. Metadata for external purposes will be at a general level, providing basic information about the data resources. Having this in a standardised form enables metadata services to be set up for widespread use. Standardisation of this aspect is thus important, and most metadata standards are designed for this purpose. Metadata for internal purposes will be much more detailed. Internal standards for metadata will generally be extensions of those required for external purposes, and often will contain items particular to the organisation or business sector. Whilst the details of these Guidelines relate to external metadata, and in particular the UK GEMINI2 metadata set designed for discovery metadata services, the principles are equally applicable to internal metadata.
A glossary of terms is provided in here.
There are a range of uses for metadata, for discovery, for evaluation and for use:
- discovery: the user aims to find out what resources are available and whether these are able to satisfy their requirements. This is typically what a search engine can process, using basic search criteria to identify the available resources corresponding to the user requirements, and providing to the user basic metadata (name, content description, geographic area of applicability etc.) about the candidate resources.
- evaluation: the user needs to go deeper in the metadata (e.g. looking at the quality of the information) to ascertain whether a candidate resource is fit for the intended purpose.
- use: the user has selected a candidate resource, but needs to access it and to configure a system or software to process it.
Metadata services provide one of the fundamental parts of a Spatial Data Infrastructure (SDI). They are used within organisations as part of the information management facilities, and on a national basis for discovery purposes. Essentially, they work on the basis of a user defining parameters such as Topic Category and Extent, to carry out a search to discover data resources that might be suitable and return information about their source, content and availability.
A simple process model is presented at Figure 1 and shows the flow from metadata creation through to metadata query with quality related processes built-in. It can represent the processes within one organisation or split between separate organisations. The flow is idealised and could relate to any discovery-level metadata service.
Although metadata creation and maintenance are shown as separate operations, it cannot be over-emphasised that they will be far more successful if they are fully integrated into the other business processes of an organisation. Thus, when a dataset or other form of data resource is produced, it should be routinely documented as metadata as part of the production or distribution process. Further, when the data resource is updated or subject to some form of change, the metadata should also be updated.
If the main purpose of creating metadata is to document and enable discovery of an organisation’s own data resources and the exposure of all or part of that metadata to some external service is secondary, then there will be far greater chance of support and resources for metadata in the business.
However, if metadata is seen as an additional activity to support some external service, it is likely to be ignored or forgotten. When it is finally picked up, it may well be assigned to someone who has no knowledge of the data resource and no interest in the quality of the metadata.
Three generic roles in the metadata process can be recognised, these are illustrated above:
- Metadata creator – responsible for creating and maintaining the metadata and for its quality. Although the metadata creator is usually the data producer or distributor, this is not always so.
- Service provider – runs the metadata service, and may be part of the same organisation as the metadata creator or quite separate. A broad view of the role is taken here which may be made up of several actual roles in the real world - for example:
- a contractor hosting the service and providing IT support who is contracted to
- a service owner who is ultimately responsible for the quality of the service and has service level agreements (SLAs) with the contractor and metadata creators.
- Service user - the consumer of the service who selects the search criteria matching their requirements, performs the searches and finds data resources meeting their requirements or, at least, meriting further investigation.
There are many metadata standards in existence. These have been produced at different times by different bodies for different purposes. The main ones that are relevant to geospatial data resources are:
- ISO 19115
This describes all aspects of geospatial metadata and provides a comprehensive set of metadata elements. It is designed for electronic metadata services, and the elements are designed to be searchable wherever possible. It is widely used as the basis for geospatial metadata services. However, because of the large number of metadata elements and the complexity of its data model, it is difficult to implement.
The INSPIRE metadata Implementing Rules defines the minimum set of metadata elements necessary to comply with the INSPIRE Directive. In essence it is a profile of ISO 19115 for discovery purposes. It allows a variety of possible implementations.
- UK GEMINI
The UK Geospatial Metadata Interoperability Initiative (GEMINI) was produced originally through a collaboration between the Association for Geographic Information (AGI), the e-Government Unit (eGU) and the UK Data Archive. In a subsequent revision, it was made compatible with the core elements of ISO 19115. In its current version it is conformant with the INSPIRE Implementing Rules and Technical Guidance for metadata, and designed to meet the requirements of INSPIRE in a UK context. This latest revision has been sponsored by Defra and managed by AGI. UK GEMINI remains an AGI publication.
In common with INSPIRE, UK GEMINI remains an implementation of ISO 19115:2005, not the more recent version ISO 19115-1:2014.
- Dublin Core
This was originally developed by librarians for cataloguing information resources. It uses free-text fields, which makes automatic searching difficult. Consequently, it not ideally suited for discovery purposes using electronic data services. It is severely limited in its ability to handle the geospatial aspects of data, and also in how it handles the geographic extent of non-geographic data, e.g. data that applies to one country or region rather than another.
However, Dublin Core provides the basic vocabulary used in W3C’s Data Catalogue Vocabulary (DCAT), on which the European Commission has based their GeoDCAT Application Profile.
At present, GEMINI makes no direct use of DCAT or GeoDCAT, but it does share some of the Dublin Core vocabulary.
Formal references for these standards and implementing rules are given in the Bibliography.
This section describes some general high-level rules that apply to the creation of metadata. Whilst primarily applying to UK GEMINI2, they are also relevant to other metadata specifications.
The aim of a discovery metadata such as specified in UK GEMINI2 is to define metadata in a form that provides easily understood information for potential users of the data resource, and is searchable in a computerised discovery metadata service such as the INSPIRE Geoportal. It is difficult to carry out searches on free text, since the same thing can be written in different ways, for example “OS”, “Ordnance Survey”, “The Ordnance Survey of Great Britain”, “OSGB”. Hence, free text entries are discouraged where the metadata is searchable. Where only a limited number of options is available, code lists are specified. For some metadata elements, such as Topic, these may not correspond exactly to the theme of the dataset, and the nearest option should be chosen. Where code lists are used, input and output facilities in a metadata service should identify the corresponding element value.
As with other data specifications, elements in UK GEMINI2 are given as mandatory, conditional or optional. Mandatory elements must be completed, and most software tools do not allow the metadata to be published if any mandatory elements are missing. Conditional elements have a condition associated with them, and should be supplied when that condition is fulfilled. Such conditions usually define the applicability of the element. Optional elements need not always be completed. This can be for several reasons, for example if the element is not relevant or its value is not known. In practice, optional elements are conditional, and there is a set of circumstances in which a value should be given, roughly corresponding to “is it relevant?” and “is its value known?” Optional elements should not be ignored.
Each metadata element has a range of allowable values, called its domain. These may be values in a given range (e.g. positive integers) or a set of code values chosen from a list. Specifying the domain helps with both the initial creation of data and quality checking [For further details on quality checking, see Part 3 of these Guidelines]. In some cases, default values may be used, for example for elements relating to the metadata itself.
By definition, geographic data contains some form of spatial reference. The spatial reference identifies the position of the object of interest in the real world. Spatial references can be given in a number of forms, not just as Grid References. They can take the form of the name or identifier of a geographic location which can be described in a gazetteer. Examples are property addresses, postcodes and census areas. These spatial references are a key means of searching the data by location, not only within the data resource, but also positioning the data resource in the world (e.g. data for Scotland). A consistent set of spatial references enables spatial searches to be made for datasets in a metadata service.
Metadata can contain a multitude of dates identifying the different stages in the life cycle of the data. Key dates are included in metadata. To enable searches to be carried out (e.g. date relating to the period 1990 to 1999), these dates need to be recorded in standardised form. Unfortunately, different standards are used in different places. Here it is recommended that the extended format defined in ISO 8601 (YYYY-MM-DD) is used.
The purpose of this section is to provide guidance on
(i) what types of geospatial data can be documented using UK GEMINI2,
(ii) the applicability of UK GEMINI2 and
(iii) how to find an appropriate level for the individual documentation of data resources.
UK GEMINI2 “specifies a set of metadata elements for describing geographic data resources” but provides no other information about what types of data or services are in or out of scope.
In practice it can be difficult to decide what data or service resources are in scope and how they should be documented. For example, what constitutes a suitable body of data for individual documentation? Should it be an individual map or the whole map series?
There is no simple answer to these questions; it is likely to be a compromise. However, some general guidance can be given.
The general nature of geospatial data and access services is described in the Introduction. As is emphasised there, many data resources that at first sight do not appear to be geospatial nevertheless have a geospatial component. They are geographically constrained in some way, in that the data only refers to certain areas or locations on the surface of the Earth, whether on land or sea. The way that these locations are referenced can take many forms such as coordinates (e.g. latitude and longitude, National Grid), or geographic place names (e.g. street, locality, town, administrative area).
Although it is common to think of geospatial data as being synonymous with maps, this is far from the case. Geospatial data can equally well be textual - whether tabular or free text – or images taken from the air or from the ground. The data can be as hardcopy or in digital or even video form.
All these types can be documented using UK GEMINI2.
The primary purpose of UK GEMINI2 is to provide the requirements for documenting data resources within the United Kingdom that conforms with the INSPIRE Metadata Implementing Rules.
There are no absolute rules for deciding on an appropriate level for the individual documentation of a data resource. The overriding consideration is that the data resource has been documented with sufficient granularity to yield a useful result if discovered using a metadata service. Too coarse a granularity will result in too generalised a result, too fine a granularity is likely to overwhelm the user (and the metadata creator!). The granularity can relate to geographical extent, temporal extent or subject.
There are some questions that can be posed which may help the metadata creator to find an answer.
- How is the data resource used and how is it made available? Is it a product, dataset, document that may be used and combined with other datasets or is it an integral part of a larger data resource?
- Has the data resource been captured using a single data specification? Are there other data resources captured using the same data specification?
- Is the data resource part of a time series? Is the data resource covering the same extent periodically updated to the same specification?
- Does the data resource relate to, or reference, a continuous area or contiguous areas, or does it reference specific locations?
- Does the data resource relate to one or many subjects, topics or themes?
An approach to resolving these questions is found in the table below.
The key points are:
- Finding the appropriate level will be a compromise between what appears to be fit for purpose for the user and the granularity which can be supported by the metadata creator.
- If the metadata creator has to aggregate what is documented for practical reasons then they should ensure that the resource is adequately documented in terms of Extent, Dataset reference date and Topic category. There should also an Additional Information Source included so that the service user can understand the finer granularity lying behind the single entry.
Nature of data resource
How to document
Stand-alone product or identifiable dataset or document
Individually with one metadata record.
Products or identifiable datasets or documents forming a series with a common specification;
Together with one metadata record.
Products or identifiable datasets or documents forming a series with a common specification;
Each location or area individually with a metadata record.
Last updated: July 2018
This work is licensed under a Creative Commons Attribution 4.0 International License