Return to GEMINI 2.3 home page
The purpose of these guidelines is to explain, with the aid of examples, how to encode UK GEMINI metadata using XSD schemas of ISO / TC 211. Examples are in the form of fragments of XML.
UK GEMINI metadata may be for a dataset, dataset series or a service. The encoding of all types is covered.
Outside the scope of this document is the description of GEMINI2 metadata items, their content, obligation and meaning. Readers seeking this information should consult the GEMINI2 standard .
It is assumed that readers will be familiar with XML. Readers who require background information are referred to the W3Schools introduction to XML:
Readers requiring an introduction to XML Schemas are referred to the W3Schools XML Schema tutorial:
A Glossary is provided here.
The core section, Encoding Guidelines, is split into four principal sections
There is some overlap, and therefore duplication, between the sections but it is felt that this approach is best in that it clearly indicates the requirements for each type of metadata.
The 'Metadata for datasets and dataset series' and 'Metadata for services' sections list metadata items from GEMINI2 in the order that they appear in GEMINI2. XML elements in an XML document must follow the order expressed in the XSD schema to which the XML document conforms. The order of XML element expressed by the XSD schema will not be the same as the order of metadata items in GEMINI2.
Examples are provided by way of XML fragments throughout the document. Full metadata instances are shown here:
This document contains examples of XML encoding. The examples are considered fragments in that they are not complete XML documents. The following conventions are used:
Figure 1 shows an example of an XML fragment. Note that it starts with the XML element gmd:MD_Metadata. The next XML element in order is gmd:fileIdentifier and its start-tag is on the next line and is tabbed in. The following line is the content of gmd:fileIdentifier.
An ellipsis follows the end tag of the XML element gmd:fileIdentifier indicating that other content is missing
Also deliberately omitted from XML fragments is:
In the example in Figure 1 CharacterString is an XML element in the namespace gco. It has the start-tag
<gco:CharacterString> and the end-tag
</gco:CharacterString>. The string 98e25be5-388d-4be3-bc5f-ba07ef6009b2 is the element’s content. The CharacterString element forms the content of another element: fileIdentifier. Its start-tag is
<gmd:fileIdentifier> and its end-tag is
</gmd:fileIdentifier>. In the example below
codeSpace are XML attributes. XML attributes are encoded in the start-tag of an element with the form
Note that there is no reason in practice to tab in XML and to present each XML element on a new line other than to aid humans in reading raw XML. XML parsers, on the other hand, would have no problem reading the XML were it encoded without carriage returns as shown in Figure 3.
The UK Location Information Infrastructure will accept any valid XML document that conforms to these guidelines. This includes canonical XML encodings  and files laid out with additional white space for human readability, and other variants in between. Similarly, XML attribute values could be delimited using single or double quotes.
The schemas defining the structure of GEMINI2 metadata instances must implement:
Several schema sets have been identified as meeting these requirements:
Both 1 & 2 import the same dataset schemas, which are authoritatively at the ISO TC211 resource site: https://schemas.isotc211.org/schemas/19139/
The schema files that shall be used for validating GEMINI2 metadata instances are:
The AP-ISO schema files given above import both of these, so a validator does not need to determine what type of record it is validating.
With schema set 3, the dataset and service schemas have separate entry points: gmx/gmx.xsd and srv/srv.xsd
The XML schemaLocation property officially provides only a ‘hint’ to the validator, but many validators do use the schemas at that location.
The INSPIRE Geoportal Harvesting Checker at http://inspire-geoportal.ec.europa.eu/validator2/ emulates the checks performed by the INSPIRE geoportal during its harvesting process, including schema validation.
The INSPIRE Validator at http://inspire-sandbox.jrc.ec.europa.eu/validator/ includes metadata validation. Currently the schema validation uses the schemas hinted at in the schemaLocation property of the file.
Metadata instances are XML documents. XML documents should, but do not have to, begin with an XML declaration. If a metadata instance has an XML declaration then it must be the first line in the document. It must not be preceded by anything else, other than an invisible Unicode byte-order mark.
Figure 4 shows an XML declaration. The version attribute must always have the value 1.0. The encoding attribute is optional. Its value specifies which character set is in use in the document. By default (i.e. if the encoding attribute is omitted) XML documents are assumed to be encoded in the UTF-8 encoding of the Unicode character set. Care should be taken when using text editing software to edit XML, or writing XML using bespoke software code, that the XML’s actual physical encoding conforms with the encoding stated with this attribute. It is expected that the UTF-8 character set will be sufficient in nearly all cases.
An XML declaration may include a “standalone” attribute. However, this attribute is only relevant if an XML document is using a DTD. Metadata instances of GEMINI2 shall not use a DTD so it is out of scope.
The root element of a GEMINI2 metadata instance shall be gmd:MD_Metadata. The root element shall contain namespace references to, at least, gmd, gco, gml and xlink. Metadata for services shall, in addition, contain a namespace reference to srv. In addition reference may be made to the gmx namespace if XML elements such as gmx:Anchor are used.
An example is shown in Figure 5. Subsequent examples omit the namespace references for brevity. An ellipsis is used to indicate that required content has been omitted.
The namespace identifier for gmd shall be: http://www.isotc211.org/2005/gmd
The namespace identifier for gco shall be: http://www.isotc211.org/2005/gco
The namespace identifier for srv shall be: http://www.isotc211.org/2005/srv
The namespace identifier for gmx shall be: http://www.isotc211.org/2005/gmx
The namespace identifier for xlink shall be: http://www.w3.org/1999/xlink
The namespace identifier for gml shall be: http://www.opengis.net/gml/3.2
Note that http://www.opengis.net/gml/3.2 refers to GML version 3.2.1 not GML version 3.2.0.
The root element, and in fact any element in an XML instance, may have an attribute called xsi:schemaLocation which contains a value or set of values hinting at the physical location of schemas which may be used for validation. Since this attribute provides only a hint, validating parsers are allowed to ignore it and use other means of locating the relevant schemas.
Figure 6 shows a root element containing an xsi:schemaLocation attribute. Here the schemas referenced are in the INSPIRE Metadata XSD repository.
Since the xsi:schemaLocation attribute exists in the xsi namespace, this namespace must be referenced. The xsi:schemaLocation attribute contains a pair of space separated values when one schema is identified. The first value specifies the namespace and the second value specifies the schema to use to validate elements in that namespace. When more than one schema is identified, as would be the case for validating a service metadata instance, the attribute contains a space separated sequence of namespace / schema pairs. The xsi:schemaLocation attribute is not required in a GEMINI2 metadata instance.
Dates and date-time shall be expressed in the Gregorian calendar and UTC as per ISO 8601. The formatting shall be as follows, in order of increasing precision:
The ISO 8601 encoding also allows negative dates to represent BC. However, gco:Date and gco:DateTime XML elements do not accept negative values.
The GEMINI2 standard states that temporal extents may be given with as coarse a granularity as century (e.g. yy or 19). However, unfortunately this cannot be encoded in ISO 19139 XML and will result in a schema validation error. The lowest level of granularity allowable is the year.
XML elements in a metadata instance must follow the order in which the elements are defined in an XSD schema. Failure to do so will result in schema validation errors. The order of XML elements and their corresponding GEMINI2 metadata items is shown here.
Some metadata items, such as alternative title, have cardinalities for more than one. This means that more than one instance of the item can be encoded in metadata instances. The general approach in ISO 19139 XML is that an XML element expressing the property, in Figure7 gmd:alternateTitle, contains an XML element which expresses the data type and contains the value, in this case gco:CharacterString. Note that more than one alternative title is expressed by repeating the gmd:alternateTitle XML element, not the gco:CharacterString XML element (shown in an invalid example in Figure 8). This pattern is followed throughout ISO 19139 XML including for XML elements that have complex content, such as gmd:identificationInfo (Figure 9).
The first XML child element of any GEMINI2 metadata instance shall be gmd:fileIdentifier. The content of this XML element is the identifier of the metadata instance. File identifier is not to be confused with the metadata item Resource Identifier.
The content of the XML element shall be a unique managed identifier, such as a system generated UUID. Once the identifier has been set for a metadata instance it shall not change.
External resources, such as publications, controlled vocabularies, are expressed using the ISO 19115 class CI_Citation and its XML element instance, gmd:CI_Citation. This is a common structure that is used to encode:
A citation must include at least a title, a date and a date type. Figure 11 shows the citation structure used to encode information about the GEMET Concepts dictionary.
In any one citation there may be more than one date. However, there shall be only one date with a date type of ‘creation’ and there shall be only one date with type 'revision'.
Addresses are expressed using the ISO 19115 class CI_ResponsibleParty and its XML element instance, gmd:CI_ResponsibleParty. This is a common structure that is used to encode:
In the context of GEMINI2 a responsible party set shall include at least the organisation name (encoded using gmd:organisationName), an email address (encoded using gmd:electronicMailAddress) and a role (encoded using gmd:role).
The XML element role takes values from the ISO 19115 codelist CI_RoleCode. Any value in the code list may be chosen.
Additionally, the contact position (encoded using gmd:positionName), the postal address (encoded using a combination of gmd:deliveryPoint, gmd:city, gmd:administrativeArea, gmd:postalCode and gmd:country), telephone number (encoded using gmd:voice) and facsimile number (encoded using gmd:facsimile) may be provided.
Where a sub-item takes its value from a code list, which may or may not be expressed in ISO 19115, the source code list catalogue and code list value shall be expressed using the attributes gmd:codeList and gmd:codeListValue respectively.
Figure 13 shows the encoding where a code list is specified in ISO 19115. The value of the codeList attribute should be the URL for the ISO 19115 code list catalogue that is published on the ISO website:
Plus a hash character acting as a delimiter, and then the identifier of the code list, in this case ‘MD_ScopeCode’, that contains the code list value that is used. This information could be used to validate the code list value and ensure that it is a member of the code list.
The value of the code list value attribute (gmd:codeListValue) shall be a valid entry from the specified code list dictionary.
The element value (i.e. in Figure 13
<gmd:MD_ScopeCode ...>dataset</gmd:MD_ScopeCode>) is human readable text. It can be omitted or given a value different from that of the attribute codeListValue (e.g. Dataset). Developers of GEMINI aware applications should note that reliance should not be placed on the element value of code list elements but rather on the value of the attribute gmd:codeListValue.
Figure 14 shows a fragment of the code list catalogue with the entries of MD_ScopeCode that are relevant to GEMINI2 metadata.
The ISO 19139 XML schemas provide a means for indicating that the contents of an element may be unknown or withheld, through the use of the gco:nilReason attribute. This attribute can be added to any element in the gmd namespace. It can take the following values:
Empty XML elements (see Figure 15) are not permitted in ISO 19139 metadata instances. Although this is not checked by the “Table A” schematron rules in use in UK Location, those creating metadata records should avoid creating empty XML elements if at all possible. If an optional element is not required, don’t include it; if a mandatory element is not available use gco:nilReason.
The following metadata items shall not be nillable:
The content of a metadata instance may be expressed by value or by reference. By value means that the metadata instance carries all the necessary information. By reference means that a metadata instance indicates that content is to be found in an external repository or another place within the same instance. The by reference case is supported by the object reference (gco:ObjectReference) attribute group. This provides two mechanisms for referencing remote resources:
Figure 16 shows the use of the XLink href attribute to specify a vertical CRS by reference to the EPSG Geodetic parameter dataset while Figure 17 shows the same information encoded by value (note however, that in this case the domain of validity (gml:domainOfValidity), vertical coordinate system (gml:verticalCS) and vertical datum (gml:verticalDatum) are themselves encoded by reference).
Encoding information by reference is clearly advantageous in the sense that it is more efficient (in terms of file size but also avoiding data duplication) than by value. However, it presupposes that an XML software application will ‘know’ how to dereference the reference. Dereferencing is the act of obtaining the externally referenced information. It is also important that the referenced information is universally available in a structured machine readable form so that it can be incorporated by value. In the case of the examples below the EPSG web service endpoint can be used to dereference the EPSG URN to return the GML encoded vertical CRS. The GML can be directly incorporated in an XML metadata instance, where the metadata element accepts a GML value (noting that there will be a difference in the GML namespace identifier – EPSG returning GML 3.1.1 while metadata instances shall identify the GML 3.2.1 namespace – in the case of the CRS XML elements of GML there is no difference between these versions of GML).
Typically, by reference shall be used for identifying the vertical CRS of a vertical extent and the implementation of coupled resource (following INSPIRE guidelines) alone. The XLink mechanism shall be used (see Figure 16 for vertical CRS and Figure 18 for coupled resource). Note that in encoding coupled resource by referencing the uuidref attribute may also be used, in addition to XLink. All other metadata items shall be implemented by value.
GML XML elements which are used in metadata have an optional gml:id attribute. The value domain of the identifier is referred to as XML name. XML names have certain restrictions. They may contain any alphanumeric character, non-English alphanumeric characters, ideograms and the underscore, hyphen and period. They may not contain any other punctuation characters. The colon is allowed, but its use is reserved for namespaces, so it cannot appear in an identifier. XML names may not include any whitespace including spaces and carriage returns. All names beginning with the letters XML (in uppercase, lowercase or any mixture thereof) are reserved (see  pages 18 and 19).
XML names may only start with letters, ideograms and the underscore character. Consequently, care must be taken when using the value of a UUID as the value of an identifier because these can begin with numeric characters. If using UUIDs as the basis of such an identifier best practice is to prefix the UUIDs with an underscore.
An identifier must be unique within the scope of any XML document (i.e. there shall not be more than one id type attribute with a particular identifier value) that the metadata record might occur in, such as a result set from a CSW query and not just the metadata document itself.
If an id type attribute contains an illegally formed XML name the result will be a schema validation error.
In GEMINI2 there are two ways of encoding free text. The basic element for providing text of unrestricted length with no internal XML structure is gco:CharacterString. This element is appropriate when the text does not refer to a specific external resource or registry. When the provided text is a term or code referring to an externally defined explanation or registry value, gmx:Anchor element is recommended over gco:CharacterString. gmx:Anchor contains an additional attribute group enabling linking the provided piece of text with an external describing resource. The most important of these attributes in this context is xlink:href, which contains the actual reference in Uniform Resource Identifier (URI) format.
For example the encoding of an identifier/code value should be done with gmx:Anchor as in Figure 19, rather than gco:CharacterString (Figure 20), when the unique resource identifier is referenceable,
GEMINI contains two elements, 25 Limitations on public access and 26 Use constraints. These represent the two INSPIRE elements Limitations on public access and Conditions applying to access and use, and are therefore both encoded with ISO 19115 MD_LegalConstraints elements. INSPIRE requires that they are encoded in separate MD_LegalConstraints elements. This means that a GEMINI metadata record must contain at least two MD_LegalConstraints elements.
INSPIRE requires that both of these elements use MD_RestrictionCode = otherRestrictions. Because GEMINI element 26 is for use constraints, it makes sense for both elements to place this inside a useConstraints element. Having specified otherRestrictions, each shall then use one or more otherConstraints elements to specify the actual constraints.
For Limitations on public access, at least one of the otherConstraints elements shall use gmx:Anchor to indicate the kind of constraint, by reference to the appropriate entry in the INSPIRE registry (http://inspire.ec.europa.eu/metadata-codelist/LimitationsOnPublicAccess). The free text within the anchor can provide human readable detail.
For Use constraints, if there are no conditions, or the conditions are unknown, then use gmx:Anchor to reference the appropriate entry in the INSPIRE registry (http://inspire.ec.europa.eu/metadata-codelist/ConditionsApplyingToAccessAndUse). Similarly, if the conditions are documented in a license, use gmx:Anchor to reference the full license text. The free text within the anchor can provide a human readable summary. If the conditions are not available at a URL, they can be entered as plain text; see FIgure 22b.
Figure 21 describes a commercial product, not available to the public for IPR reasons, and with a web page describing licences
Figure 22a describes an open data product, with no limitations on public access, the Open Government Licence referenced, and summarised in plain text.
Figure 22b describes an open data product, with no limitations on public access, but plain text conditions of use.
XML elements for encoding metadata for datasets and series of datasets are drawn, primarily, from the gmd and gco namespaces and also the gml and xlink namespaces. Identification information is encoded using the gmd:MD_DataIdentification type (Figure 23).
Metadata instances may include more than one gmd:identificationInfo XML element. The first gmd:identificationInfo XML element in a GEMINI metadata instance for datasets or series shall have as its first and only child XML element gmd:MD_DataIdentification. The ISO 19115 hierarchyLevel element shall be set to “dataset” or “series”. For a series, ISO 19115 hierarchyLevelName element must also be set, to "dataset" or "series" as appropriate.
XML elements for encoding metadata for services are drawn from the gmd, gco, gml, xlink and srv namespaces. Identification information is encoded using the srv:SV_ServiceIdentification type (Figure 24).
Metadata may include more than one gmd:identificationInfo XML element. The first gmd:identificationInfo XML element in a GEMINI metadata instance for services shall have as its first child XML element srv:SV_ServiceIdentification. The ISO 19115 hierarchyLevel element shall be set to “service”, ISO 19115 hierarchyLevelName element must also be set, to "service".
The ISO 19119 class SV_ServiceIdentification includes two mandatory properties that are out of scope of GEMINI2 metadata. These are srv:couplingType and srv:containsOperations. Both shall be implemented with null values with the nil reason being missing (Figure 23).
Last technical update: March 2019
This work is licensed under a Creative Commons Attribution 4.0 International License