Introduction

This document is for all those interested in promoting efficient exchange and re-use of multi-media news content within their own organisations and with information partners, using open standards and technologies.

Whilst a good deal of the content of this document is aimed at technical architects and software writers, business influencers and decision-makers are encouraged to read the Executive Summary, which gives a broadly non-technical justification for the use of IPTC Standards, and how they may be applied to solve the real-world issues of all organisations that create or consume news.

Purpose and Audience

The Guidelines are intended to provide implementers of the NewsML-G2 Standards with a thorough knowledge of the XML data structures used to manage and describe content, and an appreciation of the issues involved in implementing the standards in their organisation, whether they are a content provider, content customer, or software vendor.

Terms of Use

Copyright © 2017-18 IPTC, the International Press Telecommunications Council. All Rights Reserved.

The Guidelines are published under the Creative Commons Attribution 4.0 license - see the full license agreement at http://creativecommons.org/licenses/by/4.0/. By obtaining, using and/or copying this document, you (the licensee) agree that you have read, understood, and will comply with the terms and conditions of the license.

This document uses materials that are either in the public domain or are made available by permission of their respective copyright holders. Permission of the copyright holder must be obtained prior to the use of protected material. All materials of this IPTC standard covered by copyright shall be licensable at no charge.

If you do not agree to the Terms of Use you must cease all use of the NewsML-G2 specifications and materials now. If you have any questions about the terms, please contact the Managing Director of the International Press Telecommunication Council. (Contact Details below)

While every care has been taken in creating this document, it is not warranted to be error-free, and is subject to change without notice. Check for the latest version and applicable NewsML-G2 Standards and Documentation by visiting www.newsml-g2.org/doc. The versions of the NewsML-G2 Standards covered by this document are listed in About the NewsML-G2 Standards.

Contacting the IPTC

IPTC, International Press Telecommunications Council
Web address: www.iptc.org
Follow us on Twitter: @IPTC and @IPTCupdates
Email: office@iptc.org
Business address:
25 Southampton Buildings
London WC2A 1AL
United Kingdom

The company is registered in England at 10 Portland Business Centre, Datchet, Slough, Berks, SL3 9EG
as Comité International des Télécommunications de Presse
Registration No. 1010968, Limited by Guarantee, Not Registered for VAT

Acknowledgments

IPTC member delegates past and present who have contributed to this documentation project:

BABY, Vincent

Thomson Reuters

BEYNET, Yannick

Agence France Presse

CARD, Tony

BBC Monitoring

COMPTON, Dave

Thomson Reuters

CRAIG-BENNETT, Honor

The Press Association*

EVAIN, Jean-Pierre

European Broadcasting Union

EVANS, John

Transtel Communications Ltd

GEBHARD, Andreas

Getty Images

GORTAN, Philipp

APA (Austrian News Agency)

GULIJA, Darko

HINA (Croatia News Agency)*

HARMAN, Paul

Bloomberg

HUSO, Trond

NTB (Norwegian News Agency)*

KELLY, Paul

XML Team*

LE MEUR, Laurent

Agence France Presse*

LINDGREN, Johan

TT Nyhetsbyrån (Swedish News Agency)

LORENZEN, Jayson

Business Wire

MOUGIN, Philippe

Agence France Presse

MYLES, Stuart

Associated Press

RATHJE, Kalle

Deutsche Presse Agentur

SCHMIDT-NIA, Robert

Deutsche Presse Agentur

SERGENT, Benoît

European Broadcasting Union

STEIDL, Michael

Managing Director, IPTC

WHESTON, Siobhán

BBC Monitoring

WOLF, Misha

Thomson Reuters

* Delegate is no longer with the member company in 2018.

About the NewsML-G2 Standards

This is the 10th edition of the NewsML-G2 Implementation Guidelines. The IPTC Standards covered by these Guidelines are:

  • NewsML-G2, version 2.25

  • SportsML-G2, version 3.0

Guidelines author: Kelvin Holland, Point House Media Ltd.

Conventions used by this document

Links to cross-referenced resources within this document are indicated by this style

Links to external resources are indicated by this style

Code examples are shown thus:

<element attribute="attribute_value">Data</element>

In NewsML-G2, all XML elements that consist of two (or more) concatenated words are in lowerCamel case. For example:

<catalog>
<catalogRef>

Where a word is normally capitalized, it remains so. Thus:

<inlineXML>

Attribute names are always all lowercase For example

standardversion="2.25"

In these Guidelines, an "Item" with capitalised "I" indicates a NewsML-G2 Item (for example News Item, Planning Item, and so on).

Admonitions indicate especially
important notes or
warnings to
implementers

Note on Spelling (English)

The IPTC convention for documents in English is to use UK English spelling, In general, U.S. English is used for property names and values used in IPTC XML Standards (for example, canceled, color, catalog).

A common sense approach dictates that there may be exceptions to this convention.

Terminology: MUST and SHOULD

There are few mandatory features in NewsML-G2. This document uses the terms MUST (NOT), SHOULD (NOT) and MAY as defined in RFC 2119. A MUST instruction in this document refers to mandatory actions; SHOULD refers to recommended actions or best practice to be used unless there is a very good reason not to do so; MAY refers to optional actions.

Note on Time and Date-Time properties

The XML Specification for time-based properties is based on ISO 8601 and permits the omission of time zone/time offset information. However, these values MUST be present in NewsML-G2 timestamps that express a date AND time, such Item Metadata timestamps, because the exchange of news information may cross time zones, and timing information must be unambiguous. The following comply:

<versionCreated>2017-11-06T12:12:12+01:00</versionCreated> (1)
<versionCreated>2017-11-06T12:12:12-01:00</versionCreated> (2)
<versionCreated>2017-11-06T12:12:12Z</versionCreated> (3)
1 +ve offset
2 -ve offest
3 indicates UTC

The following does NOT comply:

<versionCreated>2017-11-06T12:12:12</versionCreated> // no time offset

Administrative content metadata properties such as <contentCreated> and descriptive date-time properties such as <founded> and <dissolved> use a Truncated DateTime data type permitting parts of the data-time to be truncated from the right, for example:

<founded>2017-11-06</founded>

complies, but this does not:

<founded>2017-11-06T12:00:00</founded>

Note on IPTC NewsCodes vocabularies in NewsML-G2 examples

The NewsML-G2 examples used in this document and in the accompanying files illustrate the use of QCodes and Controlled Vocabularies. Although the examples include IPTC NewsCodes and Scheme Aliases as defined in a Catalog, some other QCodes are for example only and not controlled by the IPTC. In each of the Listings in the document, it will be made clear which are NOT values from the IPTC NewsCodes vocabularies.

1. How-To Index

The Implementation Guide is written using worked examples and use cases in order to give implementers an insight into the practical application of NewsML-G2 features. It may be helpful to have the NewsML-G2 Specification at hand; this will contain further detailed information about features discussed. This document can be downloaded by visiting www.newsml-g2.org/spec.

Implementers picking up these full Guidelines after reading the separately available “Quick Start” guides may wish to go straight to the How To topics for answers to specific questions.

1.7. Receiver view

This edition of the Guidelines contains a new Quick Start guide to Receiving NewsML-G2. The content is not included in the full Guidelines document, but as a separate document in Microsoft Word format. It is intended that this standalone document can be used by NewsML-G2 Providers as a template to create customer guides to their own implementations of NewsML-G2.

2. What’s New in NewsML-G2 2.25 (including version 2.24)

This chapter summarises the changes to NewsML-G2 since version 2.23, which was covered in the previous revision of this Guidelines document. A full change history of NewsML-G2 up to and including version 2.3 is documented in 28 Changes to NewsML-G2 and related Standards.

2.1. Development Freeze for Core Conformance (CCL)

There is to be no further development of the Core Conformance Level (CCL) of NewsML-G2, so the latest version that supports CCL will be 2.25. Later versions of the standard will only support Power Conformance (PCL). CCL remains as a minimal set of features that create a useful working implementation of NewsML-G2, but all recent developments have focused on adding features at PCL only, so the IPTC feels this is a useful and pragmatic step to simplify the standard going forward.

The default conformance level remains CCL, therefore implementers must explicitly specify @conformance of any NewsML-G2 Item even though they are using a version beyond 2.25, as in:
conformance="power"

2.2. Add attributes for standard version of contenttype or format

When content has an IANA Media Type expressed for example as @contentype, there may be a need to qualify the attribute according to the standard used. The following attributes are added to all elements having a @contenttype, a @contenttypevariant, or a @format attribute:

  • @contenttypestandardversion

  • @contenttypevariantstandardversion

  • @formatstandardversion

The following table illustrates some use cases:

Attribute @contenttype @contenttype standardversion @contenttype variant @contenttype variant standardversion @format @format standardversion

Current definition by the XML Schema

The IANA (Internet Assigned Numbers Authority) MIME type of the target resource.

A refinement of a generic content type (i.e. IANA MIME type) by a literal string value

A refinement of a generic content type (i.e. IANA MIME type) by a value from a controlled vocabulary - expressed by a QCode

Use Case Examples: the content is …​

NITF 3.4

text/vnd.IPTC.NITF

3.4

G2 v2.24 News Item

application/vnd.iptc.g2.newsitem+xml

2.24

SportsML 3

application/xml

1.0 (for xml)

fmt:SportsML

3.0 (for SportsML)

2.3. Adding urgency to planning

The NewsML-G2 Planning Item already supports contentMeta/urgency, which indicates the Editorial Urgency of the content of the <planningItem>; this refers to the entire structure within the <newsCoverageSet>.

There is a need to be able to flag the Urgency (Importance) of News Coverage per Item Class associated with a Planning Item. Providers will not necessarily group <planning> by Item Class within <newsCoverage>, so this Urgency indication needs to be supported at the <planning> level, not the <newsCoverage> level. Example:

<newsCoverageSet>
    <newsCoverage>
        <planning>
            <itemClass qcode="icls:video"/>
            <scheduled>2013-10-12T11:00:00.000Z</scheduled>
            ...
            <newsContentCharacteristics ... />
            <urgency>3</urgency>
            <planningExtProperty ... />
        </planning>
    </newsCoverage>
    ...
</newsCoverageSet>

2.4. Add Publication History

The Publication History element <pubHistory> is a new child element of News Item and Package Item. It sits between <hopHistory> and <rightsInfo> and may be qualified by any of the Common Power Attributes group. It has a single child element <published>, which is qualified by a combination of @qcode, @uri, or @literal attributes according to the rules set forth in Change to cardinality of QCode Type attributes. There must be AT LEAST one <published> element per <pubHistory>. The <published> element may contains the following (optional) elements:

  • timestamp

  • name

  • related

  • publishedExtProperty

Example:

<pubHistory>
    <published qcode="..">
        <timestamp>2017-11-02T23:34:00Z</timestamp>
        <name>SNAP</name>
    </published>
</pubHistory>

2.5. Adding role and version attribute to altId

The @role attribute is added so that information about the role of a specific Alternative Identifier among other Alternative Identifiers may be expressed in addition to the @type attribute, which provides information about the context of the <altId>.

The @version attribute is added because a genuine identifier may need to be accompanied by a version, and this was missing from <altdId>.

2.6. Refine event date confirmation

The current <confirmation> child element of eventDetails/dates is DEPRECATED from NewsML-G2 v2.24 on because it only supports a single value that applies to all dates, but providers need further granularity to support the confirmation status of any of the date and duration values for an event.

Therefore the optional attributes @confirmationstatus and @confirmationstatusuri are added to the following eventDetails/dates properties:

  • start

  • end

  • duration

To accompany the change the IPTC NewsCodes for Event Date Confirmation (Scheme URI http://cv.iptc.org/newscodes/eventdateconfirm/, recommended alias "edconf") is changed. The revised CV is shown in the table below:

Code Name Definition

approximate

approximate

The confirmation status of the date/period/duration is: approximate

confirmed

confirmed

The status of the date/period/duration is: confirmed.

undefined

undefined

The confirmation status of the date/period/duration is undefined.

See Event Details Group for further details.

3. Executive Summary

3.1. Why News Exchange standards matter

Information is valuable. Many major financial decisions rely on split-second delivery of news about companies and markets; successful businesses have been built on the ability to target individuals and groups with information which is relevant to their needs. News organisations and information providers have also invested heavily in the people and the technology needed to gather and disseminate news to their customers.

Without standards for news exchange, most of the value of this information would be lost in a confusion of customised feeds and competing formats. The huge volumes of content now being exchanged not only demand a common format, or mark-up, for the content itself but also a common framework for information about the content - the so-called metadata.

3.2. The purpose of NewsML-G2

NewsML-G2 is an open standard for the exchange of all kinds of news information. This can be content, such as text, pictures, audio and video, or information about Events and News Planning. A sister standard, SportsML-G2 conveys rich sports information. The content can be in any format or encoding. It is conveyed with semantically-precise metadata that matches the needs of a professional news workflow and the way that content is consumed both in a B2B and B2C context.

The standard is not concerned with the presentation or mark-up of the content that it conveys; this is the role of standards such as HTML5 and NITF (News Industry Text Format, an IPTC XML standard) that are used to mark up the payload of a NewsML-G2 Item. Microformats such as hNews are complementary; the IPTC has its own semantic mark-up standard rNews, which is compatible with NewsML-G2.

NewsML-G2 models the way that professional news organisation work, but goes beyond this by standardising the handling of the metadata that ultimately enables all types of content to be linked, searched, and understood by end users. NewsML-G2 metadata properties are designed to be mapped to RDF, the language of the Semantic Web, enabling the development of new applications and opportunities for news organisations in evolving digital markets.

3.3. Business Drivers

Many of the business challenges faced by media organisations are related to the development of the World Wide Web, which has not only increased the availability of news content, but is constantly creating new ways to consume it. These challenges are not a once-in-a-lifetime event, but a continuing fact of life.

Businesses need to:

  • Control, and if possible reduce, the cost of developing and maintaining services.

  • Quickly develop new media-rich products and services that can exploit emerging trends and new business models.

  • Give customers access to added-value assets, including archives and metadata repositories.

  • Allow innovation by third-party vendors and partners.

  • Enhance IT investment by enabling the sharing of complex content across separate systems.

3.4. Business Requirements

In response to these business challenges, an information exchange standard needs to:

  • Fit an MMM strategy (Multi-media, Multi-channel, Multi-platform)

  • Handle texts, pictures, graphics, animated, audio or video news

  • Be a lightweight container for news, simple to implement and extend, yet offer powerful features for advanced applications.

  • Be useful at all stages of the lifecycle of news, from initial event planning, through content gathering, syndication, to archiving.

3.5. Design and Benefits

The NewsML-G2 standards are based on common framework – the News Architecture – that is independent of any technical implementation. It may be implemented using object-oriented software, such as Java, or in a database.

G2FamilyGenes

Figure: The family genes of NewsML-G2 are inherited from the News Architecture (NAR)

The IPTC has implemented the NAR specification in XML Schema to create NewsML-G2 and SportsML-G2, because of the need to facilitate news exchange using W3C standards. XML provides continuity with existing standards, and also has an existing large community of experts.

The standard enables all parties involved in news – providers, receivers and software vendors – to send and receive information quickly, accurately, and appropriately.

  • A common framework maximises the value of investments and provides a path into the future, with maximum inter-operability between different information partners.

  • Machine-readable metadata enables automation of standard processes, cutting costs, speeding delivery, and increasing quality.

  • Innovative solutions are possible because NewsML-G2 complements the work of companies working on search and navigation technologies to realise the vision of the Semantic Web.

4. About the IPTC

IPTC members are technologists and thought leaders from the world’s main news agencies and leading media players. They are expert in the field of news production and dissemination. IPTC Standards today play an essential role in efficient news exchange between the world’s news and media organisations.

The text transmission standards IPTC7901 and its cousin ANPA1312 have been a key enabler of news exchange for news agencies and their newspaper and broadcaster customers, as has the IIM standard for pictures. All media organisations have benefited from these standards; they have been essential to the adoption of digital production technologies. NewsML was first launched in 2000, and the G2 version of NewsML in 2006.

5. Unification of NewsML-G2 and EventsML-G2

Originally, NewsML-G2 conveyed content, while its sibling EventsML-G2 was a separate Standard for News Events, although both are children of the NAR model. With the introduction of the Planning Item with its News Coverage payload in NewsML-G2 v2.7 and EventsML-G2 v1.6, it became difficult to make a distinction between the two as separate Standards.

The IPTC members therefore decided to unify the Standards. From a non-technical “brand” standpoint, NewsML-G2 is now the “senior” umbrella Standard for all Items, whether they convey News or Events. SportsML-G2 continues to be a completely separate Standard, although it is always conveyed by NewsML-G2. From v2.9, all EventsML-G2 structures were merged in NewsML-G2.

From a technical implementation perspective, there is a simplification of XML Schemas into just two: an “All-Core” Schema for Core Conformance, and an “All-Power” Schema for the Power Conformance Level. These cover all Item types in both News and Events.

Throughout this document, the term “NewsML-G2” is used, unless referring to a specific feature of an earlier version of EventsML-G2. The term “G2” on its own refers to the G2 Family of Standards.

The term “Item” with capitalised “I” is used to indicate a NewsML-G2 Item (News Item, Planning Item, Package Item etc.).

6. How News Happens

6.1. Introduction

NewsML-G2 represents a content and processing model for news that aligns with the way that professional news organisations work. It is therefore important that implementers have at least a high-level understanding of how the news business works, in order to appreciate the rationale behind its features.

An event becomes news when someone decides to create a record of it, and place that record in the public domain. Professional news production is not a haphazard or random process, but a highly organised activity, shaped by a number of influences:

  • The publishing of news originally centred on printing, an industrial process which imposes time and logistical constraints. Print remains an important channel for news dissemination.

  • The selection of what is, and what is not, news to any given audience is vital to the success of any publishing venture, whether in print, broadcasting, web or other media.

  • For legal and ethical reasons, professional news organisations ensure that standards are maintained in the selection and production of news, and that content is reviewed before being authorised for release to the public

These constraints and considerations lead to the news production process being divided into five generic domains:

  • Planning and Assignment

  • Information Gathering

  • Verification

  • Dissemination

  • Archiving

6.2. Assignment

News organisations need to plan their operations, based on prior knowledge of newsworthy events that are expected to occur in any given time frame (daily, weekly, monthly and so on). The resulting schedule of events is called a variety of names, according to custom, such as schedule, budget, day book or diary.

Unexpected events (breaking news) will cause this schedule to change at short notice.

According to the schedule, people and resources will be assigned to “cover” the news events, and those who are dependent on the timely gathering of the news, such as co-workers and customers, will be kept informed of expected coverage, deadlines and any updates,

Large organisations may have several schedules for different categories of news, for example General News, Sport, Finance, Features etc.

Increasingly, text and pictures are being augmented by dynamic content: video, audio, animated graphics, and the availability of this material needs to be signalled in the schedule to interested parties in a way that is amenable to software processing.

These business processes are addressed by Events in NewsML-G2 and Editorial Planning – the Planning Item.

6.3. Information Gathering

Most people recognise the model – beloved of Hollywood – of reporters, photographers, film/video and sound personnel rushing to the scene of a news event and generating content based on material they are able to obtain as the event unfolds.

In fact, news is gathered by an endless variety of means, such as press releases, reports from news agencies and freelance journalists, tip-offs from the public, statements on web sites, blogs etc. Generally, information gathered in this way is incomplete and needs to be augmented by additional material. Sometimes this material is gathered and prepared by contributors, working with the original creator.

This information gathering process ultimately results in journalists submitting event coverage: written copy, photographs, video footage and so on, to the Verification Stage of an editorial workflow.

6.4. Verification

The process of verifying the authenticity of news often starts before the content is generated, as part of the selection and assignment process. However, the detail of the content needs to be checked before the content can be released.

Responsible news organisations take steps to ensure that the facts of any news coverage are correct, and that they are presented in a fair, balanced and impartial way. It is also surprisingly easy to break the law by the inappropriate release of content. Lawyers or legally-trained staff routinely work with editors to ensure that content does not transgress the civil or criminal law, and that it is not gratuitously offensive to individuals or groups.

Clear and consistent writing, spelling and grammar are considered important and an organisation’s rules will often be written down in a Style Guide which journalists are required to use when writing and editing.

Only when content meets all of the required standards will it be authorised for release. Completing these essential tasks under time pressure is one of the major operational challenges faced by news organisations.

6.5. Dissemination

Although seen conceptually as a physical “publication” process, the dissemination of information and news assets in digital form is pre-eminent today.

When news is received electronically, the recipient needs to be able to process the information quickly and reliably. When one considers that each day, a large news organisation may receive, from multiple sources, thousands of images, and hundreds of thousands of words of text, plus video, audio and graphics, the scale of the processing required becomes apparent.

The management of news requires organisations to know whether any given piece of content is useable, and in what context. Media organisations often receive content under embargo. This is information that has been released to professional journalists in advance so that they may complete any work needed to make it ready for dissemination to the public. Only when the embargo time has passed may the content be published. These informal protocols work because it is in the interests of all parties to co-operate. If they break an embargo, journalists know that their job may become more difficult because the provider will withhold information in future.

When content is transmitted electronically, it cannot be physically deleted by the provider. There must therefore be a means for providers to inform their customers that a piece of content must be deleted, (“cancelled” or “killed”) and it is vital that any examples of the content are deleted from all systems, including archives, often for copyright or serious legal reasons.

The right to use a piece of content is an important aspect of news. Picture and video rights can be particularly complex. Although formal rights languages that are machine-readable, such as RightsML, (www.rightsml.org) are available, many organisations continue to indicate rights using a natural-language statement.

This management and administrative information must also be accompanied by descriptive information – metadata – that enables the receiver to direct the content to the appropriate workflow and users, retrieve related content, and if necessary re-purpose it for a variety of media channels.

Descriptive metadata will include some type of classification of the news so that its relevance to a sphere of interest(s) can be determined. Ad-hoc tags or keywords are useful, but their value is increased if they form part of a formal classification scheme, or taxonomy.

The use of taxonomies enables searches to yield consistent predictable results across a wide range of content and further enables accurate processing of content by software.

6.6. Archiving

A comprehensive digital archive of news, people and organisations plays an increasingly active role in the news process because of the features offered by electronic media such as the World Wide Web.

Today it is desirable to publish news which contains links to related news and information assets, allowing the consumer to view any aspect of a news story, including details of the people and organisations involved, and the concepts at issue.

The archiving process completes the news production cycle and accurate, comprehensive metadata is the key to unlocking the value of this information asset. The value of content is in direct proportion to the quality and quantity of its metadata; one can imagine that content with no metadata could be almost valueless.

7. Quick Start: NewsML-G2 Basics

7.1. About the Quick Start Guides

Quick Start Guides are intended to give implementers enough information about NewsML-G2 to begin creating a useful working set of model NewsML-G2 documents for their organisation, or to begin working with NewsML-G2 documents provided by another organisation. This Basics Guide covers the NewsML-G2 features that are common to most types of content. The other Quick Start Guides give more specific information about NewsML-G2 for text, pictures, video, and news packages.

7.2. Introduction

The basic structure of a NewsML-G2 Item document is common to all applications. The available types of Item are:

  • News Item: for all kinds of news content.

  • Package Item: for structured collections of news content.

  • Concept Item: for expressing knowledge about entities, abstract concepts, and events.

  • Knowledge Item: for collections of concepts, often grouped for a specific purpose such as Controlled Vocabularies.

  • Planning Item: for exchanging information about news coverage and fulfilment.

  • Catalog Item: for managing references to Controlled Vocabularies.

7.3. Item structure

The building blocks of a NewsML-G2 Item are shown in the diagram below:

BasicItemStructure

Figure: All NewsML-G2 Items share this basic container structure

All have a root element that is specific to the type of Item that contains identification, version and some basic information to initiate the NewsML-G2 processor.

The Catalog information is required to resolve QCodes, a fundamental feature of NewsML-G2 that enables partners to guarantee that codes used within an Item are globally unique.

The Rights Information block allow publishers to assert fine-grained information about copyright and usage terms as human-readable statements, or by inclusion of a machine-readable rights expression language such as RightsML.

The Item Metadata wrapper contains metadata about the Item as a whole, and this is followed by metadata about whole of the content <contentMeta>, and optionally by the <partMeta> wrapper, which enables publishers to express metadata about specific parts of the content.

Optional "helper" structures are available for specialised processing needs.

Each type of NewsML-G2 Item has a specific wrapper element for content, shown in the diagram below that also shows the basic top level elements common to all NewsML-G2 Items. The colours of the wrapper elements in the diagram are repeated in the code example in order to highlight the relevant sections of the News Item:

ExpandedItemStructure

Figure: The basic XML elements associated with each part of a NewsML-G2 Item

The example in this Quick Guide to NewsML-G2 Basics uses a News Item with text content to illustrate the basic principles in action. Read this guide first, and proceed to further Quick Guides specific to Text, Pictures, Video, and Packages, as needed.

LISTING 1: A NewsML-G2 News Item with Text
<?xml version="1.0" encoding="UTF-8"?>
<newsItem
    xmlns="http://iptc.org/std/nar/2006-10-01/"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://iptc.org/std/nar/2006-10-01/
        ./NewsML-G2_2.25-spec-All-Power.xsd"
    guid="urn:newsml:acmenews.com:20161018:US-FINANCE-FED"
    version="10"
    standard="NewsML-G2"
    standardversion="2.25"
    conformance="power"
    xml:lang="en-GB">
    <catalogRef
        href="http://www.iptc.org/std/catalog/catalog.IPTC-G2-Standards_30.xml" />
    <catalogRef href="http://www.example.com/newsml-g2/catalog.enews_2.xml" />
    <rightsInfo>
        <copyrightHolder uri="http://www.example.com/about.html#copyright" >
            <name>Example Enews LLP</name>
        </copyrightHolder>
        <copyrightNotice>
            Copyright 2016-17 Example Enews LLP, all rights reserved
        </copyrightNotice>
        <usageTerms>
            Not for use outside the United States
        </usageTerms>
    </rightsInfo>
    <itemMeta>
        <itemClass qcode="ninat:text" />
        <provider qcode="nprov:REUTERS" />
        <versionCreated>2017-10-21T16:25:32-05:00</versionCreated>
        <firstCreated>2016-10-18T13:12:21-05:00</firstCreated>
        <embargoed>2017-10-23T12:00:00Z</embargoed>
      <pubStatus qcode="stat:usable" />
        <service qcode="svc:uknews">
            <name>UK News Service</name>
        </service>
        <edNote>
            Note to editors: STRICTLY EMBARGOED. Not for public release until 12noon
            on Friday, October 23, 2017.
        </edNote>
        <signal qcode="sig:update" />
        <link rel="irel:seeAlso"
            href="http://www.example.com/video/20081222-PNN-1517-407624/index.html"/>
    </itemMeta>
    <contentMeta>
        <contentCreated>2016-10-18T11:12:00-05:00</contentCreated>
        <contentModified>2017-10-21T16:22:45-05:00</contentModified>
        <located type="cptype:city" qcode="geo:345678">
            <name>Berlin</name>
            <broader type="cptype:statprov" qcode="prov:2365">
                <name>Berlin</name>
            </broader>
            <broader type="cptype:country" qcode="iso3166-1a2:DE">
                <name>Germany</name>
            </broader>
        </located>
        <creator uri="http://www.example.com/staff/mjameson" >
            <name>Meredith Jameson</name>
        </creator>
        <infoSource uri="http://www.example.com" />
        <subject type="cpnat:abstract" qcode="medtop:04000000">
            <name xml:lang="en-GB">economy, business and finance</name>
        </subject>
        <subject type="cpnat:abstract" qcode="medtop:20000523">
            <name xml:lang="en-GB">labour market</name>
            <name xml:lang="de">Arbeitsmarkt</name>
            <broader qcode="medtop:04000000" />
        </subject>
        <genre qcode="genre:interview">
            <name xml:lang="en-GB">Interview</name>
        </genre>
        <slugline>US-Finance-Fed</slugline>
        <headline> Fed to halt QE to avert "bubble"</headline>
    </contentMeta>
    <contentSet>
        <inlineXML contenttype="application/nitf+xml"> <!--  A VALID MIME TYPE      -->
            <!--   Inline XML must contain well-formed XML such as NITF or XHTML       -->
        </inlineXML>
    </contentSet>
</newsItem>

7.4. Root element <newsItem>

Each NewsML-G2 Item Type uses a specific root element name as shown in the diagram above. In the example News Item the root element is <newsItem> (note camel case spelling).

7.4.1. Root element attributes

Item Identifier

All NewsML-G2 Items must have a @guid, an identifier that should be globally unique for all time and independent of location. The IPTC has registered a URN namespace for the purpose of creating GUIDs for NewsML-G2 Items using a specification based on RFC3085. The syntax for a @guid using this scheme is:

guid="urn:newsml:[ProviderId]:[DateId]:[NewsItemId]"

Use an internet domain name owned by your organisation as a ProviderId, for example:

<newsItem guid= "urn:newsml:acmenews.com:20161018:US-FINANCE-FED"
Do not try to "reverse engineer" the DateId part of a GUID to create a time-stamp. This may have unintended consequences and result in errors. Use the appropriate NewsML-G2 timestamp property instead.
Version

A simple indicator of the version of the Item:

version="10"

Version numbers need not be consecutive, but must START at 1, and a new version must be a higher number than the previous version. If version is missing, the value is assumed to be 1. See Hidden Values of NewsML-G2

Standard

A string denoting the IPTC Standard, in this case "NewsML-G2".

standard="NewsML-G2"
Standard Version

A string denoting the major and minor version of the Standard being used:

standardversion="2.25"
Conformance

There are two levels of Conformance to the NewsML-G2 standard. The "Core" conformance level (CCL) represents a minimal sub-set of NewsML-G2 that can usefully get work done, and is the default if @conformance is omitted. In practice, many implementers use the properties of "Power" conformance (PCL). If an implementer stays within the CCL, the @conformance of the NewsML-G2 Item can be assumed and omitted, but must be declared if using PCL:

conformance="power"
Development of NewsML-G2 at Core Conformance has stopped at 2.25; versions beyond this are at Power Conformance only, but the @conformance property must continue to be explicitly stated.
Language

Setting the default language of XML elements with text content in NewsML-G2 is UK English:

xml:lang="en-GB"
IPTC namespace

Sets the default namespace for elements:

xmlns="http://iptc.org/std/nar/2006-10-01/"

Putting this together, the required root element attributes are:

<newsItem xmlns="http://iptc.org/std/nar/2006-10-01/"
    guid="urn:newsml:acmenews.com:20161018:US-FINANCE-FED"
    version="10"
    standard="NewsML-G2"
    standardversion="2.25"
    conformance="power"
    xml:lang="en-GB">

7.4.2. Validating code

When developing a NewsML-G2 processing application, implementers will need to validate the generated NewsML-G2 code against the appropriate schema. To validate code at PCL against the latest version of NewsML-G2 covered by these Guidelines (2.23) add the following code to the <newsItem> element:

<newsItem
    //...
    xmlns="http://iptc.org/std/nar/2006-10-01/"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://iptc.org/std/nar/2006-10-01/
http://www.iptc.org/std/NewsML-G2/2.25/specification/NewsML-G2_2.25-spec-All-Power.xsd"

To validate at CCL, substitute the name of the "Core" schema:

NewsML-G2_2.25-spec-All-Core.xsd
The IPTC does NOT permit validation of production documents against the XML Schema files hosted by the IPTC.

7.4.3. Catalog wrapper <catalogRef>

Codes, short mnemonics used to express the value of properties such as Category, are a long-established feature of news exchange. QCodes are the NewsML-G2 mechanism that enables partners in news exchange to guarantee that codes are globally unique. Without going into details of the mechanism here, the News Item <catalog> enables a NewsML-G2 processor to resolve QCodes, and guarantee that uniqueness, by mapping the code to a globally-unique URI. It is recommended that this URI locates a web resource.

One of the few mandatory NewsML-G2 elements, <itemClass>, uses QCodes issued by the IPTC to identify the business intention of the Item. For a News Item, the scheme is News Item Nature, with a recommended Scheme Alias of "ninat". Values from the scheme include (not limited to) "ninat:text" and "ninat:picture". The catalog reference is:

<catalog>
    <scheme alias="ninat" uri="http://cv.iptc.org/newscodes/ninature/" />
</catalog>

Other types of NewsML-G2 Item use specific schemes for the <itemClass> property.

All Scheme Aliases used in the example listing indicate IPTC NewsCodes
vocabularies, except for the following
alias values: svc, cptype, geo,, prov.

As the CVs used by a provider are usually quite consistent across the NewsML-G2 Items they publish, the IPTC recommends that the <catalog> references are aggregated into a stand-alone file which is made available as a web resource referenced by <catalogRef>. This is how the IPTC publishes its Catalogs:

<catalogRef
    href="http://www.iptc.org/std/catalog/catalog.IPTC-G2-Standards_30.xml"
/>

The use of stand-alone web resources is preferable because all of the QCode mappings are shared across many NewsML-G2 Items; a local <catalog> can only be used by the single Item.

It’s likely that provider-specific catalogs will be needed to resolve QCodes used in the Item, for example:

<catalogRef
    href="http://www.example.com/newsml-g2/catalog.enews_2.xml" />

Adding the Catalog information to the example results in the following:

<newsItem xmlns="http://iptc.org/std/nar/2006-10-01/"
    guid="urn:newsml:acmenews.com:20161018:US-FINANCE-FED"
    version="10"
    standard="NewsML-G2"
    standardversion="2.25"
    conformance="power"
    xml:lang="en-GB">
    <catalogRef
        href="http://www.iptc.org/std/catalog/catalog.IPTC-G2-Standards_30.xml"
         />
     <catalogRef
          href="http://www.example.com/newsml-g2/catalog.enews_2.xml"
         />

7.4.4. Rights information wrapper <rightsInfo>

The optional <rightsInfo> wrapper holds copyright information and usage terms, such as the following example:

<rightsInfo>
    <copyrightHolder uri="http://www.example.com/about.html#copyright">
        <name>Example Enews LLP</name>
    </copyrightHolder>
    <copyrightNotice>
        Copyright 2016-17 Example Enews LLP, all rights reserved
    </copyrightNotice>
     <usageTerms>Not for use outside the United States</usageTerms>
</rightsInfo>

7.5. Item Metadata <itemMeta>

7.5.1. Mandatory Properties

The mandatory <itemMeta> section has four mandatory elements, present in the following order:

Item Class

As previously mentioned, the <itemClass> property describes the type of content conveyed by the Item. It is mandatory to use one of the IPTC News Item Nature NewsCodes (recommended Scheme Alias: "ninat") for the Item Class of News Items and Package Items, expressed as a QCode:

<itemClass qcode="ninat:text" />

Other possible values from this scheme include (not limited to) "ninat:picture", "ninat:video" and "ninat:audio".

Provider

Can be represented by a QCode, or a URI. If the value of this property is NOT taken from a controlled vocabulary, the @qcode or @uri will be omitted and the child <name> element used to give a human-readable value for the property. The IPTC recommends using a QCode with the Provider NewsCodes, a controlled vocabulary of providers registered with the IPTC with recommended Scheme Alias of "nprov":

<provider qcode="nprov:REUTERS" />
Version Created

This contains the date, time and time zone (or UTC) that this version of the NewsML-G2 Item was created. The value must be expressed as XML Schema datetime:
YYYY-MM-DDThh:mm:ss±hh:mm

<versionCreated>2017-10-21T16:25:32-05:00</versionCreated>

The -05:00 denotes U.S. Eastern Standard Time offset from UTC

Publication Status

Every NewsML-G2 Item must have a publication status. The value defaults to "usable", which permits the <pubStatus> property to be omitted, however it is recommended that a value is explicitly included:

<pubStatus qcode="stat:usable" />

Publication status is highly likely to be used by most news agencies, because the ability to explicitly signal the status of news is essential. The use of the IPTC Publishing Status NewsCodes is mandatory. Its recommended alias is "stat". Other values permitted by the scheme are:

  • stat:canceled (note U.S. spelling) means that the content of the newsItem must not be used, ever.

  • stat:withheld means the content must not be used until further notice.

When an Item is cancelled, it must never be used, but an Item that has been withheld may subsequently have its status updated to "usable".

7.5.2. Use of URIs in place of QCodes

The original NewsML-G2 developers wanted to use URIs as the preferred way to identify concepts, as this would enable the resulting concept identifiers to be globally unique and optionally to be a reference to a web resource. The constraints on network bandwidth (compared to today) led the developers to propose QCodes to represent URIs, because:

  • they are lightweight and economise file sizes,

  • although they are designed to reference web resources, the delivery of data in response to requesting QCodes that are resolved to full URIs is optional; the codes may be used "as is" without de-referencing to the full URI.

Later, some providers asked for flexibility to use full URIs in property values, so from NewsML-G2 v2.11 this feature was added enabling @uri to be used in place of, or in parallel with, @qcode. (where both are used, @qcode takes precedence). The <pubStatus> assertion:

<pubStatus qcode="stat:usable" />

could be expressed using a URI as:

<pubStatus uri="http://cv.iptc.org/newscodes/pubstatusg2/usable" />

Further, in NewsML-G2 v2.18 onwards, other properties with QCode Type values had "URI siblings" added, for example properties with @role may now have a value expressed as @roleuri. In NewsML-G2 v2.20, properties with a mandatory @qcode were changed so that @uri may be used instead.

7.5.3. Optional Properties

The following optional properties are frequently used by NewsML-G2 providers.

First Created

The <firstCreated> element indicates when the first version of the Item (not the content) was created.

<firstCreated>2016-10-18T13:12:21-05:00</firstCreated>
Embargoed

Business-to-business news organisations often use an embargo to release information in advance, on the strict understanding that it may not be released into the public domain until after the embargo time has expired, or until some other form of permission has been given.

Embargoed is NOT the same as the Publishing Status; embargoed content should have a publishing status of "usable".

<embargoed>2017-10-23T12:00:00Z</embargoed>

It is not required to give any further information about the embargo conditions, but some providers may provide a natural language <edNote>, see below.

Service

The <service> element allows the provider to declare which of its services delivered this package, using a Controlled Vocabulary:

<service qcode="svc:uknews">
    <name>UK News Service</name>
</service>
Editorial Note

The <edNote> element contains a note that is intended to be read by internal staff at the receiving organisation, but not published to the end-user; in this example it conveys some optional information about the release condition imposed by the <embargoed> element:

<edNote>
 Note to editors: STRICTLY EMBARGOED. Not for public release until 12noon
 on Sunday, October 23, 2017.
</edNote>
Signal

Additional processing instructions can be given using <signal> and its @qcode. This example uses the IPTC Signal NewsCodes (recommended Scheme Alias "sig") that advises the end-user that this Item updates a previous versions of the Item:

<signal qcode="sig:update" />

The other value in the Signal scheme is "correction". There is further NewsML-G2 functionality for expressing fine-grained information about the reason and impact of updates and also for applying signals to different parts of an Item’s content.

The <link> element has two basic purposes:

  • To assert relationships to other Items, such as a previous version of an Item

  • To create a navigable link from an Item to some supporting or additional resource.

This example provides a "see also" link to a resource on the Web that end-users can view to get further information about the event. @rel is used to denote the reason that the link is provided. In this example, the QCode uses the recommended IPTC Item Relation NewsCodes with a recommended Scheme Alias of "irel" and the code value is "seeAlso":

<link rel="irel:seeAlso" href="http://www.example.com/video/20081222-PNN-1517-407624/index.html"/>
Completed Item Metadata
<itemMeta>
    <itemClass qcode="ninat:text" />
    <provider qcode="nprov:REUTERS" />
    <versionCreated>2017-10-21T16:25:32-05:00</versionCreated>
     <firstCreated>2016-10-18T13:12:21-05:00</firstCreated>
     <embargoed>2017-10-23T12:00:00Z</embargoed>
    <pubStatus qcode="stat:usable" />
    <service qcode="svc:uknews">
        <name>UK News Service</name>
    </service>
    <edNote>
        Note to editors: STRICTLY EMBARGOED. Not for public release until 12noon
        on Sunday, October 23, 2017.
    </edNote>
    <signal qcode="sig:update" />
    <link rel="irel:seeAlso" href="http://www.example.com/video/20081222-PNN-1517-407624/index.html"/>
</itemMeta>

7.6. Content Metadata <contentMeta>

Conceptually, there are two kinds of content metadata: Administrative and Descriptive.

7.6.1. Administrative Metadata

This is information about the content that cannot necessarily be deduced by examining it, for example: when it was created and/or modified. Administrative properties that are widely used by implementers of NewsML-G2 for editorial content are:

  • Timestamps (Created, Modified)

  • Story location (Located)

  • Creator

  • Information Source

Timestamps

The <contentCreated> timestamp corresponds to a "Created on" field of the story. It is expressed in NewsML-G2 as Truncated DateTime data type, meaning that the date-time elements may optionally be stripped, starting from the right. If required, the <contentModified> property may also be used to contain the "Last Edit" timestamp. This must be later than the Created timestamp.

<contentCreated>2016-10-21T15:21:06-05:00</contentCreated>
<contentModified>2017-10-21T16:22:45-05:00</contentModified>
Located

The place that the content was created uses the <located> element. Note that this is not necessarily the place of the event or subject. For example, for a UK story written in the London office, <located> would be "London"; a picture of Mount Fuji taken from downtown Tokyo would have a <located> value of "Tokyo".

The semantics of <located> are similar to the natural-language location carried in the dateline that often prefaces news (such as "BERLIN, October 24") but can be conveyed more precisely, and in terms that may be more readily processed by software, using a @qcode or @uri:

<located type="cptype:city" qcode="geo:345678">
    <name>Berlin</name>
</located>

The optional @type uses a controlled vocabulary to indicate the nature of the location being expressed; in the example this indicates that <located> refers to a city.

Broader, Narrower

Both Located and Subject (below) contain child elements that express a specific relationship between entities or concepts. For example, the content originated in the city of Berlin and the <located> element shows that the city of Berlin has a "broader" relationship – that is a child-to-parent relationship – to Berlin the state, and to Germany the country:

    <located type="cptype:city" qcode="geo:345678">
        <name>Berlin</name>
            <broader type="cptype:statprov" qcode="prov:2365">
                <name>Berlin</name>
            </broader>
            <broader type="cptype:country" qcode="iso3166-1a2:DE">
                <name>Germany</name>
            </broader>
    </located>
Creator

The writer, photographer or other author of content is expressed using the <creator> element:

<creator uri="http://www.example.com/staff/mjameson">
    <name>Meredith Jameson</name>
</creator>
Information Source

The <infoSource> element, together with its optional @role, enables finely-grained identification of the various parties who provided information used to create and develop an item of news. If absent, the default value of @role is the originator of the information used to create or enhance the content.

<infoSource uri="http://www.example.com/pressreleases/201610/newproducts.html"/>

7.6.2. Descriptive Metadata

These properties set the context of news content in relation to other news by describing and classifying it. Information that has historically been carried within the content itself, such as the headline and by-line (for text) or embedded metadata (for pictures) may also be specified as metadata. The practical benefit is that the end user no longer needs to scan or retrieve the actual content in order to process it. None of the Descriptive Metadata elements are mandatory, but the following feature frequently in NewsML-G2 implementations.

Subject

The subject matter of content uses the <subject> element. When the value of the Subject is taken from a Controlled Vocabulary, this is identified using either a @qcode or @uri:

<subject type="cpnat:abstract" qcode="medtop:20000523" />
<subject uri= _"http://cv.iptc.org/newscodes/mediatopic/20000523"_ />

For concepts not taken from a CV, the identifier is omitted and the name of the concept is given in the child <name> element, for example:

<subject>
    <name>Labour Market</name>
</subject>

The optional @type uses the IPTC "nature of the concept" NewsCodes (recommended scheme alias "cpnat") to indicate the type of concept being expressed, for example, an abstract concept, that is a concept that does not represent a real-world entity, but something like an idea, or news category.

<subject type="cpnat:abstract" qcode="medtop:04000000">
    <name xml:lang="en-GB">economy, business and finance</name>
</subject>

The above example uses a concept from the IPTC Media Topic NewsCodes. Also note the use of the W3C XML attribute xml:lang that expresses the language used for the element’s value. It is also possible to add relationships to related concepts, as shown above in <located>. For example:

<subject type="cpnat:abstract" qcode="medtop:04000000">
    <name xml:lang="en-GB">economy, business and finance</name>
</subject>
<subject type="cpnat:abstract" qcode="medtop:20000523">
    <name xml:lang="en-GB">labour market</name>
    <name xml:lang="de">Arbeitsmarkt</name>
    <broader qcode="medtop:04000000" /> relationship property
</subject>

The IPTC highly recommends that providers use the Media Topic NewsCodes unless there is an over-riding requirement to use proprietary codes. This promotes inter-operability and standardisation in news exchange. It is also recommended that if a <name> is used in conjunction with a QCode, then the value of the <name> agrees with the value in the Scheme for the language specified in xml:lang. IPTC NewsCodes are provided in UK English (en-GB) and translations in French and German have been provided by IPTC members.

The "medtop" code prefix is the recommended Scheme Alias for the Codes, which is resolved via the IPTC Catalog (see Catalog wrapper <catalogRef>, above) to the Scheme URI http://cv.iptc.org/newscodes/mediatopic/ This is guaranteed to be globally unique because it is part of the Internet Domain controlled by the IPTC. Appending the code "04000000" to the Scheme URI forms the Concept URI http://cv.iptc.org/newscodes/mediatopic/04000000 that cannot be confused with a concept with the same code 04000000 from another source.

When using @type to indicate the nature of the concept, the possible values from the IPTC Concept Nature NewsCodes are:

  • Abstract: a concept that does not represent a real-world entity

  • Event

  • geoArea: a geo-political area

  • Object: A real-world object, such as a painting; an aircraft

  • Organisation

  • Person

  • Point of Interest

Genre

The <genre> element indicates the style of the content, in this example "interview" as a property that is distinct from <subject> that is used to indicate the subject matter of content. In the example, an IPTC Genre NewsCodes value of "Interview" is used:

<genre qcode="genre:interview">
    <name xml:lang="en-GB">Interview</name>
</genre>
Slugline

Some news services implemented in NewsML-G2 retain the <slugline> property as a human-readable index for legacy reasons; therefore receivers may sometimes see this property. However, it has never been a completely reliable identifier, and it is recommended that more purposeful identifiers that are also machine-readable are implemented in its place.

<slugline>US-Finance-Fed</slugline>
Headline

Even if the Headline is carried inline in text content, it is useful also to place it explicitly in metadata so that it can more easily be identified and extracted by the end-user:

<headline> Fed to halt QE to avert "bubble"</headline>
Completed Content Metadata
<contentMeta>
    <contentCreated>2016-10-21T15:21:06-05:00</contentCreated>
    <contentModified>2017-10-21T16:22:45-05:00</contentModified>
    <located type="cptype:city" qcode="geo:345678">
        <name>Berlin</name>
        <broader type="cptype:statprov" qcode="prov:2365">
            <name>Berlin</name>
        </broader>
        <broader type="cptype:country" qcode="iso3166-1a2:DE">
            <name>Germany</name>
        </broader>
    </located>
    <creator uri="http://www.example.com/staff/mjameson">
        <name>Meredith Jameson</name>
    </creator>
    <infoSource uri="http://www.example.com" />
    <subject type="cpnat:abstract" qcode="medtop:04000000">
        <name xml:lang="en-GB">economy, business and finance</name>
    </subject>
    <subject type="cpnat:abstract" qcode="medtop:20000523">
        <name xml:lang="en-GB">labour market</name>
        <name xml:lang="de">Arbeitsmarkt</name>
        <broader qcode="medtop:04000000" />
    </subject>
    <genre qcode="genre:interview">
        <name xml:lang="en-GB">Interview</name>
    </genre>
    <slugline>US-Finance-Fed</slugline>
    <headline> Fed to halt QE to avert "bubble"</headline>
</contentMeta>

7.7. Content <contentSet>

The content of a NewsML-G2 document varies according to its Item type: the example below shows a News Item with a <contentSet> wrapper containing a trivial text payload. Following this code example are skeletal examples showing the other options for conveying content in NewsML-G2 News Items and Package Items:

7.7.1. News Content options for News Items and Package Items

The News Item <contentSet> contains a single logical piece of content, but allows alternative renditions of the SAME content to be carried in a single NewsML-G2 Item:

<contentSet>
    <inlineXML contenttype="application/nitf+xml"><!-- A VALID MEDIA TYPE -->
        <!-- Inline XML must contain well-formed XML such as NITF or XHTML -->
    </inlineXML>
</contentSet>

or

<contentSet>
    <inlineData>
        <!-- Inline Data contains plain text -->
    </inlineData>
</contentSet>

or

<contentSet>
    <!-- Remote Content contains a reference to a binary asset such as -->
    <!—-a PDF or image file -->
    <remoteContent rendition="rnd:one" />
    <remoteContent rendition="rnd:two" />
    <remoteContent rendition="rnd:three" />
</contentSet>

Package Item content is wrapped by the <groupSet> element, which can contain one or more <group> children. Each Group contains references to Items that make up the News Package, using the <itemRef> element. A Group can also reference other Groups via the <groupRef> element.

<groupSet ... >
    <group ...>
        <itemRef ... >
        <!-- Properties extracted from the packaged Item -->
        </itemRef>
        <groupRef />
        </group>
    <group ... >
        //...
    </group>
</groupSet>

7.8. Summary and Next Steps

This section has covered the basic structure that is common to all NewsML-G2 Items, and also outlined properties that are commonly used for news content. Further Quick Start Guides show how to build upon this foundation:

Quick Start – Text takes an example news story and shows how the information on an editor’s screen would be implemented in NewsML-G2.

Quick Start – Pictures takes an example image and its embedded metadata and converts this to a NewsML-G2 properties with several image renditions carried in a single NewsML-G2 Item. The guide also shows how to express the various technical characteristics of images using NewsML-G2 properties.

Quick Start – Video is split into two sections: the first covers a simple case of a standalone video file with various technical renditions expressed in NewsML-G2; the second uses a more comprehensive structure that separates the metadata for multiple segments of a video, using the <partMeta> wrapper.

Quick Start – Packages shows how NewsML-G2 Items and other kinds of content can be assembled into Packages of managed objects with an explicit structure.

7.9. Hidden Values of NewsML-G2

There are some default values set by the specification which allow an element or attribute to be omitted and the default assumed. The list below shows NewsML-G2 elements and attributes which optionally appear in an Item but for which a usable value or status exists.

7.9.1. All NewsML-G2 items

  • @version of the root element = "1"

  • @conformance of the root element = "core"

  • <embargoed> = no embargo

  • <pubStatus> = "usable"

  • <catalog> and <catalogRef> = one is required; many of either, or a mix of both may be used

  • @scope of <hash> = "content". Hash value is a message digest included in the Item for security purposes. A hash scope of "content" indicates that the hash value was derived by hashing some/all of the content only.

  • @why (attribute of many elements) = "direct". The attribute value indicates that the value is directly related to the content.

  • @how (attribute of many elements) = "person". The attribute indicates how the value was extracted from the content: by a person. (See Why and How metadata has been added: @why and @how for essential guidance on the use of @why and @how.)

  • @custom (attribute of many elements) = "false". The attribute indicates that the property was added specifically for a customer or group of customers

  • @dir (many elements) = "ltr". The directionality of the script of the language of the property is left to right.

7.9.2. News Items only

  • @timeunit (when @duration is used) = "seconds"

  • @dimensionunit (when @width/@height is used) = (for example) pixels for a still picture. See the Quick Start Guide – Pictures for details.

  • @encoding of <inlineData> = "base64". This is the default binary encoding for Inline Data.

7.9.3. Package Item only

  • @mode of the <group> = "bag" – an unordered collection of complementary components

8. Quick Start: Text

8.1. Introduction

One of the most fundamental needs of a news organisation is to handle text. This chapter covers the basics of a simple NewsML-G2 News Item containing text content.

We recommend reading the Quick Start Guide to NewsML-G2 Basics
before this Quick Start Guide to Text.

8.2. Example

Below is an example story and supporting information as might be displayed on the journalist’s editing screen at a fictional news provider, Acme News and Media (ANM):

Acme News and Media - Content Editing System

Slugline

US-Finance-Fed

Created on

2016-11-21 15:21:06

Source

ANM

Author

mjameson

Latest edit

2017-11-21 16:22:45

Latest editor

moiras

Categories

economy, finance, business, central bank, monetary policy

Headline

Fed to halt QE to avert "bubble"

Byline

By Meredith Jameson

(Location) Date

(Washington) 21/11/2017

Body Text

Et, sent luptat luptat, commy nim zzriureet vendreetue modo dolenis ex euisis nosto et lan ullandit lum doloreet vulla feugiam coreet, cons eleniam il ute facin veril et aliquis ad minis et lor sum del iriure dit la feugiamcommy nostrud min ullapat velisl duisismodip ero dipit nit utpatum sandrer cipisim nit lortis augiat nulla faccum at am, quam velenis nulput la auguerostrud magna commolore eliquatie exerate facilis modiamconsed dion henisse quipit at..

This screen contains nearly all of the information needed to create the NewsML-G2 document below:

LISTING 2: NewsML-G2 Text Document

(All Scheme Aliases used in listing below indicate IPTC NewsCodes vocabularies, except for the following alias values: geoloc, is)

<?xml version="1.0" encoding="UTF-8" ?>
<newsItem
    xmlns="http://iptc.org/std/nar/2006-10-01/"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://iptc.org/std/nar/2006-10-01/
        ./NewsML-G2_2.25-spec-All-Core.xsd"
    guid="urn:newsml:acmenews.com:20161121:US-FINANCE-FED"
    version="10"
    standard="NewsML-G2"
    standardversion="2.25"
    xml:lang="en-US">
    <catalogRef
        href="http://www.iptc.org/std/catalog/catalog.IPTC-G2-Standards_30.xml" />
    <catalogRef
        href="http://catalog.acmenews.com/news/ANM_G2_CODES_2.xml" />
    <rightsInfo>
        <copyrightHolder uri="http://www.acmenews.com/about.html#copyright">
            <name>Acme News and Media LLC</name>
        </copyrightHolder>
        <copyrightNotice>Copyright 2016-17 Acme News and Media LLC</copyrightNotice>
    </rightsInfo>
    <itemMeta>
        <itemClass qcode="ninat:text" />
        <provider uri="http://www.acmenews.com/about/" />
        <versionCreated>2017-11-21T16:25:32-05:00</versionCreated>
        <pubStatus qcode="stat:usable" />
    </itemMeta>
    <contentMeta>
        <contentCreated>2016-11-21T15:21:06-05:00</contentCreated>
        <contentModified>2017-11-21T16:22:45-05:00</contentModified>
        <located qcode="geoloc:NYC">
            <name>New York, NY</name>
        </located>
        <creator uri="http://www.acmenews.com/staff/mjameson">
            <name>Meredith Jameson</name>
        </creator>
        <infoSource qcode="is:AP">
            <name>Associated Press</name>
        </infoSource>
        <language tag="en-US" />
        <subject qcode="medtop:04000000">
            <name>economy, business and finance</name>
        </subject>
        <subject qcode="medtop:20000350">
            <name>central bank</name>
        </subject>
        <subject qcode="medtop:20000379">
            <name>money and monetary policy</name>
        </subject>
        <slugline>US-Finance-Fed</slugline>
        <headline> Fed to halt QE to avert "bubble"</headline>
    </contentMeta>
    <contentSet>
        <inlineXML contenttype="application/nitf+xml">
            <nitf xmlns="http://iptc.org/std/NITF/2006-10-18/">
                <body>
                   <body.head>
                       <hedline>
                           <hl1>Fed to halt QE to avert "bubble"</hl1>
                       </hedline>
                       <byline>By Meredith Jameson, <byttl>Staff Reporter</byttl></byline>
                   </body.head>
                   <body.content>
                       <p>(New York, NY - October 21) Et, sent luptat luptat, commy
                           Nim zzriureet vendreetue modo
                           dolenis ex euisis nosto et lan ullandit lum doloreet vulla
                           feugiam coreet, cons eleniam il ute facin veril et aliquis ad
                           minis et lor sum del iriure dit la feugiamcommy nostrud min ulla
                           autpat velisl duisismodip ero dipit nit utpatum sandrer cipisim
                           nit lortis augiat nulla faccum at am, quam velenis nulput la
                           auguerostrud magna commolore eliquatie exerate facilis
                           modiamconsed dion henisse quipit at. Ut la feu facilla feu
                           faccumsan ecte modoloreet ad ex el utat.
                       </p>
                       <p>Ugiating ea feugait utat, venim velent nim quis nulluptat num
                           Volorem inci enim dolobor eetuer sendre ercin utpatio dolorpercing
                           Et accum nullan voluptat wisis alit dolessim zzrilla commy nonulpu
                           tpatinis exer sequatueros adit verit am nonse exerili quismodion
                           esto cons dolutpat, si.
                       </p>
                   </body.content>
                </body>
            </nitf>
        </inlineXML>
    </contentSet>
</newsItem>

8.3. Document structure

The building blocks of the text document shown above are the <newsItem> root element, with additional wrapping elements for metadata about the News Item (itemMeta), metadata about the content (contentMeta) and the content itself (contentSet). The top level (root) element <newsItem> attributes are:

<newsItem xmlns="http://iptc.org/std/nar/2006-10-01/"
    guid="urn:newsml:acmenews.com:20161121:US-FINANCE-FED"
    version="10"
    standard="NewsML-G2"
    standardversion="2.25"
    conformance="power"
    xml:lang="en-US">

This is followed by references to the Catalogs used to resolve QCodes in the Item, and Rights information:

<catalogRef
    href="http://www.iptc.org/std/catalog/catalog.IPTC-G2-Standards_30.xml"
    />
<catalogRef
    href="http://catalog.acmenews.com/news/ANM_G2_CODES_2.xml" />
<rightsInfo>
    <copyrightHolder uri="http://www.acmenews.com/about.html#copyright">
        <name>Acme News and Media LLC</name>
    </copyrightHolder>
    <copyrightNotice>Copyright 2016-17 Acme News and Media LLC</copyrightNotice>
</rightsInfo>

8.3.1. Item Metadata <itemMeta>

Note the three mandatory child elements of the mandatory<itemMeta>:

  • Item Class

  • Provider

  • Version Created

A publication status is also mandatory, but the <pubStatus> element may be omitted, in which case the publication status is "usable". However, it is recommended that the publication status is explicitly given, as in this example. As Acme News & Media is fictional, the Provider property does not use one of the IPTC Provider NewsCodes, and is expressed by a URI:

<itemMeta>
    <itemClass qcode="ninat:text" />
    <provider uri="http://www.acmenews.com/about.html" />
    <versionCreated>2017-11-21T16:25:32-05:00</versionCreated>
    <pubStatus qcode="stat:usable" />
</itemMeta>

8.3.2. Content Metadata <contentMeta>

Administrative Metadata

The administrative properties of the example text story are:

 <contentCreated>2016-11-21T15:21:06-05:00</contentCreated>
 <contentModified>2017-11-21T16:22:45-05:00</contentModified>

The place that the content was created uses the <located> element:

<located qcode="geoloc:NYC">
    <name>New York, NY</name>
</located>

(Note that this is where the story was written, not the place where the subject of the story took place. That would be expressed using <subject>, part of Descriptive Metadata.)

The author of the article is expressed using the <creator> element:

<creator uri="http://www.acmenews.com/staff/mjameson">
    <name>Meredith Jameson</name>
</creator>

The Information Source for the article is also given. When used without a @role, <infoSource> is used to denote the person or party that provided the original information on which the content is based. This is the relationship to be expressed here:

<infoSource qcode="is:AP">
    <name>Associated Press</name>
</infoSource>

The default language for the content is given as U.S. English:

<language tag="en-US" />
Descriptive Metadata

In the example, the Subject properties use QCodes from the Controlled Vocabulary of Media Topics NewsCodes that are owned and maintained by the IPTC and expressed as QCodes. Thus:

<subject qcode="medtop:04000000">
    <name>economy, business and finance</name>
</subject>
<subject qcode="medtop:20000350">
    <name>central bank</name>
</subject>
<subject qcode="medtop:20000379">
    <name>money and monetary policy</name>
</subject>

The <slugline> property contains the value of the "Slugline" field of the story:

<slugline>US-Finance-Fed</slugline>

In a similar fashion, the <headline> property will contain the value of the "Headline" field:

<headline>Fed to halt QE to avert "bubble"</headline>
Complete Content Metadata
<contentMeta>
    <contentCreated>2016-11-21T15:21:06-05:00</contentCreated>
    <contentModified>2017-11-21T16:22:45-05:00</contentModified>
    <located qcode="geoloc:NYC">
        <name>New York, NY</name>
    </located>
    <creator uri="http://www.acmenews.com/staff/mjameson">
        <name>Meredith Jameson</name>
    </creator>
    <infoSource qcode="is:AP">
        <name>Associated Press</name>
    </infoSource>
    <language tag="en-US" />
    <subject qcode="medtop:04000000">
        <name>economy, business and finance</name>
    </subject>
    <subject qcode="medtop:20000350">
        <name>central bank</name>
    </subject>
    <subject qcode="medtop:20000379">
        <name>money and monetary policy</name>
    </subject>
    <slugline>US-Finance-Fed</slugline>
    <headline>Fed to halt QE to avert "bubble"</headline>
</contentMeta>

8.4. Text content choices

8.4.1. Inline XML

The content of the NewsML-G2 document is enclosed by the <contentSet> wrapper. In the example, IPTC’s news text mark-up language NITF (News Industry Text Format) is used to format the text content. As an XML standard, it is contained in an <inlineXML> child element of <contentSet>, and uses @contenttype to denote the XML-based standard, using the IANA Media Type.

HTML5 and XHTML are also a popular text mark-up choices among NewsML-G2 providers. As alternatives, the contents of <inlineXML> may be any XML language that can express generic or specialised news information, including SportsML-G2 and rNews. Other languages such as XBRL (Extended Business Reporting Language) may also be used. The content inside <inlineXML> must be valid XML, in other words, it could stand alone as a valid XML document in its own namespace.

<contentSet>
    <inlineXML contenttype="application/nitf+xml">
        <nitf xmlns="http://iptc.org/std/NITF/2006-10-18/">

        <!--STORY CONTENT HERE -->

        </nitf>
    </inlineXML>
</contentSet>

8.4.2. Inline data

The <inlineData> wrapper element holds plain-text or base64 encoded content. Plain text or CDATA content MUST be identified by the "text/plain" content type. Binary content, like images, audio clips or even PDF or Word documents may be exchanged after proper encoding, but it is strongly recommended to use this structure for small assets only. The encoding algorithm MAY be indicated using the encoding attribute. The following example uses plain text:

<contentSet>
    <inlineData contenttype="text/plain">__

        Et, sent luptat luptat ...

    </inlineData>
</contentSet>

9. Quick Start: Pictures and Graphics

9.1. Introduction

Image content, including pictures and graphics, can be conveyed in a standard NewsML-G2 document. Picture providers and consumers need a rich vocabulary for descriptive and technical metadata, and for administrative metadata such as rights and usage terms. There is also a long-established use of embedded metadata, such as the IPTC/IIM Fields in JPEG and TIFF files. This Quick Start guide addresses some aspects of embedded metadata, but for a full description of the mapping embedded metadata and IIM fields to NewsML-G2, this is described in detail in the Guidelines section Mapping Embedded Photo Metadata to NewsML-G2.

The example in this Quick Start guide is a simple but complete document that shows how to implement in NewsML-G2 the frequently-used needs of a professional picture workflow:

  • Right and usage instructions

  • Descriptive and administrative properties such as Location and Categorisation

  • Separate technical renditions of a picture

We recommend reading the Quick Start Guide to NewsML-G2 Basics
before this Quick Start Guide.

The picture and the metadata used in the example are courtesy of Getty Images. Note that the sample code is NOT intended to be a guide to receiving NewsML-G2 from Getty Images.

9.2. About the example

image

A library picture is provided to customers in three sizes: a large image intended for high resolution and/or large size display, a medium-sized image intended for web use, and a small image for use as a thumbnail or icon. These are three alternative renditions of the same picture and can be contained in a single NewsML-G2 document.

Listing 3: Photo in NewsML-G2

(All Scheme Aliases used in the listing below indicate IPTC NewsCodes vocabularies, except for the following alias values: crol, crel, gyibt, ctrol, gyimeid, gycon, gyiid)

<newsItem
    xmlns="http://iptc.org/std/nar/2006-10-01/"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://iptc.org/std/nar/2006-10-01/
        ./NewsML-G2_2.25-spec-All-Power.xsd"
    guid="tag:gettyimages.com,2010:GYI0062134533"
    version="10"
    standard="NewsML-G2"
    standardversion="2.25"
    conformance="power"
    xml:lang="en-US">
    <catalogRef
        href="http://www.iptc.org/std/catalog/catalog.IPTC-G2-Standards_30.xml" />
    <catalogRef href="http://cv.gettyimages.com/nml2catalog4customers-1.xml" />
    <rightsInfo>
        <copyrightHolder uri="http://www.gettyimages.com">
            <name>Getty Images North America</name>
        </copyrightHolder>
        <copyrightNotice
            href="http://www.gettyimages.com/Corporate/LicenseInfo.aspx">
                Copyright 2010 Getty Images. --
                http://www.gettyimages.com/Corporate/LicenseInfo.aspx
        </copyrightNotice>
        <usageTerms>Contact your local office for all commercial or
            promotional uses. Full editorial rights UK, US, Ireland, Canada (not
            Quebec). Restricted editorial rights for daily newspapers elsewhere,
            please call.</usageTerms>
    </rightsInfo>
    <itemMeta>
        <itemClass qcode="ninat:picture"/>
        <provider qcode="nprov:GYI" >
            <name>Getty Images Inc.</name>
        </provider>
        <versionCreated>2017-10-12T06:42:04Z </versionCreated>
        <firstCreated>2010-10-20T20:58:00Z</firstCreated>
    </itemMeta>
    <contentMeta>
        <contentCreated>2010-10-20T19:45:58-04:00</contentCreated>
        <creator role="crol:photographer">
            <name>Spencer Platt</name>
            <related rel="crel:isA" qcode="gyibt:staff" />
        </creator>
        <contributor role="ctrol:descrWriter">
            <name>sp/lrc</name>
        </contributor>
        <creditline>Getty Images</creditline>
        <subject type="cpnat:event" qcode="gyimeid:104530187" />
        <subject type="cpnat:abstract" qcode="medtop:20000523">
            <name xml:lang="en-GB">labour market</name>
            <name xml:lang="de">Arbeitsmarkt</name>
        </subject>
        <subject type="cpnat:abstract" qcode="medtop:20000533">
            <name xml:lang="en-GB">unemployment</name>
            <name xml:lang="de">Arbeitslosigkeit</name>
        </subject>
        <subject type="cpnat:geoArea">
            <name>Las Vegas Boulevard</name>
        </subject>
        <subject type="cpnat:geoArea" qcode="gycon:89109">
            <name>Las Vegas</name>
            <broader qcode="iso3166-1a2:US-NV">
                <name>Nevada</name>
            </broader>
            <broader qcode="iso3166-1a3:USA">
                <name>United States</name>
            </broader>
        </subject>
        <keyword>business</keyword>
        <keyword>economic</keyword>
        <keyword>economy</keyword>
        <keyword>finance</keyword>
        <keyword>poor</keyword>
        <keyword>poverty</keyword>
        <keyword>gamble</keyword>
        <headline>Variety Of Recessionary Forces Leave Las Vegas
            Economy Scarred</headline>
        <description role="drol:caption">A general view of part of downtown,
            including Las Vegas Boulevard, on October 20, 2010 in Las Vegas,
            Nevada. Nevada once had among the lowest unemployment rates in the
            United States at 3.8 percent but has since fallen on difficult times.
            Las Vegas, the gaming capital of America, has been especially hard
            hit with unemployment currently at 14.7 percent and the highest
            foreclosure rate in the nation. Among the sparkling hotels and
            casinos downtown are dozens of dormant construction projects and
            hotels offering rock bottom rates. As the rest of the country slowly
            begins to see some economic progress, Las Vegas is still in the midst
            of the economic downturn. (Photo by Spencer Platt/Getty Images)
        </description>
    </contentMeta>
    <contentSet>
        <remoteContent rendition="rnd:highRes"
            href="./GYI0062134533.jpg" version="1"
            size="346071" contenttype="image/jpeg" width="1500"
            height="1001" colourspace="colsp:AdobeRGB" orientation="1"
            layoutorientation="loutorient:horizontal">
            <altId type="gyiid:masterID">105864332</altId>
        </remoteContent>
        <remoteContent rendition="rnd:web"
            href="file:///./ GYI0062134533-web.jpg" version="1"
            size="28972" contenttype="image/jpeg" width="480"
            height="320" colourspace="colsp:AdobeRGB" orientation="1"
            layoutorientation="loutorient:horizontal"/>
        <remoteContent rendition="rnd:thumb"
            href="file:///./GYI0062134533-thumb.gif" version="1"
            size="6381" contenttype="image/gif" width="80"
            height="53" colourspace="colsp:AdobeRGB" orientation="1"
            layoutorientation="loutorient:horizontal"/>
    </contentSet>
</newsItem>

9.3. Document structure

The building blocks of the NewsML-G2 document are the <newsItem> root element, with additional wrapping elements for metadata about the News Item (itemMeta), metadata about the content (contentMeta) and the content itself (contentSet).

The root <newsItem> attributes are:

<newsItem xmlns="http://iptc.org/std/nar/2006-10-01/__
    guid="tag:gettyimages.com.2010:GYI0062134533"
    version="10"
    standard="NewsML-G2"
    standardversion="2.25"
    conformance="power"
    xml:lang="en-US">

Note that this example uses a Tag URI (see TAG URI home page for details)

This is followed by references to the Catalogs used to resolve QCodes in the Item, and Rights information:

<catalogRef
    href="http://www.iptc.org/std/catalog/catalog.IPTC-G2-Standards_30.xml"
/>
<catalogRef
    href="http://cv.gettyimages.com/nml2catalog4customers-1.xml" />
<rightsInfo>
    <copyrightHolder uri="http://www.gettyimages.com">
        <name>Getty Images North America</name>
    </copyrightHolder>
    <copyrightNotice
        href="http://www.gettyimages.com/Corporate/LicenseInfo.aspx">
        Copyright 2010 Getty Images. -- http://www.gettyimages.com/Corporate/LicenseInfo.aspx
    </copyrightNotice>
    <usageTerms>Contact your local office for all commercial or promotional uses. Full editorial rights UK, US, Ireland, Canada (not Quebec). Restricted editorial rights for daily newspapers elsewhere, please call.</usageTerms>
</rightsInfo>

9.3.1. Source

Note that the IIM "Source" field maps to the NewsML-G2 <copyrightHolder> element of the <rightsInfo> block.

9.4. Item Metadata <itemMeta>

<itemMeta>
    <itemClass qcode="ninat:picture">
    <provider qcode="nprov:GYI">
        <name>Getty Images Inc.</name>
    </provider>
    <versionCreated>2017-11-12T06:42:04Z</versionCreated>
    <firstCreated>2010-10-20T20:58:00Z</firstCreated>
</itemMeta>

The <itemClass> property uses a QCode from the IPTC News Item Nature NewsCodes to denote that the Item conveys a picture.

The Z suffix denotes UTC. Note the <firstCreated> property refers to the creation of the Item, NOT the content.

9.5. Embedded metadata

For many years IPTC metadata fields have been embedded in JPEG or TIFF image files. From 1995 on the IPTC Information Interchange Model (IIM) defined the semantics of the fields and the technical format for saving them in image files. In 2003 Adobe introduced a new format for saving metadata, namely XMP (Extended Metadata Platform), and many IPTC IIM fields were specified as the "IPTC Core" metadata schema. This defined identical semantics but opened the formats for saving to IIM and XMP in parallel. Later the "IPTC Extension" metadata schema was added; the defined fields are stored by XMP only. Thus, many people work with IPTC photo metadata, regardless how they are saved in the files; this is handled by the software they use.

The transfer of IPTC Photo Metadata fields to NewsML-G2 properties has a focus on the equivalence of the semantics of fields. The retrieval of the embedded values from the files is a secondary issue and documents like the Guidelines for Handling Image Metadata, produced by the Metadata Working Group (http://www.metadataworkinggroup.org/specs/) help in this area.

This Quick Start guide will provide the basics of this mapping, for more details see Mapping Embedded Photo Metadata to NewsML-G2. You can also learn more from the IPTC web by visiting https://www.iptc.org/standards/ and following the link to Photo Metadata.

The screen shot on the following page shows the panel for the IPTC Core fields as displayed by Adobe’s Photoshop CS File Info screen; note the IPTC Extension tab that displays the additional IPTC Extension metadata.

Getty-fileinfo copy.png

Figure: IPTC Core Metadata fields in the File Info panel of Adobe Photoshop

There are advantages, in a professional workflow, to carrying metadata independently of the binary asset:

  • There is no need to retrieve and open the file to read essential information about the picture

  • An editor may not have access to the original picture to modify its metadata

  • A library picture used to illustrate a news event may have inappropriate embedded metadata.

A situation may arise where the metadata expressed in the NewsML-G2 Item and the embedded metadata in the photo are different. Some providers choose to strip all embedded metadata from objects, to avoid potential confusion. If not, a provider should specify any processing rules in its terms of use.

The IPTC recommends that descriptive metadata properties that exist in the NewsML-G2 Item (in Content Metadata) ALWAYS take precedence over the equivalent embedded metadata (if it exists). These properties include genre, subject, headline, description and creditline.

9.6. Content Metadata <contentMeta>

This example shows how embedded metadata from the example picture are translated into NewsML-G2, and includes the equivalent IPTC Core metadata schema property highlighted thus:

IPTC Core Schema equivalent: name

9.6.1. Administrative metadata

Timestamp

The <contentCreated> element is used to give the creation date of the picture:

<contentCreated>2010-10-20T19:45:58-04:00</contentCreated>

Note that this value refers to the creation of the original content; for a scanned picture this is always the date (and optionally the time) of the original photograph. The property type is Truncated Date Time, so that when the precise date-time is unknown, for example for an historic photograph, the value can be truncated (from the right) to a simple date or just a year.

IPTC Core Schema equivalent: Date Created

Creator

The example uses a <creator> element without an identifier, but includes an optional @role that contains a QCode qualifying the creator as a photographer:

<creator role="crol:photographer">
    <name>Spencer Platt</name>
    <related rel="crel:isA" qcode="gyibt:staff" />
</creator>

The <related> child element of <creator> further qualifies the photographer as a member of staff (as distinct from, say, a freelance photographer)

IPTC Core Schema equivalent: Creator

Contributor

A <contributor> identifies people or organisations who did not originate the content, but have added value to it. In this case, the @role value is a hint that the contributor added descriptive metadata:

<contributor role="ctrol:descrWriter">
    <name>sp/lrc</name>
</contributor>

IPTC Core Schema equivalent: Description Writer

Creditline

The <creditline> is a natural-language string that must be used by the receiver to indicate the credit(s) for the content, as directed in the business terms agreed with the provider or copyright holder:

<creditline>Getty Images</creditline>

IPTC Core Schema equivalent: Credit Line

9.6.2. Descriptive metadata

Subject

As described in the Quick Start Guide to NewsML-G2 Basics, the subject matter of content is expressed using the <subject> element. The optional @type uses the IPTC Concept Nature NewsCodes (recommended scheme alias "cpnat") to indicate the type of concept being expressed. The following example uses a value of "cpnat:event" to indicate that the concept is an Event, and the QCode identifies the Event in the scheme with an alias "gyimeid":

<subject type="cpnat:event" qcode="gyimeid:104530187" />

The provider can use this Event ID to "tag" each of the pictures that relate to this topic, enabling receivers to group them via the Event ID.

The picture of Las Vegas Boulevard illustrates a story about unemployment. This example uses codes and associated <name> child elements from the IPTC Media Topic NewsCodes:

<subject type="cpnat:abstract" qcode="medtop:20000523">
    <name xml:lang="en-GB">labour market</name>
    <name xml:lang="de">Arbeitsmarkt</name>
</subject>
<subject type="cpnat:abstract" qcode="medtop:20000533">
    <name xml:lang="en-GB">unemployment</name>
    <name xml:lang="de">Arbeitslosigkeit</name>
</subject>
City, State/Province, Country

The <located> element in the <contentMeta> block describes the place where the picture was created. This may be the same location as the event portrayed in the picture, but this cannot be assumed. The location of the event is logically part of the subject matter – the City, State/Province, Country fields in the IPTC Photo Metadata are defined as "the location shown" – so should use the <subject> element. To summarise:

  • Use <located> to describe where the camera was located when taking the picture.

  • Use <subject> to describe the location shown in the picture. It is recommended that @type is used to indicate the property identifies a geographical area.

The location shown in the example picture is Las Vegas Boulevard. Child elements of <subject> may be used to add further details, including:

  • <name> gives the place name in plain text, and

  • <broader> expresses the concept of Las Vegas Boulevard as part of the broader entity of Las Vegas which in turn is part of broader entities of Nevada state and of the United States.

<broader> is only available at Power Conformance Level, which is why we set @conformance to "power" in <newsItem>

It is recommended that the nature of the concept is indicated by @type using a value from the IPTC Concept Nature NewsCodes, in this case that the concept identifies a geographical area:

The completed <subject> structure for the geographical information is:

<subject type="cpnat:geoArea">
    <name>Las Vegas Boulevard</name>
</subject>
<subject type="cpnat:geoArea" qcode="gycon:89109">
    <name>Las Vegas</name>
    <broader qcode="iso3166-1a2:US-NV">
    <name>Nevada</name>
    </broader>
    <broader qcode="iso3166-1a3:USA">
    <name>United States</name>
    </broader>
</subject>
Keywords

QCodes and relationship properties are powerful tools, but keywords are still widely used by picture archives. The NewsML-G2 <keyword> property is mapped from the "Keywords" field in XMP. The semantics of "keyword" can vary from provider to provider, but should not present problems in the news industry, which is familiar enough with their use:

<keyword>business</keyword>
<keyword>economic</keyword>
<keyword>economy</keyword>
<keyword>finance</keyword>
<keyword>poor</keyword>
<keyword>poverty</keyword>
<keyword>gamble</keyword>

IPTC Core Schema equivalent: Keywords

Headline, Description

These two IPTC/IIM fields map directly to elements of the same name in NewsML-G2. Both <headline> and <description> also have an optional @role. The IPTC maintains a set of NewsCodes for Description Role (recommended scheme alias "drol"). In this case, as the description is of a photograph, the role will be "caption". Description is a Block type element, meaning it may contain line breaks.

Both elements have optional attributes which may be used to support international use: @xml:lang, @dir (text direction):

<headline>Variety Of Recessionary Forces Leave Las Vegas
Economy Scarred</headline>
<description role="drol:caption">A general view of part of downtown,
    including Las Vegas Boulevard, on October 20, 2010 in Las Vegas,
    Nevada. Nevada once had among the lowest unemployment rates in the
    United States at 3.8 percent but has since fallen on difficult times.
    Las Vegas, the gaming capital of America, has been especially hard
    hit with unemployment currently at 14.7 percent and the highest
    foreclosure rate in the nation. Among the sparkling hotels and
    casinos downtown are dozens of dormant construction projects and
    hotels offering rock bottom rates. As the rest of the country slowly
    begins to see some economic progress, Las Vegas is still in the midst
    of the economic downturn. (Photo by Spencer Platt/Getty Images)
</description>

IPTC Core Schema equivalent: Headline

9.6.3. Completed <contentMeta>

<contentMeta>
    <contentCreated>2010-10-20T19:45:58-04:00</contentCreated>
    <creator role="crol:photographer">
        <name>Spencer Platt</name>
        <related rel="crel:isA" qcode="gyibt:staff" />
    </creator>
    <contributor role="ctrol:descrWriter">
        <name>sp/lrc</name>
    </contributor>
    <creditline>Getty Images</creditline>
    <subject type="cpnat:event" qcode="gyimeid:104530187" />
    <subject type="cpnat:abstract" qcode="medtop:20000523">
        <name xml:lang="en-GB">labour market</name>
        <name xml:lang="de">Arbeitsmarkt</name>
    </subject>
    <subject type="cpnat:abstract" qcode="medtop:20000533">
        <name xml:lang="en-GB">unemployment</name>
        <name xml:lang="de">Arbeitslosigkeit</name>
    </subject>
    <subject type="cpnat:geoArea">
        <name>Las Vegas Boulevard</name>
    </subject>
    <subject type="cpnat:geoArea" qcode="gycon:89109">
        <name>Las Vegas</name>
        <broader qcode="iso3166-1a2:US-NV">
            <name>Nevada</name>
        </broader>
        <broader qcode="iso3166-1a3:USA">
            <name>United States</name>
        </broader>
    </subject>
    <keyword>business</keyword>
    <keyword>economic</keyword>
    <keyword>economy</keyword>
    <keyword>finance</keyword>
    <keyword>poor</keyword>
    <keyword>poverty</keyword>
    <keyword>gamble</keyword>
    <headline>Variety Of Recessionary Forces Leave Las Vegas Economy Scarred</headline>
    <description role="drol:caption">A general view of part of downtown,
        including Las Vegas Boulevard, on October 20, 2010 in Las Vegas,
        Nevada. Nevada once had among the lowest unemployment rates in the
        United States at 3.8 percent but has since fallen on difficult times.
        Las Vegas, the gaming capital of America, has been especially hard
        hit with unemployment currently at 14.7 percent and the highest
        foreclosure rate in the nation. Among the sparkling hotels and
        casinos downtown are dozens of dormant construction projects and
        hotels offering rock bottom rates. As the rest of the country slowly
        begins to see some economic progress, Las Vegas is still in the midst
        of the economic downturn. (Photo by Spencer Platt/Getty Images)
    </description>
</contentMeta>

9.7. Picture data

Binary content is conveyed within the NewsML-G2 <contentSet> wrapper by one or more <remoteContent> elements, enabling multiple alternate renditions of a picture within the same Item.

9.7.1. Remote Content

The <remoteContent> element references objects that exist independently of the current NewsML-G2 Item. In the example there is an instance of <remote Content> for each of the three separate binary renditions of the picture.

Remote Content Renditions

Figure: Each <remoteContent> wrapper references a separate rendition of the binary resource

Each remote content instance contains attributes that conceptually can be split into three groups:

  • Target resource attributes enable the receiver to accurately identify the remote resource, it’s content type and size;

  • Content attributes enable the processor to distinguish the different business purposes of the content using @rendition;

  • Content Characteristics contain technical metadata such as dimensions, colour values and resolution.

Frequently used attributes from these groups are described below, but note that the NewsML-G2 XML structure that delimits the groups may not be visible in all XML editors. For details of these attribute groups, see the NewsML-G2 Specification, which can be downloaded from www.newsml-g2.org/spec.

9.7.2. Target Resource Attributes

This group of attributes express administrative metadata, such as identification and versioning, for the referenced content, which could be a file on a mounted file system, a Web resource, or an object within a CMS. NewsML-G2 flexibly supports alternative methods of identifying and locating the externally-stored content. For this example, the picture renditions are located in the same folder as the NewsML-G2 document.

The two attributes of <remoteContent> available to identify and locate the content are Hyperlink (@href) and Resource Identifier Reference (@residref). Either one MUST be used to identify and locate the target resource. They MAY optionally be used together, Their intended use is:

  • @href locates any resource, using an IRI.

  • @residref identifies a managed resource, using an identifier that may be globally unique.

An IRI, for example:

<remoteContent href="
http://example.com/2008-12-20/pictures/foo.jpg"

Or (amongst other possibilities):

<remoteContent href="file:///./GYI0062134533-web.jpg"
Resource Identifier Reference (@residref)

An XML Schema string, such as:

<remoteContent residref="tag:example.com,2008:PIX:FOO20081220098658"

It is up to the provider to specify how @residref may be resolved to retrieve the actual content.

Version

An XML Schema positive integer denoting the version of the target resource. In the absence of this attribute, recipients should assume that the target is the latest available version:

<remoteContent href="file:///./GYI0062134533-web.jpg"
    version="1"
Content Type

The Media Type of the target resource:

contenttype="image/jpeg"
Size

Indicates the size of the target resource in bytes.

size="346071"

9.7.3. News Content Attributes

This group of attributes of <remoteContent> enables a processor or human-reader to distinguish between different components; in this case the alternative resolutions of the picture. The principal attribute of this group is @rendition, described below.

Rendition

The rendition attribute MUST use a QCode, either proprietary or using the IPTC NewsCodes for rendition, which has a Scheme URI of http://cv.iptc.org/newscodes/rendition/ and recommended Scheme Alias of "rnd" and contains (amongst others) the values that we need: highRes, web, thumbnail. Thus using the appropriate NewsCode, the high resolution rendition of the picture may be identified as:

<remoteContent rendition="rnd:highRes"

To avoid processing ambiguity, each specific rendition value should be used only once per News Item, except when the same rendition is available from multiple remote locations. In this case, the same value of rendition may be given to several Remote Content elements.

9.7.4. News Content Characteristics

This group of attributes describes the physical properties of the referenced object specific its media type. Text, for example, may use @wordcount). Audio and video are provided with attributes appropriate to streamed media, such as @audiobitrate, @videoframerate. The appropriate attributes for pictures are described below.

Picture Width and Picture Height

The dimension attributes @width and @height are optionally qualified by @dimensionunit, which specifies the units being used. This is a @qcode value and it is recommended that the value is taken from the IPTC Dimension Unit NewsCodes, whose URI is http://cv.iptc.org/newscodes/dimensionunit/ (recommended Scheme Alias is "dimensionunit")

If @dimensionunit is absent, the default units for each content type are:

Content Type Height Unit (default) Width Unit (default)

Picture

pixels

pixels

Graphic: Still / Animated

points

points

Video (Analog)

lines

pixels

Video (Digital)

pixels

pixels

As the dimensions of the example picture are expressed in pixels, @dimensionunit is not needed:

width="1500"
height="1001"
Picture Orientation

This indicates that the image requires an orientation change before it can be properly viewed, using values of 1 to 8 (inclusive), where 1 (the default) is "upright": that is the visual top of the picture is at the top, and the visual left side of the picture in on the left.

The application of these orientation values is described in detail in the News Content Characteristics section of the NewsML-G2 Specification. (This can be downloaded by visiting https://iptc.org/standards/ and following the link to NewsML-G2.)

The picture used in our example has an orientation value of 1:

width="1500"
height="1001"
orientation="1"
Layout Orientation

It is possible to calculate the best way to use a picture in a page layout using the combined technical characteristics of Height, Width and Orientation, but many implementers are reluctant to rely on technical characteristics to make editorial judgements (determining whether a video is SD or HD is another example). The @layoutorientation is a way to express editorial advice on the best way to use a picture in a layout. The value for the example picture is:

layoutorientation="loutorient:horizontal"

Values in the Layout Orientation Scheme are:

Code Definition

horizontal

The human interpretation of the top of the image is aligned to the long side.

vertical

The human interpretation of the top of the image is aligned to the short side.

Square

Both sides of the image are about identical, there is no short and long side.

unaligned

There is no human interpretation of the top of the image.

Picture Colour Space

The colour space of the target resource, and MUST use a QCode. The recommended scheme is the IPTC Colour Space NewsCodes (recommended scheme alias "colsp") Note the UK English spelling of colour.

colourspace="colsp:AdobeRGB"
Colour Depth

The optional @colourdepth indicates using a non-negative integer the number of bits used to define the colour of each pixel in a still image, graphic or video.

colourdepth="24"
Content Hints

At the Power conformance level, the provider is able to express metadata from the target resource as an aid to processing.

It is not mandatory for the metadata to be extracted from the target resource, but it MUST agree with any existing metadata within the target resource.

In this case, the provider has added an <altId> – an alternative identifier – for the resource.

Alternative identifiers may be needed by customer systems. The <altId> element may optionally be refined using a QCode to describe the context – in this case a "master ID" that is proprietary to the provider. This makes clear the purpose of the alternative identifier. Also note that Alternative Identifiers are useful only to another application; and not intended to be used by THIS NewsML-G2 processor. The provider MUST tell receivers how to interpret alternative identifiers, otherwise they are meaningless.

<altId type="gyiid:masterID">105864332</altId>

Note that in this example only the high resolution rendition has an <altId>.

Signal

The signal property instructs the NewsML-G2 processor to process an Item or its content in a specific way. As a child element of itemMeta, the scope of <signal> is the whole of the document and/or its contents. If alternative renditions of content have specific processing needs, use <signal> as a child element of <remoteContent> to specify the processing instructions.

9.7.5. Completed <remoteContent> wrapper

The <remoteContent> wrapping element in full for the "High Res" picture in the example:

<remoteContent rendition="rnd:highRes" __
    href="./GYI0062134533.jpg" version="1"
    size="346071" contenttype="image/jpeg" width="1500"
    height="1001" colourspace="colsp:AdobeRGB" orientation="1"
    layoutorientation="loutorient:horizontal">
    <altId type="gyiid:masterID">105864332</altId>
</remoteContent>

10. Quick Start: Video

10.1. Introduction

Now that streamed media is part of everyone’s day-to-day experience on the Web, organisations with little or no tradition of "broadcast media" production need to be able to process audio and video.

NewsML-G2 allows all media organisations, whether traditional broadcasters or not, to access and exchange audio and video in a professional workflow, by providing features and Extension Points that enable proprietary formats to be "mapped" to Newsml-G2 to achieve freedom of exchange amongst a wider circle of information partners.

This Quick Start guide is split into two parts:

  • Part I deals with a video that is available in multiple different renditions and the example focuses on expressing the technical characteristics of each rendition of the content.

  • Part II shows an example of video content has been assembled from multiple sources, each with distinct metadata.

We recommend reading the Quick Start Guide to NewsML-G2 Basics
before this Quick Start Guide.

10.2. Part I – Multiple Renditions of a Single Video

The following example is based on a sample NewsML-G2 video item from Agence France Presse (but is not a guide to processing AFP’s NewsML-G2 news services).

LISTING 4: Multiple Renditions of a Video in NewsML-G2

All Scheme Aliases used in listing below indicate IPTC NewsCodes vocabularies, except for afpdescRole,

<?xml version="1.0" encoding="utf-8"?>
<newsItem 
    xmlns="http://iptc.org/std/nar/2006-10-01/"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://iptc.org/std/nar/2006-10-01/ 
        ./NewsML-G2_2.25-spec-All-Power.xsd"
    guid="urn:newsml:afp.com:20140131:CNG.3424d3807bc.391@video_1359566"
    version="10" 
    standard="NewsML-G2" 
    standardversion="2.25" 
    conformance="power" 
    xml:lang="en-US">
    <catalogRef
        href="http://www.iptc.org/std/catalog/catalog.IPTC-G2-Standards_30.xml" />
    <catalogRef href="http://cv.afp.com/std/catalog/catalog.AFP-IPTC-G2_3.xml" />
    <itemMeta>
        <itemClass qcode="ninat:video" />
        <provider qcode="nprov:AFP" />
        <versionCreated>2017-10-31T11:37:23+01:00    </versionCreated>
        <firstCreated>2014-01-30T13:29:38+00:00</firstCreated>
        <pubStatus qcode="stat:usable" />
    </itemMeta>
    <contentMeta>
        <icon contenttype="image/jpeg" height="62"
            href="http://spar-iris-p-sco-http-int-vip.afp.com/components/9601ac3"
            rendition="rnd:thumbnail" width="110" />
        <creditline>AFP</creditline>
        <description role="afpdescRole:synthe">- Amir Hussein Abdullahian, Iranian
            foreign ministry's undersecretary for Arab and African affairs -
            Panos Moumtzis (man), UNHCR regional coordinator for Syrian refugees
        </description>
        <description role="afpdescRole:script">SHOTLIST: KUWAIT. JANUARY 30,
            2014. SOURCE: AFPTV -VAR inside the conference room -VAR of Ban
            Ki-moon -MS of King Abdullah II of Jordan -MS of Michel Sleiman,
            president of Lebanon -MS of Tunisian president Moncef Marzouki
            SOUNDBITE 1 - Amir Hussein Abdullahian (man), Iranian foreign
            ministry's undersecretary for Arab and African affairs (Farsi, 10
            sec): "Those who send arms to Syria are behind the daily killings
            there." SOUNDBITE 2 - Amir Hussein Abdullahian (man), Iranian foreign
            ministry's undersecretary for Arab and African affairs (Farsi, 9
            sec): "We regret that some countries, such as the United States, have
            created a very high level of extremism in Syria." SOUNDBITE 3 - Panos
            Moumtzis (man), UNHCR regional coordinator for Syrian refugees
            (Arabic, 12 sec): "The United Nations is providing humanitarian
            assistance to more than four million people inside Syria, two million
            of them displaced." SOUNDBITE 4 - Panos Moumtzis (man), UNHCR
            regional coordinator for Syrian refugees (Arabic, 17 sec): "The
            funding will first go to UN relief organizations, who are working
            inside Syria and in neighbouring countries. Funding will also go to
            the more than 55 NGOs in Syria with whom we cooperate and coordinate
            to deliver aid." 
        </description>
        <language tag="en" />
    </contentMeta>
    <contentSet>
        <remoteContent contenttype="video/mpeg-2" 
            href="http://components.afp.com/ab652af034e.mpg"
            rendition="vidrnd:dvd" 
            size="54593540" 
            width="720"
            height="576"
            duration="69"
            durationunit="timeunit:seconds"
            videocodec="vcdc:c015"
            videoframerate="25" 
            videodefinition="videodef:sd"
            colourindicator="colin:colour"
            videoaspectratio="4:3"
            videoscaling="sov:letterboxed" />
        <remoteContent contenttype="video/mp4-1920x1080"
            href="http://components.afp.com/3e353716caa.1920x1080.mp4"
            rendition="vidrnd:HD1080"
            size="87591736" 
            width="1920" height="1080" 
            duration="69" 
            durationunit="timeunit:seconds"
            videocodec="vcdc:c041"
            videoframerate="25" 
            videodefinition="videodef:hd"
            colourindicator="colin:colour"
            videoaspectratio="16:9"
            videoscaling="sov:unscaled" />
        <remoteContent contenttype="video/mp4-1280x720"
            href="http://components.afp.com/5ba0d14a64f.1280x720.mp4"
            rendition="vidrnd:HD720" 
            size="71010540" 
            width="1280" height="720"
            duration="69" 
            durationunit="timeunit:seconds"
            videocodec="vcdc:c041"
            videoframerate="25" 
            videodefinition="videodef:hd"
            colourindicator="colin:colour"
            videoaspectratio="16:9"
            videoscaling="sov:unscaled" />
    </contentSet>
</newsItem>

10.2.1. Document structure

The building blocks of the NewsML-G2 Item are the <newsItem> root element, with additional wrapping elements for metadata about the News Item (itemMeta), metadata about the content (contentMeta) and the content itself (contentSet).

The root <newsItem> attributes are:

<newsItem xmlns="http://iptc.org/std/nar/2006-10-01/"
    guid="urn:newsml:afp.com:20160131:CNG.3424d3807bc.391@video_1359566"
    version="10"
    standard="NewsML-G2"
    standardversion="2.25"
    conformance="power"
    xml:lang="en-US">

This is followed by Catalog references:

<catalogRef
    href="http://www.iptc.org/std/catalog/catalog.IPTC-G2-Standards_30.xml"
/>
<catalogRef
    href="http://cv.afp.com/std/catalog/catalog.AFP-IPTC-G2_3.xml" />

10.2.2. Item Metadata <itemMeta>

The <itemClass> property uses a QCode from the IPTC News Item Nature NewsCodes to denote that the Item conveys a picture. Note that <provider> uses the recommended IPTC Provider NewsCodes, a controlled vocabulary of providers registered with the IPTC, recommended scheme alias "nprov":

<itemMeta>
    <itemClass qcode="ninat:video" />
    <provider qcode="nprov:AFP" />
    <versionCreated>2017-01-31T11:37:23+01:00 </versionCreated>
    <firstCreated>2016-01-30T13:29:38+00:00</firstCreated>
    <pubStatus qcode="stat:usable" />
</itemMeta>

10.2.3. Content Metadata <contentMeta>

The <icon> element tells receivers how to retrieve an image to use as an iconic image for the content, for example a still image extracted from the video. It’s possible to have multiple icons to suit different applications, each qualified by @rendition.

Two <description> elements are qualified by @role: first a summary, second a more detailed shotlist:

<contentMeta>
    <icon contenttype="image/jpeg" height="62"
    href="http://spar-iris-p-sco-http-int-vip.afp.com/components/9601ac3"
    rendition="rnd:thumbnail" width="110" />
    <creditline>AFP</creditline>
    <description role="afpdescRole:synthe">- Amir Hussein Abdullahian,
    foreign ministry's undersecretary for Arab and African affairs -
    Panos Moumtzis (man), UNHCR regional coordinator for Syrian refugees
    </description>
    <description role="afpdescRole:script">SHOTLIST: KUWAIT. JANUARY 30,
        \2016. SOURCE: AFPTV -VAR inside the conference room -VAR of Ban
        Ki-moon -MS of King Abdullah II of Jordan -MS of Michel Sleiman,
        president of Lebanon -MS of Tunisian president Moncef Marzouki
        SOUNDBITE 1 - Amir Hussein Abdullahian (man), Iranian foreign
        ministry's undersecretary for Arab and African affairs (Farsi, 10
        sec): "Those who send arms to Syria are behind the daily killings
        there." SOUNDBITE 2 - Amir Hussein Abdullahian (man), Iranian foreign
        ministry's undersecretary for Arab and African affairs (Farsi, 9
        sec): "We regret that some countries, such as the United States, have
        created a very high level of extremism in Syria." SOUNDBITE 3 - Panos
        Moumtzis (man), UNHCR regional coordinator for Syrian refugees
        (Arabic, 12 sec): "The United Nations is providing humanitarian
        assistance to more than four million people inside Syria, two million
        of them displaced." SOUNDBITE 4 - Panos Moumtzis (man), UNHCR
        regional coordinator for Syrian refugees (Arabic, 17 sec): "The
        funding will first go to UN relief organizations, who are working
        inside Syria and in neighbouring countries."
    </description>
    <language tag="en" />
</contentMeta>

10.2.4. Video Content

Video is conveyed within the NewsML-G2 <contentSet> using the <remoteContent> element; where there are multiple alternate renditions of SAME content, <remoteContent> can be repeated for each rendition within the same Item.

The <remoteContent> element references binary objects that exist independently of the current NewsML-G2 document. In this example there is an instance of <remote Content> for each of three renditions of the video.

Each remote content instance contains attributes that can conceptually be split into three groups:

  • Target resource attributes enable the receiver to accurately identify the remote resource, its content type and size;

  • Content attributes enable the processor to distinguish the different business purposes of the content using @rendition;

  • Content Characteristics contain technical metadata such as dimensions, duration and format.

Frequently used attributes from these groups are described below, but note that the NewsML-G2 XML structure that delimits the groups may not be visible in all XML editors. For a detailed description of these attribute groups, see the NewsML-G2 Specification (This can be downloaded by visiting www.newsml-g2.org/spec and following the link to NewsML-G2.)

10.2.5. Target Resource Attributes

This group of attributes express administrative metadata, such as identification and versioning, for the referenced content, which could be a file on a mounted file system, a Web resource, or an object within a content management system. NewsML-G2 flexibly supports alternative methods of identifying and locating the externally-stored content.

The two attributes of <remoteContent> that identify and optionally locate the content are Hyperlink (@href) and Resource Identifier Reference (@residref). Either one MUST be used to identify the target resource. They MAY optionally be used together.

Although @href and @resideref are superficially similar, their intended use is:

  • @href locates any resource, using an IRI.

  • @residref identifies a managed resource, using an identifier that may be globally unique. It is up to the provider to specify how @residref may be resolved to retrieve the actual content.

An IRI, for example:

<remoteContent href="http://components.afp.com/ab652af034e5f7acc131f8f122b274a5ef8ee37e.mpg"
Resource Identifier Reference (@residref)

An XML Schema string e.g.

<remoteContent residref="tag:example.com,2008:PIX:FOO20081220098658"
Version

An XML Schema positive integer denoting the version of the target resource. In the absence of this attribute, recipients should assume that the target is the latest available version

version="10"
Content Type

The Media Type of the target resource

contenttype=" video/3gpp"
Format

A refinement of a Content Type using a value from a controlled vocabulary:

format="fmt:video"
Content Type Variant (@contenttypevariant)

A refinement of a Content Type using a string:

contenttype="video/3gpp"
contenttypevariant="MPEG-4 Simple Profile"
Size

Indicates the size of the target resource in bytes.

size="54593540"

10.2.6. News Content Attributes

This group of attributes of <remoteContent> enables a processor or human operator to distinguish between different components; in this case the alternative resolutions of the video.

Rendition

The rendition attribute MUST use a QCode. Providers may have their own schemes, or use the IPTC NewsCodes for rendition, which has a Scheme URI of http://cv.iptc.org/newscodes/rendition/ and recommended Scheme Alias of "rnd". This example uses a provider-specific scheme with a Scheme Alias of "vidrnd":

<remoteContent rendition="vidrnd:dvd"

To avoid processing ambiguity, each specific rendition value should be used only once per News Item, except when the same rendition is available from multiple remote locations. In this case, the same value of rendition may be given to several Remote Content elements.

10.2.7. News Content Characteristics

This third a group of attributes of <remoteContent> is provided to enable further efficiencies in processing and describes physical characteristics of the referenced object specific its media type. Text, for example, may use @wordcount; Audio and video are provided with attributes appropriate to streamed media, such as @audiobitrate, @videoframerate. The appropriate attributes for video are described below.

Duration (@duration and @durationunit)

Indicates the duration of the content in seconds by default, but can be expressed by some other measure of temporal reference (e.g. frames) when using the optional @durationunit. From NewsML-G2 2.14, the data-type of @duration is a string; earlier versions use non-negative integer. The reason for the change is that video duration is often expressed using non-integer values.

For example, expressing duration as an SMPTE time code requires the following NewsML-G2:

duration="00:06:32:12" durationunit="timeunit:timeCode"

The recommended CV for @durationunit is the IPTC Time Unit NewsCodes whose URI is http://cv.iptc.org/newscodes/timeunit/. The recommended alias for the scheme is "timeunit".

Video Codec (@videocodec)

A QCode value indicating the encoding of the video – for example one of the encodings used in this example is MPEG-2 Video Simple Profile. This is indicated by the IPTC Video Codec NewsCodes with a recommended Scheme Alias "vcdc", and the corresponding code is "c015".

videocodec="vcdc:c015"
Video Frame Rate (@videoframerate)

A decimal value indicating the rate, in frames per second [fps] at which the video should be played out to achieve the correct visual effect. Common values (in fps) are 25, 50, 60 and 29.97 (drop-frame rate):

videoframerate="25"
Video Aspect Ratio (@videoaspectratio)

A string value, e.g. 4:3, 16:9

videoaspectratio="4:3"
Video Scaling (@videoscaling)

The @videoscaling attribute describes how the aspect ratio of a video has been changed from the original in order to accommodate a different display dimension:

videoscaling="sov:letterboxed"

The value of the property is a QCode; the recommended CV is the IPTC Video Scaling NewsCodes (Scheme URI: http://cv.iptc.org/newscodes/videoscaling/)

The recommended Scheme Alias is "sov", and the codes and their definitions are as follows:

Code Definition

unscaled

no scaling applied

mixed

two or more different aspect ratios are used in the video over the timeline

pillarboxed

bars to the left and right

letterboxed

bars to the top and bottom

windowboxed

pillar boxed plus letter boxed

zoomed

scaling to avoid any borders

Video Definition (@videodefinition)

Editors may need to know whether video content is HD or SD, as this may not be obvious from the technical specification ("HD", for example, is an umbrella term covering many different sets of technical characteristics). The @videodefinition attribute carries this information:

videodefinition="videodef:sd"

The value of the property can be either "hd" or "sd", as defined by the Video Definition NewsCodes CV. The Scheme URI is http://cv.iptc.org/newscodes/videodefinition/ and the recommended scheme alias is "videodef".

Colour Indicator <colourindicator>

Indicates whether the still or moving image is coloured or black and white (note the UK spelling of colour). The recommended vocabulary is the IPTC Colour Indicator NewsCodes (Scheme URI: http://cv.iptc.org/newscodes/colourindicator/) with a recommended Scheme Alias of "colin". The value of the property is "bw" or "colour":

colourindicator="colin:colour"

The completed Remote Content wrapper will be:

<remoteContent contenttype="video/mpeg-2"
    href="http://components.afp.com/ab652af034e5f7acc131f8f122b274a5ef8ee37e.mpg"
    rendition="vidrnd:dvd"
    size="54593540"
    width="720" height="576"
    duration="69"
    durationunit="timeunit:seconds"
    videocodec="vcdc:c015"
    videoframerate="25"
    videodefinition="videodef:sd"
    colourindicator="colin:colour"
    videoaspectratio="4:3"
    videoscaling="sov:letterboxed" />

10.2.8. Audio metadata

There are specific properties for describing the technical characteristics of audio, for example:

Audio Bit Rate (@audiobitrate)

A positive integer indicating kilobits per second (Kbps)

audiobitrate="32"
Audio Sample Rate (@audiosamplerate)

A positive integer indicating the sample rate in Hertz (Hz)

audiosamplerate="44100"

For a detailed description of all of the News Content Characteristics for Video and Audio content, see section News Content Characteristics in the NewsML-G2 Specification Document.

10.3. Part 2 – Multi-part video

We recommend reading the Quick Start Guide to NewsML-G2 Basics and the preceding Part 1 of
this guide to video before reading Part 2.

Audio and video, including animation, have a temporal dimension: the nature of the content is expected to change over its duration: in this example a single piece of video has been created from a number of shots – shorter segments of content from different creators - that were combined during an editing process.

Note that this complies with the basic NewsML-G2 rule that "one piece of content = one newsItem". Although the video may be composed of material from many sources, it remains a single piece of journalistic content created by the video editor. This is analogous to a text story that is compiled by a single reporter or editor from several different reports.

NewsML-G2 supports this by enabling the expression of metadata about separate identifiable parts of content using <partMeta> in addition to metadata structures that apply to the whole content.

The example video is about a retrospective exhibition in Berlin of works by the German humourist and animator Vicco von Bülow. It consists of a number of shots, so provides a shotlist summarising the visual content of each shot, and a dopesheet, giving an editorial summary of the video’s content.

The document structure and the NewsML-G2 properties included in the example have been previously described, except for the <partMeta> wrapper, which is described in detail below. A full code listing for the example is included at the end.

The example is based on a sample NewsML-G2 video item from the European Broadcasting Union (EBU). The News Item references a multi-part broadcast video and contains separate metadata for each segment of the content, including a keyframe, and additionally describes the technical characteristics of the video.

Please note that it may resemble but does NOT represent the EBU’s NewsML-G2 implementation.
LISTING 5: Multi-part Video in NewsML-G2

All Scheme Aliases used in the listing below indicate IPTC NewsCodes vocabularies, except for the following : addressType, codeorigin, codesource, cptype, descrole, geo, ISOCountryCode, langusecode, prov, providercode, rolecode, servicecode, lrol and vidrnd.

<?xml version="1.0" encoding="ISO-8859-1"?>
<newsItem
    xmlns="http://iptc.org/std/nar/2006-10-01/"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://iptc.org/std/nar/2006-10-01/
        ./NewsML-G2_2.25-spec-All-Power.xsd"
    standardversion="2.25"
    guid="tag:example.com,2008:407624"
    version="10"
    standard="NewsML-G2"
    conformance="power"
    xml:lang="en">
    <catalogRef
        href="http://www.iptc.org/std/catalog/catalog.IPTC-G2-Standards_30.xml" />
    <catalogRef
        href="http://www.example.com/metadata/newsml-g2/catalog.NewsML-G2.xml" />
    <rightsInfo>
        <usageTerms>
            Access only for Eurovision Members and EVN / EVS Sub-Licensees.
            <br />
            Coverage cannot be used by a national competitor of the contributing
            broadcaster.
        </usageTerms>
    </rightsInfo>
    <itemMeta>
        <itemClass qcode="ninat:video" />
        <provider qcode="providercode:EBU">
            <name>European Broadcasting Union - EVN</name>
            <organisationDetails>
                <contactInfo>
                   <web>http://www.eurovision.net</web>
                   <phone>+41 22 717 2869</phone>
                   <email>features@eurovision.net</email>
                   <address role="AddressType:Office">
                       <line>Eurovision Sports News Exchanges</line>
                       <line>L Ancienne Route 17 A</line>
                       <line>CH-1218</line>
                       <locality>
                           <name>Grand-Saconnex</name>
                       </locality>
                       <country qcode="ISOCountryCode:ch">
                           <name xml:lang="en">Switzerland</name>
                       </country>
                   </address>
                </contactInfo>
            </organisationDetails>
        </provider>
        <versionCreated>2017-10-19T10:54:04Z</versionCreated>
        <firstCreated>2008-11-06T10:22:28Z</firstCreated>
        <pubStatus qcode="stat:usable" />
        <service qcode="servicecode:EUROVISION">
            <name>Eurovision services</name>
        </service>
        <edNote>Originally broadcast in Germany</edNote>
        <link rel="irel:associatedWith"
            href="http://www.example.com/video/407624/index.html"/>
    </itemMeta>
    <contentMeta>
        <contentCreated> 2008-11-05T19:04:00-08:00</contentCreated>
        <located type="cptype:city" qcode="city:345678">
            <name>Berlin</name>
            <broader type="cptype:statprov" qcode="state:2365">
                <name>Berlin</name>
            </broader>
            <broader type="cptype:country" qcode="iso3166-1a2:DE">
                <name>Germany</name>
            </broader>
        </located>
        <creator qcode="codesource:DEZDF">
            <name>Zweites Deutsches Fernsehen</name>
            <organisationDetails>
                <location>
                   <name>MAINZ</name>
                </location>
            </organisationDetails>
        </creator>
        <contributor qcode="codeorigin:DEZDF" role="rolecode:TechnicalOrigin">
            <name>Zweites Deutsches Fernsehen</name>
        </contributor>
        <creator qcode="codesource:GBRTV">
            <name>Reuters Television Ltd</name>
        </creator>
        <language tag="en" role="langusecode:VoiceOver" >
            <name>English</name>
        </language>
        <genre qcode="genre:biog">
            <name xml:lang="en-GB">Biography</name>
            <name xml:lang="fr">biographie</name>
        </genre>
        <subject type="cpnat:abstract" qcode="medtop:01000000">
            <name xml:lang="en-GB">Arts, Culture and Entertainment</name>
            <name xml:lang="fr">Arts, culture, et spectacles</name>
            <narrower type="cpnat:abstract" qcode="medtop:20000003">
                <name xml:lang="en-GB">Animation</name>
                <name xml:lang="fr">Dessin animé</name>
            </narrower>
        </subject>
        <headline>Loriot retrospective</headline>
        <description role="descrole:dopesheet">
            Yesterday evening (November 5) an exhibition opened in Berlin in
            honour of German humorist Vicco von Bülow, better known under the
            pseudonym "Loriot", to commemorate his 85th birthday. He was born
            November 12, 1923 in Brandenburg an der Havel and comes from an old
            German aristocratic family. He is most well-known for his cartoons,
            television sketches alongside late German actress Evelyn Hammann and
            a couple of movies. Under the name "Loriot" in 1971 he created a
            cartoon dog named "Wum", which he voice acted himself. In 1976 the
            first episode of the TV series "Loriot" was produced.
            <br />
        </description>
        <description role="descrole:shotlist">
            Berlin, 05/11/2008
            <br />
            - vs. Vicco von Bülow entering exhibition
            <br />
            - vs. Loriot and media
            <br />
            - sot Vicco von Bülow
            <br />
            "Since 85 years I didn't succeed in pursuing a job that could be
            called a profession."
            <br />
            - vs exhibition
            <br />
            - sot Irm Herrmann, actress
            <br />
            "Loriot is timeless. You always can watch him and I can always
            laugh."
            <br />
            - actor Ulrich Matthes in exhibition
            <br />
            sot Ulrich Matthes, actor
            <br />
            " I would say: one of the great German classics. Goethe, Kleist,
            Schiller, Thomas Mann, Loriot. That's the way I would say it."
            <br />
        </description>
    </contentMeta>
    <partMeta partid="Part1_ID" seq="1">
        <icon href=" http://www.example.com/video/Keyframes/407624.jpeg"/>
        <timeDelim renditionref="vidrnd:avi25" start="0" end="446" timeunit="timeunit:editUnit"/>
        <language tag="en" role="langusecode:VoiceOver" />
        <description>Vicco von Bülow entering exhibition </description>
    </partMeta>
    <partMeta partid="Part2_ID" seq="2">
        <icon href="http://www.example.com/video/Keyframes/407624-447.jpeg"/>
        <timeDelim renditionref="vidrnd:avi25" start="446" end="831" timeunit="timeunit:editUnit"/>
        <language tag="en" role="langusecode:VoiceOver" />
        <description>Loriot and media </description>
    </partMeta>
    <partMeta partid="Part3_ID" seq="3">
        <icon href="http://www.example.com/video/Keyframes/407624-832.jpeg"/>
        <timeDelim renditionref="vidrnd:avi25" start="831" end="1081" timeunit="timeunit:editUnit"/>
        <language tag="en" role="langusecode:Interlocution" />
        <description>Vicco von Bülow interview</description>
    </partMeta>
    <partMeta partid="Part4_ID" seq="4">
        <icon href="http://www.example.com/video/Keyframes/407624-1082.jpeg"/>
        <timeDelim renditionref="vidrnd:avi25" start="1081" end="1313" timeunit="timeunit:editUnit"/>
        <language tag="en" role="langusecode:NaturalSound" />
        <description>Exhibition panorama </description>
    </partMeta>
    <partMeta partid="Part5_ID" seq="5">
        <icon href="http://www.example.com/video/Keyframes/407624-1314.jpeg"/>
        <timeDelim renditionref="vidrnd:avi25" start="1313" end="1616" timeunit="timeunit:editUnit"/>
        <language tag="en" role="langusecode:Interlocution" />
        <description>Irm Herrmann, actress, interview</description>
    </partMeta>
    <partMeta partid="Part6_ID" seq="6">
        <icon href="http://www.example.com/video/Keyframes/407624-1617.jpeg"/>
        <timeDelim renditionref="vidrnd:avi25" start="1616" end="2109" timeunit="timeunit:editUnit"/>
        <language tag="en" role="langusecode:VoiceOver" />
        <description>Ulrich Matthes, actor, in exhibition</description>
    </partMeta>
    <partMeta partid="Part7_ID" seq="7">
        <icon href="http://www.example.com/video/Keyframes/407624-2110.jpeg"/>
        <timeDelim renditionref="vidrnd:avi25" start="2109" end="2732" timeunit="timeunit:editUnit"/>
        <language tag="en" role="langusecode:Interlocution" />
        <description>Ulrich Matthes, actor, interview</description>
    </partMeta>
    <partMeta partid="Part8_ID" seq="9">
        <icon href="http://www.example.com/video/Keyframes/407624-2733.jpeg"/>
        <timeDelim renditionref="vidrnd:avi25" start="2732" end="2775" timeunit="timeunit:editUnit"/>
        <language tag="en" role="langusecode:VoiceOver" />
        <description>"I would say: one of the great German classics. Goethe, Kleist,
            Schiller, Thomas Mann, Loriot. That's the way I would say it."</description>
    </partMeta>
    <contentSet>
        <remoteContent href="http://www.example.com/video/407624.avi"
            rendition="vidrnd:avi25"
            format="fmt:avi"
        duration="111" durationunit="timeunit:seconds"
        videocodec="vcdc:c155"
        videoframerate="25"
        videoaspectratio="16:9" />
    </contentSet>
</newsItem>

10.3.1. Part Metadata

NewsML-G2 Items can have many <partMeta> wrappers, each expressing properties for an identifiably separate part of the content; in this example each of the shots, or segments, which make up the video. The properties for each segment include:

  • an ID for the segment, and a sequence number

  • a keyframe, or icon that may help to visually identify the content of the segment

  • the start and end positions of the segment within the content

It is also possible to assert any Administrative or Descriptive Metadata for each <partMeta> element, if required.

The id and sequence number for the shot are expressed as attributes of <partMeta> and the <partMeta> element is repeated for each video segment. Below is a complete example of a single segment:

<partMeta partid="Part1_ID" seq="1">
    <icon href=" http://www.example.com/video/Keyframes/407624.jpeg"/>
    <timeDelim start="0" end="446" timeunit="timeunit:editUnit"/>
    <language tag="en" role="lrol:voiceOver" />
    <description>Vicco von Bülow entering exhibition</description>
</partMeta>

These elements of video <partMeta> are discussed below.

Add keyframe using <icon>

A keyframe for the video segment is expressed as the child element <icon> with @href pointing to the keyframe image as a resource on the Web:

<icon href="http://www.example.com/video/Keyframes/407624.jpeg"/>
Timing metadata

The <timeDelim> property indicates the start and end positions of this segment within the video, and the units being used to express these values, as shown for example:

<timeDelim start="0" end="446" timeunit="timeunit:editUnit"/>

This @timeunit uses a QCode to indicate that @start and @end are expressed in Edit Units, the smallest editable units of the content; in the case of video this is frames. Edit Unit is the assumed default value of @timeunit if this attribute is not present. It is one of the values of the IPTC Time Unit NewsCodes (recommended Scheme Alias "timeunit"), which is used in this example.

The values in the scheme are:

  • editUnit: the time delimiter is expressed in smallest editable unit of the content: frames (video) or samples (audio) and requires the frame rate or sampling rate to be known. This must be defined by the referenced rendition of the content.

  • timeCode: the format of the timestamp is hh:mm:ss:ff (ff for frames).

  • timeCodeDropFrame: the format of the timestamp is hh:mm:ss:ff (ff for frames).

  • normalPlayTime: the format of the timestamp is hh:mm:ss.sss (milliseconds).

  • seconds.

  • milliseconds.

In the example @start and @end are expressed as integers, but their datatype is XML String, because start and end can be expressed as integers, time values, or SMPTE time codes.

The value of @start expresses the non-inclusive start of the segment of the timeline; the value of @end expresses the inclusive end of the segment of the timeline. For example, a 30 second segment at 25 frames per second may be expressed using Edit Unit as:

<timeDelim start="0" end="750" timeunit="timeunit:editUnit"/>

A following 30 second segment would start at "750" and end at "1500".

The same segment would be expressed using milliseconds as:

<timeDelim start="0" end="30000" timeunit="timeunit:milliseconds"/>

and the following 30 second segment would start at "30000" and end at "60000"

When processing a time delimiter expressed as frames, use the
the following example as a guide:
<timeDelim start="3" end="4" timeunit="timeunit:editUnit"/>
Q: Does it mean that two frames are included, or just one frame, and which one(s) exactly?
A: one frame, the fourth.

When specifying the start and end points of a segment of video, be aware that these are unlikely to be frame-accurate for the same segment rendered in different technical formats; if frame-rates are different, the viewer is likely to see a different result for each rendition.

It is therefore highly recommended when expressing time delimiters using frames or timecodes that @renditionref is used to specify separate time delimiters corresponding to alternative renditions of the same shot, as follows:

<partMeta....>
    <!-- 10 seconds in frames at 25 fps = 250 frames -->
    <timeDelim renditionref="_vidrnd:avi25" start="0" end="250" +
        timeunit="timeunit:editUnit"/>
    <!-- 10 seconds in frames at 30 fps = 300 frames -->
    <timeDelim renditionref="vidrnd:avi30" start="0" end="300" +
        timeunit="timeunit:editUnit"/>
</partMeta>

Each @renditionref identifies a corresponding @rendition in <remoteContent>:

<contentSet>
    <remoteContent contenttype="..."
        href="..."
        rendition="vidrnd:avi25"
        ... />
    <remoteContent contenttype="..."
        href="..."
        rendition="vidrnd:avi30"
        _..._/>
</contentSet>
Description and Language

The example also indicates the language being used in the shot, and the context in which it is used. In this case, @role uses a QCode from a proprietary EBU scheme to indicate that the soundtrack of the shot is a voiceover in English.

<language tag="en" role="langusecode:VoiceOver" />

Implementers may also use the IPTC Language Role NewsCodes (recommended Scheme Alias "lrol") for this purpose.

Using <description>, we can also indicate what the viewer can expect to see in this segment:

<description>Vicco von Bülow entering exhibition</description>

10.3.2. Video Content

The <contentSet wrapper contains a single rendition of the video inside the <remoteContent> element. Note that the video frame rate is included, as this is required to calculate points in the timeline when using time delimiters based on Edit Unit:

<contentSet>
    <remoteContent href="http://www.example.com/video/407624.avi"
        format="fmt:avi"
        duration="111" durationunit="timeunit:seconds"
        videocodec="vcdc:c155"
        videoframerate="25"
        videoaspectratio="16:9" />
</contentSet>

10.4. Further Resources

The IPTC Video Metadata Hub Recommendation (VMHub) was launched in October 2016 as a comprehensive solution to the exchange of video metadata between multiple existing standards. Visit the IPTC website www.iptc.org and follow the links to Video Metadata Hub.

11. Quick Start - Packages

We recommend reading the Quick Start Guide to NewsML-G2 Basics
before this Quick Start Guide to News Packages.

11.1. Introduction

The ability to package together items of news content is important to news organisations and customers. Using packages, different facets of the coverage of a news story can be viewed in a named relationship, such as "Main Article", "Sidebar", Background". Another frequent application of packages is to aggregate content for news products, for example "Top Ten" news packages such as that illustrated below

A description of how to create this type of package with ordered components can be found further on in this document.
A Top Ten News package displayed on a web page
Figure: A Top Ten News Package displayed on the Web

Packages can range from simple collections on a common theme, to rich hierarchical structures.

NewsML-G2 is flexible in allowing a provider to package content that has already been published, or a package may be sent together with all of its content resources in a single News Message. See the Guidelines section on News Messages.

The NewsML-G2 <link> property is a useful way to indicate optional supplementary resources that may be retrieved by the end-user when processing or consuming a NewsML-G2 Item. Links should not be used as a lightweight method of packaging news; a NewsML-G2 processor would not be able to distinguish between News Items with some optional resources, and News Items that are intended to be pseudo-packages using links. It is also a basic NewsML-G2 rule that a News Item only conveys one piece of content.

By contrast, Packages:

  • Express structure, allowing news to be packaged as a list, or as a named hierarchy of content resources.

  • Have a mode property that enables the expression of a relationship between the components of a package group.

11.3. Package Structure

A simple Package has a structure as shown in the example below. The top level for content of a Package Item is one and only one <groupSet> element, followed by at least one <group> structure containing one or more <ItemRef> references to content. The <group> structure may also be repeated, but this example has only one. The diagram below shows a skeleton of the XML elements in a simple package and a visualisation of the relationship that this structure creates:

Simple package structure
Figure: Top-level element view of a simple package, and (right) a visualisation of the structure

LISTING 6: Simple NewsML-G2 Package

The following NewsML-G2 document illustrates the package structure above.

(All Scheme Aliases used in listing below indicate IPTC NewsCodes vocabularies, except for the following alias values: staffjobs, mystaff, svc, group.)

<?xml version="1.0" encoding="UTF-8"?>
<packageItem
    xmlns="http://iptc.org/std/nar/2006-10-01/"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://iptc.org/std/nar/2006-10-01/
        ./NewsML-G2_2.25-spec-All-Power.xsd"
     standard="NewsML-G2"
    standardversion="2.25"
    conformance="power"
    guid="tag:example.com,2008:UK-NEWS-TOPTEN:UK20081220098658" version="10">
    <catalogRef
        href="http://www.iptc.org/std/catalog/catalog.IPTC-G2-Standards_30.xml" />
    <catalogRef
        href="http:/www.example.com/customer/cv/catalog4customers-1.xml" />
    <itemMeta>
        <itemClass qcode="ninat:composite" />
        <provider qcode="nprov:AcmeNews" />
        <versionCreated>2017-11-17T12:30:00Z</versionCreated>
        <firstCreated>2008-12-20T12:25:35Z</firstCreated>
        <pubStatus qcode="stat:usable" />
        <profile versioninfo="1.0.0.2">simple_text_with_picture.xsl</profile>
        <service qcode="svc:uktop">
            <name>Top UK News stories hourly</name>
        </service>
        <title>UK-TOPNEWS</title>
        <edNote>Updates the previous version</edNote>
        <signal qcode="sig:update" />
    </itemMeta>
    <contentMeta>
        <contributor jobtitle="staffjobs:cpe" qcode="mystaff:MDancer">
            <name>Maurice Dancer</name>
            <name>Chief Packaging Editor</name>
            <definition validto="2017-11-17T17:30:00Z">
                Duty Packaging Editor
            </definition>
            <note validto="2017-11-17T17:30:00Z">
                Available on +44 207 345 4567 until 17:30 GMT today
            </note>
        </contributor>
         <headline xml:lang="en">UK</headline>
    </contentMeta>
    <groupSet root="G1">
        <group id="G1" role="group:main">
            <itemRef residref="urn:newsml:iptc.org:20081007:tutorial-item-A"
                contenttype="application/vnd.iptc.g2.newsitem+xml"
                size="2345">
                <itemClass qcode="ninat:text" />
                <provider qcode="nprov:AcmeNews"/>
                <pubStatus qcode="stat:usable"/>
                <title>Obama annonce son équipe</title>
                <description role="drol:summary">Le rachat il y a deux ans de la
                   propriété par Alan Gerry, magnat local de la télévision câblée, a
                   permis l'investissement des 100 millions de dollars qui étaient
                   nécessaires pour le musée et ses annexes, et vise à favoriser le
                   développement touristique d'une région frappée par le chômage.
                </description>
            </itemRef>
            <itemRef residref="urn:newsml:iptc.org:20081007:tutorial—item-B"
                contenttype="application/vnd.iptc.g2.newsitem+xml"
                size="300039">
                <itemClass qcode="ninat:picture" />
                <provider qcode="nprov:AcmeNews"/>
                <pubStatus qcode="stat:usable"/>
                <title>Barack Obama arrive à Washington</title>
                <description role="drol:caption">Si nous avons aujourd'hui un
                   afro-américain et une femme dans la course à la présidence.
                </description>
            </itemRef>
        </group>
    </groupSet>
</packageItem>

11.4. Document structure

The building blocks of the Package Item are the <packageItem> root element, with additional wrapping elements for metadata about the Package (itemMeta), metadata about the content (contentMeta) and the package content (groupSet). The top level (root) element <packageItem> attributes are:

<packageItem xmlns="http://iptc.org/std/nar/2006-10-01/"
    guid="tag:example.com,2008:UK-NEWS-TOPTEN:UK20081220098658"
    version="10">
    standard="NewsML-G2"
    standardversion="2.25"
    conformance="power"

This is followed by Catalog information:

<catalogRef
    href="http://www.iptc.org/std/catalog/catalog.IPTC-G2-Standards_30.xml" />
<catalogRef
    href="http:/www.example.com/customer/cv/catalog4customers-1.xml" />

11.5. Item Metadata

The <itemMeta> wrapper contains properties that are aids to processing the package contents.

11.5.1. Profile

The <profile> element allows a provider to name a pre-arranged template or transformation stylesheet that can be used to process the package, for example "text and picture" could be the name of a template; "textpicture.xsl" would be an xsl stylesheet. The @versioninfo of a <profile> enables the template or stylesheet to be versioned:

<profile versioninfo="1.0.0.2">simple_text_with_picture.xsl</profile>

11.5.2. Item Metadata in full

<itemMeta>
    <itemClass qcode="ninat:composite" />
    <provider qcode="nprov:AcmeNews" />
    <versionCreated>2017-11-07T12:30:00Z</versionCreated>
    <firstCreated>2008-12-20T12:25:35Z</firstCreated>
    <pubStatus qcode="stat:usable" />
    <profile versioninfo="1.0.0.2">simple_text_with_picture.xsl</profile>
    <service qcode="svc:uktop">
        <name>Top UK News stories hourly</name>
    </service>
    <title>UK-TOPNEWS</title>
    <edNote>Updates the previous version</edNote>
    <signal qcode="sig:update" />
</itemMeta>

11.6. Content Metadata

The <contentMeta> wrapper in this example contains extended metadata about the person who compiled the package, including hours of duty and contact telephone number.

<contentMeta>
    <contributor jobtitle="staffjobs:cpe" qcode="mystaff:MDancer">
        <name>Maurice Dancer</name>
        <name>Chief Packaging Editor</name>
        <definition validto="2017-11-17T17:30:00Z">Duty Packaging Editor</definition>
        <note validto="2017-11-17T17:30:00Z">Available on +44 207 345 4567 until 17:30 GMT today</note>
    </contributor>
    <headline xml:lang="en">UK</headline>
</contentMeta>

11.7. Group Set

The <groupSet> has a mandatory root attribute that references the primary child <group> element. The primary <group> element must identify itself using an @id that matches the @root of <groupSet>.

<groupSet root="G1">

11.7.1. Group

Although the id attribute is optional, in practice one must be provided to match the mandatory root attribute of the <groupSet>, even if there is only one <group>. If there is more than one <group> element, one and only one can be identified as the root group.

Group elements must also contain a role attribute to declare its role within the package structure. The role is a QCode, but a Scheme of Roles may typically contain values representing "main", "sidebar" or other editorial terms that express how the content is intended to be used in the package.

<group id="G1" role="group:main">

11.7.2. Item Reference

The <itemRef> element identifies an Item or a Web resource using @href and/or @residref. The IPTC recommends that Package Items should reference NewsML-G2 Items if they are available (typically News Items) rather than other types of resource, such as "raw" news objects. Referring to other kinds of Web-accessible resource is allowed and is a legitimate use-case, however it has some disadvantages. Resources referred to in this way cannot be managed or versioned: if one of the resources is changed, the entire package may need to be re-compiled and sent, whereas a reference to a managed object such as a <newsItem> may refer to the latest (or a specific) version.

The example versions the referenced Items using @version, and gives processing or usage hints using @contenttype and @size. The @contenttype uses the registered IANA Media Type for a NewsML-G2 News Item:

<itemRef residref="urn:newsml:iptc.org:20081007:tutorial-item-A"
    contenttype="application/vnd.iptc.g2.newsitem+xml"
    size="2345">

The Item Reference includes properties from the referenced Item that have been extracted as an aid to processing:

    <itemClass qcode="ninat:text" />
    <provider qcode="nprov:AcmeNews"/>
    <pubStatus qcode="stat:usable"/>
    <title>Obama annonce son équipe</title>
    <description role="drol:summary">Le rachat il y a deux ans de la
        propriété par Alan Gerry, magnat local de la télévision câblée, a
        permis l'investissement des 100 millions de dollars qui étaient
        nécessaires pour le musée et ses annexes, et vise à favoriser le
        développement touristique d'une région frappée par le chômage.
    </description>
</itemRef>

11.8. Hierarchical Package Structure

Hierarchies of Groups and Item References can be created by adding multiple Groups to Packages and using <groupRef>, to reference other Groups by @idref, as illustrated by the following diagram:

parent-child package
Figure: Code outline of hierarchical package with two groups, visualising parent-child structure (right)

The code listing below shows how such a hierarchical package would be fully expressed in XML in a NewsML-G2 Group Set:

LISTING 7: Group Set example showing Hierarchical Package Structure

All Scheme Aliases used in listing below indicate IPTC NewsCodes vocabularies, except for group,

<groupSet root="G1">
    <group id="G1" role="group:main">
        <itemRef residref="urn:newsml:iptc.org:20081007:tutorial—item-A"
            contenttype="application/vnd.iptc.g2.newsitem+xml"
            size="2345">
            <itemClass qcode="ninat:text" />
            <title>Obama annonce son équipe</title>
        </itemRef>
        <itemRef residref="urn:newsml:iptc.org:20081007:tutorial—item-B"
            contenttype="application/vnd.iptc.g2.newsitem+xml"
            size="300039">
            <itemClass qcode="ninat:picture" />
            <title>Barack Obama arrive à Washington</title>
        </itemRef>
        <groupRef idref="G2" />
    </group>
    <group id="G2" role="group:sidebar">
        <itemRef residref="urn:newsml:iptc.org:20081007:tutorial—item-C"
            contenttype="application/vnd.iptc.g2.newsitem+xml"
            size="1503">
            <itemClass qcode="ninat:text" />
            <title>Clinton reprend son rôle de chef de la santé</title>
        </itemRef>
        <itemRef residref="urn:newsml:iptc.org:20081007:tutorial—item-D"
            contenttype="application/vnd.iptc.g2.newsitem+xml"
            size="350280">
            <itemClass qcode="ninat:picture" />
            <title>Hillary Clinton à une rassemblement à New York</title>
        </itemRef>
    </group>
</groupSet>

In the example, the "root" group is identified as the group with id="G1". This group has a role of "main" and consists of a text story and a picture of Barack Obama. The group with id="G2" has the role of "sidebar" and contains a text and picture of Hillary Clinton. It is referenced by a <groupRef> in Group G1.

11.9. List Type Package Structure

The @mode indicates the relationship between components of a group using one of three values from the IPTC Package Group Mode NewsCodes (recommended Scheme Alias "pgrmod"):

  • pgrmod:bag – an unordered collection of components, for example different components of a web news page with no special order, as in the example below. This is the default @mode.

  • pgrmod:seq – denotes a sequential package group set in descending order, for example a "Top Ten" list: each sub-group would provide references to a text article and a related picture.

  • pgrmod:alt – an unordered collection. Each sub-group is an alternative to its peer groups in the set, for example coverage of a news event supplied in different languages.

Alternative package components
Figure: Code skeleton view of package with alternative components, with visualisation of structure

The diagram above shows a package containing two Items in the root group, and a group reference to a "group of groups" with package mode set to "alt" indicating that the child groups contain alternative content. The example uses groups of associated video suitable for different Android device screen sizes as indicated by the @role of each group.

The code overview shows the root group referencing the two Items and the <groupRef> element referencing the group with @id "G2". Group G2 has its package mode set to "alt" and its components are references to alternate groups G3, G4 and G5, which reference videos at the required rendition for each screen type.

The right-hand image in the diagram is a visual representation of the relationship expressed through this package structure.

Note the <group> that has its Mode set to "alt" – not the "main" group but the second group with @id "G2". The components of this group are alternatives: each references a group containing the video content. The code example below shows how this relationship is fully expressed in NewsML-G2:

LISTING 8: Group Set example showing an "alt" Package Mode

All Scheme Aliases used in listing below indicate IPTC NewsCodes vocabularies, except for group,

<groupSet root="G1">
    <group id="G1" role="group:main">
        <itemRef residref="urn:newsml:iptc.org:20081007:tutorial—item-A"
            contenttype="application/vnd.iptc.g2.newsitem+xml"
            size="2345">
            <itemClass qcode="ninat:text" />
            <title>Obama annonce son équipe</title>
        </itemRef>
        <itemRef residref="urn:newsml:iptc.org:20081007:tutorial—item-B"
            contenttype="application/vnd.iptc.g2.newsitem+xml"
            size="1503">
            <itemClass qcode="ninat:text" />
            <title>Clinton reprend son rôle de chef de la santé</title>
        </itemRef>
        <groupRef idref="G2" />
    </group>
    <group id="G2" role="group:video" mode="pgrmod:alt">
        <groupRef idref="G3" />
        <groupRef idref="G4" />
        <groupRef idref="G5" />
    </group>
    <group id="G3" role="group:mdpi">
        <itemRef residref="urn:newsml:iptc.org:20081007:tutorial—item-C"
            contenttype="video/mp4" width="480" height="320">
            <itemClass qcode="ninat:video" />
            <title>Barack Obama arrive à Washington</title>
        </itemRef>
        <itemRef residref="urn:newsml:iptc.org:20081007:tutorial—item-D"
            contenttype="video/mp4" width="480" height="320">
            <itemClass qcode="ninat:video" />
            <title>Hillary Clinton à une rassemblement à New York</title>
        </itemRef>
    </group>
    <group id="G4" role="group:hdpi">
        <itemRef residref="urn:newsml:iptc.org:20081007:tutorial—item-E"
            contenttype="video/mp4" width="720" height="480">
            <itemClass qcode="ninat:video" />
            <title>Barack Obama arrive à Washington</title>
        </itemRef>
        <itemRef residref="urn:newsml:iptc.org:20081007:tutorial—item-F"
            contenttype="video/mp4" width="720" height="480">
            <itemClass qcode="ninat:video" />
            <title>Hillary Clinton à une rassemblement à New York</title>
        </itemRef>
    </group>
    <group id="G5" role="group:xhdpi">
        <itemRef residref="urn:newsml:iptc.org:20081007:tutorial—item-G"
            contenttype="video/mp4" width="960" height="640">
            <itemClass qcode="ninat:video" />
            <title>Barack Obama arrive à Washington</title>
        </itemRef>
        <itemRef residref="urn:newsml:iptc.org:20081007:tutorial—item-H"
            contenttype="video/mp4" width="960" height="640">
            <itemClass qcode="ninat:video" />
            <title>Hillary Clinton à une rassemblement à New York</title>
        </itemRef>
    </group>
</groupSet>

11.10. A Sequential "Top Ten" Package

The screenshot at the start of this Chapter shows a "Top Ten" list of news items in order of importance. The package mode of "seq" indicates that the components are in descending order and a code skeleton and visual representation of the package structure is shown in the diagram below:

Sequential mode package
Figure: Code skeleton of a sequential mode package and (right) the resulting relationship structure

Note how the <group> sets the Mode for its components, in this case the component group references of the "main" group are sequentially ordered. The relationship is fully-expressed in XML in NewsML-G2 as shown below:

LISTING 9: Group Set example showing a "seq" Package Mode

All Scheme Aliases used in listing below indicate IPTC NewsCodes vocabularies, except for group,

<groupSet root="G1">
    <group id="G1" role="group:main" mode="pgrmod:seq">
        <groupRef idref="G2" />
        <groupRef idref="G3" />
        <groupRef idref="G4" />
        <groupRef idref="G5" />
        <groupRef idref="G6" />
        <groupRef idref="G7" />
        <groupRef idref="G8" />
        <groupRef idref="G9" />
        <groupRef idref="G10" />
        <groupRef idref="G11" />
    </group>
    <group id="G2" role="group:top" >
        <itemRef residref="urn:newsml:iptc.org:20081007:tutorial-item-A"
            contenttype="application/vnd.iptc.g2.newsitem+xml"
            size="3452">
            <itemClass qcode="ninat:text" />
            <provider qcode="nprov:AcmeNews"/>
            <pubStatus qcode="stat:usable"/>
            <title>Bank cuts interest rates to record low</title>
            <description role="drol:summary">London (Reuters) - The Bank of England cut
                interest rates by half a percentage point on Thursday to a record low of
                1.5 percent and economists expect it to cut again in February as it
                battles to prevent Britain from falling into a deep slump.
            </description>
        </itemRef>
        <itemRef residref="urn:newsml:iptc.org:20081007:tutorial—item-B"
            contenttype="application/vnd.iptc.g2.newsitem+xml"
            size="230003">
            <itemClass qcode="ninat:picture" />
            <provider qcode="nprov:AcmeNews "/>
            <pubStatus qcode="stat:usable"/>
            <title>BoE Rate Decision</title>
        </itemRef>
    </group>
    <group id="G3" role="group:two" >
        <itemRef residref="urn:newsml:iptc.org:20081007:tutorial-item-C"
            contenttype="application/vnd.iptc.g2.newsitem+xml"
            size="2345">
            <itemClass qcode="ninat:text" />
            <provider qcode="nprov:AcmeNews"/>
            <pubStatus qcode="stat:usable"/>
            <title>Government denies it will print more cash</title>
            <description role="drol:summary">London (Reuters) – Chancellor
                Alistair Darling dismissed reports on Thursday that the government was about
                to    boost the money supply to ease the impact of recession.
            </description>
        </itemRef>
        <itemRef residref="urn:newsml:iptc.org:20081007:tutorial—item-D"
            contenttype="application/vnd.iptc.g2.newsitem+xml"
            size="24065">
            <itemClass qcode="ninat:picture" />
            <provider qcode="nprov:AcmeNews "/>
            <pubStatus qcode="stat:usable"/>
            <title>Sterling notes and coin</title>
        </itemRef>
    </group>
    <group id="G4" role="group:three" >
        <itemRef residref="urn:newsml:iptc.org:20081007:tutorial-item-E"
            contenttype="application/vnd.iptc.g2.newsitem+xml"
            size="2345">
            <itemClass qcode="ninat:text" />
            <provider qcode="nprov:AcmeNews"/>
            <pubStatus qcode="stat:usable"/>
            <title>Rugby's Mike Tindall banned for drink-driving</title>
            <description role="drol:summary">London (Reuters) - England rugby player Mike
                Tindall was banned from driving for three years and fined £500 on Thursday
                for his second drink-drive offence.
            </description>
        </itemRef>
        <itemRef residref="urn:newsml:iptc.org:20081007:tutorial—item-F"
            contenttype="application/vnd.iptc.g2.newsitem+xml"
            size="25346">
            <itemClass qcode="ninat:picture" />
            <provider qcode="nprov:AcmeNews "/>
            <pubStatus qcode="stat:usable"/>
            <title>Mike Tindall in rugby action for England</title>
        </itemRef>
    </group>
    <group id="G5" role="group:four" >
        <itemRef residref="urn:newsml:iptc.org:20081007:tutorial—item-G"
            contenttype="application/vnd.iptc.g2.newsitem+xml"
            size="3654">
            <itemClass qcode="ninat:text" />
            <title>Crunch forces employees to work unpaid overtime</title>
        </itemRef>
    </group>
    <group id="G6" role="group:five" >
        <itemRef residref="urn:newsml:iptc.org:20081007:tutorial—item-H"
            contenttype="application/vnd.iptc.g2.newsitem+xml"
            size="5123">
            <itemClass qcode="ninat:text" />
            <title>Government warns of tax fraudsters</title>
        </itemRef>
    </group>
    <group id="G7" role="group:six" >
        <itemRef residref="urn:newsml:iptc.org:20081007:tutorial—item-I"
            contenttype="application/vnd.iptc.g2.newsitem+xml"
            size="4323">
            <itemClass qcode="ninat:text" />
            <title>Nissan to cut 1,200 jobs at Sunderland plant</title>
        </itemRef>
    </group>
    <group id="G8" role="group:seven" >
        <itemRef residref="urn:newsml:iptc.org:20081007:tutorial—item-J"
            contenttype="application/vnd.iptc.g2.newsitem+xml"
            size="3122">
            <itemClass qcode="ninat:text" />
            <title>Sainsbury sales tops forecast</title>
        </itemRef>
    </group>
    <group id="G9" role="group:eight" >
        <itemRef residref="urn:newsml:iptc.org:20081007:tutorial—item-K"
            contenttype="video/mp4-480x320"
            size="322443">
            <itemClass qcode="ninat:video" />
            <title>Cause of wind turbine damage unknown</title>
        </itemRef>
    </group>
    <group id="G10" role="group:nine" >
        <itemRef residref="urn:newsml:iptc.org:20081007:tutorial—item-L"
            contenttype="application/vnd.iptc.g2.newsitem+xml"
            size="4123">
            <itemClass qcode="ninat:text" />
            <title>Muslims warn Gaza crisis could provoke extremism</title>
        </itemRef>
    </group>
    <group id="G11" role="group:ten" >
        <itemRef residref="urn:newsml:iptc.org:20081007:tutorial—item-M"
            contenttype="application/vnd.iptc.g2.newsitem+xml"
            size="8192">
            <itemClass qcode="ninat:text" />
            <title>Banks hiring young Britons to prepare for upturn</title>
        </itemRef>
    </group>
</groupSet>

11.11. Package Processing Considerations

11.11.1. Other NewsML-G2 Items

In the above examples, the referenced resources in the package have been News Items, but <itemRef> may also refer to other Items, such as Package Items. The following example of <itemRef> shows how a Package Item can be used as part of a Package Item. This type of "Super Package" could be used to send a "Top Ten" package (a themed list of news) where each referenced item is also a package consisting of references to the text, picture and video coverage of each news story.

The advantage of using this "package of packages" approach is that it promotes more efficient re-use of content. Once created, any of the "sub-packages" can be easily referenced by more than one "super-package": a package about a given story could be used by both "Top News This Hour" and by "Today’s Top News". If the individual News Items that make up a sub-package were to be referenced directly, these references have to be assembled each time the story is used, either by software or a journalist, which would be less efficient.

As these sub-packages are managed objects, we use @residref to identify and locate the referenced items. Each referenced item may be a Package Item, shown by the Item Class of "composite" and the Content Type of "application/vnd.iptc.g2.packageitem+xml". Each <itemRef> would then resemble the following:

<itemRef residref="tag:afp.com,2008:TX-PAR:20080529:JYC99"
    contenttype="application/vnd.iptc.g2.packageitem+xml"
    size="28047">
    <itemClass qcode="ninat:composite" />
    <provider qcode="nprov:AFP"/>
    <pubStatus qcode="stat:usable"/>
    <title>Tiger Woods cherche son retour</title>
    <description role="drol:summary"> Tiger Woods lorem ipsum dolor sit amet,
        consectetur adipiscing elit. Etiam feugiat. Pellentesque ut enim eget
        eros volutpat consectetur. Quisque sollicitudin, tortor ut dapibus
        porttitor, augue velit vulputate eros, in tempus orci nunc vitae nunc.
        Nam et lacus ut leo convallis posuere. Nullam risus.
    </description>
</itemRef>

11.11.2. Facilitating the Exchange of Packages

There needs to be some consideration of how such a "Super Package" should be processed by the receiver. The power and flexibility inherent in NewsML-G2 Packages could lead to confusion and processing complexity unless provider and receiver agree on a method for specifying the structure of packages and signalling this to the receiving application. Processing hints such as the <profile> property (described above) intended to help resolve this issue.

In the example below, we maintain flexibility and inter-operability with potential partner organisations by defining any number of standard package "templates" – termed Profiles – for the Package, among other processing hints. Partners would agree in advance on the Profiles and rules for processing them. All that the provider then needs to do is place the pre-arranged Profile name, or the name of a transformation script, in the <profile> property.

Package profiles could be represented as diagrams like those shown below:

Package Profiles
Figure: Diagrams of Package Profiles. The numbers in brackets indicate the required items

In this example, the Profile Name is intended to be a signal to the processor that references to each member of the Top Ten list are placed in their own group, and that we create our Top Ten list in the "root" group of the Package Item as an ordered list of <groupRef> elements. (as in the "Top Ten" list profile shown in the above diagram)

The properties in <itemMeta> that can be used to provide information on processing are:

<generator>, a versioned string denoting the name of the process or service that created the package:

<generator versioninfo="3.0">MyNews Top Ten Packager</generator>

<profile>, as discussed, sets the template or transformation stylesheet of the package

<profile versioninfo="1.0.0.2">ranked_idref_list</profile>

<signal> is a QCode type property that instructs the receiver to perform any required actions upon receiving the Item. An <edNote> may contain natural-language instructions, if necessary, and a <link> property denotes the previous version of the package.

<signal qcode="action:replacePrev" />
<edNote>Replace the previous package</edNote>
<link
    rel="irel:previousVersion"
    residref="tag:example.com,2008:UK-NEWS-TOPTEN:UK20081220098658"
    version="1"
/>

12. Concepts and Concept Items

12.1. Introduction

Concepts in NewsML-G2 are a method of describing real-world entities, such as people, events and organisations, and also to describe thoughts or ideas: abstract notions such as subject classifications, facial expressions. Using concepts, we can classify news, and the entities and ideas found in news, to make the content more accessible and relevant to people’s particular information needs.

Content originators who make up the IPTC membership constantly strive to increase the value proposition of their products. The need to extract and properly express the meaning of news using concepts is a major reason for moving to NewsML-G2.

Clear and unambiguously-defined concepts enable receivers of information to categorize and otherwise handle news more effectively, routing content and archiving it accurately and quickly using automated processes.

NewsML-G2 Concepts are powerful because they bring meaning to news content in a way that can be understood by humans and processed by machines. The concept model aligns with work being done at the W3C and elsewhere to realize the Semantic Web.

Concepts are conveyed individually in Concept Items, or (more commonly) are collected as groups of Concepts in Knowledge Items. These can be collections with a common purpose, such as Controlled Vocabularies.

This Chapter gives details of the Concept element that is common to both types of Item, and also describes the Concept Item. The Chapter 13 Knowledge Items, succeeding this one, described Knowledge Items in detail.

Concepts are also used to convey event information, which is described in detail in Events in NewsML-G2.

12.2. What is a Concept?

A NewsML-G2 Concept is anything about which we can be express knowledge in some formal way, and which may also have a named relationship with other concepts:

  • "Mario Draghi" is a concept about which, or whom, we can express knowledge, for example, date of birth (September 3, 1947), job title (President of the European Central Bank).

  • "The European Central Bank" is a concept. It has an address, a telephone number, and other inherent characteristics of an organisation.

  • We can express a named relationship: "Mario Draghi" is a member of "The European Central Bank" NewsML-G2 concept expressions thus conform with an RDF triple of subject, predicate and object

Concepts are either global in scope, when they are identified by a URI using a @uri, optionally taking the format of a QCode attribute @qcode, or their scope is local to the containing document when identified by a string value using @literal (where permitted) The use of @literal identifier is a special case that matches the identifier of an <assert> in the NewsML-G2 document that contains a localised concept structure. See 20.2 The Assert Wrapper for more details. This Chapter describes Concepts identified by QCodes and URIs.

12.3. Creating Concepts – the <concept> element

The <concept> element contains the properties that express the concept in detail and identify it so that it can be used and re-used:

12.3.1. Concept ID <conceptId>

A concept MUST contain a <conceptId> which takes the form of a QCode (@qcode) attribute. Optionally the full URI may be added using a @uri. If URI resolved from @qcode is not the same as the @uri value, then the URI resolved from the @qcode takes precedence. Optionally, this can be refined using date-time for @created and @retired.

<concept>
    <conceptId created="2009-01-01T12:00:00Z" qcode="foo:bar" />
    ...
</concept>

When a concept is retired by use of the @retired attribute of <conceptId), the authority behind the concept is indicating that it is no longer actively using this concept (for example it may have been merged with another concept), but resources that were created before the change must continue to be able to resolve the concept.

12.3.2. Concept Name <name>

A concept MUST contain at least one <name>, a natural language name for the concept, with optional attributes of @xml:lang and @dir (text direction):

<concept>
    <conceptId qcode="foo:bar" />
        <name>Mario Draghi</name>
</concept>

Concepts are designed to be useable in multiple languages:

<concept>
    <conceptId created="2000-10-30T12:00:00+00:00" qcode="medtop:01000000" />
    <type qcode="cpnat:abstract" />
    <name xml:lang="en-GB">arts, culture and entertainment</name>
    <name xml:lang="de">Kultur, Kunst, Unterhaltung</name>
    <name xml:lang="fr">Arts, culture, et spectacles</name>
    <name xml:lang="es">arte, cultura y espectáculos</name>
    <name xml:lang="ja-JP">文化</name>
    <name xml:lang="it">Arte, cultura, intrattenimento</name>
</concept>

12.3.3. Concept Type

The optional <type> element expresses the "nature of the Concept", for example, using the recommended IPTC Concept Nature NewsCodes to identify this concept is of type "abstract". We can also use <related> to extend this notion into further characteristics of the concept (see Relationships between Concepts, below).

<type> demonstrates the use of the subject, predicate, object triple of RDF to express a named relationship with another concept; <type> can only express one kind of relationship – "is a(n)". It is used to express the most obvious, or primary, inherent characteristic of a concept, as in:

arts, culture and entertainment (Subject) is a(n) (Predicate) abstract concept (Object):

<type qcode="cpnat:abstract" />

The current types agreed by the IPTC and contained in the "concept nature" CV at http://cv.iptc.org/newscodes/cpnature/ are:

  • abstract concept (cpnat:abstract),

  • person (cpnat:person),

  • organisation (cpnat:organisation),

  • geopolitical area (cpnat:geoArea),

  • point of interest (cpnat:poi),

  • object (cpnat:object)

  • event (cpnat:event).

12.3.4. Concept Definition

The optional <definition> element allows more extensive natural language information with some mark-up,

if required. Block type elements may use an optional @role QCode to differentiate repeating Definition statements such as "summary" and "long":

<definition xml:lang="en-GB" role="definitionrole:short">
    Matters pertaining to the advancement and refinement of the human mind,
    of interests, skills, tastes and emotions
</definition>

Note that although much of this information could be, and may be, duplicated in machine-readable XML, it is still useful to carry some core information in human-readable form.

12.3.5. Note

The <note> element may be used to add supplemental natural-language information on the concept as a block of text with some optional mark-up, again with an optional @role:

<note>
    This is a top-level concept from the IPTC Media Topic NewsCodes
</note>

12.4. Conveying Concepts: the Concept Item structure

A Concept Item conveys knowledge about a single concept, whether a real-world entity such as a person, or an abstract concept such as a subject. It shares the basic structure of all NewsML-G2 Items and therefore uses the same methods for identification, versioning and conformance levels.

Item Metadata is mandatory and contains the mandatory properties for Item Class, Provider and Version Created (note that Publication Status is optional but the Item’s publication status must be assumed to be the default "usable" if the property is absent).

Content Metadata is optional and is not included in this example:

Note the <itemClass> property for a Concept Item must use the IPTC Concept Item Nature NewsCodes with a recommended Scheme Alias of "cinat" and denotes this Item conveys a NewsML-G2 Concept.

12.4.1. Completed Concept Item

This example is a Concept Item that describes one of the IPTC Media Topic NewsCodes:

LISTING 10: Abstract Concept conveyed in a NewsML-G2 Concept Item

All Scheme Aliases used in listing below indicate IPTC NewsCodes vocabularies.

<?xml version="1.0" encoding="UTF-8"?>
<conceptItem
    xmlns="http://iptc.org/std/nar/2006-10-01/"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://iptc.org/std/nar/2006-10-01/
        ./NewsML-G2_2.25-spec-All-Power.xsd"
    guid="urn:newsml:iptc.org:20080229:ncdci-subjectcode"
    version="10"
    standard="NewsML-G2"
    standardversion="2.25">
    <catalogRef
        href="http://www.iptc.org/std/catalog/catalog.IPTC-G2-Standards_30.xml" />
    <rightsInfo>
        <copyrightHolder>
            <name>IPTC International Press Telecommunications Council, 20 Garrick Street, London WC2E 9BT, UK</name>
        </copyrightHolder>
        <copyrightNotice>Copyright 2016-17, IPTC, www.iptc.org, All
            Rights Reserved</copyrightNotice>
    </rightsInfo>
    <itemMeta>
        <itemClass qcode="cinat:concept" />
        <provider qcode="nprov:IPTC" />
        <versionCreated>2017-11-07T12:35:21+01:00</versionCreated>
        <firstCreated>2008-02-29T12:00:00+00:00</firstCreated>
        <pubStatus qcode="stat:usable" />
        <title xml:lang="en">Concept Item delivering a
             concept requested from the IPTC Media Topic NewsCodes</title>
    </itemMeta>
    <concept>
        <conceptId created="2000-10-30T12:00:00+00:00" qcode="medtop:01000000" />
        <type qcode="cpnat:abstract" />
        <name xml:lang="en-GB">arts, culture and entertainment</name>
        <name xml:lang="de">Kultur, Kunst, Unterhaltung</name>
        <name xml:lang="fr">Arts, culture, et spectacles</name>
        <name xml:lang="es">arte, cultura y espectáculos</name>
        <name xml:lang="ja-JP">文化</name>
        <name xml:lang="it">Arte, cultura, intrattenimento</name>
        <definition xml:lang="en-GB">Matters pertaining to the advancement and refinement
            of    the human mind, of interests, skills, tastes and emotions</definition>
        <definition xml:lang="de">Sachverhalte, die die Veränderung und Weiterentwicklung
            des menschlichen Geistes,    der Interessen, des Geschmacks, der Fähigkeiten und
            der Gefühle betreffen.</definition>
        <definition xml:lang="fr">Tout ce qui est relatif à la création d'œuvres, au
            développement des facultés intellectuelles, et à leur représentation
            publique</definition>
        <definition xml:lang="es">Asuntos pertinentes al avance y refinamiento de la mente
            humana, intereses, habilidades, gustos y emociones.</definition>
        <definition xml:lang="ja-JP">
            人間の精神や興味、技能、嗜好、感情の進歩や洗練に関係する事柄</definition>
        <definition xml:lang="it">Creazione e rappresentazione dell'opera d'arte, gli
            Interessi intellettuali, il gusto e le emozioni umane</definition>
        <note xml:lang="en-GB">
            This is a top-level concept from the IPTC Media Topic NewsCodes
        </note>
    </concept>
</conceptItem>

12.5. Concepts for real-world entities

For each of the types of named entities agreed by the IPTC: person, organisation, geographical area, point of interest, object and event, there is a specific group of additional properties. The following example is a Concept Item for a person.

12.5.1. Document Structure

The document structure is as previously described, with a root <conceptItem> element and <itemMeta>. The <contentMeta> element is optional and may only contain Administrative metadata properties, such as <contentModified> (not included in the example)

<?xml version="1.0" encoding="UTF-8"?>
<conceptItem xmlns="http://iptc.org/std/nar/2006-10-01/"
    guid="urn:newsml:iptc.org:20080229:ncdci-person"
    version="1010181123618"
    standard="NewsML-G2"
    standardversion="2.25"
    <catalogRef
        href="http://www.iptc.org/std/catalog/catalog.IPTC-G2-Standards_30.xml"
    />
    <rightsInfo>
        <copyrightHolder>
            <name>IPTC - International Press Telecommunications Council, 20 Garrick Street, London WC2E 9BT, UK</name>
        </copyrightHolder>
        <copyrightNotice>Copyright 2008, IPTC, www.iptc.org, All
        Rights Reserved</copyrightNotice>
    </rightsInfo>
    <itemMeta>
        <itemClass qcode="cinat:concept" />
        <provider qcode="nprov:IPTC" />
        <versionCreated>2017-11-07T12:38:18Z</versionCreated>
        <firstCreated>2008-12-29T11:00:00Z</firstCreated>
        <pubStatus qcode="stat:usable" />
        <title xml:lang="en">Concept Item describing Mario Draghi</title>
    </itemMeta>

12.5.2. Top-level concept details

The <concept> wrapper starts with the properties common to all types of concepts:

<concept>
    <conceptId created="2009-01-10T12:00:00Z" qcode="people:329465"/>
    <type qcode="cpnat:person" />
    <name xml:lang="en-GB">Mario Draghi</name>
    <definition xml:lang="en-GB" role="definitionrole:biog">
        Mario Draghi, born 3 September 1947,
        is an Italian banker and economist who succeeded Jean-Claude Trichet as the
        President of the European Central Bank on 1 November 2011. He was previously
        the governor of the Bank of Italy from January 2006 until October 2011.
        In 2014 Forbes nominated Draghi 9th most powerful person in the world.<br />
    </definition>
    <note xml:lang="en-GB">
        Not Mario D’roggia, international powerboat racer
    </note>
    <related rel="relation:occupation" qcode="jobtypes:puboff" />
        <sameAs type="cpnat:person" qcode="pers:567223">
            <name>DRAGHI, Mario</name>
        </sameAs>
    ....
</concept>

Note the inclusion of Concept Relationship properties: the <related> element indicates that the person who is the subject of the concept "has occupation of" the related concept expressed in by the @qcode "jobtypes:puboff". The <sameAs> element indicates that this concept is the same as AFP’s (note: fictitious) concept expressed by the @qcode "pers:567223".

12.5.3. Person details

The <personDetails> element is a container for additional properties that are specifically designed to convey information about people:

Born <born> and Died <died>

The date of birth and date of death of the person, for example:

<born>1947-09-03</born>

The data type is "TruncatedDateTime", which means that the value is a date, with an optional time part. The date value may be truncated from the right to a minimum of YYYY. If used, the time must be present in full, with time zone, and ONLY in the presence of the full date.

<born>1947</born>
Affiliation <affiliation>

An affiliation of the person to an organisation.

<affiliation type="orgnat:employer" qcode="org:ECB">
    <name>European Central Bank</name>
</affiliation>

Note that the @type refers to the type of organisation – not the type of relationship with the person. In the example we use scheme "orgnat" to describe the Nature of the Organisation as a Bank.

Contact Info <contactInfo>

Contact information associated with the person. The <contactInfo> element wraps a structure with the properties outlined below. A "person" concept may have many instances of <contactInfo>, each with @role indicating their purpose, or example work or home. These are controlled values, so a provider may create their own CV of address types if required. The IPTC NewsCodes for the roles of parts of <contactInfo> is http://cv.iptc.org/newscodes/contactinfopartrole/ with a recommended scheme alias of "ciprol" (used in the example below)

Each of the child elements of <contactInfo> may be repeated as often as needed to express different @roles, for example different "office" and "mobile" phone numbers.

Property Name Element Type Notes

Email Address

<email>

Electronic Address

An “Electronic Address” type allows the expression of @role (QCode) to qualify the information, for example: <email role=“addressrole:office”> info@ecb.eu </email>

Instant Message Address

<im>

Electronic Address

<im role=“imsrvc:reuters”> jc.trichet.ecb.eu@reuters.net </im>

Phone Number

<phone>

Electronic Address

Fax Number

<fax>

Electronic Address

Web site

<web>

IRI

<web role=“webrole:corporate”> www.ecb.eu </web>

Postal Address

<address>

Address

The Address may have a @role to denote the type of address is contains (e.g. work, home) and may be repeated as required to express each address @role.

Other information

<note>

Block

Any other contact-related information, such as “annual vacation during August”

For example:

<contactInfo>
    <email role="ciprol:office">info@ecb.eu</email>
    <im role="imsrvc:reuters">president.ecb.eu@reuters.net</im>
    <phone role="ciprol:office">+49 69 13 44 0</phone>
    <phone role="ciprol:mobile">+49 69 13 44 60 00</phone>
    <web>www.ecb.eu</web>
    <address role="ciprol:office">
        <!--  see below  -->
    </address>
</contactInfo>
Postal address <address>

The Address Type property may have a @role to indicate its purpose, The following table shows the available child properties. Apart from <line>, which is repeatable, each element may be used once for each <address>

Property Name Element Type Notes/Example

Address Line

<line>

Internationalized string

As many as are needed

Locality

<locality>

Flexible Property

May be a URI, QCode or Literal value, or no value with a <name> child element

Area

<area>

Flexible Property

Country

<country>

Flexible Property

Postal Code

<postalCode>

World Region

<worldRegion>

Flexible Property

For example:

<address role="ciprol:office">
    <line>Postfach 16 03 19</line>
    <locality>
        <name>Frankfurt am Main</name>
    </locality>
    <country qcode="iso3166-1a2:DE">
        <name xml:lang="en">Germany</name>
    </country>
    <postalCode>D-60066</postalCode>
    <worldRegion qcode="maxmindcc:EU">Europe</worldRegion>
</address>

12.5.4. Putting it together

The complete concept listing for this example:

LISTING 11: Person Concept conveyed in a NewsML-G2 Concept Item

All Scheme Aliases used in listing below indicate IPTC NewsCodes vocabularies, except for the following alias values: relation, jobtypes, pers, definitionrole, orgnat, org, imsrvc, maxmindcc.

<?xml version="1.0" encoding="UTF-8"?>
<conceptItem
    xmlns="http://iptc.org/std/nar/2006-10-01/"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://iptc.org/std/nar/2006-10-01/
        ./NewsML-G2_2.25-spec-All-Power.xsd"
    guid="urn:newsml:iptc.org:20080229:ncdci-person"
    version="10"
    standard="NewsML-G2"
    standardversion="2.25">
    <catalogRef
        href="http://www.iptc.org/std/catalog/catalog.IPTC-G2-Standards_30.xml" />
    <rightsInfo>
        <copyrightHolder>
            <name>IPTC - International Press Telecommunications Council, 20 Garrick Street, London WC2E 9BT, UK</name>
        </copyrightHolder>
        <copyrightNotice>Copyright 2016-17, IPTC, www.iptc.org, All
            Rights Reserved</copyrightNotice>
    </rightsInfo>
    <itemMeta>
        <itemClass qcode="cinat:concept" />
        <provider qcode="nprov:IPTC" />
        <versionCreated>2017-11-07T12:38:18Z</versionCreated>
        <firstCreated>2008-12-29T11:00:00Z</firstCreated>
        <pubStatus qcode="stat:usable" />
        <title xml:lang="en">Concept Item describing Mario Draghi</title>
    </itemMeta>
    <concept>
        <conceptId created="2009-01-10T12:00:00Z" qcode="people:329465" />
        <type qcode="cpnat:person" />
        <name xml:lang="en-GB">Mario Draghi</name>
        <definition xml:lang="en-GB" role="definitionrole:biog">
            Mario Draghi, born 3 September 1947,
            is an Italian banker and economist who succeeded Jean-Claude Trichet as the
            President of the European Central Bank on 1 November 2011. He was previously
            the governor of the Bank of Italy from January 2006 until October 2011. In 2013
            Forbes nominated Draghi 9th most powerful person in the world.<br />
        </definition>
        <note xml:lang="en-GB" role="nrol:disambiguation">
            Not Mario D’roggia, international powerboat racer
        </note>
        <related rel="relation:occupation" qcode="jobtypes:puboff" />
        <sameAs type="cpnat:person" qcode="pers:567223">
            <name>DRAGHI, Mario</name>
        </sameAs>
        <personDetails>
            <born>1947-09-03</born>
            <affiliation type="orgnat:employer" qcode="org:ECB">
                <name>European Central Bank</name>
            </affiliation>
            <contactInfo>
                <email role="ciprol:office">info@ecb.eu</email>
                <im role="imsrvc:reuters">president.ecb.eu@reuters.net</im>
                <phone role="ciprol:office">+49 69 13 44 0</phone>
                <phone role="ciprol:mobile">+49 69 13 44 60 00</phone>
                <web>www.ecb.eu</web>
                <address role="ciprol:office">
                   <line>Kaiserstrasse 29</line>
                   <locality>
                       <name>Frankfurt am Main</name>
                   </locality>
                   <country qcode="iso3166-1a2:DE">
                       <name xml:lang="en">Germany</name>
                   </country>
                   <postalCode>D-60311</postalCode>
                </address>
            </contactInfo>
        </personDetails>
    </concept>
</conceptItem>

12.6. More real-world entities

12.6.1. Organisation Details <organisationDetails>

A concept of type "organisation" may hold the following additional properties:

Founded <founded> and Dissolved <dissolved>

The date of foundation / dissolution of the organisation, equivalent to born/died for a person, for example

<founded>1998-06-01</founded>

Or

<founded>1998</founded>
Location <location>

A place where the organisation is located, expressed as Flexible Property, NOT an address, repeated as many times as needed. For example:

<location type="loctypes:regoff" qcode="poi:75001">
    <name>Paris</name>
</location>
Contact Information <contactInfo>

Contact information associated with the organisation, uses the same structure as described in Contact Information.

12.6.2. Geopolitical Area Details <geoAreaDetails>

A "geoArea" concept may have the following additional properties:

Position <position>

This expresses the coordinates of the concept using the following attributes:

Attribute Name Attribute Type Notes/Example

Latitude

@latitude

XML Decimal

The latitude in decimal degrees

Positive value = north of the Equator

Negative value = south of the Equator

Longitude

@longitude

XML Decimal

The longitude in decimal degrees

Positive value = east of the Greenwich Meridian

Negative value = west of the Greenwich Meridian

Altitude

@altitude

XML Integer

The absolute altitude in metres with reference to mean sea level

GPS Datum

@gpsdatum

XML String

The GPS datum associated with the position measurement, default is WGS84

Founded

The Date and optionally the time plus time zone, that the geopolitical area was founded

<founded>1998</founded>
Dissolved

The Date and optionally the time plus time zone, that the geopolitical area was dissolved.

12.6.3. Point of Interest <poiDetails>

A Point of Interest (POI) is a place "on the map" of interest to people, which is not necessarily a geographical feature, for example concert venue, cinema, sports stadium. As such is has different properties to a purely-geographical point. POI may have the additional properties listed below.

Address

The location of the point of interest expressed as a postal address. The <address> element is a wrapper for child elements described in Postal Address In this context, the address is expressly the location of the POI, whereas the <address> wrapper when used as a child of <contactInfo> (see Contact Information) expresses the address of the entity who should be contacted about the POI, which could be an office some distance away.

Position <position>

The coordinates of the location as described in Position

Opening Hours <openHours>

The opening hours of the POI are expressed as a Label type, which is an internationalized string – a natural language expression – extended to include @role if required. Example:

<openHours>9.30am to 5.30pm, closed for lunch from 1pm to
2pm</openHours>
Capacity <capacity>

The capacity of the POI is expressed as a Label:

<capacity>10,000 seats</capacity>
Contact Information <contactInfo>

Contact information for the POI uses the <contactInfo> structure as described in Contact Information. It expresses who should be contacted regarding the POI. This could be an organisation located miles away from the location of the POI.

Access details <access>

Methods of accessing the POI, including directions. This is a Block type of element, allowing some mark-up and may be repeated as often as needed:

<access role="traveltype:public">
    The Jubilee Line is recommended as the quickest route to ExCeL London. At Canning
    Town change to the DLR (upstairs on platform 3) for the quick two-stop journey
    to Custom House for ExCeL Station.
</access>
<access role="traveltype:road">
    When driving to ExCeL London follow signs for Royal Docks, City Airport and ExCeL
    There is easy access to the M25, M11, A406 and A13.
</access>
Detailed Information <details>

Detailed information about the location of the POI expressed as a Block type:

<details>Room M345, 3rd Floor</details>
Creation of POI <created>

The date (and optionally a time) on which the Point Of Interest was created.

<created>2016-06-23</created>
Destruction or teardown of POI <ceasedToExist>

The date (and optionally a time) on which the Point Of Interest ceased to exist; perhaps in reference to a temporary POI:

<ceasedToExist>2016-06-23</ceasedToExist>

12.6.4. Object Details <objectDetails>

Objects that may be expressed as a concept include works of art, books, inventions and industrial artefacts. The IPTC provides three properties for Objects as part of NewsML-G2, but as with any of the types of concept discussed, providers are able to extend the standard. Note these are properties of the object described by the Concept, NOT properties of <itemMeta> which apply to the Concept Item conveying the Concept. The standard additional properties of an Object concept are:

Creation Date <created>

The date and, optionally, the time and time zone when the object was created. Non-repeatable.

<created>1994-06-14</created>
Creator <creator>

A party (person or organisation) that created the object, expressed as a Flexibly Property type. Repeatable.

<creator type="cpnat:organisation" qcode="nyse:ba">
    <name>The Boeing Company</name>
</creator>

In this case, the object is a Boeing 777 airliner.

Any necessary copyright notice for claiming the intellectual property of the object. A repeatable Label type:

<copyrightNotice role="iprole:company>
    Copyright 2008 Boeing Aircraft, all rights reserved
</copyrightNotice>

12.7. Relationships between Concepts

This is a group of four properties, <broader> <narrower> <related> and <sameAs> that enable the creation of particular types of relationship to another concept. For example, our subject was born in Rome. We could create a concept for Rome as follows, with a <broader> property that denotes that the city as part the region of Lazio:

<concept>
    <conceptId qcode="urban:roma" />
    <type qcode="cpnat:geoArea" />
    <definition role="definitionrole:short">
        Rome (Italian: Roma) is a city and special commune
        (named "Roma Capitale") in Italy. Rome is the capital of Italy and also
        homonymous province and of the region of Lazio.<br />
    </definition>
    <broader type="cpnat:geoArea" qcode="locale:lazio">
        <name xml:lang="en">Lazio (region)</name>
    </broader>
</concept>

<narrower> expresses the reverse relationship. A concept for Rhône could have a <narrower> property linking it to Lyon, and a <broader> link to the concept of its parent region, or to the concept of the country, France.

<sameAs> allows the provider to inform the recipient that this concept has an equivalent concept in some other taxonomy. For example, we may know that AFP’s knowledge base of people has an entry for Mario Draghi that can be referenced using the appropriate alias.

<sameAs type="cpnat:person" qcode="AFPpers:567223">
    <name>DRAGHI, Mario</name>
</sameAs>

The sameAs property also assists inter-operability because it can be used to enable recipients to choose the CV, or standard, they employ.

For example, the document may have a concept for Germany identified by the provider’s QCode "country:de". Some recipients may have standardized on using ISO-3166 Country Codes to classify nationality. The provider can assist recipients to make a direct reference to their preferred scheme using "sameAs":

<sameAs qcode="iso3166-1a2:DE" />
<sameAs qcode="iso3166-1a3:DEU" />
<sameAs qcode="iso3166n:276" />
<related> allows the expression of a relationship with another concept

that cannot be expressed using <broader>, <narrower>, or <sameAs>. For example, the "European Central Bank" may be "related to" "Mario Draghi" – thus the ECB concept may include:

<related rel="relation:hasPresident" type="cpnat:person" qcode="people:329465">
    <name>Mario Draghi</name>
</related>

The nature of the relationship is expressed using @rel; the example above indicates that the European Central Bank "has a President" Mario Draghi. This relationship must be part of a CV of relationships, which might include "has a CEO", "has a Finance Director". The IPTC recommends that the <related> property should always contain a @rel.

At PCL, the property may be extended by adding @rank, a numeric ranking of the current concept amongst other concepts related to the target concept.

For example, if the European Central Bank is the second most important concept related to Mario Draghi, amongst other concepts related to him, we can express this as follows:

<related rank="2" rel="relation:hasPresident" type="cpnat:person" qcode="people:329465">
    <name>Mario Draghi</name>
</related>

The <related> property also allows the expression of quantitative values, for example a share price or a sport score, in addition to the concept relationships described above.

The three attributes of <related> that enable this feature are @value, @valueunit and @valuedatatype. Implementers can type the @value by applying an XML Schema datatype, and optionally to declare the units of the value (e.g. the currency) using @valueunit and @valuedatatype.

For example, to express scores in a sports game where the named team won by 4 goals to 2 and gained 3 points:

<concept xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <conceptId qcode="ukprem:WBA" />
    <name>West Bromwich Albion</name>
    ...
    <related rel="crel:scoreFor" value="4"
        valueunit="valunits:goals"
        valuedatatype="xs:nonNegativeInteger" />
    <related rel="crel:scoreAgainst" value="2"
        valueunit="valunits:goals"
        valuedatatype="xs:nonNegativeInteger" />
    <related rel="crel:pointsAdded" value="3"
        valueunit="valunits:points"
        valuedatatype="xs:nonNegativeInteger" />
    ...
</concept>

A further example, expresses a recommendation from an analyst in the financial markets where the EUR changes from 39 to 44 (expressed as a value), and the rank changes from Hold to Buy (expressed as a value or QCode):

<concept xmlns:xs="http://www.w3.org/2001/XMLSchema">
    ...
    <related rel="crel:price_new" value="44"
        valueunit="iso4217a:EUR"
        valuedatatype="xs:decimal" />
    <related rel="crel:price_old" value="39"
        valueunit="iso4217a:EUR"
        valuedatatype="xs:decimal" />
    <related rel="crel:rank_old" value="Hold"
        valueunit="valunits:trRanks"
        valuedatatype="xs:string" />
    <related rel="crel:rank_new" value="Buy"
        valueunit="valunits:trRanks"
        valuedatatype="xs:string" />
    <related rel="crel:rank_new" qcode="trRanks:Buy" />
        ...
</concept>

When using <related>: ONLY ONE out of @qcode, @uri, @literal, OR @value MUST be used. (i.e. these properties are mutually exclusive).

The @value has a datatype XML Schema String. If using @value, a @valuedatatype MUST also be used and its datatype must be one of the data types defined by the W3C XML Schema specification. The inclusion of @valueunit is optional, for example if @rel="crel:noOfSiblings" and @value="2" the type of units is obvious.

12.8. Supplementary information about a Concept

Links can be used to enhance the information carried by a NewsML-G2 Concept. For example, a Concept may represent a person in the news; it may also contain some key facts about the person and relationships to other concepts (e.g. membership of an organisation). Links to other resources can also be used to add articles, pictures and other objects to the Concept.

However, the use of <link> as a child of <itemMeta> in a Concept Item would create a problem if a number of Concept Items containing Links were to be aggregated into a Knowledge Item: only the content of the <concept> wrapper would be carried across into the Knowledge Item and the Concept Item Metadata and any Links, would be lost.

To resolve this issue, a <remoteInfo> property may be added to <concept>, with a datatype of LinkType (CCL) and Link1Type (PCL), matching that of <link>. This enables implementers to provide links to supplementary information inside the <concept> wrapper, and thus into a Knowledge Item:

<concept>
    <conceptId created="2009-01-10T12:00:00Z" qcode="people:329465"/>
    <type qcode="cpnat:person" />
    <name xml:lang="en-GB">Mario Draghi</name>
    <definition xml:lang="en-GB" role="definitionrole:biog">
        Mario Draghi, born 3 September
        1947, is an Italian banker and economist who succeeded Jean-Claude Trichet
        as President of the European Central Bank on 1 November 2011.<br />
    </definition>
    <related rel="relation:occupation" qcode="jobtypes:puboff" />
    <remoteInfo "link" start
        rel="irel:seeAlso"
        contenttype="image/jpeg">
        residref="tag:acmenews.com,2008:TX-PAR:20090529:JYC90" // Item ref
        <title>ECB official portrait picture of Mario Draghi</title>
    </remoteInfo> "link" end
    <personDetails>
        <born>1942-12-20</born>
    ....
        <contactInfo role="contactrole:official">
    ....
        </contactInfo>
    </personDetails>
</concept>

Using the rules given in [Hints and Extension Points] when adding properties of the target NewsML-G2 Item, the parent property must be included if it is not from either <contentMeta> or <itemMeta>. For example, a <description> element extracted from <contentMeta> (no parent needed):

<remoteInfo
    rel="irel:seeAlso"
    contenttype="video/mpeg">
    residref="tag:acmenews.com,2008:TX-PAR:20090529:JYC90"
    <description>
        ECB official video of Mario Draghi working with senior colleagues at the Bank
    </description>

Contrast with a <description> from <partMeta>, which must be included as the parent element:

<remoteInfo>
    <partMeta partid="part1" seq="1"
        <description>The first part shows...</description>
    </partMeta>
    <partMeta partid="part2" seq="2"
        <description>The second part shows...</description>
    </partMeta>
</remoteInfo>

12.9. Concepts in Practice

The more common method of exchanging Concepts is as part of a Controlled Vocabulary (otherwise known as Taxonomy, Thesaurus, Dictionary, for example), which are conveyed in NewsML-G2 as a set of concepts in a Knowledge Item. This is discussed in Knowledge Items.

The use of Concepts to convey Event information is discussed in Events in NewsML-G2.

13. Knowledge Items

13.1. Introduction

When news happens, the event rarely takes place in isolation. There will be a series of relationships between the news event and the people, places and organisations that are directly or indirectly involved. Many of these entities will be well-known, and readers of the news may expect to be able to navigate to further information about these entities, or to find other events in which they are involved. This expectation is heavily fostered by people’s familiarity with the Web: they expect to be able to click on the name of, say, a company and see more information about it.

There will also be references to abstract notions such as subject classifications that enable the news event to be searched and sorted according to user preferences.

To fully exploit the value of their services, news organisations need to be able to exchange this supporting information in an industry-standard way that can be processed using standard technology.

As described in the previous chapter Concepts and Concept Items, NewsML-G2 has powerful features for encapsulating this detailed information about entities and notions, but the Concept Item can convey only a single concept, while the Knowledge Item is able to convey many of them, even a full taxonomy.

For example, a profile of, say, Vladimir Putin can be conveyed in a Concept. By including this Concept in a set of similar Concepts profiling world leaders, we can create and exchange a Knowledge Base of political personalities that can be updated and referred to over time.

The level of detail of information that a provider may make available in a Knowledge Item will depend on its business model and relationship with the receiving customer(s). Providers may make variable levels of information available according to subscription since it is clear that the content of their Knowledge Bases is likely to be valuable.

There are also opportunities for third-party providers of specialist information to partner with providers and customers to create value-added knowledge services using a NewsML-G2 infrastructure.

IPTC NewsCodes are controlled vocabularies maintained as Knowledge Items and are available on the IPTC web site at http://www.iptc.org/newscodes/. Choose View NewsCodes and follow instructions for downloading any of the CVs.

13.2. Example: a Knowledge Item for <accessStatus>

One of the available properties for describing a news event is <accessStatus> to provide information about the physical accessibility of the place where an event is due to occur.

The property takes a QCode value:

<conceptId qcode="access:restricted" />

As there is no IPTC NewsCodes scheme currently defined for Access Status, any provider wishing to include this information would need to create a Controlled Vocabulary.

This example shows a Knowledge Item which defines a Controlled Vocabulary of Access Status terms. This CV would be available as a Knowledge Item, with names and definitions in English, French and German, as shown below:

Value Names Definitions

easy

Easy
access

Facile d’accès

Der Zugang ist einfach

Unrestricted access for vehicles and equipment. Loading bays and/or lifts for unimpeded access to all levels

Un accès sans restriction pour les véhicules et l’équipement. Les quais de chargement et / ou des ascenseurs pour l’accès sans entrave à tous les niveaux

Ungehinderten Zugang für Fahrzeuge und Ausrüstung. Laderampen und / oder Aufzüge für den uneingeschränkten Zugang zu allen Ebenen

restricted

Access is Restricted

Accès Restreint

Der Zugang ist eingeschränkt

Access for vehicles and equipment possible but restricted. There may be obstacles, height or width restrictions that will impede large or heavy items. Advise checking with the organizers.

L’accès des véhicules et de matériel possible, mais limitée. Il y mai être des obstacles, la hauteur ou la largeur des restrictions qui empêchent les grandes ou d’objets lourds. Conseiller à la vérification avec les organisateurs.

Zugang für Fahrzeuge und Ausrüstung möglich, aber eingeschränkt. Möglicherweise gibt es Hindernisse, Höhe und Breite, die Beschränkungen behindern große oder schwere Gegenstände. Beraten Sie mit dem Veranstalter in Verbindung

difficult

Access is difficult

L’accès est difficile

Der Zugang ist schwierig

Access includes stairways with no lift or ramp available. It will not be possible to install bulky or heavy equipment that cannot be safely carried by one person

Comprend l’accès aux escaliers ou la rampe sans ascenseur disponible. Il ne sera pas possible d’installer des équipements lourds ou volumineux qui ne peuvent pas être transportés en sécurité par une seule personne

Access enthält Treppen ohne Fahrstuhl oder Rampe zur Verfügung. Es wird nicht möglich sein, installieren sperrige oder schwere Geräte, die sich nicht sicher befördert werden von einer Person

13.3. Structure and Properties

Knowledge Items share a common structure with News Items, Package Items and Concept Items.

This Chapter assumes that the reader is familiar with the chapter on Concepts and Concept Items.

13.3.1. The <knowledgeItem> element

The top level element of a Knowledge Item is <knowledgeItem>, which contains id, versioning and catalog information.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<knowledgeItem xmlns="http://iptc.org/std/nar/2006-10-01/"
    guid="urn:newsml:iptc.org:20090202:ncdki-accesscode"
    version="10"
    standard="NewsML-G2"
    standardversion="2.25"
    conformance="power">
    <catalogRef
        href="http://www.iptc.org/std/catalog/catalog.IPTC-G2-Standards_30.xml" />

13.3.2. Item Metadata

The <itemMeta> block contains management metadata for the Knowledge Item document. Below is a minimum set of properties.

The Item Class property should use the IPTC "Nature of Concept Item" NewsCodes (scheme alias "cinat"). The appropriate value in the case of sending a CV or taxonomy is "scheme", denoting that this is a full scheme of concepts contained in this Knowledge Item.

<itemMeta>
    <itemClass qcode="cinat:scheme" />
    <provider qcode="nprov:IPTC" />
    <versionCreated>2017-11-08T00:00:00Z</versionCreated>
    <pubStatus qcode="stat:usable" />
</itemMeta>

13.3.3. Content Metadata

The optional <contentMeta> block contains Administrative Metadata and Descriptive Metadata shared by the concepts conveyed by the <conceptSet>.

Administrative Metadata

This example timestamps the content:

<contentMeta>
    <contentModified>2009-01-28T13:00:00Z</contentModified>
...

More details about informing receivers about changes to Knowledge Item content are contained in Handling Updates to Knowledge Items using @modified

Descriptive Metadata

The descriptive metadata properties <subject> and <description> may be used by Knowledge items, in any order. They are optional and repeatable. This example uses the <description> element:

<description xml:lang="en">
    Classification of the ease of gaining physical access to the location of
    a news event for the purpose of deploying personnel, vehicles and equipment.
</description>
<description xml:lang="fr">
    Classification de la facilité d'obtenir un accès physique à l'emplacement d'un événement pour
    le déploiement de personnel, de véhicules et d'équipements.
</description>
<description xml:lang="de">
    Klassifikation der physischen Zugriff auf den Standort eines News Termine für Die Zwecke der
    Bereitstellung von Personal, Fahrzeugen und Ausrüstungen.
</description>

13.3.4. Concept Set

A single <conceptSet> element wraps zero or more <concept> components. The order of the Concepts is not important. Properties of <concept> are optional and described in Concepts and Concept Items.

Each member of the CV requires its own <concept> wrapper with a Concept ID and Name within the Concept Set:

<conceptSet>
    <concept>
        <conceptId qcode="access:easy" />
        <name xml:lang="en">Easy access</name>
    ...
    </concept>
    <concept>
        <conceptId qcode="access:difficult" />
        <name xml:lang="en">Access is difficult</name>
    ...
    </concept>
    <concept>
        <conceptId qcode="access:restricted" />
        <name xml:lang="en">Access is Restricted</name>
    ...
    </concept>
</conceptSet>

Each Concept also has a <definition> in three languages that gives further details in natural language, for example the English definition:

<definition xml:lang="en">
    Access for vehicles and equipment possible but restricted. There may be
    obstacles, height or width restrictions that will impede large or heavy
    items. Advise checking with the organisers.
</definition>

This completes the first Concept in the <conceptSet>. The two other concepts in the CV are added in a similar fashion.

13.3.5. Scheme Metadata

In NewsML-G2 v 2.17, the <schemeMeta> element was added to enable a Knowledge Item to support similar properties to the <scheme> in a Catalog. It should be noted that <schemeMeta> is use to express metadata about the scheme being conveyed, and should ONLY be used where the Knowledge Item contains all the concepts from a single scheme, as denoted by:

<itemClass qcode="cinat:scheme"/>

The <schemeMeta> element is used after <conceptSet> and contains the same attributes and child properties of <scheme> in a catalog, with the exception of @alias, but has the following additional properties:

The optional child elements <related>, which enables implementers to express the top-level concept(s) of a scheme, a requirement of SKOS, and <concepttype> listing the concept types that are used within the Knowledge Item. Please note the IPTC recommends that if <concepttype> is used, ALL concept types in the scheme are listed.

The attributes @authority, to indicate the party that controls the scheme, and @preferredalias, which indicates the scheme authority’s recommended scheme alias to be used with QCodes.

<schemeMeta uri="http://cv.example.org/newscodes/access/"
    authority="http://www.example.org" preferredalias="access">
    <definition xml:lang="en-GB">Classification of the ease of gaining physical access
    to the location of a news event for the purpose of deploying personnel,
    vehicles and equipment.</definition>
    <name xml:lang="en-GB">Ease of Access/name>
    <related qcode="access:easy" rel="skos:hasTopConcept"/>
    <related qcode="access:difficult" rel="skos:hasTopConcept"/>
    <related qcode="access:restricted" rel="skos:hasTopConcept"/>
<schemeMeta/>
LISTING 12: Knowledge Item for Access Codes

All Scheme Aliases used in listing below indicate IPTC NewsCodes vocabularies, except for access,

<?xml version="1.0" encoding="UTF-8"?>
<knowledgeItem
    xmlns="http://iptc.org/std/nar/2006-10-01/"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://iptc.org/std/nar/2006-10-01/
        ./NewsML-G2_2.25-spec-All-Power.xsd"
    guid="urn:newsml:iptc.org:20090202:ncdki-accesscode" version="10"
    standard="NewsML-G2"
    standardversion="2.25"
    conformance="power" >
    <catalogRef
        href="http://www.iptc.org/std/catalog/catalog.IPTC-G2-Standards_30.xml" />
    <itemMeta>
        <itemClass qcode="cinat:scheme" />
        <provider qcode="nprov:IPTC" />
        <versionCreated>2017-11-08T00:00:00Z</versionCreated>
        <pubStatus qcode="stat:usable" />
    </itemMeta>
    <contentMeta>
        <contentCreated>2009-01-28T12:00:00Z</contentCreated>
        <contentModified>2009-01-28T13:00:00Z</contentModified>
        <description xml:lang="en">
            Classification of the ease of gaining physical access to the location of a news
            event for the purpose of deploying personnel, vehicles and equipment.
        </description>
        <description xml:lang="fr">
            Classification de la facilité d'obtenir un accès physique à l'emplacement d'un
            événement pour le déploiement de personnel, de véhicules et d'équipements.
        </description>
        <description xml:lang="de">
            Klassifikation der physischen Zugriff auf den Standort eines News Termine für
            Die Zwecke der Bereitstellung von Personal, Fahrzeugen und Ausrüstungen.
        </description>
    </contentMeta>
    <conceptSet>
        <concept>
            <conceptId qcode="access:easy" />
            <name xml:lang="en">Easy access</name>
            <name xml:lang="fr">Facile d'accès</name>
            <name xml:lang="de">Der Zugang ist einfach</name>
            <definition xml:lang="en">
                Unrestricted access for vehicles and equipment. Loading bays
                and/or lifts for unimpeded access to all levels
            </definition>
            <definition xml:lang="fr">
                Un accès sans restriction pour les véhicules et l'équipement. Les quais de
                chargement et / ou des ascenseurs pour l'accès sans entrave à tous les
                niveaux
            </definition>
            <definition xml:lang="de">
                Ungehinderten Zugang für Fahrzeuge und Ausrüstung. Laderampen und / oder
                Aufzüge für den uneingeschränkten Zugang zu allen Ebenen
            </definition>
        </concept>
        <concept>
            <conceptId qcode="access:difficult" />
            <name xml:lang="en">Access is difficult</name>
            <name xml:lang="fr">L'accès est difficile</name>
            <name xml:lang="de">Der Zugang ist schwierig</name>
            <definition xml:lang="en">
                Access includes stairways with no lift or ramp available.  It will not be
                possible    to install bulky or heavy equipment that cannot be safely carried
                by one person
            </definition>
            <definition xml:lang="fr">
                Comprend l'accès aux escaliers ou la rampe sans ascenseur disponible. Il ne
                sera    pas possible d'installer des équipements lourds ou volumineux qui ne
                peuvent pas être transportés en sécurité par une seule personne
            </definition>
            <definition xml:lang="de">
                Access enthält Treppen ohne Fahrstuhl oder Rampe zur Verfügung. Es wird nicht
                möglich sein, installieren sperrige oder schwere Geräte, die sich nicht
                sicher befördert werden von einer Person
            </definition>
        </concept>
        <concept>
            <conceptId qcode="access:restricted" />
            <name xml:lang="en">Access is Restricted</name>
            <name xml:lang="fr">Accès Restreint</name>
            <name xml:lang="de">Der Zugang ist eingeschränkt</name>
            <definition xml:lang="en">
                Access for vehicles and equipment possible but restricted. There may be
                obstacles, height or width restrictions that will impede large or heavy
                items. Advise checking with the organisers.
            </definition>
            <definition xml:lang="fr">
                L'accès des véhicules et de matériel possible, mais limitée. Il y mai être
                des     obstacles, la hauteur ou la largeur des restrictions qui empêchent les
                grandes ou d'objets lourds. Conseiller à la vérification avec les
                organisateurs.
            </definition>
            <definition xml:lang="de">
                Zugang für Fahrzeuge und Ausrüstung möglich, aber eingeschränkt.
                Möglicherweise gibt es Hindernisse, Höhe und Breite, die Beschränkungen
                behindern große oder schwere Gegenstände. Beraten Sie mit dem Veranstalter in
                Verbindung
            </definition>
        </concept>
    </conceptSet>
     <schemeMeta uri="http://cv.example.org/newscodes/access/"
        authority="http://www.example.org" preferredalias="access">
            <definition xml:lang="en-GB">Classification of the ease of gaining physical access
            to the location of a news event for the purpose of deploying personnel,
            vehicles and equipment.</definition>
            <name xml:lang="en-GB">Ease of Access</name>
        <related qcode="access:easy" rel="skos:hasTopConcept"/>
           <related qcode="access:difficult" rel="skos:hasTopConcept"/>
           <related qcode="access:restricted" rel="skos:hasTopConcept"/>
    </schemeMeta>
</knowledgeItem>

13.4. Knowledge Workflow

The diagram below shows a possible information flow for news information that exploits the possibilities of NewsML-G2 Concepts and Knowledge Items to add value to news:

Knowledge Workflow
Figure: Information Flow for Concepts and Knowledge Items

Increasingly, news organisations are using entity extraction engines to find "things" mentioned in news objects. The results of these automated processes may be checked and refined by journalists. The goal is to classify news as richly as possible and to identify people, organisations, places and other entities before sending it to customers, in order to increase its value and usefulness.

This entity extraction process will throw up exceptions – unrecognised and potentially new concepts – that may need to be added to the Knowledge Base. Some news organisations have dedicated documentation departments to research new concepts and maintain the Knowledge Base.

When new concepts are submitted to the Knowledge Base, they are added to the appropriate taxonomy and may be made available to customers (depending on the business model adopted) either partially or fully as Knowledge Items.

13.5. Using NewsML-G2 Knowledge Items with SKOS

The Simple Knowledge Organisation System is a W3C standard for using RDF-based means to share information about knowledge organization systems (see www.w3.org/2004/02/skos/).

The IPTC Media Topic NewsCodes are a working example of how a NewsML-G2 Knowledge Item may have features added that can align a NewsML-G2 Scheme to SKOS using the <related> child element of <concept> and <schemeMeta>.

For example, the Media Topic for the sport of biathlon, part of the Media Topics scheme (http://cv.iptc.org/newscodes/mediatopic/):

<concept id="medtop20000852" modified="2010-12-14T21:53:19+00:00">
    <conceptId qcode="medtop:20000852" created="2009-10-22T02:00:00+00:00"/>
    <type qcode="cpnat:abstract"/>
    <name xml:lang="en-GB">biathlon</name>
    <definition xml:lang="en-GB">A combination of cross country skiing and target
    shooting on a 12.5 K course in a pursuit format. </definition>
    <broader qcode="medtop:20000822"/>
    <related qcode="medtop:20000822" rel="skos:broader"/>
    <related qcode="subj:15009000" rel="skos:exactMatch"/>
    <related uri="http://cv.iptc.org/newscodes/mediatopic/" rel="skos:inScheme"/>
</concept>

This uses the NewsML-G2 <broader> property to express that the ‘’biathlon’’ topic (medtop:20000852) is a child of the "competition discipline" topic (medtop:20000822), and this is complemented by the using a <related> to indicate the same relationship by the SKOS term "skos:broader". The second <related> element indicates that the legacy IPTC Subject Code NewsCodes 15009000 is the exact match (G2 = <sameAs>) for this Media Topic.

The CV also contains a Scheme Metadata element as follows:

<schemeMeta uri="http://cv.iptc.org/newscodes/mediatopic/"
    authority="http://www.iptc.org" preferredalias="medtop">
    <definition xml:lang="en-GB">Indicates a subject of an item.</definition>
    <name xml:lang="en-GB">Media Topic</name>
    <note xml:lang="en-GB">The Media Topic NewsCodes is IPTC's new (as of December
        2010) 1100-term taxonomy with a focus on text. The development started with the
        Subject Codes and extended the tree to 5 levels and reused the same 17 top
        level terms. The terms below the top level have been revised and rearranged.
        Each Media Topic provides a mapping back to one of the Subject Codes.
    </note>
    <related qcode="medtop:01000000" rel="skos:hasTopConcept"/>
    <related qcode="medtop:02000000" rel="skos:hasTopConcept"/>
    <related qcode="medtop:03000000" rel="skos:hasTopConcept"/>
    <related qcode="medtop:04000000" rel="skos:hasTopConcept"/>
    <related qcode="medtop:05000000" rel="skos:hasTopConcept"/>
    <related qcode="medtop:06000000" rel="skos:hasTopConcept"/>
    <related qcode="medtop:07000000" rel="skos:hasTopConcept"/>
    <related qcode="medtop:08000000" rel="skos:hasTopConcept"/>
    <related qcode="medtop:09000000" rel="skos:hasTopConcept"/>
    <related qcode="medtop:10000000" rel="skos:hasTopConcept"/>
    <related qcode="medtop:11000000" rel="skos:hasTopConcept"/>
    <related qcode="medtop:12000000" rel="skos:hasTopConcept"/>
    <related qcode="medtop:13000000" rel="skos:hasTopConcept"/>
    <related qcode="medtop:14000000" rel="skos:hasTopConcept"/>
    <related qcode="medtop:15000000" rel="skos:hasTopConcept"/>
    <related qcode="medtop:16000000" rel="skos:hasTopConcept"/>
    <related qcode="medtop:17000000" rel="skos:hasTopConcept"/>
</schemeMeta>

This satisfies the SKOS requirement to indicate the top-level concepts of any scheme.

The current structure of NewsML-G2 Knowledge Items delivered by IPTC is documented at http://dev.iptc.org/NewsCodes-G2-Knowledge-Items-by-IPTC

14. Controlled Vocabularies and QCodes

14.1. Introduction

One of the fundamental ideas underpinning NewsML-G2 is the use of Controlled Vocabularies (CVs) or taxonomies to enable two basic operations:

  • To restrict the allowed values of certain properties in order to maintain the consistency and inter-operability of machine-readable information – supplying data to populate menus, for example.

  • To provide a concise method of unambiguously identifying any abstract notion (e.g. subject classification) or real-world entity (person, organisation, place etc.) present in, or associated with, an item. This enables links to be made to external resources that can provide the consumer with further information or processing options.

A Controlled Vocabulary (CV) is a set of concepts usually controlled by an authority which is responsible for its maintenance, i.e. adding and removing vocabulary entries. In NewsML-G2, CVs are also known as Schemes. The person or organisation responsible for maintaining a Scheme is the Scheme Authority.

Examples of CVs include the set of country codes maintained by the International Standards Organisation, and the NewsCodes maintained by the IPTC. An application of a CV could be a drop-down list of countries in an application interface.

Many CVs are dedicated to a specific metadata property, for example there are CVs for <subject>, for <genre>; or they are dedicated to a specific attribute that refines a property e.g. CV for the @role of <description>.

In news distributions that use NewsML-G2, it is recommended that Controlled Vocabularies are exchanged as Knowledge Items, with members of the CV contained in individual <concept> structures.

Members of a Scheme are each identified by a concept identifier expressed as a QCode (note capitalisation), which is resolved via the Catalog information in the NewsML-G2 Item to form a URI that is globally unique.

14.2. Business Case

Controlled Vocabularies are needed in information exchange because they establish a common ground for understanding content that is language-independent. Schemes and QCodes enable CVs to be exchanged and referenced using Web technology, and provides a lightweight, flexible and reliable model for sharing concepts and information about concepts

For example, the IPTC Media Topics Scheme is a language-independent taxonomy for classifying the subject matter of news. A consumer receiving news classified using this scheme can discover the meaning of this classification, using a publicly-accessible URL.

Examples abound of non-IPTC CVs in everyday use: IANA Media Types, ISO Country Codes or ISO Currency Codes to name but three.

News providers use CVs to add value to their content:

  • News can be accurately processed by software if it adheres to known (i.e. controlled) parameters expressed as a CV, for example the publishing status of a news item

  • by establishing CVs of people, places and organisations, the identity of entities in the news can be unambiguously affirmed;

  • CVs can be extended to store further information about entities in the news, for example biographies of people, contact details for organisations.

14.3. How QCodes work

A QCode is a string with three parts, all of which MUST be present:

  • Schema Alias: the prefix, for example "stat"

  • Scheme-Code Separator: a separator, which MUST be a colon ":" (ASCII 58dec = 003Ahex)

  • Code Value: the suffix, for example "usable"

This produces a complete QCode of "stat:usable" represented in XML as

qcode="stat:usable"

14.3.1. Scheme Alias to Scheme URI

The key to resolving scheme aliases is the Catalog information, a child of the root element of every NewsML-G2 Item. A scheme alias may be resolved directly using the <catalog> element:

<catalog>
    <scheme alias="stat" uri="http://cv.iptc.org/newscodes/pubstatusg2/" />
</catalog>

Using the information in <catalog>, a processor now has a Scheme URI that can be used as the next step in resolving the QCode.

Catalogs

The <catalog> element may contain many <scheme> components. Catalog information can be stored in one of two ways:

  • directly in a NewsML-G2 Item using the <catalog> element, or

  • remotely in a file containing the catalog information, referenced by the @href of a <catalogRef> element.

There is likely to be more than one <catalogRef> in an Item:

<catalogRef
    href="http://www.iptc.org/std/catalog/catalog.IPTC-G2-Standards_30.xml" />
<catalogRef
    href="http://www.xmlteam.com/specification/xts-SportsCodesCatalog_1.xml"/>

These remote catalogs are hosted by specific authorities, in this case by the IPTC, and by the information provider XML Team. Each remote catalog file will contain a <catalog> element and a series of <scheme> components that map the scheme aliases used in the item to their scheme URIs. For a more detailed description of managing Catalogs, see Creating and Managing Catalogs.

14.3.2. Concept URI

Appending the QCode Code Value to the Scheme URI produces the Concept URI.

The IPTC recommends that Scheme/Concept URIs can be resolved to a Web resource that contains information in both machine-readable and human-readable form (This is also a recommendation for the Semantic Web), i.e. they are URLs. The concept resolution mechanism used by the IPTC is http-based, and the NewsML-G2 Specification describes how an http-URL should be resolved.

Entering the above Concept URI in a web browser results in the following page being displayed:

Concept URI displayed in browser
Figure: Human-readable browser page of a Concept URI

A question often asked by implementers is: "What happens if I receive files from two providers who inadvertently have a clash of scheme aliases?"

The scenarios they envisage are either:

  • Provider A and Provider B use the same scheme alias to represent different schemes. For example the alias "pers" is used by both providers to represent their own proprietary CVs of people, or

  • Provider A and Provider B use a different scheme alias to represent the same scheme. For example, A uses "subj" to represent the IPTC Subject NewsCodes, and B uses "tema" to represent the same CV.

The answer is "everything works fine!"; QCode to Concept URI mappings must be unique only within the scope of each document in which they appear.

A processor should correctly process two files with different aliases to the same Concept URI:

<!-- First Document – scheme alias "subj" -->
<catalog>
    <scheme alias="*subj*" uri="http://cv.iptc.org/newscodes/subjectcode/" />
    ...
</catalog>
    <subject type="cpnat:abstract" qcode="subj:1500000" />
<!-- Second Document – scheme alias "tema" -->
<catalog>
    <scheme alias="_*tema*_" uri="http://cv.iptc.org/newscodes/subjectcode/" />
    ...
</catalog>
    <subject type="cpnat:abstract" qcode="tema:1500000" />

This is because the concept resolution process is local to each document. The processor can unambiguously resolve the QCode to a Concept URI via the <catalog> in each case.

The following example does NOT work because the same alias is mapped to two different URIs within the same document and the processor is unable to resolve the QCode to a single Concept URI:

<catalog>
    <scheme alias="*subject*" uri="http://cv.iptc.org/newscodes/subjectcode/" />
...
    <scheme alias="*subject*" uri="http://cv.example.com/subjectcodes/codelist/" />
</catalog>
...
    <subject type="cpnat:abstract" qcode="subject:1500000" />
...

But the following is CORRECT because it is possible to have different aliases within the same document pointing to the same URI and the processor can resolve both QCodes:

<catalog>
    <scheme alias="*subject*" uri="http://cv.iptc.org/newscodes/subjectcode/" />
...
    <scheme alias="*subj*" uri="http://cv.iptc.org/newscodes/subjectcode/" />
</catalog>
...
    <subject type="cpnat:abstract" qcode="subject:1500000" />
...
    <subject type="cpnat:abstract" qcode="subj:1500000" />

In this document, there are many references to IPTC NewsCodes and their scheme aliases. From the above, it will be obvious that these specific alias values are not mandatory, although the IPTC recommends the consistent use of scheme alias values by implementers.

14.4. QCodes and Taxonomies

Taxonomies, also known as thesauri, knowledge bases and so on, are repositories of information about notions or ideas, and about real-world "things" such as people, companies and places.

For example, a processor might encounter the following XML in a NewsML-G2 document:

<subject type="cpnat:person" qcode="pol:rus12345">

The subject property shown here has two QCodes, one for @type, and the other as @qcode. The "cpnat" alias is for a controlled vocabulary of allowed categories of concept, which includes values of "person", "organisation", "POI" (point of interest). Using @type in this way enables further processing such as "find all of the people identified in the document".

The second QCode encountered in the subject is "pol:rus12345". Resolving this (fictional) scheme alias and suffix might result in the following concept URI:

Fetching the information at the above resource, may return the following information:

PUTIN, Vladimir – Prime Minister of the Russian Federation Name (FAMILY, Given)

PUTIN, Vladimir Vladimirovitch

Name (known as)

Vladimir Vladimirovitch Putin

Summary

Former Soviet intelligence officer who has served terms as Russian president and prime minister.

Background

Born 7 October 1952

Place of birth

Leningrad (St Petersburg)

Other

The IPTC recommends that providers should make schemes containing concepts such as the above available to recipients as Knowledge Items. It should include at least a name; the amount of further knowledge about the concepts could be different for different customer classes and depend on contracts.

14.5. Managing Controlled Vocabularies as NewsML-G2 Schemes

14.5.1. Knowledge Items

In a workflow where partners are exchanging news information using NewsML-G2, Knowledge Items are the most compliant method of distributing a new CV: first by creating a Scheme (see Creating a New Scheme) and next creating a Knowledge Item from a set of existing Concept Items (see Creating a new Knowledge Item for distributing as a CV) for distribution to customers and partners.

Knowledge Items do not necessarily contain all of the information that a provider possesses about any given set of concepts. This, after all, may be commercially valuable information that the provider makes available on a per-subscriber basis. For example, a lower fee might entitle the subscriber to basic information about a concept, say a person, while a higher fee might give access to full biographical details and pictures.

It is not mandatory that information about CVs be stored or distributed in the technical format of a Knowledge Item. It is sufficient, for the correct processing of a NewsML-G2 Item, only that a Scheme Alias/Code Value pair (defining the Concept URI) is unambiguous. The IPTC makes the following recommendations about CVs:

  • Knowledge Items SHOULD be used to distribute CVs. Other means such as paper, fax or email are permissible but at the price of less efficient automated processing.

  • Concept URIs SHOULD resolve to a Web resource; this is a requirement for the Semantic Web.

  • In the case where a Scheme Authority does not make the concepts of a CV available as a Web resource, the Scheme URI SHOULD resolve to a Web resource, such as a human-readable Web page giving information about the purpose of the CV, and where details of the Scheme can be obtained.

14.5.2. Creating a new Scheme

A NewsML-G2 controlled vocabulary is a set of concepts. To create a CV as a NewsML-G2 Scheme:

Add this Scheme Alias and URI to the catalog:

<catalog>
    <scheme alias="abc" uri="http://cv.example.org/schemeA/" />
</catalog>

If using a remote catalog, change the catalog URI to reflect a new version of the catalog (so that recipients know that they should add this to their cache of catalogs) and ensure that all NewsML-G2 Items using the new Scheme refer to the new version of the remote catalog.

  • Create Concepts as required. You must use the Scheme Alias of the new scheme with the identifier of this new concept. For example:

<concept> +
<conceptId created="2009-09-22" qcode="abc:concept-x" +
...

The above Concept Identifier resolves to the Concept URI http://cv.example.org/schemeA/concept-x

14.5.3. Creating a new Knowledge Item for distributing as a CV

The Chapter on Knowledge Items shows an example Controlled Vocabulary expressed in XML. A Knowledge Item contains concepts from one or more Schemes. The steps to begin creating a KI are:

  • Identify the set of Concept Items that contain the concepts that will be part of the Knowledge Item, which may be only from this new CV or also from other CVs.

  • Create the metadata properties for the Knowledge Item that express the rules used to create it, for example, a <title> and <description> such as "Concepts extracted from Schemes A and B based on criteria X and Y".

  • For each Concept: Copy all or part of the selected concept details (the <concept> wrapper and associated properties) into the Knowledge Item.

14.5.4. Managing Schemes

Changes to Schemes

Scheme URIs MUST persist over time, and any changes to a Scheme which involve the creation or deprecation of concepts MUST be backwardly compatible with existing concepts. (For example, the code of a retired Concept MUST NOT be reused for a new and different concept.)

Scheme Authorities can indicate that a member of a CV should no longer be applied as a new value. This must be expressed by adding a @retired attribute to the <conceptId> of the Concept that is no longer to be used.

Both @created and @retired attributes are of datatype Date with optional Time and Time Zone (DateOptTime) and their use is optional. The @retired date can be a date in the future when a Scheme Authority knows that the Concept ID should no longer be used for new NewsML-G2 Items.

Example of a retired concept:

<conceptId created="2006-09-01" retired="2009-12-31"
qcode="foo:bar" />
  • Concepts MUST NOT be deleted from a Scheme; this could cause processing errors for NewsML-G2 Items that pre-date the changes. Use of @retired ensures that Items that pre-date a CV change will continue to correctly resolve "legacy" concept identifiers.

  • For the same reason, Concept IDs MUST NOT be re-cycled, i.e. the same identifier MUST NOT be used for a different concept.

  • Schemes themselves MUST NOT be deleted, as archived content is likely to use the concepts contained in a retired CV.

Recommendations for non-complying Schemes

Some Scheme Authorities may fail to comply with the NewsML-G2 Specification, and this could be beyond the control of the end-user or the provider. Guidelines for handling various scenarios are:

  1. The authority of the vocabulary governs the scheme URI and the code – but does not comply with the NewsML-G2 Specification
    Reusing a Concept URI which was assigned to one concept with another concept is a breach of the NewsML-G2 specifications. If there are requirements that drive the authority to do so, the authority should give a clear and warning notification about that fact to all receivers at the time of the publication of the reused Concept URI.

  2. The authority of the vocabulary governs only the code but not the scheme URI
    This may be the case for Controlled Vocabularies of codes only and if a news provider assigns a scheme URI of its own domain to enable the CV to be used with NewsML-G2. A good example are the scheme URIs defined by the IPTC for the ISO vocabularies for country names or currencies – see http://cvx.iptc.org
    The party who assigned the scheme URI has the responsibility of making users of this scheme URI together with the vocabulary aware of any reuse cases – and should post a generic warning about the potential threat of reused codes.
    In addition the party who assigned the scheme URI may consider changing the scheme URI in the case of the reuse of a code. This would avoid having the same Concept URI for different concepts but would require careful management of the vocabularies as actually a completely new Controlled Vocabulary is created by the use of the new scheme URI.

  3. Another code is assigned to the same concept or a very likely similar concept
    This use case does not violate the NewsML-G2 specifications. But care should be taken for establishing relationships between Concept URIs:

    • If a new code is assigned to the same concept a sameAs or a SKOS exactMatch relationship of this new URI should be established pointing to all other existing URIs identifying this concept.

    • If a slightly modified concept is created and gets a new Concept URI it may be considered to establish a closeMatch relationship from SKOS

14.6. Creating and Managing Catalogs

As previously described, Catalogs are essential to the resolution of NewsML-G2 Controlled Vocabularies and constituent QCodes.

14.6.1. The <catalog> element

Each QCode Scheme Alias that is used in a NewsML-G2 Item must have a reference in the Item’s <catalog> to the Controlled Vocabulary in which it is included. A <catalog> contains the Scheme Alias and the Scheme URI:

<catalog>
    <scheme alias="timeunit" uri="http://cv.iptc.org/newscodes/timeunit/" />
</catalog>

14.6.2. Remote Catalogs

As the CVs used by a provider are usually quite consistent across the NewsML-G2 Items they publish, the IPTC recommends that the <catalog> references are aggregated into a stand-alone file which is made available as a web resource: known as a Remote Catalog. This file is referenced by a <catalogRef> in the Item:

<catalogRef
    href="http://www.iptc.org/std/catalog/catalog.IPTC-G2-Standards_30.xml" />

Note: the <catalog> element in such a stand-alone file needs an XML namespace definition:

<catalog xmlns="http://iptc.org/std/nar/2006-10-01/">

The use of stand-alone web resources is preferable because all of the QCode mappings are shared across many NewsML-G2 Items; the local <catalog> can only be used by the single Item.

14.6.3. Managing Catalog files

Simple management of Remote Catalogs over time is relatively straightforward: whenever a new scheme is added or the alias or the uri of any of the existing schemes are changed, a new Catalog must be published with a new URL. This URL may reflect a version number of the catalog. This is the method that the IPTC uses to maintain the NewsCodes, simply by increasing the file suffix digit by one:

href="..Standards_29.xml" --> href="..Standards_30.xml"

Note that ALL versions of a Remote Catalog must continue to be available as a web resource, otherwise existing NewsML-G2 Items and QCodes that reference it will not be able to resolve related Scheme Aliases to Scheme URIs.

Receiving applications MUST use the catalog information contained in the NewsML-G2 document being processed. If a provider updates a catalog, this is likely to be because new schemes have been added. Using a catalog other than that indicated in the document could cause errors or unintended results.

Additional <catalog> features

To improve the management features of Catalogs, new (optional) properties were added to the <catalog> at PCL in NewsML-G2 v2.14:

  • @url defines the location of the Catalog as a remote resource. (This must be the same as the URL used with the href attribute of a catalogRef in NewsML-G2 Items that use this Catalog.)

  • @authority uses a URI to define the company or organisation controlling this Catalog.

  • @guid is a globally unique identifier, expressed as a URI, for this kind of Catalog as managed by a provider. (This must be the @guid of the Catalog Item, see below, that manages this Catalog) If present, the version attribute should also be used.

  • @version is the version of the @guid as a non-negative integer; a version attribute must always be accompanied by a guid attribute.

<scheme> properties

Further information about schemes is expressed using the <scheme> child elements <name>, <definition>, and <note>. In NewsML-G2 v2.14 @roleuri was added to these child elements to allow the role of the element to be defined using a full URI instead of a QCode used by the existing @role attribute).

@roleuri simplifies processing by avoiding the situation where a QCode used in a Catalog relies on an alias defined in some other Catalog, making resolution difficult or impossible.

In NewsML-G2 v 2.17, the <schemeMeta> element was added to the Knowledge Item, enabling the metadata previously expressed in the <scheme> elements of a Catalog to be added directly to a Knowledge Item that conveys all the concepts in a Scheme (aka Controlled Vocabulary). See Scheme Metadata

14.6.4. Catalog Item

For providers who wish to use the same basic means for managing a Catalog as are available for news content, the Catalog Item is introduced to NewsML-G2 in version 2.14. The Catalog Item inherits the additions and changes to <catalog> and <scheme> described above. The example below shows how a Scheme Authority (in this example, the IPTC) might distribute its catalog to subscribers.

The Catalog Item shares the generic identification and processing instruction attributes associated with all NewsML-G2 items, for example:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<catalogItem xmlns="http://iptc.org/std/nar/2006-10-01/"
    guid="urn:newsml:iptc.org:20160517:catalog"
    version="10"
    standard="NewsML-G2"
    standardversion="2.25"
    conformance="power"
    xml:lang="en-GB">

The child properties of Catalog Item are restricted to a basic set of essentials required for Catalog management:

Definition Name Cardinality Child properties

Catalog

catalogRef catalog

1..n

As per all NewsML-G2 Items

Hop History

hopHistory

0..1

As per all NewsML-G2 Items

Rights Information

rightsInfo

0..n

As per all NewsML-G2 Items

Item Metadata

itemMeta

1

As per all NewsML-G2 Items

Content Metadata

contentMeta

0..1

- contentCreated (0..1) - contentModified (0..1) - creator (0..1) - contributor (0..1) - altId (0..1) (power conformance only)

Catalog Container

catalogContainer

1

catalog (1)

The catalogRef, rights information and itemMeta elements follow normal practice. Note that the Item Class is set to "cainature:catalog"; this uses the IPTC Catalog Item Nature NewsCodes, recommended Scheme Alias "cainature":

<catalogRef
    href="http://www.iptc.org/std/catalog/catalog.IPTC-G2-Standards_30.xml" />
<rightsInfo>
    <copyrightHolder uri="http://www.iptc.org">
        <name>IPTC</name>
    </copyrightHolder>
</rightsInfo>
<itemMeta>
    <itemClass qcode="cainature:catalog" />
    <provider qcode="nprov:IPTC">
        <name>International Press Telecommunications Council </name>
    </provider>
    <versionCreated>2017-05-17T12:00:00Z</versionCreated>
    <pubStatus qcode="stat:usable" />
</itemMeta>

The Catalog information conveyed by the Item is wrapped in the <catalogContainer> element, which must contain one and only one <catalog>. The Catalog contains one or more <scheme> elements, as previously described:

<catalogContainer>
    <catalog xmlns="http://iptc.org/std/nar/2006-10-01/"
        additionalInfo="http://www.iptc.org/goto?G2cataloginfo">
        <scheme alias="app" uri="http://cv.iptc.org/newscodes/application/">
        <definition xml:lang="en-GB">Indicates how the metadata
            value was applied.</definition>
            <name xml:lang="en-GB">Application of Metadata Values</name>
        </scheme>
    </catalog>
</catalogContainer>
LISTING 13: Complete Catalog Item

All Scheme Aliases used in listing below indicate IPTC NewsCodes vocabularies.

<?xml version="1.0" encoding="UTF-8" ?>
<catalogItem
    xmlns="http://iptc.org/std/nar/2006-10-01/"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://iptc.org/std/nar/2006-10-01/
        ./NewsML-G2_2.25-spec-All-Power.xsd"
    guid="urn:newsml:iptc.org:20130517:catalog"
    version="30"
    standard="NewsML-G2"
    standardversion="2.25"
    conformance="power"
    xml:lang="en-GB">
    <catalogRef
        href="http://www.iptc.org/std/catalog/catalog.IPTC-G2-Standards_30.xml" />
    <rightsInfo>
        <copyrightHolder uri="http://www.iptc.org">
            <name>IPTC</name>
        </copyrightHolder>
    </rightsInfo>
    <itemMeta>
        <itemClass qcode="cainature:catalog" />
        <provider qcode="nprov:IPTC">
            <name>International Press Telecommunications Council</name>
        </provider>
        <versionCreated>2017-10-17T12:00:00Z</versionCreated>
        <pubStatus qcode="stat:usable" />
    </itemMeta>
    <catalogContainer>
        <catalog xmlns="http://iptc.org/std/nar/2006-10-01/"
            additionalInfo="http://www.iptc.org/goto?G2cataloginfo">
            <scheme alias="app" uri="http://cv.iptc.org/newscodes/application/">
                <definition xml:lang="en-GB">Indicates how the metadata
                   value was applied.</definition>
                <name xml:lang="en-GB">Application of Metadata Values</name>
            </scheme>
        </catalog>
    </catalogContainer>
</catalogItem>

14.7. Processing Catalogs and CVs

In practice, from a receiver’s point of view, it makes no sense to look up the contents of CVs over the network every time a document is processed, since this would consume considerable computing and network resources and probably degrade performance. Also, as discussed, some providers might not make a scheme or its contents available at all.

The NewsML-G2 Specification requires that remote Catalogs – the file(s) that map Scheme Aliases to Scheme URIs – are retrieved by processors and the IPTC highly recommends that the Catalogs are cached at the receiver’s site. They can be cached indefinitely because catalog URIs must remain unchanged over time. Whenever Schemes are created or deleted, an updated catalog must be provided under a new URI. This ensures that Items that pre-date the catalog changes can continue to be processed using the previous catalog.

14.7.1. Resolving Scheme Aliases

Some NewsML-G2 properties are important for the correct processing of an Item, for example the Item Class property tells a receiving application the type of content being conveyed by a NewsML-G2 Item: a processor may expect to apply some rule according to the value present in the <itemClass>, for example to route all pictures to the Picture Desk.

Other CVs may be important for correctly processing an Item, for example the presence of specific subject codes could cause an Item’s content to be routed to certain staff or departments in a workflow.

The schemes used by <itemClass> property are mandatory, and the IPTC recommends that implementers use the scheme aliases appropriate to the Item Type, for example "ninat" for News item Nature or "cinat" for Concept Item Nature. Note that the use of these specific alias values is NOT mandatory; they could already being used by a provider as aliases for other CVs.

This illustrates the flexibility of the NewsML-G2 model: consistency of scheme aliases between different providers – or even by the same provider – cannot be guaranteed, and in NewsML-G2 they do not have to be guaranteed. For this reason it would be unwise for NewsML-G2 processor implementers to assume that a given scheme alias can be "hard coded" into their applications.

However, this flexibility does not mean that these "needed for processing" CVs must be accessed every time an Item is processed. This could be an unnecessary overhead and performance burden.

Processing rules such as those described above would be based on acting in response to expected values. In the case of the News Item Nature Scheme, these values include "text, "picture", "audio" etc. The problem is not in obtaining the contents of the CV in real time, but in verifying that it is the correct CV.

For example:

  • A receiver knows that providers use the IPTC Media Topic NewsCodes for classifying news content by subject matter, and that the scheme URI for these NewsCodes is http://cv.iptc.org/newscodes/mediatopic/

  • The business requires that incoming content is routed to the appropriate department, according to the Media Topic NewsCodes found in the Items,

  • A routing table is set up in the processor with a configurable rule "all items with a Media Topic NewsCode 04000000 to be routed to the Business News department".

How does the processor "know" that a <subject> property with a QCode containing 04000000 is an IPTC Media Topic NewsCode? The processor should not rely on the scheme alias "medtop"; it could be an alias to another CV, or the provider may use another alias:

<subject type="cpnat:abstract" qcode="sc:04000000" />

By following the IPTC advice to retrieve all catalog information used by Items, and cache the information indefinitely, CVs can be processed without reference to external resources.

In the example, the catalog used by the Item resolves the scheme alias "sc" contained in the QCode. The Item contains the line pointing to the catalog file:

<catalogRef href="http://www.example.org/std/catalog/catalog.example_10.xml" />

The processor should have retrieved and cached the contents of the file at this URL, and would have in memory the mapping of this alias to the Scheme URI:

<scheme alias="sc" uri="http://cv.iptc.org/newscodes/mediatopic/" />

…​this verifies that the QCode value is from the Media Topic NewsCodes scheme. A rule "all items with a Media Topic NewsCode of 04000000 to be routed to the Business News department" is satisfied and the NewsML-G2 Item processed appropriately.

14.7.2. Resolving Concept URIs

The IPTC recommends that Schemes SHOULD resolve to a Web resource, and that Scheme Authorities who disseminate news using NewsML-G2 should make their Schemes available as Knowledge Items.

Access Models

Making a CV available as a Web resource does not mean it must be accessible on the public Web; only that Web technology should be used to access it. The resource may be on the public Web, on a VPN, or internal network.

Providers may also wish to use Schemes to add value to content, using a subscription model. In this case, the contents of a Scheme may not be available to non-subscribers, but they could continue to resolve the QCodes to a unique and persistent Concept URI.

Concept Resolution: Provider View

In following the IPTC recommendation that CVs should be accessible as a Web resource, providers may be concerned about the implications for providing sufficient access capacity and reliability guarantees. If receivers were to interrogate CVs each time they processed a NewsML-G2 document that could act like a Denial of Service attack.

The IPTC makes no recommendations about this issue other than to advise the use of industry-standard methods of mitigating these risks. Organisations hosting CVs could also define an acceptable use policy that places limits on the load that individual subscribers can place on the service.

Concept Resolution: Receiver View

As the IPTC recommends that CVs should be available as Web resources, it follows that the Scheme Authority may host its Schemes as Knowledge Items on a Web server. However, a Scheme Authority does not guarantee the availability and capacity of connections to its hosted Knowledge Items.

In addition, from the receiver’s point of view, it could be unwise for business-critical news applications to rely on a third-party system beyond the receiver’s control.

Processors are therefore recommended to retrieve and cache the contents of third-party Knowledge Items. Providers should advise their customers on the recommended frequency for refreshing the third-party cached Knowledge Items.

Handling updates to Knowledge Items using @modified

The receiver can get information about which concepts have been modified (when cached concepts are synchronized with those in the latest received Knowledge Item) using the modified attribute of <concept>

Use Case and Example

1.       A news site is using a CV maintained by a third-party Scheme Authority, for example a CV maintained by the IPTC.

2.       The site retrieves a Knowledge Item about the concepts in the CV from the third-party Scheme Authority’s web server and stores them within its internal cache.

3.       Sometime later the site wants to check the validity of the cache. It again downloads or receives a Knowledge Item from the third-party Scheme Authority, containing the relevant concepts which may have been updated in the meantime.

4.       The site’s NewsML-G2 processor checks the @modified timestamp (date-time) of each concept conveyed within the Knowledge Item against the modification timestamp of the corresponding cached concept. Any concepts within the Knowledge Item with a modification timestamp later than the corresponding cached concept’s modification timestamp are processed as updates to the cache. (Note: this assumes that the Scheme Authority always flags Concepts conveyed within a Knowledge Item with a modification timestamp, see below.)

Updates Processing Notes

In the above use case, it was assumed that the Scheme Authority always flags Concepts with a modification timestamp. In cases where modification timestamps are missing from some or all or the concepts, either in the KI or in the cache, a receiver can be less certain about whether or not a concept has been modified. The following matrix outlines the IPTC recommendations for processing updates for each individual concept in the KI:

Concept in local cache Concept received in KI Processor should:

No modification timestamp

No modification timestamp

Update cache from KI

No modification timestamp

Has modification timestamp

Update cache from KI, now the concept in the cache has a modification timestamp!

Has modification timestamp

No modification timestamp

Update cache from KI, now the concept in the cache has lost its modification timestamp!

Has modification timestamp

Has modification timestamp

Compare timestamps and update the concept in the cache IF the timestamp from the KI is later.

The code snippet below shows how the Scheme Authority would inform receivers that the concepts in a Knowledge Item have been updated using the @modified attribute value informing receivers of the timestamp (date-time) of the change.

<?xml version="1.0" encoding="UTF-8"?>
<knowledgeItem ...>
    ...
    <itemMeta>
    ...
    </itemMeta>
    <contentMeta>
    ...
    </contentMeta>
    <conceptSet>
        <concept modified="2010-01-28T13:00:00Z">
            <conceptId qcode="access:easy" />
            ...
        </concept>
        <concept modified="2010-01-28T13:00:00Z">
            <conceptId qcode="access:difficult" />
            ...
        </concept>
        <concept modified="2009-11-23T13:00:00Z">
            <conceptId qcode="access:restricted" />
            ...
        </concept>
    </conceptSet>
</knowledgeItem>
Notifying receivers of changes to Knowledge Items

This issue is not necessarily specific to NewsML-G2 news exchange: Most news providers have CVs that pre-date NewsML-G2, for example those CVs typically used with IPTC 7901. Channels and conventions for advising customers of changes to CVs will already exist. Generally, providers notify customers in advance about changes to CVs, especially if it is likely that a CV is used for content processing.

The IPTC hosts and maintains a large number of CVs and provides an RSS feed that notifies of changes to the IPTC Schemes. Details at www.iptc.org.

14.8. Private versions and extensions of CVs

News providers are encouraged to use pre-existing or well-known CVs, such as those maintained by the IPTC, where possible to promote interoperability and standardisation of the exchange of news. Sometimes a provider will use a CV that is maintained by some other Scheme Authority (e.g. the IPTC), but may need to add its own information. The following are typical potential business cases:

  • Case 1: A national news agency wishes to use all of the codes in a CV that is maintained by the IPTC, without alteration, except to provide local language versions of names and definitions.

  • Case 2: An organisation receives news objects from information partners and uses a CV that defines the stages in a shared workflow. The CV is maintained by a third-party organisation (the Scheme Authority), but the receiver needs to add further workflow stages for its own internal purposes.

14.8.1. Use <schemeSameAs> to provide a local language version of a well-known CV

Some useful and well-known CVs do not have the values of properties such as name and definition in the local language of a news provider. For example, although some IPTC NewsCodes contain concept information in several languages, the IPTC does not have the resources to provide concept details in every language being used by news providers.

Yet some NewsCodes schemes are recommended or mandatory. How can a provider use a local language version of these NewsCodes AND conform to the NewsML-G2 Specification?

Using the <sameAsScheme> property, a news provider can create its own CV, containing all of the NewsCodes it wishes to use with local language names and definitions, while making it clear to receivers that these codes identify the same concepts as the original IPTC Scheme.

For example, the IPTC Item Relation NewsCodes irel:seeAlso resolves via the Catalog to a Concept URI http://cv.iptc.org/newscodes/itemrelation/seeAlso which hosts the following information about the seeAlso concept:

<concept>
    <conceptId created="2008-01-29T00:00:00+00:00" qcode="irel:seeAlso" />
    <type qcode="cpnat:abstract" />
    <name xml:lang="en-GB">See also</name>
    <definition xml:lang="en-GB">
        To fully understand the content of this item see also the content
        of the related item.
    </definition>
    ...
</concept>

A provider wishes to provide this same information in the Czech language. As a first step, it creates a new Controlled Vocabulary containing the required concepts from the original scheme with translated name and definition, for example:

<concept>
    <conceptId created="2010-01-29T00:00:00+00:00" qcode="itemrel:seeAlso" />
    <type qcode="cpnat:abstract" />
    <name xml:lang="cs">Viz také</name>
    <definition xml:lang="cs">
        Chcete-li plně pochopit obsah této položky viz též obsah
        související položky.
    </definition>
    ...
</concept>

The provider then creates a Catalog file, or a new version of an existing Catalog file, containing a Scheme Alias and Scheme URI for the new CV, thus:

<catalog>
    <scheme alias="itemrel" uri="http://cv.example.org/codes/itemrelation/">
        <sameAsScheme>http://cv.iptc.org/newscodes/itemrelation/</sameAsScheme>
    </scheme>
</catalog>

This asserts that the ALL of the codes in the private scheme identified by the Scheme URI are semantically the "same as" the corresponding codes in the original scheme indicated by the <sameAsScheme> child element of <scheme>.

In this example the provider MUST give the new CV a Scheme Alias – here it is "itemrel" – that is different to the recommended Scheme Alias "irel" of the IPTC Scheme. This is because some IPTC schemes are mandatory, so a reference to the IPTC catalog would always be present in the Item. When there is no reference to an original scheme, there is no need to use a different Scheme Alias for the private scheme.

Finally, the provider adds a reference to the new Catalog file to NewsML-G2 Items that it publishes.

By using the <sameAsScheme> element to the Catalog, the provider is able apply a Same As relationship at the level of a set of Concepts. So for example, this code:

<link
    rel="itemrel:seeAlso"
    contenttype="image/jpeg"
    residref="tag:acmenews.com,2008:TX-PAR:20090529:JYC90"
/>

can be efficiently resolved: a processor does not have to search for a Same As relationship at the individual concept level but can map this relationship directly from the @rel value’s scheme (alias "itemrel") to the scheme identified by the <sameAsScheme> property.

Rules for <sameAsScheme>

The semantics of <sameAsScheme> are:

"all of the concepts in the scheme identified by the private scheme/@uri have a ‘same as’ relationship to concepts with the same code in the original scheme identified by the URI in the <sameAsScheme> element."

So in the example:

<scheme alias="itemrel" uri="http://cv.example.org/codes/itemrelation/">
    <sameAsScheme>http://cv.iptc.org/newscodes/itemrelation/</sameAsScheme>
</scheme>

In practice, this means:

  • The Scheme identified by scheme/@uri (the provider’s private scheme) must NOT use a code that does not exist in ALL of the original Schemes identified by the <sameAsScheme> elements. In other words, a provider cannot add new concepts to its private Scheme that have the effect of extending the set of concepts of the original Scheme(s).

  • Some codes and concepts of the original Scheme MAY not exist in the provider’s private Scheme. This could happen if for example the original Scheme has new terms added which the provider has not yet included in the private Scheme.

  • Each concept identified by a code in the provider’s private Scheme MUST be semantically equivalent to its corresponding concept in the original Scheme(s), and MUST be identified by the same code as in the original Scheme(s).

The <sameAsScheme> property was introduced to solve some issues that news providers had encountered, such as adding translations of free-text properties (for example name, definition, note), which are not available within the original scheme, or adding additional information, e.g. usage notes.

14.8.2. Adding further concepts to a well-known CV

A news provider needs to add concepts to a workflow role CV which is shared with information partners, but is maintained by some other organisation (the Scheme Authority).

The news provider would have two courses of action:

  1. Ask the Scheme Authority to add the new concepts to the CV. For example, IPTC members are entitled to request the addition and/or retirement of terms in IPTC Schemes with the agreement of other members.

  2. Create a new scheme that complements the original scheme, but uses properties such as <broader> and <sameAs> to link the concepts in the new scheme to concepts in the original scheme. A concept is the Same As another concept if their semantics are the same, but it MAY contain more details, such as a translation in another language. A concept with a Broader relationship to another concept is a new concept with semantics narrower than those of the broader concept.

Example using a new scheme

Using the shared workflow role example, the original scheme contains three concepts for defined roles in a workflow:

<conceptSet>
    <concept>
        <conceptId qcode="wflow:draft" />
        ...
    </concept>
    <concept>
        <conceptId qcode="wflow:review" />
        ...
    </concept>
    <concept>
        <conceptId qcode="wflow:release" />
        ...
    </concept>
</conceptSet>

Properties that use this scheme are resolved through the <catalog> in the Item in which they appear, e.g. the Item contains the following property in <itemMeta>:

<role qcode="wflow:release"

and the catalog statement:

<catalog>
    <scheme alias="wflow" uri="http://cv.example.org/schemes/wfroles/" />
</catalog>

The receiver needs to add an intermediate role in the workflow, representing a "final review" stage. Thus the private scheme in Case 2 is extending the original scheme. Unlike Case 1, where the codes in the private scheme were identical to codes contained in the original scheme, In Case 2, the concepts in the private scheme use a <sameAs> property at the level of each code in the new private scheme.

<conceptSet>
    <concept>
        <conceptId qcode="iwf:draft" />
        <sameAs qcode="wflow:draft" />
        ...
    </concept>
    <concept>
        <conceptId qcode="iwf:review" />
        <sameAs qcode="wflow:review" />
        ...
    </concept>
    <concept>
        <conceptId qcode="iwf:finalreview" />
        ...
    </concept>
    <concept>
        <conceptId qcode="iwf:release" />
        <sameAs qcode="wflow:release" />
        ...
    </concept>
</conceptSet>

G2 Items that use this scheme must use a <catalog> statement to enable the processor to resolve both the private "iwf" scheme alias and the original "wflow" scheme alias:

<catalog>
    <scheme alias="iwf" uri="http://support.myorg.com/cv/workflow/" />
    <scheme alias="wflow" uri="http://cv.example.org/schemes/wfroles/" />
</catalog

A Knowledge Item containing <sameAs>, <broader>, or <narrower> properties like the above must also contain a <catalog> allowing the QCodes to be resolved, in this case:

<catalog>
    <scheme alias="wflow" uri="http://cv.example.org/schemes/wfroles/" />
</catalog>

14.9. Best Practice in QCode exchange

NewsML-G2 specifies that concepts must be identified by a full URI conforming to RFC 3986. The IPTC also recommends that a URI identifying a scheme and concept should resolve to a resource providing information about the scheme or the concept and which is either human or machine readable. In other words, a Concept URI should be a URL.

The unreserved characters that are permitted in a URL are:

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
a b c d e f g h i j k l m n o p q r s t u v w x y z
0 1 2 3 4 5 6 7 8 9 - _ . ~

the reserved characters are:

! * ' ( ) ; : @ & = + $ , / ? % # [ ]

In order to promote unambiguous processing of QCodes, the IPTC defines that it is the responsibility of providers to encode Concept URIs as they are intended to be valid for a system which resolves them and not to rely on end-users applying per-cent encoding rules for URI processing. Therefore:

  • Providers should decide whether or not to per-cent encode reserved characters used in QCodes distributed to customers in NewsML-G2 documents.

  • Receivers should not perform per-cent encoding/decoding, when resolving QCodes according to the rules outlined below.

The reason for the recommendation is illustrated by the following example of a provider distributing a document containing:

<subject qcode="fc:3#FTSE" />

The catalog entry for a parent Scheme is (say):

<scheme alias="fc" uri="http://cv.example.org/schemes/fc/" />

The receiver could transform this QCode literally to this URI:

Or the receiver could percent-encode the # (%23), yielding:

The resources identified by these two URIs are both valid by RFC 3986 but they are different!

Following the IPTC definition, the provider must include the per-cent encoding into the QCode of the first example if this is required by the system resolving the URIs:

<subject qcode="fc:3%23FTSE" />

14.9.1. Non-ASCII characters

The encoding described above assumes that the character(s) to be per-cent encoded are from the US-ASCII character set (consisting of 94 printable characters plus the space).

If a code contains non-ASCII characters, for example accented characters, the Unicode encoding UTF-8 must be used, in line with normal practice.

For example, the UTF-8 encoding of the å character is a two-byte value of C3hex A5hex which would be percent-coded as %C3%A5.

14.9.2. White Space in Codes

Codes in controlled vocabularies which have been created with no regards to the NewsML-G2 specifications may contain spaces. As the NewsML-G2 Specification does not allow white space characters in Codes, this section recommends a workaround.

Whitespace characters in Codes - in practice, only spaces (20hex) - are replaced by a sequence of one or more unreserved characters that is reused for this purpose according to the practices of the provider; it is recommended that such a sequence is not part of the any of the codes used by the provider.

For example, if a code contains a space, the space character might be replaced by ~~. Receivers would be informed to translate this string back to a space character in order to match the QCode against a list of codes that contain spaces.

14.10. Syntactic Processing of QCodes

This section provides a summary of the processing model. Please also see the NewsML-G2 Specification for a full technical description. (This can be downloaded by visiting www.newsml-g2.org/spec.)

14.10.1. Creating QCodes from Scheme Aliases and Codes

Scheme Aliases

These do not have to be encoded as they will never be part of the full Concept URI. A Scheme Alias may contain any character except a colon (3Ahex) or white space characters (20hex or 09 hex or 0D hex or 0A hex)

14.10.2. Processing Received Codes

To resolve a QCode received in an Item to a Concept URI, use the following steps:

  1. Apply any XML decoding to the string (this should be performed by your XML processor)

  2. Retrieve the QCode value from the document
    example: fôô:bår

  3. Identify the first colon starting from the left; the string on left of the colon is the scheme alias, the string on the right of the colon is the code. If there is no colon, the QCode is invalid. In the example, therefore:
    Scheme Alias = fôô
    Code Value = bår

  4. Check whether the alias is defined in a catalog. If not, the QCode is invalid.
    example: <scheme alias="fôô" uri="http://cv.example.org/cv/somecodes/" />

  5. Append the Code Value to the Scheme URI to make the full Concept URI:
    example: http://cv.example.org/cv/somecodes/bår

  6. It is highly recommended to use only full Concept URIs to compare identifiers of concepts.

14.11. A generic way to express concept identifiers as URIs

When NewsML-G2 was originally designed, communication capacity constraints drove implementers to find a compact way to express the typically long globally unique identifier (URI) of a concept; this led to the creation of the compact QCode notation. In NewsML-G2 version 2.11, @uri was introduced for properties having a concept identifier value type. This was restricted to properties that used @qcode or @literal to hold the identifier, but additionally allowed providers to express the full URI if required. Example:

<subject uri="http://example.com/people?id=12345&group=223" />

The same flexibility was made available to properties with QCode or QCodeList attribute types in version 2.18. New URI sibling attributes (IRIType) were added to these properties by appending "uri" to the QCode name. For example @rendition gets a @renditionuri sibling, such that:

<remoteContent rendition="rnd:highRes"
... />

may now be alternatively expressed as:

<remoteContent renditionuri="http://cv.iptc.org/newscodes/rendition/highRes"
... />

Implementers are advised NOT use both QCode and URI type attributes for the same property; however, if they are used together, the QCode type attribute takes precedence. The following extract shows how the <contentSet> of the code listing from Quick Start: Pictures and Graphics would use the URI sibling properties:

<contentSet>
    <remoteContent renditionuri="http://cv.iptc.org/newscodes/rendition/highRes"
        href="./GYI0062134533.jpg"
        version="1"
        size="346071"
        contenttype="image/jpeg"
        width="1500"
        height="1001"
        colourspaceuri="http://cv.iptc.org/newscodes/colorspace/AdobeRGB"
        orientation="1"
        layoutorientationuri="http://cv.iptc.org/newscodes/layoutorientation/horizontal">
        <altId type="gyiid:masterID">105864332</altId>
        ...
</contentSet>

This code is listed in full in the file LISTING 3A Photo in NewsML-G2 (URI sibling attributes).xml, which is part of the set of code examples in the Guidelines download package.

For a full list of QCode type attributes and their URI siblings see the table List of QCode attributes and their URI siblings below

14.11.1. Change to cardinality of QCode Type attributes

After the introduction of @uri and later the "URI Sibling" attributes, a @qcode and other QCodeType attributes (collectively termed "QCode") remained mandatory for some properties. From v2.20 this constraint is lifted and the cardinality of QCode attributes in these properties is changed from (1) to (0..1) in order to allow @uri and "URI sibling" attributes to be used on their own. The table Properties affected by changed cardinality of QCode attributes below lists the properties affected by lifting the mandatory use of QCodes. The rules are now as follows, according to the version of NewsML-G2 being used:

NewsML-G2 version Rule

Up to 2.11

A QCode is mandatory for the affected attributes.

2.11 to 2.19

A QCode AND a full URI MAY be used. The QCode is mandatory. (from 2.18, some URI Sibling attributes were constrained by this rule.)

2.20 onwards

A QCode OR a full URI MUST be used for the affected properties. A QCode AND a URI MAY be used together but this is not recommended and in this case the QCode value takes precedence.

14.12. Literal Identifiers

The NewsML-G2 Standard recognises that it is not always possible to use a QCode or URI as an identifier, therefore the NewsML-G2 Flexible Property type allows a @literal identifier or no identifier at all. For example, a CV may be understood by both receivers and providers, but the mapping of identifiers to concepts is managed and communicated outside NewsML-G2. Many long-established CVs such as these pre-date NewsML-G2. In other circumstances, the identifier may add no value, because only some basic property, such as <name> needs to be conveyed.

A @literal is an identifier which is intended to be processed by software; it is not intended to be a natural-language label. If the name of a concept identified by @literal is intended for display, the IPTC recommends that providers SHOULD add the <name> child element for inter-operability and language-independent processing. If no human-readable property is available, receivers MAY use the @literal value for display purposes.

14.12.1. Use of @literal in a NewsML-G2 Item

Literals may be used as in the following cases:

1. As an identifier for linking with an assert element inside a NewsML-G2 document. In this case the literal value could be a random one. If a literal value is used with an assert element then all instances of that literal value in that item must identify the same concept.

2. When a code from a vocabulary which is known to the provider and the recipient is used without a reference to the vocabulary. The details of the vocabulary are, in this case, communicated outside NewsML-G2. Such a contract could express that a specific vocabulary of literals is used with a specific property.

3. When importing metadata which may contain codes which have not yet been checked to be from an identified vocabulary: the code values are represented as literals until the vocabulary is identified; thereafter, a controlled identifier can be used.

The following rules govern the use of @literal:

  • A @literal value can only be used to identify a concept within the local scope of an Item.

  • The use of @literal and a controlled identifier (either @qcode, @uri or both) is mutually exclusive.

  • There can be no guarantee that all instances of a @literal value used in an Item identify the same concept. However, when @literal is used with an assert element, providers MUST ensure all instances of that literal value in the Item identify the same concept. If the provider uses the same literal value for different concepts, an assert element using this literal value MUST NOT be used, as the concept is indeterminate.

14.12.2. Properties with no identifier

It is permissible for a NewsML-G2 element with a Flexible Property type to have no concept identifier:

<provider>
    <name>Getty Images North America</name>
</provider>

As a special case, when a <bag> child element is used with a property to create a composite concept, a concept identifier MUST NOT be used with the parent property (neither @qcode, nor @uri nor @literal). The new composite concept is created by the multiple existing concepts identified by a @qcode in each <bit> element of the <bag>:

<subject>
    <name>Bread</name>
    <bag>
        <bit qcode="ingredient:flour"/>
        <bit qcode="ingredient:water"/>
        <bit qcode="ingredient:yeast"/>
    </bag>
</subject>

14.13. Full list of the QCode type attributes and their URI siblings

Attribute Name URI Sibling Attribute Attribute Name URI Sibling Attribute

qcode

uri

radunit

radunituri

accesstype

accesstypeuri

ratertype

ratertypeuri

aspect

aspecturi

ratingtype

ratingtypeuri

audiochannels

audiochannelsuri

rel

reluri

audiocodec

audiocodecuri

rendition

renditionuri

colourindicator

colourindicatoruri

renditionref

renditionrefuri

colourspace

colourspaceuri

reposrole

reposroleuri

confirmationstatus

confirmationstatusuri

representation

representationuri

creator

creatoruri

role

roleuri

durationunit

durationunituri

scaleunit

scaleunituri

encoding

encodinguri

scope

scopeuri

environment

environmenturi

severity

severityuri

format

formaturi

symbolsrc

symbolsrcuri

hashtype

hashtypeuri

target

targeturi

heightunit

heightunituri

tech

techuri

how

howuri

timeunit

timeunituri

idformat

idformaturi

type

typeuri

interactiontype

interactiontypeuri

valcalctype

valcalctypeuri

jobtitle

jobtitleuri

valueunit

valueunituri

layoutorientation

layoutorientationuri

videocodec

videocodecuri

market

marketuri

videodefinition

videodefinitionuri

marketlabelsrc

marketlabelsrcuri

videoscaling

videoscalinguri

mode

modeuri

why

whyuri

part

parturi

widthunit

widthunituri

pubconstraint

pubconstrainturi

14.14. Properties affected by changed cardinality of QCode attributes

(The mandatory use of the QCode Attribute was lifted.)

Property Name QCode Attribute Name URI Attribute Name

action

qcode

uri

itemClass

qcode

uri

pubStatus

qcode

uri

…​./itemMeta/role

qcode

uri

…​./partMeta/role

qcode

uri

…​./itemMeta/service

qcode

uri

…​./newsCoverage/service

qcode

uri

signal

qcode

uri

bit

qcode

uri

conceptId

qcode

uri

type

qcode

uri

confirmation*

qcode

uri

occurStatus

qcode

uri

newsCoverageStatus

qcode

uri

accessStatus

qcode

uri

hash

hashtype

hashtypeuri

userinteraction

interactiontype

interactiontypeuri

circle

radunit

radunituri

all <..ExtProperty>

rel

reluri

group

role

roleuri

rating

scaleunit

scaleunituri

timedelim

timeunit

timeunituri

eventDetails/dates/start

confirmationstatus

confirmationstatusuri

endeventDetails/dates/

confirmationstatus

confirmationstatusuri

eventDetails/dates/duration

confirmationstatus

confirmationstatusuri

*Note: property deprecated

15. Events in NewsML-G2

15.1. A standard for exchanging news event information

The sharing of event-related information and planning of news coverage is a core activity of news organisations, without which they cannot function effectively.

News agencies need to keep their customers informed of upcoming events and planned coverage. News organisations publishing on paper and digital media need to plan and co-ordinate their operations in order to make optimum use of their available resources and ensure their target audiences will be properly served.

Historically, this was a paper-based exercise, with news desks maintaining a Day Book, or Diary, and circulating information to colleagues and partners using written memoranda, sometimes referred to as the Schedule, or Budget.

Many organisations have moved, or are moving, to electronic scheduling applications. With software developers and vendors working independently on these applications, there is a risk that incompatibility will inhibit the exchange of information and reduce efficiency.

Consequently, there has been a drive among IPTC members to formalise a standard for exchanging this information in a machine-readable events format using XML, allowing it to be processed using standard tools and enabling compatibility with other XML-based applications and popular calendaring applications such as Microsoft Outlook.

Originally a separate EventsML-G2 standard, the Events Calendar and Scheduling model is merged with NewsML-G2 to create a single standard focused on the needs of a professional news industry workflow.

The related NewsML-G2 Planning Item can be combined with Event information so that news organisations can plan their response to news events, such as job assignments, content planning and content fulfilment, sharing this if required with partners in a workflow. (See Editorial Planning – the Planning Item)

NewsML-G2 Events may be used to send and receive all, or part of, the information about:

  • a specific news event.

  • a range of news events filtered according to some criteria – an event listing.

  • updates to news events.

  • people, organisations, objects and other concepts linked to news events.

15.1.1. Business Advantages

An event managed using NewsML-G2 can serve as the "glue" that binds together all of the content related to a news story. For example, a news organisation learns of the imminent merger of two companies. This story can be pre-planned using Events in NewsML-G2 and assigned a unique Event ID by the event planning workflow, in the form of a QCode. When content related to this event is created (text, pictures, graphics, audio, video and packages), all of the separately-managed content to be associated using the QCode as a reference.

The result is that when users view content about this story, they can be provided with navigation to any other related content, or they can search for the related content.

The NewsML-G2 Events standard is the result of detailed collaborative work by IPTC experts operating in diverse markets throughout North America, Europe and Asia Pacific, highly experienced in the planning of news operations and the issues involved.

Adopters therefore have access to an "off the shelf" data model built on the specific needs of the news industry that can nevertheless be extended by individual organisations where necessary to add specialised features. The Events model is evolving as new requirements become known, and the IPTC endeavours to main compatibility with previous versions, giving users a straightforward upgrade path.

The NewsML-G2 Events model complies with the Resource Description Framework (RDF) promulgated by the W3C, which is a basic building block of the Semantic Web, and aligns with the iCalendar.

A mapping of iCalendar properties to NewsML-G2 properties can be found in the NewsML-G2 Specification Page on the NewsML-G2 web site. See also IETF iCalendar Specification RFC 5545 specification that is supported by popular calendar and scheduling applications.

There is considerable scope for using events planning to improve the efficiency and quality of news production, since an estimated 50-80% of news provision is of events that are known about in advance. When a pre-planned event is accessible from an editorial system, metadata from the event may be inherited by the news content associated with that event; this makes the handling of the news faster and more consistent. There is also an improvement in quality, since the appropriateness and accuracy of metadata may be checked at the planning stage, rather than under the pressure of a deadline.

The advantages of using a common standard to promote the efficient exchange of information are well understood. Using NewsML-G2 Events, providers can develop planning and scheduling products with greater confidence that the information can be consumed by their customers; recipients can cut development costs and time to market for the savings and services that flow from an efficient resource planning system that aligns with their operational model.

15.1.2. Events Structure

Events in NewsML-G2 use the same building blocks as all other NewsML-G2 Items:

  • Events use NewsML-G2 identification and versioning properties.

  • The <itemMeta> block holds management information about the Item.

  • The <contentMeta> block holds common administrative metadata about the event, or events, conveyed by the NewsML-G2 object.

Please read the Quick Start Guide
to NewsML-G2 Basics
before reading this Chapter.

There are two methods for expressing events using NewsML-G2, each suited to a particular type of information, or application, as shown in the diagram below:

  • Persistent event information that will be referenced by other events and by other NewsML-G2 Items is instantiated as Concept Items containing an <eventDetails> wrapper, or as members of a set of Event Concepts in a Knowledge Item. Events expressed as a Concept with a Concept ID are persistent and can be unambiguously referenced by other Items over time.

  • Transient event information that is "standalone" and volatile can be conveyed in NewsML-G2 using an <events> structure inside a News Item. Events expressed as an <event> in a News Item have no Concept ID and cannot be referenced from other Items.

These differences are illustrated in the following diagram:

Event as a Concept or as a News Item
Figure: Persistent Event as a Concept and (right) transient Events in a News Item

Managed (persistent) events would be appropriate when a provider makes each of the announced events uniquely identifiable. The event information can then be stored by the receivers and any updates to events can be managed. This model also enables content to be linked to events using the unique event identifier, and this enables linking and navigation of content related to news stories.

A "standalone" implementation would be used when an organisation periodically announces to its partners and customers lists of forthcoming events. These may be for information only and not managed by the provider. So, for example, a daily list may repeat items that appeared in a weekly list, but there is no link between them, and nothing to indicate when an event has been updated.

15.2. Event Information – What, Where, When and Who

In a news context, events are newsworthy happenings that may result in the creation of journalistic content. Since news involves people, organisations and places, NewsML-G2 has a flexible set of properties that can convey details of these concepts. There is also a fully-featured date-time structure to express event occurrences, which conforms to the iCalendar specification.

15.2.1. Complete Listing for an example Event

The following example shows a news event expressed as a Concept and conveyed within a Concept Item.

LISTING 14: Event sent as a Concept Item

All Scheme Aliases used in listing below indicate IPTC NewsCodes vocabularies, except for the following alias values: event, frel, ventyp.

<?xml version="1.0" encoding="UTF-8"?>
<conceptItem
    xmlns="http://iptc.org/std/nar/2006-10-01/"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://iptc.org/std/nar/2006-10-01/
        ./NewsML-G2_2.25-spec-All-Power.xsd"
    guid="urn:newsml:iptc.org:20160422:qqwpiruuew4711"
    version="10"
    standard="NewsML-G2"
    standardversion="2.25"
    conformance="power"
    xml:lang="en">
    <catalogRef href="http://www.iptc.org/std/catalog/catalog.IPTC-G2-Standards_30.xml" />
    <catalogRef href="http://www.example.com/events/event-catalog.xml" />
    <itemMeta>
        <itemClass qcode="cinat:concept" />
        <provider qcode="nprov:IPTC" />
        <versionCreated>2017-10-18T13:05:00Z
        </versionCreated>
        <pubStatus qcode="stat:usable" />
    </itemMeta>
    <contentMeta>
    <urgency>5</urgency>
    <contentCreated>2016-01-16T12:15:00Z</contentCreated>
    <contentModified>2017-06-12T13:35:00Z</contentModified>
    </contentMeta>
    <concept>
        <conceptId created="2017-01-16T12:15:00Z" qcode="event:1234567" />
        <type qcode="cpnat:event" />
        <name>IPTC Autumn Meeting 2017</name>
        <eventDetails>
            <dates>
                <start>2017-10-26T09:00:00Z</start>
                <duration>P2D</duration>
            </dates>
            <location>
                <name>86, Edgeware Road, London W2 2EA, United Kingdom</name>
                <related rel="frel:venuetype" qcode="ventyp:confcentre" />
                <POIDetails>
                   <position latitude="51.515659" longitude="-0.163346" />
                   <contactInfo>
                       <web>https://www.etcvenues.co.uk</web>
                   </contactInfo>
                </POIDetails>
            </location>
            <participant qcode="eprol:director">
                <name>Michael Steidl</name>
                <personDetails>
                   <contactInfo>
                       <email>mdirector@iptc.org</email>
                   </contactInfo>
                </personDetails>
            </participant>
        </eventDetails>
    </concept>
</conceptItem >

The top level element of the Concept Item is <conceptItem>. The document must be uniquely identified using a GUID. By this means, event information re-sent using the same GUID and an incremented version number, allows the receiver to manage, update or replace the conveyed concept (event) information.

@guid and @version uniquely identify the Concept Item, for the purpose of managing and updating the event information. Items that reference the event itself MUST use the Concept ID. This is because the Concept ID uniquely references a persistent Web resource, whereas the GUID only identifies a document that may or may not persist.

To enable concepts to be identified by a Concept ID QCode, a reference to the provider’s catalog (or a catalog statement containing the scheme URI) MUST be included:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<conceptItem xmlns="http://iptc.org/std/nar/2006-10-01/"
    guid="urn:newsml:iptc.org:20160422:qqwpiruuew4711"
    version="10"
    standard="NewsML-G2"
    standardversion="2.25"
    conformance="power"
    xml:lang="en">
    <catalogRef
        href="http://www.iptc.org/std/catalog/IPTC-G2-standards_27.xml" />
    <catalogRef
        href="http://www.example.com/events/event-catalog.xml"
/>

In the mandatory <itemMeta> wrapper the IPTC "Nature of Concept Item" NewsCodes expresses the type of Concept Item. (This is complementary to the "Nature of News Item" NewsCodes used with a News Item.) There are currently two values: "concept" and "scheme". (Scheme is used for Knowledge Items.)

<itemMeta>
    <itemClass qcode="cinat:concept" />
    <provider qcode="nprov:IPTC" />
    <versionCreated>2017-10-18T13:05:00Z
    </versionCreated>
    <pubStatus qcode="stat:usable" />
</itemMeta>

The Content Metadata for a Concept Item may contain only Administrative Metadata:

<contentMeta>
    <urgency>5</urgency>
    <contentCreated>2016-01-16T12:15:00Z</contentCreated>
    <contentModified>2017-06-16T13:35:00Z</contentModified>
</contentMeta>

15.2.2. What is the Event?

In order to convey event information, we first need to describe "what" the event is. In NewsML-G2, events MUST have at least one Event Name, a natural language name for the event. They MAY additionally have one or more natural language Definitions, properties that describe characteristics of the event, and Notes.

This Event content is contained in the <concept> wrapper. A Concept Item has one, and only one, <concept> wrapper. Each Concept must carry a <conceptId>, a QCode value that will provide a way for other Items to reference the Concept. The type of concept being conveyed is expressed using the <type> element with a QCode to tell the receiver that it contains Event information, using the IPTC "Nature of Concept" NewsCodes. (The applicable value for an Event Concept is "event")The Scheme URI is http://cv.iptc.org/newscodes/cpnature/ and recommended Scheme Alias is "cpnat".

The event information is placed directly inside an <eventDetails> wrapper:

<concept>
    <conceptId created="2012-10-16T12:15:00Z" qcode="event:1234567"/>
    <type qcode="cpnat:event" />
    <name>IPTC Autumn Meeting 2017</name>
    <eventDetails>
        ...
    </eventDetails>
</concept>
__

The generic properties of Events are similar to those of Concepts, covered in Concepts and Concept Items. In this framework, we can expand the "what" information of an Event by indicating one or more relationships to other events, using the properties of Broader, Narrower, Related and SameAs. (See Event Relationships)

15.2.3. When does the Event take place?

The "when" of an event uses the Dates wrapper to express the dates and times of events: the start, end, duration and recurrence.

<dates>
    <start>2017-10-26T09:00:00Z</start>
    <duration>P2D</duration>
</dates>

Although start and end times may be specified precisely, in the real world of news, the timing of events is often imprecise. In the early stages of planning an event, the day, month or even just the year of occurrence may be the only information available. Providers also need to be able to indicate a range of dates and/or times of an event, with a "best guess" at the likely date-time. (See Dates and times for details)

15.2.4. Where does the Event take place?

The "where" of an event is expressed using Location, a rich structure containing detailed information of the event venue (or venues); including GPS coordinates, seating capacity, travel routes etc.

<location>
    <name>86, Edgeware Road, London W2 2EA, United Kingdom</name>
    <related rel="frel:venuetype" qcode="ventyp:confcentre" />
    <POIDetails>
        <position latitude="51.515659" longitude="-0.163346" />
        <contactInfo>
            <web>https://www.etcvenues.co.uk</web>
        </contactInfo>
    </POIDetails>
</location>

Note that if in-depth location details are given in the Event structure, this is a "one-time" use of the information. This structure might be better used as part of a controlled vocabulary of locations, in which case the structures may be copied from a referenced concept containing the location information.

See Further Properties of Event Details for details of Location and a comprehensive list of other Event properties.

15.2.5. Who is involved?

Details of "who" will be present at an event are given using the Participant property. This can expressed simply using a QCode, URI or literal value, supplemented by a human-readable Name property, or optionally using additional properties describing the participants, their roles at the event, and a wealth of related information.

<participant qcode="eprol:director">
    <name>Michael Steidl</name>
    <personDetails>
        <contactInfo>
            <email>mdirector@iptc.org</email>
        </contactInfo>
    </personDetails>
</participant>

Note use of the IPTC Event Participant Role New Code (Scheme URI http://cv.iptc.org/eventparticipantrole/), with recommended Scheme Alias of "eprol" The event organisers are also an important part of the "who" of an event. A set of Organiser and Contact Information properties can give precise details of the people and organisations responsible for an event, and how to contact them. See Further Properties of Event Details for details

15.3. Event Coverage

The NewsML-G2 Planning Item is used to inform customers about content they can expect to receive and if necessary the disposition of staff and resources. (See Editorial Planning – the Planning Item).

The link between an Event and the optional Planning Item is created using the Event ID. The Planning Item conveys information about the content that is planned in response to the Event, including the types of content and quantities (e.g. expected number of pictures). The Planning Item also has a Delivery structure which enables the tracking and fulfilment of content related to an Event.

15.4. Event Properties in more detail

The examples below show the basic properties of events, starting with the Name and Definition of the event, with other descriptive details, moving on to show how relationships, location, dates and other details may be expressed.

15.4.1. Event Description

In the code sample that follows, we will begin to create an event using four properties:

  • Event Name is an internationalised string giving a natural language name of the event. More than one may be used, for example if the Name is expressed in multiple languages.

  • Definitions, two differentiated by @role ("short" and "long") will be created using the Block element template (which allows some mark-up).

  • Notes – also Block type elements – give some additional natural language information, which is not naturally part of the event definition, again using @role if required.

  • Using Related, the nature of the Event can be refined and other properties of an Event can be expressed.

Property Type Cardinality Notes

<name>

Intl String

1..n

The name of the event, with optional @role for each <name> used

<definition>

Block

1..n

A natural language definition.

<related>

Flexible

0..n

The nature of the event, and other information. The property is repeatable and may have a @rel attribute to indicate the relationship to the characteristic. See the examples for alternative ways to express related concepts.

<note>

Block

0..n

A repeatable natural language note of additional information about the event, with an optional @role.

For example:

<name xml:lang="en">
    Bank of England Monetary Policy Committee
</name>
<definition xml:lang="en" role="definitionrole:short">
    Monthly meeting of the Bank of England committee
    that decides on bank lending rates for the UK.
</definition>
<definition xml:lang="en" role="definitionrole:long">
    The Bank of England Monetary Policy Committee meets each
    month to decide on the minimum rate of interest that will
    be charged for inter-bank lending in the UK financial
    markets. <br />
</definition>
<!-- Using related with a QCode -->
<related qcode="eventrel:meeting" />
<!-- alternative using URI -->
<related uri="http://www.example.com/meeting" />
<!-- alternative using value and valuedatatype -->
<related rel="csrel:admission" value="7.0" valueunit="iso4217:EUR"
    valuedatatype="xs:decimal" />
<note role="noterole:toeditors">
    Note to editors: an embargoed press release of the +
    minutes of the meeting will be released by the COI +
    within two weeks.
</note>
<note role="noterole:general">
    The meeting was delayed by two days due to illness.
</note>

15.4.2. Event Relationships

Large news events may be split into a series of smaller, manageable events, arranged in a hierarchy which can express parent-child and peer relationships.

A "master" event may be notionally split into sub-events, which in turn may be split into further sub-events without limit. Each event instance can be managed separately yet handled and conveyed within the context of the larger realm of events of which it forms a part using the concept relationships <broader>, <narrower>, <related> and <sameAs> relationship properties.

image
Figure: Hierarchy of Events created using Event Relationships

It is important to note from the above diagram that Broader, Narrower and Same As have specific relationships to the same type of Concept that is in this case Events. Related has no such restriction. thus:

  • Aquatics is Broader than Breast Stroke or Swimming or Diving, but not Opening Speech.

  • Diving is Narrower than Aquatics, but not Archery.

  • Aquatics may be the Same As Aquatics in some other taxonomy, but not necessarily the Same As Swimming or Diving in some other taxonomy.

  • Breast Stroke may be Related to an Event, a Person or other Concept Type.

The examples below show the use of the event relationship properties for the fictional Economic Policy Committee event:

Property Type Notes

<broader>

Flexible Property (CCL)/Related Concept (PCL)

Repeatable. The event may be part of another event, in which case this can be denoted by <broader>.

<narrower>

Flexible Property/ Related Concept (PCL)

Repeatable. May be used to indicate that the event has related child events. In this case we want to notify the receiver that a child of this event is the scheduled medal ceremony.

<related>

Flexible Property/ Related Concept (PCL)

Repeatable. May be used to denote a relationship to another concept or event. In this case, we want to link the event to an organisation which is not a participant, but may later form part of the coverage of the event. We qualify the relationship using @rel and the IPTC Item Relation NewsCodes:

<sameAs>

Flexible Property/ Related Concept (PCL)

Repeatable. May be used to denote that this event is the same event in another taxonomy, for example another governing body’s taxonomy of events:

For example:

<broader
    type="cpnat:event" qcode="events:TR2012-34625">
    <name>Olympic Swimming Gala</name>
</broader>
<narrower qcode="events:TR2012-34593">
    <name>Men’s 100m Freestyle Medal Ceremony</name>
</narrower>
<related
    rel="irel:seeAlso"
    type="cpnat:organisation"
    qcode="org:asa">
    <name>Amateur Swimming association</name>
</related>
<sameAs qcode="iocevent:xxxxxx">
    <name>Men’s 100m Freestyle</name>
</sameAs>

15.4.3. Event Details Group

The first set of properties of Event Details is date-time information as described below. The further properties are described in Further Properties of Event Details

Dates and times

The IPTC intends that event dates and times in NewsML-G2 align with the iCalendar (iCal) specification. This does not mean that dates and times are expressed in the same FORMAT as iCalendar, but the implementation of NewsML-G2 properties that match iCalendar properties should be as set forth in the iCalendar specification.

The <dates> property contains the dates and times of the event, expressed using the child elements of <start>, <end>, and <duration>

  • The <start> is the date, optionally with a time, on which the event starts.

  • The <end> is the date, optionally with a time, at which the event finishes. Note that the end dates/times of events are non-inclusive. Therefore a one day event on September 14, 2011 would have a start date (2011-09-14) and would have EITHER an end date of September 15 (2011-09-15) OR a duration of one day (syntax: see in table below).

  • The <duration> of the event may be used in place of <end>. Either may be used, but not both.

The NewsML-G2 syntax for expressing start and end times of events is a valid calendar date with optional time and offset; the following are valid:

  • 2011-09-22T22:32:00Z (UTC)

  • 2011-09-22T22:32:00-0500

  • 2011-09-22

The following are NOT valid:

  • 2011-09-22T22:32Z (if time is used, all parts MUST be present.

  • 2011-09-22T22:32:00 (if time is used, time zone MUST be present.

When specifying the duration of an event, the date-part values permitted by iCalendar are W(eeks) and D(ays). In XML Schema, the only permitted date-part values are Y(ears), M(onths) and D(ays). (The permitted time parts H(ours), M(inutes) and S(econds) are the same in both XML Schema and iCalendar.)

The following table shows the permitted values for date-part in both standards

Duration XML Schema iCalendar

D(ays)

W(eeks)

M(onths)

Y(ears)

Since NewsML-G2 uses XML Schema Duration, ONLY the values listed as permitted under the XML Schema column can be used, and therefore W(eeks) MUST NOT be used.

In addition the IPTC recommends that to promote inter-operability with applications that use the iCalendar standard, Y(ears) or M(onths) SHOULD NOT be used; only D(ays) should be used for the date part of duration values.

Duration units can be combined, in descending order from left to right, for example:

  • P2D (duration of 2 days)

  • P1DT3H (1 day and 3 hours – note the "T" separator)

iCalendar specifies that if the start date of an event is expressed as a Calendar date with no time element, then the end date MUST be to the same scale (i.e. no time) or if using duration, the ONLY permitted values are W(eeks) or D(ays). In this case, only D(ays) may be used in NewsML-G2.

The following table lists the date-time properties of events details:

Property Type Notes/Example

<start>

Approximate Date Time

Mandatory, non-repeatable property has optional attributes: @approxstart, @approxend and @confirmationstatus (or its URI sibling @confirmationstatusuri).

The date part may be truncated, starting on the right (days) according to the precision required, but MUST, at minimum, have a year, for example:

<start>2011-06-12T12:30:00Z</start>

or

<start>2011-06-12</start>
If used, the time must be present in full, with time zone, and ONLY in the presence of the full date.

The value of <start> expresses the date, optionally with a time, on which the event starts. With the information available, this might be a "best guess".

By using @approxstart and @approxend it is possible to qualify the start date-time by indicating the range of date-times within which the start will fall. (Note: these are NOT the approximate start and end of the event itself, only the range of start date-times)

@approxstart indicates the start of the range. If used on its own, the end of the range of dates is the date-time value of <start>

For example, a possible start of an event on June 12, 2011, not before June 11, 2011, and no later than June 14 2011, would be expressed as:

<start
    approxstart="2011-06-11"
    approxend="2011-06-14">
    2011-06-12
</start>

The Confirmation Status of <start> is expressed by a QCode using one of the recommended IPTC Event Date Confirmation NewsCodes (Scheme Alias http://cv.iptc.org/newscodes/eventdateconfirm/, recommended scheme alias "edconf"). This has values:

  • approximate

  • confirmed

  • undefined

The @confirmationstatus may also be applied to <end> or <duration>, described below.

<end>

Approximate Date Time

Non-repeatable element to indicate the non-inclusive end time of the event, and optionally a range of values in which it may fall, using the same property type and syntax as for <start>

The <dates> wrapper may contain either an <end> date, or a <duration> but not both. A non-inclusive <end> date means, for example, that a one-day event starting on November 12, 2012 would have an end date of November 13, 2012.

<duration>

XML Schema Duration

Non-repeatable. The time period during which the event takes place is expressed in the form:

PnYnMnDTnHnMnS

P indicates the Period (required)

nY = number of Years*

nM = number of Months*

nD = number of Days

T indicates the start of the Time period (required if a time part is specified)

nH = number of Hours

nM = number of Minutes

nS = number of Seconds

* The IPTC recommends that these units should NOT be used.

Example:

<duration>PT3H</duration>

The event will last for three hours. Note use of the "T" time separator even though no Date part is present.

The <dates> wrapper may contain either an <end> date, or a <duration> but not both.

<confirmation>

QCode

DEPRECATED as of NewsML-G2 v2.24 - the <confirmation> property is replaced by a @confirmationstatus attribute (alternative: confirmationstatusuri) on the Events Details properties of <start>, <end>, and <duration>. See the description of these properties, above.

Recurrence Properties

This is a group of optional properties that may be used to specify the recurring instances of an event, and conforms to the iCalendar specification, including the use of the same enumerated values for properties such as Frequency (@freq). Recurrence MUST be expressed using EITHER

<rDate>: one or more explicit date-times that the event is repeated, OR

<rRule>: one or more rules of recurrence.

Property Type Notes/Example

<rDate>

Date with optional Time

Recurrence Date. Repeatable. If the recurrence occurs on a specific date, with an optional time part, or on several specific dates and times.

<rdate>2011-03-27T14:00:00Z</rdate>
<rdate>2011-04-03T16:00:00Z</rdate>

<rRule>

Recurrence Rule

Repeatable. The property has a number of attributes that may be used to define the rules of recurrence for the event.

The only mandatory attribute is @freq, an enumerated string denoting the frequency of recurrence.

<rRule freq="MONTHLY" />

The enumerated values of @freq are:

  • YEARLY

  • MONTHLY

  • DAILY

  • HOURLY

  • PER MINUTE

  • PER SECOND

@interval indicates how often the rule repeats as a positive integer. The default is "1" indicating that for example, an event with a frequency of DAILY is repeated EACH day. To repeat an event every four years, such as the Summer Olympics, the Frequency would be set to ‘YEARLY" with an Interval of "4":

<rRule
    freq="YEARLY"
    interval="4"
/>

@until sets a Date with optional Time after which the recurrence rule expires:

<rRule
    freq="MONTHLY"
    until="2011-12-31"
/>

@count indicates the number of occurrences of the rule. For example, an event taking place daily for seven days would be expressed as:

<rRule
    freq="DAILY"
    count="7"
/>

A group of @byxxx attributes (as per the iCalendar BYxxx properties) are evaluated after @freq and @interval to further determine the occurrences of an event: @bymonth, @byweekno, @byyearday, @bymonthday, @byday, @byhour, @byminute, @bysecond, @bysetpos.

The following code and explanation is based on an example from the iCalendar Specification at http://www.ietf.org/rfc/rfc5545.txt

<start>2011-01-11T8:30:00Z</start>
    <rRule
    freq="YEARLY"
    interval="2"
    bymonth="1"
    byday="SU"
    byhour="8 9"
    byminute="30"
/>

First, the interval="2" would be applied to freq="YEARLY" to arrive at "every other year". Then, bymonth="1" would be applied to arrive at "every January, every other year". Then, byday="SU" would be applied to arrive at "every Sunday in January, every other year".

Then, byhour="8 9" (note that all multiple values are space separated) would be applied to arrive at "every Sunday in January at 8am and 9am, every other year". Then, byminute="30" would be applied to arrive at "every Sunday in January at 8:30am and 9:30am, every other year". Then, lacking information from rRule, the second is derived from <start>, to end up in "every Sunday in January at 8:30:00am and 9:30:00am, every other year".

Similarly, if any of the @byminute, @byhour, @byday, @bymonthday or @bymonth rule part were missing, the appropriate minute, hour, day or month would have been retrieved from the <start> property.

The @bysetpos attribute contains a non-zero integer "n" between ‑366 and 366 to specify the nth occurrence within a set of events specified by the rule. Multiple values are space separated. It can only be used with other @by* attributes.

For example, a rule specifying monthly on any working day would be

<rRule
    freq="MONTHLY"
    byday="MO TU WE TH FR"

/>

The same rule to specify the last working day of the month would be

<rRule
    freq="MONTHLY"
    byday="MO TU WE TH FR"
    bypos="-1"
/>

@wkst indicates the day on which the working week starts using enumerated values corresponding to the first two letters of the days of the week in English, for example "MO" (Monday), SA (Saturday), as specified by iCalendar.

<rRule
    freq="WEEKLY"
    wkst="MO" />

<exDate>

Date with optional Time

Excluded Date of Recurrence. An explicit Date or Dates, with optional Time, excluded from the Recurrence rule. For example, if a regular monthly meeting coincides with public holidays, these can be excluded from the recurrence set using <exDate>

<rRule
    freq="MONTHLY"
    until="2011-12-31"
/>
<exDate>
    2011-04-06
</exDate>

<exRule>

Recurrence Rule

Excluded Recurrence Rule. The same attributes as <rRule> may be used to create a rule for excluding dates from a recurring series of events. For example, a regular weekly meeting may be suspended during the summer.

<rRule
    freq="WEEKLY"
    until="2011-07-23"
/>
<rRule
    freq="WEEKLY"
    until="2011-12-24"
/>
<exRule
    freq="WEEKLY"
    until="2011-09-03"
/>

Note the order of the above statement: the <rRule> elements must come before <exRule>

The meaning being expressed is:

"The event occurs weekly until Dec 24, 2011 with a break from after July 23, 2011 until September 3, 2011."

Further Properties of Event Details

The event details group are wrapped by the <eventDetails> element

Property Type Notes/Example

<occurStatus>

QCode

Optional, non-repeatable property to indicate the provider’s confidence that the event will occur. The IPTC Event Occurrence Status NewsCodes scheme

has values indicating the provider’s confidence of the status of the event, for example "Planned, occurrence uncertain"

<occurStatus qcode="eocstat:eos5" />

<newsCoverageStatus>

Qualified Property

Optional, non-repeatable element to indicate the status of planned news coverage of the event by the provider, using a QCode and (optional) <name> child element:

<newsCoverageStatus qcode="ncstat:int">
    <name>
        Coverage Intended
    </name>
</newsCoverageStatus>

<registration>

Block

Optional, repeatable indicator of any registration details required for the event:

<registration>
    Register online at <br />
    http://www.example.com/registration.aspx/
    <br />
</registration>

The property optionally takes a @role attribute. IPTC Registration Role NewsCodes

may be used, which currently has four values:

  • Exhibitor

  • Media

  • Public

  • Student

<registration role="eregrol:exhibReg">
    Exhibitors must register online at <br />
    http://www.example.com/exhibitor/register.aspx/
    before May1, 2011<br />
</registration>
<registration role="eregrol:pubReg">
    The public may pre-register online at <br />
    http://www.example.com/public/register.aspx/
    to receive a special bonus pack.<br />
</registration>

<accessStatus>

QCode

Optional, repeatable property indicating the accessibility, the ease (or otherwise) of gaining physical access to the event, for example, whether easy, restricted, difficult. The QCodes represent a CV that would define these terms in more detail. For example, "difficult: may be defined as "Access includes stairways with no lift or ramp available. It will not be possible to install bulky or heavy equipment that cannot be safely carried by one person".

<access qcode="access:easy" />

<subject>

Flexible Property

Optional, repeatable. The subject classification(s) of the event, for example, using the IPTC Subject NewsCodes:

<subject
    type="cpnat:abstract"
    qcode="medtop:04000000">
    <name xml:lang="en-GB">
        Economy, Business and Finance
    </name>
</subject>
<subject
    type="cpnat:abstract"
    qcode="medtop:20000271">
    <name xml:lang="en-GB">
        Financial and Business Service
    </name>
</subject>
<subject
    type="cpnat:abstract"
    qcode="medtop:20000274">
    <name xml:lang="en-GB">
        Banking
    </name>
</subject>

<location>

Flexible Property/
Flexible Location Property (PCL)

Repeatable property indicating the location of the event with an optional <name>.

<location
    type="cpnat:poi">
    <name>The Bank of England,Threadneedle Street, London,EC2R 8AH,UK</name>
</location>

At PCL, a rich Concept-style structure may be used. (See the NewsML-G2 Specification by visiting www.newsml-g2.org/spec.)

<participant>

Flexible Property/
Flexible Party Property (PCL)

Optional, repeatable, The people and/or organisations taking part in the event. The type of participant is identified by @type and a QCode. The following example indicates a person, an organisation would be indicated (using the IPTC NewsCodes) by type="cpnat:organisation"

<participant
    type="cpnat:person"
    qcode="pers:32965">
    <name xml:lang="en">Paul Tucker</name>
</participant>

An IPTC Event Participant Role NewsCode is available:

that holds roles such as "moderator", "director", "presenter"

<participationRequirement>

Flexible Property

Optional, repeatable element for expressing any required conditions for participation in, or attendance at, the event, expressed by a URI or QCode.

<participationRequirement
    qcode="partreq:accredited">
    <name>Accreditation required</name>
</participant>

<organiser>

Flexible Property/
Flexible Party Property (PCL)

Optional, repeatable. Describes the organiser of the event.

<organiser
    type="cpnat:organisation"
    uri="http://www.iptc.org/">
    <name xml:lang="en">
        International Press
        Telecommunications Council
    </name>
    <name xml-lang="fr">
        Comité International de
        Télécommunications de Presse
    </name>
</organiser>

The IPTC Event Organiser Role NewsCodes, viewable at http://cv.iptc.org/newscodes/eventorganiserrole/ lists types of organiser such as, "venue organiser", "general organiser", "technical organiser".

<contactInfo>

Wrapper element

Indicates how to get in contact with the event. This may be a web site, or a temporary office established for the event, not necessarily the organiser or any participant. See Contact Information

<language>

-

Optional, repeatable element describes the language(s) associated with the event using @tag with values that must conform to the IETF’s BCP 47. An optional child element <name> may be added.

<language tag="en">
    <name>English</name>
</language>

<newsCoverage>

DEPRECATED, Should not be used with Items conforming to EventsML-G2 1.6 and later.

News Coverage is now part of the NewsML-G2 Planning Item. (See Editorial Planning – the Planning Item)

Contact Information

Contact information associated solely with the event, not any organiser or participant. For example, events often have a special web site and an event office which is independent of the organisers’ permanent web site or office address.

The <contactInfo> element wraps a structure with the properties outlined below. An event may have many instances of <contactInfo>, each with @role indicating the purpose. These are controlled values, so a provider may create their own CV of address types if required, or use the IPTC Event Contact Info Role NewsCodes (http://cv.iptc.org/newscodes/eventcontactinforole/ with recommended Scheme Alias of "ecirol"), which has values of "general contact", "media contact", ticketing contact".

Each of the child elements of <contactInfo> may be repeated as often as needed to express different @roles.

Property Name Element Type Notes

Email Address

<email>

Electronic Address

An “Electronic Address” type allows the expression of @role (QCode) to qualify the information, for example: <email role=“addressrole:office”> info@ecb.eu </email>

Instant Message Address

<im>

Electronic Address

<im role=“imsrvc:reuters”> jc.trichet.ecb.eu@reuters.net </im>

Phone Number

<phone>

Electronic Address

Fax Number

<fax>

Electronic Address

Web site

<web>

IRI

<web role=“webrole:corporate”> www.ecb.eu </web>

Postal Address

<address>

Address

The Address may have a @role to denote the type of address is contains (e.g. work, home) and may be repeated as required to express each address @role.

Other information

<note>

Block

Any other contact-related information, such as “annual vacation during August”

For example:

<email role="ecirol:media">
    office@iptc.org
</email>
<im role="ecirol:general">
    jdoe.iptc.org@reuters.net
</im>
<phone role="ecirol:general">
    1-123-456-7899
</phone>
<phone role="ecirol:media">
    1-123-456-7898
</phone>
<web role="ecirol:ticketing">
    www.iptc.org/springmeeting.html
</web>
Address details <address>

The Address Type property may have a @role to indicate its purpose; the following table shows the available child properties. Apart from <line>, which is repeatable, each element may be used once for each <address>

For example:

<address role="ciprol:office">
    <line>20 Garrick Street</line>
    <locality>
        <name xml:lang="en">London</name>
    </locality>
    <country qcode="Iiso3166-1a2:UK">
        <name xml:lang="en">United Kingdom</name>
    </country>
    <postalCode>WC2E 9BT</postalCode>
</address>

15.5. Multiple Event Concepts in a Knowledge Item

Information about multiple events can be assembled in a Knowledge Item, which conveys a set of one or more Concepts. In this example, we will use a Knowledge Item to convey two related events.

The top-level <knowledgeItem> element contains identification, version and catalog information, in common with other NewsML-G2 Items:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<knowledgeItem xmlns="http://iptc.org/std/nar/2006-10-01/"
    guid="urn:newsml:iptc.org:20110126:qqwpiruuew4712"
    version="10"
    standard="NewsML-G2"
    standardversion="2.25"
    conformance="power"
    xml:lang="en">
    <catalogRef href="http://www.iptc.org/std/catalog/catalog.IPTC-G2-Standards_30.xml" />
    <catalogRef href="http://www.example.com/events/event-catalog.xml"
/>

The <itemMeta> block conveys Management Metadata about the document and the <contentMeta> block can convey Administrative Metadata about the concepts being conveyed, and in this case we can show Descriptive Metadata that is common to all of the concepts in the Knowledge item,

The Core Descriptive Metadata properties common to all NewsML-G2 Items that may be used in a Knowledge Item are: <language>, <keyword>, <subject>, <slugline>, <headline> and <description>.

<itemMeta>
    <itemClass qcode="cinat:concept" />
    <provider qcode="nprov:IPTC" />
    <versionCreated>2017-18-10T12:00:00Z
    </versionCreated>
    <pubStatus qcode="stat:usable" />
</itemMeta>
<contentMeta>
    <urgency>5</urgency>
    <contentCreated>2016-01-16T12:15:00Z</contentCreated>
    <contentModified>2017-10-12T14:35:00Z</contentModified>
    <subject qcode="medtop:20000304">
        <name>media</name>
    </subject>
    <subject qcode="medtop:20000309">
        <name>news agency</name>
    </subject>
    <subject qcode="medtop:20000763">
        <name>IT/computer sciences</name>
    </subject>
</contentMeta>

The two events in the listing are related, with the relationship indicated by the second event using the <broader> property to show that it is an event which is part of the three-day event listed first

First event:

<conceptSet>
    <concept>
    <!-- FIRST EVENT! -->
        <conceptId created="2017-10-26T12:15:00Z" qcode="event:1234567" />
        <name>IPTC Autumn Meeting 2017</name>
        <eventDetails>
    ...

Second event: a session on news gathering and verification below (event:91011123) has a ‘broader’ relationship to the IPTC Autumn Meeting above (event:1234567).

<concept>
    <!-- SECOND EVENT! -->
    <conceptId created="2017-09-36T12:00:00+00:00" qcode="event:91011123" />
    <name>Newgathering and Verification Strategy</name>
    <broader type="cpnat:event" qcode="event:1234567">
        <name>IPTC Autumn Meeting 2017</name>
    </broader>
    <eventDetails>
    ...

15.5.1. Full listing of the Event Knowledge Item

Note that although Knowledge Items are a convenient way to convey a set of related events, there is no requirement that all of the events in a KI must be related, or even that other concepts conveyed by the Knowledge Item are events; they may be people, organisations or other types of concept.

LISTING 15: Two Related Events in a Knowledge Item

All Scheme Aliases used in listing below indicate IPTC NewsCodes vocabularies, except for the following alias values: accst, frel, ventyp, event.

<?xml version="1.0" encoding="UTF-8"?>
<knowledgeItem
    xmlns="http://iptc.org/std/nar/2006-10-01/"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://iptc.org/std/nar/2006-10-01/
        ./NewsML-G2_2.25-spec-All-Power.xsd"
    guid="urn:newsml:iptc.org:20101019:qqwpiruuew4712"
    version="10"
    standard="NewsML-G2"
    standardversion="2.25"
    conformance="power"
    xml:lang="en">
    <catalogRef href="http://www.iptc.org/std/catalog/catalog.IPTC-G2-Standards_30.xml" />
    <catalogRef href="http://www.example.com/events/event-catalog.xml" />
    <itemMeta>
        <itemClass qcode="cinat:concept" />
        <provider qcode="nprov:IPTC" />
        <versionCreated>2017-10-18T13:05:00Z
        </versionCreated>
        <pubStatus qcode="stat:usable" />
    </itemMeta>
    <contentMeta>
        <urgency>5</urgency>
        <contentCreated>2016-01-16T12:15:00Z</contentCreated>
        <contentModified>2017-10-12T13:35:00Z</contentModified>
        <subject qcode="medtop:20000304">
            <name>media</name>
        </subject>
        <subject qcode="medtop:20000309">
            <name>news agency</name>
        </subject>
        <subject qcode="medtop:20000763">
            <name>IT/computer sciences</name>
        </subject>
    </contentMeta>
    <conceptSet>
        <concept>
            <!--               FIRST EVENT!                -->
            <!--                      x                      -->
            <conceptId created="2017-01-16T12:15:00Z" qcode="event:1234567" />
            <name>IPTC Autumn Meeting 2017</name>
            <eventDetails>
                <dates>
                   <start>2017-10-26T09:00:00Z</start>
                   <duration>P2D</duration>
                </dates>
                <registration>Registration with the IPTC office is required.
                   A <a href="https://iptc.org/moz/events/annual-general-meeting-2017/">
                   web form</a> may be used until 24 September 2017
                </registration>
                <participationRequirement>
                   <name>Membership</name>
                   <definition>Only members of the IPTC and their invited
                       guests may attend.
                   </definition>
                </participationRequirement>
                <accessStatus qcode="accst:easy" />
                <language tag="en" />
                <organiser qcode="nprov:IPTC" role="eorol:general">
                   <name>International Press Telecommunications Council</name>
                   <organisationDetails>
                       <founded>1965</founded>
                   </organisationDetails>
                </organiser>
                <contactInfo>
                <email>mdirector@ipct.org</email>
                <note>Michael Steidl, Managing Director</note>
                <web>http://www.iptc.org</web>
                </contactInfo>
                <location>
                   <name>86, Edgeware Road, London W2 2EA, United Kingdom</name>
                   <related rel="frel:venuetype" qcode="ventyp:confcentre" />
                   <POIDetails>
                       <position latitude="51.515659" longitude="-0.163346" />
                       <contactInfo>
                           <web>https://www.etcvenues.co.uk</web>
                       </contactInfo>
                   </POIDetails>
                </location>
                <participant qcode="eprol:moderator">
                   <name>Stuart Myles</name>
                   <definition>Chairman of the Board of the IPTC </definition>
                </participant>
                <participant qcode="eprol:director">
                   <name>Michael Steidl</name>
                   <definition>Managing Director</definition>
                </participant>
            </eventDetails>
        </concept>
        <concept>
            <!--                SECOND EVENT!              -->
            <!--                       x                      -->
            <conceptId created="2017-01-16T12:00:00Z" qcode="event:91011123" />
            <name>Newgathering and Verification Strategy </name>
            <broader type="cpnat:event" qcode="event:1234567">
                <name>IPTC Autumn Meeting 2017</name>
            </broader>
            <eventDetails>
                <dates>
                   <start>2017-10-27T14:30:00Z</start>
                   <duration>PT30M</duration>
                </dates>
                <participationRequirement>
                   <name>Registration</name>
                   <definition>Pre-registration required for all attendees
                   </definition>
                </participationRequirement>
                <accessStatus qcode="accst:easy" />
                <language tag="en" />
                <participant qcode="eprol:presenter">
                   <name>Evi Varsou</name>
                   <definition>Presenter</definition>
                </participant>
                <participant qcode="eprol:moderator">
                   <name>Michael Steidl</name>
                   <definition>Moderator</definition>
                </participant>
            </eventDetails>
        </concept>
    </conceptSet>
</knowledgeItem>

15.5.2. Indicating changes to part of a Knowledge Item

When multiple Events are conveyed as Concepts in a Knowledge Item, and a sub-set of the Event Concepts is updated, providers may use the <partMeta> helper element to inform customers WHAT has changed in the new version of the Knowledge Item.

The <partMeta> element uses the standard XML ID/IDREF; the example shows the Event with @id="eventA" was modified on ‘2016-09-15’, as indicated by the Part Meta with @contentrefs="eventA":

<knowledgeItem
        ...>
    <itemMeta>
        ...
    </itemMeta>
    <contentMeta>
        ...
    </contentMeta>
    <partMeta contentrefs="eventA">
        <contentModified>2017-09-15</contentModified>
    </partMeta>
    <conceptSet>
        <concept id="eventA">
        <!-- FIRST EVENT! -->
        <!-- x -->
            <conceptId created="2017-08-30T12:00:00+00:00"
                qcode="event:1234567" />
        ...
        </concept >
        <conceptid="eventB">
        <!-- SECOND EVENT! -->
        <!-- x -->
            <conceptId created="2017-08-30T12:00:00+00:00"
                qcode="event:91011123" />
        ...
        </concept>
    </conceptSet>
</knowledgeItem>

15.6. Conveying Events in a NewsML-G2 Package

A news provider may wish to create a service that consists of collections of events that are significant to a specific editorial theme, for example, this could be the day’s Top Finance Events.

When such Events are available as Concept Items, they may be referenced in a Package using <itemRef> and @residref. (See the Quick Start - Packages chapter for details of the Package Item structure and Item references)

Alternatively, because they are managed and persistent Events identified using URIs, or their short format QCodes, collections of event concepts may be referenced within a Package <group> using Concept Reference <conceptRef> as follows:

<groupSet root="G1">
    <group id="G1" role="group:main">
    ...
    <conceptRef type="cpnat:event"
        qcode="iptcevents:20081007135637.12">
            <name xml:lang="fr">Barack Obama arrive à Washington</name>
    </conceptRef>
    ...
    </group>
</groupSet>

Note the optional @type indicating that the referenced concept is an Event, and the optional <name> child element is a natural-language name for the event extracted from the concept being referenced.

Note the following guidelines on packaging Event Concepts:

  • "Hint" properties – these are properties extracted from the original concept and conveyed with the reference to the concept – are restricted to <name> only when using <conceptRef>. When using <itemRef>, ANY property may be extracted from the referenced Concept Item.

  • The provider must take care that "Hint" properties do not re-define the original concept. When a provider distributes concepts using Concept Items or Knowledge Items, changes to the original concept MUST be issued via an updated Concept Item and/or Knowledge Item and if a @version is used with the <itemRef>, @residref must be updated to correspond with the revised concept’s version.

  • If for the referenced concept a Concept Item is available then this concept’s Item MAY be referenced by <itemRef>; this is the preferred option. This MAY include Hint properties copied from the Concept Item.

  • If for the referenced concept a Concept Item is available then this concept MAY be referenced using <conceptRef> as a valid alternative option, for example when a package includes a mix of concepts for which some Concept Items are available and some are not available, but the provider wants to express the references in a consistent way. However, in a case such as this the constraint on the use of Hints has to be accepted.

  • If for a referenced concept there is no Concept Item and only a URI as identifier is available, then <conceptRef> MUST be used; no Hint properties, except a name, are available as there is no Concept Item available as a source for copying Hints.

The standard method for exchanging concept information in the NewsML-G2 context is the Knowledge Item, which is a container for concepts and the detailed properties of each concept being conveyed. A provider would use Knowledge Items to send comprehensive information about many events, and receivers might incorporate this information into their own editorial diary, or Day Book.

By contrast, when a Package is employed to reference Event concepts, this represents an Editorial product, in which a discrete number of events considered to be significant or pertinent to a particular topic, are selected and published.

The Events Package has the following characteristics:

  • The value of the package is in the journalistic judgement used to select the events, not in the compilation of the event information itself.

  • When using <conceptRef>, the package is a lightweight container that references each event and a human-readable name; no further details of the events are given. In this case, the package is transient in nature: it will not be referenced over time except as an archived item of editorial content.

  • It may be ordered to indicate the relative significance of the events in the context of the package, but no other relationship between the events is expressed or implied.

  • New versions of the Package may be published; events may be replaced or deleted from the package, but details of the events themselves cannot be changed.

The characteristics of a KI are:

  • It represents Knowledge; the value is in the compilation of the details about each event in the KI: the "when and where" and other practical information about each event.

  • The KI contains persistent information about each event concept that may be expected to be frequently used and if necessary updated over time.

  • The concepts in the KI cannot be ordered, but may be related.

  • New versions of the KI can be published and the concepts that it contains can be updated, but they must not be deleted, as this would break existing references to the concepts.

15.7. Events Workflow

image
Figure: Event Flow using Package Items and Knowledge Items

The flow diagram shows how the News Provider creates and compiles event information and publishes it in a Knowledge Item, and that the Customer’s events management system subscribes to this information.

Later, the News Provider’s editorial team reviews the events database and creates an ordered list of the "Top Ten" news events of the day. The News Provider publishes the list as a Package Item. The Package Item is ingested by the Customer’s editorial system, and the Customer’s journalists use it as a guide in planning the news coverage of the day, looking up details of each event (hosted in the Knowledge Item) as needed.

15.8. Events in a News Item

Many news providers, particularly news agencies, provide their customers with event information as a list of events, for example a list of cultural events taking place in a city’s theatres. In many cases, these were provided as a text story, with minimal mark-up.

Conveying these events in a News Item as described below can be useful, as the structured mark-up enables them to be re-formatted by software, for example as tables or listings in print and online media. Note, however, that this is a limited and specific use case; events in this form are NOT persistent, and cannot be managed as part of the News Planning process (as described in Editorial Planning – the Planning Item).

The Content Set of the Events News Item uses the <inlineXML> element to convey an <events> wrapper containing one or more <event> instances, each of which is a separate self-contained set of event information:

<contentSet>
    <inlineXML>
        <events>
            <event>
                <name>NewSgathering and Verification Strategy</name>
                <eventDetails>
                   <dates>
                       <start>2017-10-27T14:00:00Z</start>
                       <end>2017-10-27T14:30:00Z</end>
                   </dates>
                </eventDetails>
            </event>
        </events>
    </inlineXML>
</contentSet>

The full listing is shown below. Note the Item Class of this News Item is "ninat:composite".

LISTING 16: A Set of Events carried in a News Item

All Scheme Aliases used in listing below indicate IPTC NewsCodes vocabularies, except for the following alias values: frel, ventyp, facilncd

<?xml version="1.0" encoding="UTF-8"?>
<newsItem
    xmlns="http://iptc.org/std/nar/2006-10-01/"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://iptc.org/std/nar/2006-10-01/
        ./NewsML-G2_2.25-spec-All-Power.xsd"
    guid="urn:newsml:iptc.org:20090122:qqwpiruuew4711"
    version="10" standard="NewsML-G2" standardversion="2.25" conformance="power"
    xml:lang="en">
    <catalogRef href="http://www.iptc.org/std/catalog/catalog.IPTC-G2-Standards_30.xml" />
    <itemMeta>
        <itemClass qcode="ninat:composite" />
        <provider qcode="nprov:IPTC" />
        <versionCreated>2017-10-18T12:44:00Z
        </versionCreated>
        <pubStatus qcode="stat:usable" />
    </itemMeta>
    <contentSet>
        <inlineXML>
            <events>
            <!--                       x                      -->
            <!--               FIRST EVENT!                -->
            <!--                      x                      -->
                <event>
                   <name>IPTC Autumn Meeting 2017</name>
                   <eventDetails>
                       <dates>
                           <start>2017-10-26T09:00:00Z</start>
                           <duration>P2D</duration>
                       </dates>
                       <location>
                           <name>86, Edgeware Road, London W2 2EA, United Kingdom</name>
                           <related rel="frel:venuetype" qcode="ventyp:confcentre" />
                           <POIDetails>
                               <position latitude="51.515659" longitude="-0.163346" />
                               <contactInfo>
                                   <web>https://www.etcvenues.co.uk</web>
                               </contactInfo>
                           </POIDetails>
                       </location>
                   </eventDetails>
                </event>
            <!--                       x                      -->
            <!--                SECOND EVENT!              -->
            <!--                       x                      -->
                <event>
                   <name>Annomarket text analytics EU project</name>
                   <eventDetails>
                       <dates>
                           <start>2017-10-22T10:00:00-04:00</start>
                           <end>2017-10-22T11:00:00-04:00</end>
                       </dates>
                   </eventDetails>
                </event>
            <!--                      x                      -->
            <!--               THIRD EVENT!                 -->
            <!--                     x                      -->
                <event>
                   <name>Accidental Heroes</name>
                   <definition>
                       News stories and random incidents provide the inspiration behind
                       this new production from the Lyric Young Company, which blends the
                       inconsequential with the life-defining in a physical and visually
                       arresting new show.
                       <br />
                       The Lyric Young Company has worked with award-winning
                       writer/director Mark Murphy to create Accidental Heroes.
                       <br />
                   </definition>
                   <related rel="frel:facility" qcode="facilncd:Food" />
                   <related rel="frel:facility" qcode="facilncd:AirConditioning" />
                   <eventDetails>
                       <dates>
                           <start>2017-10-23T19:30:00Z</start>
                           <end>2017-02-05</end>
                           <rRule freq="DAILY" byday="TH FR SA" />
                       </dates>
                   </eventDetails>
                </event>
            </events>
        </inlineXML>
    </contentSet>
</newsItem>

15.8.1. Adding event concept details to a News Item

Expressing Events as Concepts in a Concept Item gives implementers great scope for including and/or referencing events information in other Items, such as News Items conveying content. This is achieved by adding the event as a <subject> element of the <contentMeta> container. For example in 0 an event was conveyed as a concept within the Concept Set of a Knowledge Item:

<knowledgeItem
    ...>
    <conceptSet>
        <concept>
        <!-- FIRST EVENT! -->
        <!-- -->
            <conceptId created="2017-01-16T12:15:00Z" qcode="event:1234567" />
            <name>IPTC Autumn Meeting 2017</name>
            <eventDetails>
                ...
            </eventDetails>
    ...
</knowledgeItem>

The QCode event:1234567 uniquely identifies the event, and later news coverage of the event can reference it using the <subject> element, as shown below:

<newsItem
    ...>
    <contentMeta>
        ...
        <subject type="cpnat:event" qcode="event:12345657">
            <name>IPTC Autumn Meeting 2017</name>
        </subject>
        ...
    </contentMeta>
    ...
</newsItem>

Referencing an event using Subject enables News Items to be searched and/or grouped by event, and also helps end-users manage the content for events with a wide coverage and many delivered Items.

15.8.2. Using <bag> to create a composite concept

Providers can put an event and related concepts together to create a new composite concept using a <bag> child element of <subject>.

For example, a financial news service sends a News Item conveying content about a takeover of a small company by a larger global company. This takeover story has an event concept, which enables the provider to add a subject property containing the QCode of the Event:

<subject type="cpnat:event" qcode="finevent:takeover123AB" />
  1. together with subject properties for each company:

<subject type="cpnat:organisation" qcode="isin:SmallCompany" />
<subject type="cpnat:organisation" qcode="isin:GlobalCompany" />

However, a new, richer, composite concept combining the concept IDs for the Event and the two companies could be created instead using <bag>:

<newsItem ...>
    ...
    <contentMeta>
        ...
        <subject type="cpnat:abstract">
            <name>GlobalCompany takes over SmallCompany</name>
            <bag>
                <bit type="cpnat:event" qcode="finevent:takeover123AB" />
                <bit type="cpnat:organisation" qcode="isin:SmallCompany" />
                <bit type="cpnat:organisation" qcode="isin:GlobalCompany" />
            </bag>
        </subject>
        ...
    </contentMeta>
    ...
</newsItem>

Note that in this case, the <subject> does not have its own
identifier; these properties must NOT be used when <subject>
has a <bag> element

Using @significance with <bag>

An advantage of using the composite structure of <bag> is that implementers can "fine tune" the subject properties using @significance. This enables receiving applications to be more discerning on behalf of end-users, about how they filter and select news.

The significance of an event to all the concepts in a <bag> is not necessarily equal: for an end-user interested in news about a small company, the fact that it has been taken over by a large company is very significant. To the follower of news about the large company, the story may be less significant.

The code snippet below shows how the significance attribute would be used to refine the previous example, such that the event is more significant to the SmallCompany (100) compared to the GlobalCompany (10):

<subject type="cpnat:abstract">
    <name>GlobalCompany takes over SmallCompany</name>
    <bag>
        <bit type="cpnat:event" qcode="finevent:takeover123AB" />
        <bit type="cpnat:organisation" qcode="isin:SmallCompany" significance="100"/>
        <bit type="cpnat:organisation" qcode="isin:GlobalCompany" significance="10"/>
    </bag>
</subject>
The <bag> MUST contain one <bit> of type "cpnat:event"
This feature was added from NewsML-G2 2.7 onwards. ONLY for the special case where the <bag> contains a <bit> referencing an event, may a @significance attribute be added to the other <bit> members of the <bag>.

16. Editorial Planning – the Planning Item

News-gathering is not an ad-hoc process; as outlined in How News Happens, professional news organisations need to be well-organised so that resources are used effectively, customers get the news they need, in time, and editorial standards and quality are maintained.

News planning information can improve collaboration, and thus efficiency, in B2B news exchange, and for some news agencies and their media customers, this has become an important area of development.

The Planning Item addresses this need. As a "top level" NewsML-G2 Item, it is a sibling of News Item, Concept Item, Package Item and Knowledge Item.

The Planning Item carries the News Coverage information that was previously expressed by Event Concept Items under the <eventDetails> wrapper, and has also been expanded to include additional features, such as the ability to track news deliverables against a previously-announced manifest of objects.

There are a number of advantages to the separation of Events and Planning information, including:

  • News coverage information may change frequently in response to a news event. Previously, this would have meant re-sending all of the event information whenever the news planning changed. Now only the planning information is re-sent – a lighter processing overhead.

  • News coverage is not always in response to a news event taking place at a specific place and time; it is also topic-based, such as "The best ski resorts for this winter". Without the Planning Item, topic-based coverage such as this would require the creation of a "dummy" event as a placeholder for the coverage information. Now, no event information is required, merely a Planning Item that can carry the description of the topic and can be linked to the content once it is created.

  • Organisations needed a way to announce planned coverage of an event, in terms of (say) a number of pictures, then inform receivers of progress and fulfilment. A Planning Item can provide a list of Items that have been delivered. The complementary "Delivery Of" property, which has been added as a property of other Items, can also link individual Items back to a Planning Item.

  • Having news coverage information tightly-bound to an event caused issues when events span more than one news cycle. A single event may have multiple Planning Items spanning different time periods, so multiple <planning> child elements of <newsCoverage> can specify @coversfrom and @coversto attributes.

Planning Items typically focus on the delivery of coverage for a single event or topic, but can be linked to other Planning Items to facilitate the coverage of more complex or long-term events.

The structure of a Planning Item is common to other NewsML-G2 Items:

  • The top level <planningItem> properties of conformance, identification and versioning that we associate with "anyItem"

  • Item Metadata contained in the <itemMeta> wrapper

  • Administrative and Descriptive information related to the content wrapped in the <contentMeta> element.

  • Special-purpose elements <assert> and <inlineRef> to group details of concepts and to provide references from content to metadata (PCL only) (see 25.2 and 25.3 for details on the use of these elements)

  • Extension Points for providers to add properties from non-IPTC namespaces.

  • A wrapper for the content payload, in this case a <newsCoverageSet>.

16.1. Item Metadata <itemMeta>

The standard properties of <itemMeta> are available for use in a Planning Item. The <itemClass> property uses a mandatory planning-specific IPTC "Planning Item Nature" NewsCodes. The Scheme URI is: http://cv.iptc.org/newscodes/plinature/ and recommended Scheme Alias is "plinat":

<itemClass qcode="plinat:newscoverage" />

16.2. Content Metadata <contentMeta>

The standard properties of the Administrative Metadata group may be used; a restricted set of Descriptive Metadata properties (the Core Descriptive Metadata Group, may be used. The Core Group is: <language>, <keyword>, <subject>, <slugline>, <headline>, <description>

16.3. <newsCoverageSet>

The <newsCoverageSet> wraps one or more <newsCoverage> components (see the diagram below). Typically, each <newsCoverage> component is bound to each different class of Item to be delivered, i.e. a Planning Item for two texts, ten pictures and one graphic could have three <newsCoverage> components.

Structure of Planning Item

16.4. <newsCoverage>

Figure: Planning Item: the newsCoverageSet has one or more newsCoverage components

Each <newsCoverage> component MUST contain at least one <planning> element and optionally a <delivery> element.

16.5. <planning>

At CCL, planning information is expressed using a repeatable Block type <edNote> to give a natural language description of the planned coverage, optionally with some mark-up

<edNote>Picture scheduled 2017-11-06T13:00:00+02:00</edNote>

Below is a complete example:

Listing 17: Planning Item at CCL

All Scheme Aliases used in listing below indicate IPTC NewsCodes vocabularies.

<?xml version="1.0" encoding="UTF-8"?>
<planningItem
    xmlns="http://iptc.org/std/nar/2006-10-01/"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://iptc.org/std/nar/2006-10-01/
        ./NewsML-G2_2.25-spec-All-Core.xsd"
    guid="urn:newsml:iptc.org:20101025:gbmrmdreis4711"
    version="10"
    standard="NewsML-G2"
    standardversion="2.25"
    conformance="core"
    xml:lang="en">
    <catalogRef href="http://www.iptc.org/std/catalog/catalog.IPTC-G2-Standards_30.xml" />
    <catalogRef href="http://www.example.com/events/event-catalog.xml" />
    <itemMeta>
        <itemClass qcode="plinat:newscoverage" />
        <provider qcode="nprov:IPTC" />
        <versionCreated>2017-10-18T13:45:00Z
        </versionCreated>
        <pubStatus qcode="stat:usable" />
    </itemMeta>
    <contentMeta>
    <urgency>5</urgency>
    <contentCreated>2016-10-16T12:15:00Z</contentCreated>
    <contentModified>2017-10-16T13:35:00Z</contentModified>
    </contentMeta>
        <newsCoverageSet>
                <newsCoverage>
                   <planning>
                       <edNote>Text 250 words</edNote>
                   </planning>
                </newsCoverage>
                <newsCoverage>
                   <planning>
                       <edNote>Picture scheduled 2017-11-06T13:00:00+02:00</edNote>
                   </planning>
                   <delivery>
                       <deliveredItemRef
                           guidref="tag:example.com,2008:ART-ENT-SRV:USA20081220098658">
                       </deliveredItemRef>
                   </delivery>
                </newsCoverage>
        </newsCoverageSet>
</planningItem>

16.5.1. <planning> at Power Conformance Level

Using PCL capabilities for Planning can populate a customer’s resource management applications with machine-readable information and detailed descriptive metadata, ready to be inherited by the arriving content, thus speeding up news handling and potentially increasing consistency and quality.

Used in this way, the Planning Item can bridge the workflows of provider and customer: the provider is seen as an available resource on the customer system with coverage information and news metadata capable of being updated in near real-time.

A key feature of the <planning> element is the ability to use <newsContentCharacteristics> to express comprehensive information about physical characteristics of the planned content, using the News Content Characteristics group of attributes. These are described in detail in the News Content Characteristics section of the NewsML-G2 Specification Document, which may be obtained by following the link at the IPTC NewsML-G2 resource page.

The attributes include physical properties of still images, video and audio, the line count of text content and the page count for content rendered as pages. Examples of their usage are given in the table below.

From NewsML-G2 version 2.23, the <planning> and <assignedTo> elements of <newsCoverage> have been extended to enable the planning of events to be split into multiple parts. This may be needed if, for example, a long-running event spans more than one news cycle. The cardinality of <planning> is extended from (1) to (1..∞) and new attributes @coversfrom and @coversto are added to the <planning> and <assignedTo> elements of the <newsCoverage> wrapper. These express the date (and optionally the time) of the start and end of coverage. See the code listing below for example usage.

Additional properties of <planning> at PCL:

Property Type Notes/Example

<g2ContentType>

String

Optional, non-repeatable element to indicate the Media Type of the intended coverage. The example below indicates that the content to be delivered is a NewsML-G2 News Item.

<g2ContentType>
    application/vnd.iptc.g2.newsitem+xml
</g2ContentType>

<itemClass>

QCode

Optional, non-repeatable element indicates the type of content to be delivered, using the IPTC News Item Nature NewsCodes. Since the example will show a text article, the Item Class is "text"

<itemClass qcode="ninat:text"/>

<itemCount>

Empty

The number of items to be delivered, expressed as a range:

<itemCount rangefrom="1" rangeto="1" />

Both attributes @rangefrom (non-negative integer) and @rangeto (positive integer) MUST be used. The values are inclusive: rangefrom="2" rangeto="5" means 2 to 5 items will be delivered.

<assignedTo>

Flex1 Party Property

The Flex1 Party Property Type extends the Flex Party Property Type by allowing a @role attribute to be a space separated list of QCodes.

<assignedTo> is an optional, non-repeatable element that holds the details of a person or organisation who has been assigned to create the announced content. It may hold as child elements any property from <personDetails> or <organisationDetails>, and properties from the Concept Definitions Group, and Concepts Relationships Group (see Concepts and Concept Items for details.

The example shows the details of a person assigned to create the content:

<assignedTo
    role="erole:editor"
    type="cpnat:person"
    qcode="pers:54321">
    <name>Stilton Cheesewright</name>
    <personDetails>
        <contactInfo>
            <phone>1-418-4567</phone>
            <email>stilton@iptc.org</email>
        </contactInfo>
    </personDetails>
</assignedTo>

This information may be required internally by a news organisation as part of its event planning process, but perhaps may not be distributed to customers.

Customers may be informed of the intended author/creator of planned coverage using the <by> property (see below)

<scheduled>

Approx. Date Time

Optional, non-repeatable. Indicates the scheduled time of delivery, and may be truncated if the precise date and time is not known. For example, if the content is scheduled to arrive at some unspecified time on a day, the value would be, for example:

<scheduled>2017-10-16</scheduled>

<service>

Qualified Property

Optional, repeatable. The editorial service to which the content has been assigned by the provider and on which the receiver should expect to receive the planned content.

<service qcode="srv:intwire">
    <name>International Wire Service</name>
</service>

<newsContentCharacteristics>

Empty

An element that enables providers to express physical properties of the planned item using attributes from the News Content Characteristics group:

  • linecount. The count of the number of lines in text content

<newsContentCharacteristics linecount="205" />
  • pagecount. The number of pages of the planned content.

<newsContentCharacteristics pagecount="4" />

<planningExtProperty>

For example, the planned item has a proprietary content rating. The rating is expressed using @rel with a QCode indicating the nature of the proprietary property, a @value and a @valuedatatype:

<planningExtProperty rel="mediarel:hasParentalAdvisory" value="6" valuedatatype="xs:positiveInteger"/>

Descriptive Metadata for <planning>

Metadata "hints" may also be added to assist receivers in preparing for the planned item, using the Descriptive Metadata Properties group:

Property Type Notes/Example

<by>

Label

Optional, repeatable. Natural language author/creator information:

<by>By Stilton Cheesewright</by>

<creditline>

String

Optional, repeatable. A freeform expression of the credit(s) for the content:

<creditline>Additional reporting by Bertram Wooster</creditline>

<dateline>

Label

Optional, repeatable. Natural language information traditionally placed at the start of a text by some news agencies, indicating the place and time that the content was created:

<dateline>
    Totley Towers, January 27, 2017.(Reuters)
</dateline>

<description>

Block

Optional, repeatable. A free form textual description of the intended news coverage, with minimal mark-up permitted. The optional @role may use a value from the IPTC Description Role NewsCodes:

<description role="drol:summary">
    Stilton Cheesewright will report on the proceedings from the NewsCodes Working Party
</description>

<genre>

Flex1 Concept Property

Optional, repeatable. The nature of the journalistic content that is intended for the news coverage. May be expressed by a QCode or URI value, with optional @type.

Child elements may be any from the Concept Definition Group and the Concept Relationships Group (see Concepts and Concept Items):

<genre qcode="mygenre:main">
    <name>Main article </name>
</genre>

<headline>

Label

Optional, repeatable. Headline that will apply to the content:

<headline>NewsCodes Working Party</headline>

<keyword>

String

Optional, repeatable. A freeform term to assist in indexing the content:

<keyword>IPTC</keyword>

<language>

-

Optional, repeatable. The language of the intended coverage, May have a @role to inform the receiver of the use of the language. The IPTC Language Role NewsCodes currently has two values, "Subtitle" and "Voice Over" that apply to video content.

The language @tag MUST be expressed using IETF BCP 47 and may have a child element of <name>:

<language tag="en-GB">
    <name>UK English</name>
</language>

<slugline>

Internationalised String

Optional, repeatable. May have a @role and a @separator which indicates the character used as a delimiter between words or tokens used in the slugline:

<slugline separator="-">US-AUTO-BAILOUT</slugline>

<subject>

Flex1 Concept Property

Optional, repeatable. Indicates the subject matter of the intended coverage. Child elements may be any from the Concept Definition Group and the Concept Relationships Group (see Concepts and Concept Items):

<subject qcode="medtop:20000304">
    <name>media</name>
subject>
<subject qcode="medtop:20000309">
    <name>news agency</name>
</subject>
<subject qcode="medtop:20000763">
    <name>IT/computer sciences</name>
</subject>

Or

<subject qcode="medtop:20000309">
    <name>news agency</name>
    <broader qcode="medtop:20000304">
        <name>Media</name>
    </broader>
</subject>
<subject qcode="medtop:20000763">
    <name>IT/computer sciences</name>
</subject>
LISTING 18: Planning Item at PCL

All Scheme Aliases used in listing below indicate IPTC NewsCodes vocabularies.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<planningItem
    xmlns="http://iptc.org/std/nar/2006-10-01/"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://iptc.org/std/nar/2006-10-01/
        ./NewsML-G2_2.25-spec-All-Power.xsd"
    guid="urn:newsml:iptc.org:20101025:gbmrmdreis4711"
    version="10"
    standard="NewsML-G2"
    standardversion="2.25"
    conformance="power"
    xml:lang="en">
    <catalogRef href="http://www.iptc.org/std/catalog/catalog.IPTC-G2-Standards_30.xml" />
    <catalogRef href="http://www.example.com/events/event-catalog.xml" />
    <itemMeta>
        <itemClass qcode="plinat:newscoverage" />
        <provider qcode="nprov:IPTC" />
        <versionCreated>2017-10-18T13:45:00Z
        </versionCreated>
        <pubStatus qcode="stat:usable" />
    </itemMeta>
    <contentMeta>
        <urgency>5</urgency>
        <contentCreated>2015-10-16T12:15:00Z</contentCreated>
        <contentModified>2017-10-16T13:35:00Z</contentModified>
        <headline>The 12 Days of Christmas</headline>
    </contentMeta>
        <newsCoverageSet>
                <newsCoverage id="ID_1234568" modified="2017-09-26T13:19:11Z">
                   <planning>
                       <g2contentType>application/nitf+xml</g2contentType>
                       <itemClass qcode="ninat:text"/>
                       <assignedTo
                           coversfrom="2017-12-24T06:00:00Z"
                           coversto="2017-12-24T23:00:00Z"
                           qcode="santastaff:ceo">
                           <name>Chief Elf Officer</name>
                       </assignedTo>
                       <scheduled>2017-12-24T23:30:00Z</scheduled>
                       <headline>All Wrapped Up in Lapland</headline>
                       <edNote>Text 250 words</edNote>
                   </planning>
                   <planning>
                       <g2contentType>application/nitf+xml</g2contentType>
                       <itemClass qcode="ninat:text"/>
                       <assignedTo
                           coversfrom="2017-12-24T23:00:00Z"
                           coversto="2017-12-25T12:00:00Z"
                           qcode="santastaff:santa">
                           <name>Santa Claus</name>
                       </assignedTo>
                       <scheduled>2017-12-25T06:30:00Z</scheduled>
                       <headline>Santa’s Sleigh Ride</headline>
                       <edNote>Text 250 words</edNote>
                   </planning>
                </newsCoverage>
                <newsCoverage id="ID_1234569" modified="2017-09-26T15:19:11Z">
                   <planning>
                       <g2contentType>image/jpeg</g2contentType>
                       <itemClass qcode="ninat:picture"></itemClass>
                       <scheduled>2017-12-25T00:00:00Z</scheduled>
                       <edNote>Picture will be Santa Claus departing with reindeer</edNote>
                   </planning>
                </newsCoverage>
        </newsCoverageSet>
</planningItem>

16.6. The <delivery> component

The optional <delivery> component tells the receiver which parts of the planned coverage has been delivered. The delivered item(s) are referenced by one or more <deliveredItemRef> elements, each of which points to a delivered Item.

The complementary <deliverableOf> property may be added to the <itemMeta> of the corresponding delivered Item. This enables receivers to track back from delivered content to a specific News Coverage component. Although providers should keep the item references synchronised, it provides a bi-directional method for receivers to track the deliverables of a Planning Item, for example, if a News Item is delivered before the associated Planning Item is updated.

16.7. Delivered Items - <deliveredItemRef>

A set of properties to identify, locate and describe the delivered Item.

16.7.1. Delivered Item Reference: Core Conformance

The permitted attributes of <deliveredItemRef> are listed in the table below. In addition, a child <title> element may be added as metadata "hint" to the receiver.

Property Type Notes/Example

@rel

QCode

Indicates the relationship between the current Planning Item and the target Item

@href

URL

Locator for the target resource:

href="http://example.com/2008-12-20/pictures/foo.jpg"

@residref

String

The provider’s identifier for the target resource, i.e. the @guid of the Item:

residref="tag:example.com,2008:PIX:FOO20081220098658"

@version

String

The version of the target resource: the @version of the Item

@contenttype

String

The Media Type of the target resource:

contenttype="image/jpeg"

@format

QCode

A refinement of the Media Type, taken from a CV

@size

String

Indicates the size of the target resource in bytes:

size="3764418"
LISTING 19: Planning Item with delivery at CCL

All Scheme Aliases used in listing below indicate IPTC NewsCodes vocabularies.

<?xml version="1.0" encoding="UTF-8"?>
<planningItem
    xmlns="http://iptc.org/std/nar/2006-10-01/"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://iptc.org/std/nar/2006-10-01/
        ./NewsML-G2_2.25-spec-All-Core.xsd"
    guid="urn:newsml:iptc.org:20101025:gbmrmdreis4711"
    version="10"
    standard="NewsML-G2"
    standardversion="2.25"
    conformance="core"
    xml:lang="en">
    <catalogRef href="http://www.iptc.org/std/catalog/catalog.IPTC-G2-Standards_30.xml" />
    <catalogRef href="http://www.example.com/events/event-catalog.xml" />
    <itemMeta>
        <itemClass qcode="plinat:newscoverage" />
        <provider qcode="nprov:IPTC" />
        <versionCreated>2017-10-18T13:45:00Z
        </versionCreated>
        <pubStatus qcode="stat:usable" />
    </itemMeta>
    <contentMeta>
    <urgency>5</urgency>
    <contentCreated>2016-10-16T12:15:00Z</contentCreated>
    <contentModified>2017-10-16T13:35:00Z</contentModified>
    </contentMeta>
        <newsCoverageSet>
                <newsCoverage id="ID_1234568" modified="2017-09-26T13:19:11Z">
                   <planning>
                       <edNote>Text 250 words</edNote>
                   </planning>
                </newsCoverage>
                <newsCoverage id="ID_1234569" modified="2017-09-26T15:19:11Z">
                   <planning>
                       <edNote>Picture scheduled 2017-12-25T12:0:00-05:00</edNote>
                   </planning>
                   <delivery>
                       <deliveredItemRef>
                           <title>Henry Robinson pictured today in New York</title>
                       </deliveredItemRef>
                   </delivery>
                </newsCoverage>
        </newsCoverageSet>
</planningItem>

16.7.2. Delivered Item Reference: Power Conformance

Additional attributes of <deliveredItemRef> may be used at PCL:

Property Type Notes/Example

@persistentidref

String

Reference to an element inside the target resource bearing an @id attribute, whose value must be persistent for all versions, i.e. for its entire lifecycle.

@validfrom, @validto

DateOpt
Time

Date range (with optional time) for which the Item Reference is valid:

validto="2010-11-20T17:30:00Z"

@id

XML ID

A local identifier for the <deliveredItemRef>

@creator

QCode

The person, organisation responsible for creating or editing this <deliveredItemRef> (i.e. not the referenced Item)

@modified

DateOpt
Time

The date with optional time that this <deliveredItemRef> was last changed

@xml:lang

BCP47

The language used for this <deliveredItemRef>

@dir

Enumeration

The directionality of text, either "ltr" or "rtl" (left-to-right or right-to-left)

@rank

Non-negative
integer

The rank of the <deliveredItemRef> among others in the Planning Item

Hint and Extension Point

At PCL, child elements from the NAR namespace or any other namespace may optionally be added. When using elements from the NAR, follow the rules set out in Adding Hints from the NAR namespace.

LISTING 20: Planning Item with delivery at PCL

All Scheme Aliases used in listing below indicate IPTC NewsCodes vocabularies.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<planningItem
    xmlns="http://iptc.org/std/nar/2006-10-01/"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://iptc.org/std/nar/2006-10-01/
        ./NewsML-G2_2.25-spec-All-Power.xsd"
    guid="urn:newsml:iptc.org:20101025:gbmrmdreis4711"
    version="10"
    standard="NewsML-G2"
    standardversion="2.25"
    conformance="power"
    xml:lang="en">
    <catalogRef href="http://www.iptc.org/std/catalog/catalog.IPTC-G2-Standards_30.xml" />
    <catalogRef href="http://www.example.com/events/event-catalog.xml" />
    <itemMeta>
        <itemClass qcode="plinat:newscoverage" />
        <provider qcode="nprov:IPTC" />
        <versionCreated>2017-10-18T13:45:00Z
        </versionCreated>
        <pubStatus qcode="stat:usable" />
    </itemMeta>
    <contentMeta>
        <urgency>5</urgency>
        <contentCreated>2016-10-16T12:15:00Z</contentCreated>
        <contentModified>2017-10-16T13:35:00Z</contentModified>
     </contentMeta>
    <newsCoverageSet>
        <newsCoverage id="ID_1234568" modified="2017-09-26T13:19:11Z">
            <planning>
                <g2contentType>application/nitf+xml</g2contentType>
                <itemClass qcode="ninat:text"/>
                <scheduled>2017-12-25T12:30:00-05:00</scheduled>
                <headline>Robinson: Must preserve bailout funds</headline>
                <edNote>Text 250 words</edNote>
            </planning>
        </newsCoverage>
        <newsCoverage id="ID_1234569" modified="2017-09-26T15:19:11Z">
            <planning>
                <g2contentType>image/jpeg</g2contentType>
                <itemClass qcode="ninat:picture" />
                <scheduled>2017-12-25T12:00:00-05:00</scheduled>
                <edNote>Picture will be Robinson at today's news conference</edNote>
            </planning>
            <delivery>
                <deliveredItemRef
                    guidref="tag:example.com,2008:ART-ENT-SRV:USA20081220098658">
                    <title>Henry Robinson pictured today in New York</title>
                </deliveredItemRef>
            </delivery>
        </newsCoverage>
    </newsCoverageSet>
</planningItem>

17. SportsML-G2

17.1. Introduction

This guide refers to SportsML-G2 v3.0. For implementation of earlier versions of SportsML refer to earlier versions of these guidelines. Please visit iptc.org/standards/sportsml-g2 for further links to SportsML-G2 resources. There is also a SportsML-G2 user group forum at https://groups.yahoo.com/neo/groups/sportsml/info for all those interested in SportsML-G2 wishing to seek advice and share experiences.

Sports fixtures and results have long been an important part of the output of news agencies and media organisations and this activity continues to grow in line with the increasing world-wide interest in sport.

Historically, providers had to balance the need for detailed information with the constraints imposed by having to deliver timely transmissions of data over the (then) low-speed data circuits available. This led to sparse plain text or field-delimited data feeds that required precise formatting and processing in order to produce the required output, which was driven by the space constraints of print media.

Today, the picture is very different. The rise of the Web combined with the emergence of a global sports industry has created a demand for more detailed results and statistics. Although timeliness remains a key priority, modern communications and processing power have removed many of the old restrictions.

The legacy formats are therefore no longer adequate, but if the response to modern needs is a proliferation of specialised data formats, this will over time make the exchange and application of sports data more costly and difficult.

SportsML-G2 is designed to offer a flexible, extensible framework built on the re-usable components of NewsML-G2 that can handle all types of sports information using standard technology, including:

  • Schedules (fixtures)

  • Results

  • Multi-media sports news

  • Standings (for example league tables, player order-of-merit, rankings)

  • Team statistics, including actions such as "fumbles", "tackles"

  • Player statistics, including career statistics and play statistics.

  • Match statistics, including "plays", or "actions"

  • Betting or wagering information

  • Venue information, including weather

Rather than try to fit all sports into a single generic model, special add-in modules enable widely-differing sports, such as golf, baseball and motor racing, to be accommodated within the standard framework.

17.2. Business Advantages

The NewsML-G2 data model is well suited to the exchange of sports information. By its nature, sport has many entities: teams, players, officials, leagues and so on that are routinely stored as structured data and can be used, updated, and re-used over time. These can be exchanged as Concepts and Knowledge Items.

Sports Events may therefore be modelled as actions and relationships involving these known entities, and by coding these in XML, the full value of this information can be harnessed using standard technology:

  • Information can be easily imported into databases

  • Using XSLT information can be transformed to specific format for the Web, print, mobile etc, or other XML formats.

  • The information can be used directly by dynamic applications using Java or similar tools.

  • Providers and receivers are not restricted in their choice of taxonomies. This is crucial in sport where value-added knowledge bases may be maintained by different data owners.

  • Extension points allow the standard to be adapted and customised to special needs within the standard framework

Finally, by building expertise around a single, extensible standard, information owners, providers, and consumers can benefit from reduced costs, greater reliability and faster time-to-market for compelling applications based on the rich variety of sports statistics and content that is increasingly available.

17.3. Structure

SportsML-G2 content is conveyed within NewsML-G2 Items, (that is News Items, Package Items, Concept Items, Knowledge Items and Planning Items).

The structure of a SportsML-G2 document conveying a single piece of content, or multiple renditions of the same content, matches that of a NewsML-G2 News Item, as shown in the diagram below.

The <newsItem>, <itemMeta> and <contentMeta> components are implemented as discussed in Quick Start: NewsML-G2 Basics, which readers are encouraged to study first. Although a full description of all of the many features of SportsML-G2 is beyond the scope of this Guide, this Chapter documents the implementation of specific SportsML-G2 features and shows by example how its use may be extended.