Quick Start: Text
1. Introduction
One of the most fundamental needs of a news organisation is to handle text. This chapter covers the basics of a simple NewsML-G2 News Item containing text content.
We recommend reading the Quick Start Guide to NewsML-G2 Basics before this Quick Start Guide. |
2. Example
Below is an example story and supporting information as might be displayed on the journalist’s editing screen at a fictional news provider, Acme News and Media (ANM):
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
This screen contains nearly all of the information needed to create the NewsML-G2 document below:
(All Scheme Aliases used in listing below indicate IPTC NewsCodes vocabularies, except for the following alias values: geoloc, is)
<?xml version="1.0" encoding="UTF-8" ?>
<newsItem
xmlns="http://iptc.org/std/nar/2006-10-01/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://iptc.org/std/nar/2006-10-01/
./NewsML-G2_2.24-spec-All-Core.xsd"
guid="urn:newsml:acmenews.com:20161121:US-FINANCE-FED"
version="12"
standard="NewsML-G2"
standardversion="2.24"
xml:lang="en-US">
<catalogRef
href="http://www.iptc.org/std/catalog/catalog.IPTC-G2-Standards_34.xml" />
<catalogRef
href="http://catalog.acmenews.com/news/ANM_G2_CODES_2.xml" />
<rightsInfo>
<copyrightHolder uri="http://www.acmenews.com/about.html#copyright">
<name>Acme News and Media LLC</name>
</copyrightHolder>
<copyrightNotice>Copyright 2018-19 Acme News and Media LLC</copyrightNotice>
</rightsInfo>
<itemMeta>
<itemClass qcode="ninat:text" />
<provider uri="http://www.acmenews.com/about/" />
<versionCreated>2019-11-21T16:25:32-05:00</versionCreated>
<pubStatus qcode="stat:usable" />
</itemMeta>
<contentMeta>
<contentCreated>2016-11-21T15:21:06-05:00</contentCreated>
<contentModified>2019-11-21T16:22:45-05:00</contentModified>
<located qcode="geoloc:NYC">
<name>New York, NY</name>
</located>
<creator uri="http://www.acmenews.com/staff/mjameson">
<name>Meredith Jameson</name>
</creator>
<infoSource qcode="is:AP">
<name>Associated Press</name>
</infoSource>
<language tag="en-US" />
<subject qcode="medtop:04000000">
<name>economy, business and finance</name>
</subject>
<subject qcode="medtop:20000350">
<name>central bank</name>
</subject>
<subject qcode="medtop:20000379">
<name>money and monetary policy</name>
</subject>
<slugline>US-Finance-Fed</slugline>
<headline> Fed to halt QE to avert "bubble"</headline>
</contentMeta>
<contentSet>
<inlineXML contenttype="application/nitf+xml">
<nitf xmlns="http://iptc.org/std/NITF/2006-10-18/">
<body>
<body.head>
<hedline>
<hl1>Fed to halt QE to avert "bubble"</hl1>
</hedline>
<byline>By Meredith Jameson, <byttl>Staff Reporter</byttl></byline>
</body.head>
<body.content>
<p>(New York, NY - October 21) Et, sent luptat luptat, commy
Nim zzriureet vendreetue modo
dolenis ex euisis nosto et lan ullandit lum doloreet vulla
feugiam coreet, cons eleniam il ute facin veril et aliquis ad
minis et lor sum del iriure dit la feugiamcommy nostrud min ulla
autpat velisl duisismodip ero dipit nit utpatum sandrer cipisim
nit lortis augiat nulla faccum at am, quam velenis nulput la
auguerostrud magna commolore eliquatie exerate facilis
modiamconsed dion henisse quipit at. Ut la feu facilla feu
faccumsan ecte modoloreet ad ex el utat.
</p>
<p>Ugiating ea feugait utat, venim velent nim quis nulluptat num
Volorem inci enim dolobor eetuer sendre ercin utpatio dolorpercing
Et accum nullan voluptat wisis alit dolessim zzrilla commy nonulpu
tpatinis exer sequatueros adit verit am nonse exerili quismodion
esto cons dolutpat, si.
</p>
</body.content>
</body>
</nitf>
</inlineXML>
</contentSet>
</newsItem>
3. Document structure
The building blocks of the text document shown above are the <newsItem> root element, with additional wrapping elements for metadata about the News Item (itemMeta), metadata about the content (contentMeta) and the content itself (contentSet). The top level (root) element <newsItem> attributes are:
<newsItem xmlns="http://iptc.org/std/nar/2006-10-01/"
guid="urn:newsml:acmenews.com:20161121:US-FINANCE-FED"
version="12"
standard="NewsML-G2"
standardversion="2.24"
xml:lang="en-US">
The @conformance attribute is not present, therefore the conformance defaults to "core". Consequently, the @standardversion MUST be NO LATER than 2.24, the last version of NewsML-G2 for which a Core Conformance XML Schema is available. |
This is followed by references to the Catalogs used to resolve QCodes in the Item, and Rights information:
<catalogRef
href="http://www.iptc.org/std/catalog/catalog.IPTC-G2-Standards_34.xml"
/>
<catalogRef
href="http://catalog.acmenews.com/news/ANM_G2_CODES_2.xml" />
<rightsInfo>
<copyrightHolder uri="http://www.acmenews.com/about.html#copyright">
<name>Acme News and Media LLC</name>
</copyrightHolder>
<copyrightNotice>Copyright 2018-19 Acme News and Media LLC</copyrightNotice>
</rightsInfo>
3.1. Item Metadata <itemMeta>
Note the three mandatory child elements of the mandatory<itemMeta>:
-
Item Class
-
Provider
-
Version Created
A publication status is also mandatory, but the <pubStatus> element may be omitted, in which case the publication status is "usable". However, it is recommended that the publication status is explicitly given, as in this example. As Acme News & Media is fictional, the Provider property does not use one of the IPTC Provider NewsCodes, and is expressed by a URI:
<itemMeta>
<itemClass qcode="ninat:text" />
<provider uri="http://www.acmenews.com/about.html" />
<versionCreated>2019-11-21T16:25:32-05:00</versionCreated>
<pubStatus qcode="stat:usable" />
</itemMeta>
3.2. Content Metadata <contentMeta>
3.2.1. Administrative Metadata
The administrative properties of the example text story are:
<contentCreated>2016-11-21T15:21:06-05:00</contentCreated>
<contentModified>2019-11-21T16:22:45-05:00</contentModified>
The place that the content was created uses the <located> element:
<located qcode="geoloc:NYC">
<name>New York, NY</name>
</located>
(Note that this is where the story was written, not the place where the subject of the story took place. That would be expressed using <subject>, part of Descriptive Metadata.)
The author of the article is expressed using the <creator> element:
<creator uri="http://www.acmenews.com/staff/mjameson">
<name>Meredith Jameson</name>
</creator>
The Information Source for the article is also given. When used without a @role, <infoSource> is used to denote the person or party that provided the original information on which the content is based. This is the relationship to be expressed here:
<infoSource qcode="is:AP">
<name>Associated Press</name>
</infoSource>
The default language for the content is given as U.S. English:
<language tag="en-US" />
3.2.2. Descriptive Metadata
In the example, the Subject properties use QCodes from the Controlled Vocabulary of Media Topics NewsCodes that are owned and maintained by the IPTC and expressed as QCodes. Thus:
<subject qcode="medtop:04000000">
<name>economy, business and finance</name>
</subject>
<subject qcode="medtop:20000350">
<name>central bank</name>
</subject>
<subject qcode="medtop:20000379">
<name>money and monetary policy</name>
</subject>
The <slugline> property contains the value of the "Slugline" field of the story:
<slugline>US-Finance-Fed</slugline>
In a similar fashion, the <headline> property will contain the value of the "Headline" field:
<headline>Fed to halt QE to avert "bubble"</headline>
3.2.3. Complete Content Metadata
<contentMeta>
<contentCreated>2016-11-21T15:21:06-05:00</contentCreated>
<contentModified>2019-11-21T16:22:45-05:00</contentModified>
<located qcode="geoloc:NYC">
<name>New York, NY</name>
</located>
<creator uri="http://www.acmenews.com/staff/mjameson">
<name>Meredith Jameson</name>
</creator>
<infoSource qcode="is:AP">
<name>Associated Press</name>
</infoSource>
<language tag="en-US" />
<subject qcode="medtop:04000000">
<name>economy, business and finance</name>
</subject>
<subject qcode="medtop:20000350">
<name>central bank</name>
</subject>
<subject qcode="medtop:20000379">
<name>money and monetary policy</name>
</subject>
<slugline>US-Finance-Fed</slugline>
<headline>Fed to halt QE to avert "bubble"</headline>
</contentMeta>
4. Text content choices
4.1. Inline XML
The content of the NewsML-G2 document is enclosed by the <contentSet> wrapper. In the example, IPTC’s news text mark-up language NITF (News Industry Text Format) is used to format the text content. As an XML standard, it is contained in an <inlineXML> child element of <contentSet>, and uses @contenttype to denote the XML-based standard, using the IANA Media Type.
HTML5 and XHTML are also a popular text mark-up choices among NewsML-G2 providers. As alternatives, the contents of <inlineXML> may be any XML language that can express generic or specialised news information, including SportsML-G2 and rNews. Other languages such as XBRL (Extended Business Reporting Language) may also be used. The content inside <inlineXML> must be valid XML, in other words, it could stand alone as a valid XML document in its own namespace.
<contentSet>
<inlineXML contenttype="application/nitf+xml">
<nitf xmlns="http://iptc.org/std/NITF/2006-10-18/">
<!--STORY CONTENT HERE -->
</nitf>
</inlineXML>
</contentSet>
4.2. Inline data
The <inlineData> wrapper element holds plain-text or base64 encoded content. Plain text or CDATA content MUST be identified by the "text/plain" content type. Binary content, like images, audio clips or even PDF or Word documents may be exchanged after proper encoding, but it is strongly recommended to use this structure for small assets only. The encoding algorithm MAY be indicated using the encoding attribute. The following example uses plain text:
<contentSet>
<inlineData contenttype="text/plain">__
Et, sent luptat luptat ...
</inlineData>
</contentSet>