An updated version 2.27 of NewsML-G2 is available as Developer Release

  •  XML Schemas and the corresponding documentation are updated

Packages of version 2.27 files can be downloaded:

All changes of version 2.27 can be found on that page: http://dev.iptc.org/G2-Approved-Changes

The NewsML-G2 Implementation Guidelines are a web document now at https://www.iptc.org/std/NewsML-G2/guidelines

 

Reminder of an important decision taken for version 2.25 and applying also to version 2.27: the Core Conformance Level will not be developed any further as all recent Change Requests were in fact aiming at features of the Power Conformance Level,  changes of the Core Level were only a side effect.

The Core Conformance Level specifications of version 2.24 will stay available and valid, find them at http://dev.iptc.org/G2-Standards#CCLspecs  

The new Video Metadata Hub Recommendation 1.2 supports videos as delivered by professional video cameras by mapping their key properties to the common properties of the VMHub – see  https://iptc.org/std/videometadatahub/mapping/1.2 – the new mappings are shown in columns at the right end of the table.

IPTC developed the Video Metadata Hub as common ground for metadata across already existing video formats with their own specific metadata properties. The VMHub is comprised of a single set of video metadata properties, which can be expressed by multiple technical standards, in full as reference implementation in XMP, EBU Core and JSON. These properties can be used for describing the visible and audible content, rights data, administrative details and technical characteristics of a video.

The Recommendation 1.2 adds new properties for the camera device use for recording a video and for referencing an item of a video planning system. All properties are shown at https://iptc.org/std/videometadatahub/recommendation/1.2

“Chasing the SmartPhoto” is the theme of the IPTC Photo Metadata Conference 2018. In the day-long conference, session speakers will examine the image business in a changing environment as new technologies, new devices and Artificial Intelligence will be game changers in the coming  years. The Conference will be held in Berlin (Germany) on 31 May 2018. More details and how to register can be found at www.phmdc.org.

In the afternoon session titled “SmartPhotos and Smart Search Engines”, speakers from Google and Qwant will show how their search engines process photos found on the web and how they present search results. This session will include a discussion with conference participants about how photo businesses may critically perceive presentation of copyright protected photos.

”Protecting images Against Infringements” is the topic of another conference session. Publishing a photo on the Web opens up the possibility of anyone downloading and republishing it, without permission or paying for a license. Speakers from photo businesses and service providers will show how to implement copyright protection and how to track downloaded and reused photos using blockchain and other technologies.

The Photo Metadata Conference is organised by the International Press Telecommunications Council (IPTC, iptc.org), the body behind the industry standard  for administrative, descriptive, and copyright information about images. It brings together photographers, photo editors, managers of metadata and system vendors to discuss how technical means can help improving the use of images. The Conference is held in conjunction with the annual CEPIC Congress (www.cepic.org).

The International Press Telecommunications Council (IPTC) has named Brendan Quinn as its new managing director.

Brendan Quinn

Brendan Quinn

Quinn joins the IPTC with two decades of experience in managing technology for media companies. In June 2018, he will succeed Michael Steidl, who will retire this summer after 15 years with the organisation. IPTC made the announcement today at their Spring Meeting in Athens.

Quinn brings to the IPTC a vast well of real-life experience in media industry technology, including leading the team that crafted the Associated Press’ APVideoHub.com video syndication platform, implementing content management systems at Fairfax Media in Australia, and handling an array of architecture and R&D roles over nine years with the BBC.

“I’m very much looking forward to my new role as MD for IPTC,” Quinn said. “I have huge respect for the organisation, in fact one of my first open source projects as a developer was writing a Perl module for NewsML v1 back in 2001 while I was a developer in Australia. I’m very proud to now be able to take the lead on defining the role of the IPTC in the challenging environment now faced by the media industry.”

Stuart Myles, Chairman of the Board of IPTC and Director of Information Management at the Associated Press, said he was “thrilled” to welcome Quinn to the organisation.

“He brings with him a wealth of news technology experience, with organisations from around the world and of all sizes. He has a unique combination of strategic insight into the challenges faced by the news industry and the technical know-how to help guide our work in technical standards and beyond.”

IPTC develops technical standards that address challenges in the news and photo industries, and other related fields. Recent IPTC initiatives are the Video Metadata Hub for mapping metadata across multiple existing standards; a major revision of RightsML for expressing machine readable licenses, now aligned with the new W3C standard ODRL; and a comprehensive update of SportsML for covering more efficiently a wide range of sports results and statistics. The Media Topics taxonomy for categorizing news now provides descriptions in four major languages.

Quinn says he looks forward to meeting IPTC members and learning as much as he can about the organization’s standards and outreach work.

“From iconic standards such as IPTC Photo Metadata and NewsML-G2 to emerging standards work like the Video Metadata Hub,” he said, “the IPTC aims to stay relevant in a changing media climate.”

About IPTC:

The IPTC, based in London, brings together the world’s leading news agencies, publishers, and industry vendors. It develops and promotes efficient technical standards to improve the management and exchange of information between content providers, intermediaries, and consumers. The standards enable easy, cost-effective, and rapid innovation and include the Photo Metadata and the Video Metadata Hub standards, the news exchange formats NewsML-G2, SportsML-G2 and NITF, rNews for marking up online news, the rights expression language RightsML, and NewsCodes taxonomies for categorizing news.

IPTC: www.iptc.org
Twitter: @IPTC
Brandon Quinn: @brendanquinn
Stuart Myles: @smyles 

Tagging tool at The New York Times

The New York Times uses a software tool for rules-based categorization to assign metadata to content. This is followed by human supervised review and tagging. Source: The New York Times

 

By Jennifer Parrucci
Senior Taxonomist at The New York Times
Lead of IPTC’s NewsCodes Working Group

The New York Times has a proud history of metadata. Every article published since The Times’s inception in 1851 contains descriptive metadata. The Times continues this tradition by incorporating metadata assignment into our publishing process today so that we can tag content in real-time and deliver key services to our readers and internal business clients.

I shared an overview of The Times’s tagging process at a recent conference held by the International Press Telecommunications Council in Barcelona. One of the purposes of IPTC’s face-to-face meetings is for members and prospective members to gain insight on how other member organizations categorize content, as well as handle new challenges as they relate to metadata in the news industry.

Why does The New York Times tag content today?

The Times doesn’t just tag content just for tradition’s sake. Tags play an important role in today’s newsroom. Tags are used to create collections of content and send out alerts on specific topics. In addition, tags help boost relevance on our site search and send a signal to external search engines, as well as inform content recommendations for readers. Tags are also used for tracking newsroom coverage, archive discovery, advertising and syndication.

How does The New York Times tag content?

The Times employs rules-based categorization, rather than purely statistical tagging or hand tagging, to assign metadata to all published content, including articles, videos, slideshows and interactive features.

Rules-based classification involves the use of software that parses customized rules that look at text and suggest tags based on how well they match the conditions of those rules. These rules might take into account things like the frequency of words or phrases in an asset, the position of words or phrases, for example whether a phrase appears in the headline or lead paragraph, a combination of words appearing in the same sentence, or a minimum amount of names or phrases associated with a subject appearing in an asset.

Unlike many other publications that use rules-based classification, The Times adds a layer of human supervision to tagging. While the software suggests the relevant subject terms and entities, the metadata is not assigned to the article until someone in the newsroom selects and assigns tags from that list of suggestions to an asset.

Why does The Times use rules-based and human supervised tagging?

This method of tagging allows for more transparency in rule writing to see why a rule has or has not matched. Additionally it gives the ability to customize rules based on patterns specific to our publication. For example, The Times has a specific style for obituaries, whereby the first sentence usually states someone died, followed by a short sentence stating his or her age. This language pattern can be included in the rule to increase the likelihood of obituaries matching with the term “Deaths (Obituaries).” Rules-based classification also allows for the creation of tags without needing to train a system. This option allows taxonomists to create rules for low-frequency topics and breaking news, for which sufficient content to train the system is lacking.

These rules can then be updated and modified as a topic or story changes and develops. Additionally, giving the newsroom rule suggestions and a controlled vocabulary to choose from ensures a greater consistency in tagging, while the human supervision of the tagging ensures quality.

What does the tagging process at The New York Times look like?

Once an asset (an article, slideshow, video or interactive feature) is created in the content management system, the categorization software is called. This software runs the text against the rules for subjects and then through the rules for entities (proper nouns). Once this process is complete, editors are presented with suggestions for each term type within our schema: subjects, organizations, people, locations and titles of creative works. The subject suggestions also contain a relevancy score. The editor can then choose tags from these suggestions to be assigned to an article. If they do not see a tag that they know is in the vocabulary suggested to them, the editors have the option to search for that term within the vocabulary. If there are new entities in the news, the editors can request that they be added as new terms. Once the article is published/republished the tags chosen from the vocabulary are assigned to the article and the requested terms are sent to the Taxonomy Team.

The Taxonomy Team receives all of the tag requests from the newsroom in a daily report. Taxonomists review the suggestions and decide whether they should be added to the vocabulary, taking into account factors such as: news value, frequency of occurrence, and uniqueness of the term. If the verdict is yes, then the taxonomist creates a new entry for the tag in our internal taxonomy management tool and disambiguates the entry using Boolean rules. For example, there cannot be two entries both named “Adams, John” for the composer and the former United States president of the same name. To solve this, disambiguation rules are added so that the software knows which one to suggest based on context.

John Adams,_IF:{(OR,”composer”,”Nixon in China”,”opera”…)}::Adams, John (1947- )
John Adams,_IF:{(OR,”federalist”,”Hamilton”,”David McCullough”…)}:Adams, John (1735-1826)

Once all of these new terms are added into the system, the Taxonomy Team retags all assets with the new terms.

In addition to these term updates, taxonomists also review a selection of assets from the day for tagging quality. Taxonomists read the articles to identify whether the asset has all the necessary tags or has been over-tagged. The general rule is to tag the focus of the article and not everything mentioned. This method ensures that the tagging really gets to the heart of what the piece is about. When doing this review, taxonomists will notice subject terms that are either not suggesting or suggesting improperly. The taxonomist uses this opportunity to tweak the rules for that subject so that the software suggests the tag properly next time.

After this review of the tagging process at the New York Times, the Taxonomy Team compiles a daily report back to the newsroom that includes shoutouts for good tagging examples, tips for future tagging and a list of all the new term updates for that day. This email keeps the newsroom and the Taxonomy Team in contact and acts as a continuous training tool for the newsroom.

All of these procedures come together to ensure that The Times has a high quality of metadata upon which to deliver highly relevant, targeted content to readers.

Read more about taxomony and IPTC standard Media Topics.

Follow IPTC on LinkedIn andTwitter: @IPTC

Contact IPTC

The comprehensive NewsML-G2 Guidelines are available now in an updated version and anybody can read them on the web: https://www.iptc.org/std/NewsML-G2/guidelines/

What has been modified:

How to use the new Guidelines:

We welcome feedback on the format and the content of the Guidelines. Use the Contact Us form.

IPTC's AGM 2017 in Barcelona

Photo credit: Jill Laurinaitis

By Stuart Myles
Chairman of the Board of Directors, IPTC

IPTC holds face-to-face meetings in several locations throughout the year, although, most of the detailed work of the IPTC is now conducted via teleconferences and email discussions. Our Annual General Meeting for 2017 was held in Barcelona in November. As well as being the time for formal votes and elections, the AGM is a chance for the IPTC to look back over the last year and to look ahead about what is in store. What follows are a slightly edited version of my remarks at IPTC’s AGM 2017 in Barcelona.

IPTC has had a good year – the 52nd year for the organization!

We’ve updated our veteran standards, Photo Metadata – our most widely-used standard – and NewsML-G2 – our most comprehensive XML standard, marking its 10th year of development.

We’re continuing to work in partnership with other organizations, to maximize the reach and benefits of our work for the news and media industry. In coordination with CEPIC we organized the 10th annual Photo Metadata Conference, looking to the future of auto tagging and search, examining advanced AI techniques – and considering both their benefits and their drawbacks for publishers. With the W3C we have crafted the ODRL rights standard and are launching plans to create RightsML as the official profile of the ODRL standard, endorsed by both the IPTC and W3C.

We’ve also tackled problems that matter to the media industry with technology solutions which are founded on standards, but go beyond them. The Video Metadata Hub is a comprehensive solution for video metadata management that allows exchange of metadata over multiple existing standards. The EXTRA engine is a Google DNI sponsored project to create an open source rules based classification engine for news.

We’ve had some changes in the make-up of IPTC. Johan Lindgren of TT joined the Board. Bill Kasdorf has taken over as the PR Chair. And we were thrilled to add Adobe as a voting member of IPTC, after many years of working together on photo metadata standards. Of course, with more mixed emotions, we have also learnt that Michael Steidl, the IPTC Managing Director, for 15 years will retire next Summer. As has been clear throughout this meeting and, indeed, every day between the meetings on numerous emails and phone calls, Michael is the backbone of the work of the IPTC. Once again, I ask you to join me in acknowledging the amazing contributions and dedications that Michael displays towards the IPTC.

Later today, we will discuss in detail our plans to recruit a successor for the crucial role of the Managing Director. And this is not the only challenge that the IPTC faces. We describe ourselves as “the global standards body of the news media” and that “we provide the technical foundation for the news ecosystem”. As such, just as the wider news industry is facing a challenging business and technical environment, so is the IPTC.

During this meeting, we’ve talked about some of the technical challenges – including the continuing evolution of file formats and supporting technologies, whilst many of us are still working to adopt the technologies from 5 or 10 year ago. We’ve also talked about the erosion of trust in media organizations and whether a combination of editorial and technical solutions can help.

But I thought I would focus on a particular shift in the business and technical environment for news that may well have a bigger impact than all of those. That shift can be traced back to 2014 which, by coincidence, is when I became Chairman of the IPTC. Last week, Andre Staltz published an interesting and detailed article called “The Web Began Dying in 2014, Here’s How“. If you haven’t read it, I recommend it. The article makes a number of interesting points and backs them up with numerous charts and statistics. I will not attempt to summarize the whole thing, but a few key points are worth highlighting.

Staltz points out that, prior to 2014, Google and Facebook accounted for less than 50% of all of the traffic to news publisher websites. Now those two companies alone account for over 75% of referral traffic. Also, through various acquisitions, Google and Facebook properties now share the top ten websites with news publishers – in the USA 6 of the 10 most popular websites are media properties. In Brazil it is also 6 out of 10. In the UK it is 5 out of 10. The rest all belong to Facebook and Google.

Both Facebook and Google reorganized themselves in 2014, to better focus on their core strengths. In 2014, Facebook bought Whastapp and terminated its search relationship with Bing, effectively relinquishing search to Google and doubling down on social. Also in 2014, Google bought DeepMind and shutdown Orkut, its most successful social product. This, along with the reorganization into Alphabet, meant that Google relinquished social to Facebook and allowing it to focus on search and – even more – artificial intelligence. Thus, each company seems happy to dominate their own massive parts of the web.

But … does that matter to media companies? Well, Facebook said if you want optimal performance on our website, you must adopt Instant Articles. Meanwhile, Google requires publishers to use its Accelerated Mobile Pages or “AMP” format for better performance on mobile devices. And, worldwide, Internet traffic is shifting from the desktop to mobile devices.

Then, if you add in Amazon, Apple and Microsoft, it is clear that another huge shift is going on. All of the Frightful Five are turning away from the Web as a source of growth and instead turning to building brand loyalty via high end devices. Following the successful strategy of Apple, they are all becoming hardware manufacturers with walled gardens. Already we have Siri, Cortana, Alexa and Google Home. But also think about the investments going on by these companies in AR and VR as ways to dominate social interactions, e-commerce and machine learning over the Internet.

So, just as news companies must confront these shifts in the global business and technology environment, so must the IPTC. During this meeting, we’ve talked about our initial efforts to grapple with metadata for AR, VR and 360 degree imagery. We’ve also discussed techniques which are relevant to news taxonomy and classification, including machine learning and artificial intelligence. At the same time, Facebook, Google and others are not totally in control, as they – along with Twitter – found themselves having to explain the spread of disinformation on their platforms and under increased government scrutiny, particular in the EU. So, all of us, whether we describe ourselves as news publishers or not, are dealing with a rapidly changing and turbulent information, technical and business environment.

What does this mean for IPTC? IPTC is a news technology standards organization. But it is also unique in that we are composed of news companies from around the world. We know from the membership survey that both of these factors – influence over technical solutions and access to technology peers from competitors, partners, diverse organizations large and small – are very important to current members. In order to prosper as an organization, IPTC needs to preserve these unique benefits to members, but also scale them up. This means that we need to find ways to open up the organization in ways that preserve the value of the IPTC and fit with the mission, but also in ways that are more flexible. We need to continue to move beyond saying that the only thing we work on is standards and instead use standards as a component of the technical solutions we develop, as we are doing with EXTRA and the Video Metadata Hub. We need to work with diverse groups focused on solving specific business and journalistic problems – such as trust in the media – and in helping news companies learn the best ways to work with emerging technologies, whether it is voice assistants, artificial intelligence or virtual reality.

I’m confident that – working together – we can continue to reshape the IPTC to better meet the needs of the membership and to move us further forward in support of solving the business and editorial needs of the news and media industry. I look forward to working with all of you on addressing the challenges in 2018 and beyond.

Stuart Myles is the Director of Information Management at Associated Press.

An updated version 2.26 of NewsML-G2 is available as Developer Release

  •  XML Schemas and the corresponding documentation are updated
  • the Structure Matrix Excel sheet is updated

Packages of version 2.26 files can be downloaded:

All changes of version 2.26 can be found on that page: http://dev.iptc.org/G2-Approved-Changes

Reminder of an important decision taken for version 2.25 and applying to version 2.26 too: the Core Conformance Level will not be developed any further as all recent Change Requests were in fact aiming at features of the Power Conformance Level,  changes of the Core Level were only a side effect.

The Core Conformance Level specifications of version 2.24 will stay available and valid, find them at http://dev.iptc.org/G2-Standards#CCLspecs  

Image: Liuzishan

By Johan Lindgren

The Sports Content Working Group of IPTC started in the early 2000’s, initially to develop the XML standard SportsML. But the group has evolved to handle many aspects of reporting sports in the news.

The initial big question for news organisations handling sports is to decide if it should be handled as text or as data. The sports articles have, obviously, more in common with articles about other subjects. It is the results, schedules, statistics and standings that provide the dilemma. You can choose to provide the results ready for display on screen or on paper. Or you can provide the results as detailed marked up data and let the receiver handle the formatting, depending on purpose.

In fact, with using both NewsML-G2 and SportsML from IPTC you can provide both variants in parallel, if you wish so. In a NewsML-G2 news item as wrapper you provide one rendition of the content with the results as data in SportsML markup, and in another rendition you provide the same results, but in a displayable format like HTML5.

Vocabularies and Media Topics

Another big issue in handling sports data is knowing all the terms, what they mean and how they are used. The people in the sports group have spent a lot of time on this and provide very extensive vocabularies. Some are found in the Media Topics, maintained by the NewsCodes Working Group of IPTC. The same is true for the new addition to this, called facets. Facets refine the semantics of a Media Topic.

Example: If you try to combine Nordic skiing, female, relay, freestyle, 4×5 km as constituting one combined Media Topic and think of all the variations resulting from alternates to those terms, and then expand that thought to all sports events, the number of Media Topics will be overwhelming. Instead, IPTC chose to minimize the number of Media Topics and instead create a system of facets that qualify these broader topics. So, for example, “male” and “female” can apply to many, many sport competition topics, eliminating the need to create separate Media Topic terms for all of them.

About SportsML

Apart from the topics and their facets there is a huge number of metadata property values maintained by the sports group. These values are listed in 113 vocabularies (they can be downloaded), 37 of them are used for the core of SportsML and the other 76 are used for sport-specific additions. In total there are 1,850 values defined and listed as concepts in 113 knowledge items. The list of metadata values and their explanations is fundamental know-how in the sports reporting. You can have names and definitions in several languages.

Example of a code saying the player started the game on the field:

<concept id=”spplayerstatusstarter”>

<conceptId qcode=”spplayerstatus:starter” />

<name xml:lang=”en-US”>starter</name><name xml:lang=”en-GB”>starter</name>

<definition xml:lang=”en-GB”>A member of the lineup that enters the field at the commencement of play.</definition></concept>

SportsML is used by news organisations around the world both for everyday sport reporting and big events. BBC, for example, built their handling of the Olympic results in London around SportsML. It is also used by organisers of so-called fantasy sports leagues. Even by just using the core you can handle most normal news reporting of all sports events and competitions. There are also plugins for eleven sports, when you want to handle very in-depth data of these sports. And more plugins can be added. There are also ways to extend the standard with your own values or constructs. When developing SportsML the aim has always been to handle things in the core if the things are applicable to more than one sport. But some things are very specific to one sport and will instead be placed in its own schema which is imported and linked in proper places.

To illustrate this we can use this snippet from a soccer game:

 <team-stats score=”0″ score-opposing=”2″ event-outcome=”speventoutcome:loss”>

 <team-stats-soccer line-formation=”433″>

 <stats-soccer-offensive corner-kicks=”2″/>

The first line is general with the score and outcome. But the two other lines are soccer-specific with a line-formation and the number of corner-kicks this team shot in this game.

SportsML for JSON

Up until now SportsML has mainly been serialized using XML. But with increasing interest in JSON the sports group is working on also providing a schema of SportsML for JSON usage. The work is close to being ready for the first public release. Some details of the schema need to be finalized and then the Working Group provide samples and some tools. We’re hoping to have this ready to release by early 2018.

The release of 3.0 of SportsML in XML also provided some tools (see our Github repository), mainly to transform between the earlier version, 2.2, and 3.0. One of the big developments in 3.0 was the possibility to handle statistics either in generic structures or in specific structures. So there are tools to transform between the two variants. To show this we can compare the above soccer example with the similar generic sample:

<stat stat-type=”spsocstat:line-formation” value=”433″/>

<stat class=”spct:offense”  stat-type=”spsocstat:corner-kicks” value=”2″/>

As you see the attribute names become type-values in the generic stat-construction.

The work in the Sports Content Group is completely done by volunteers. The members of the group work in the news business and contribute to the group as much as their work allows. We welcome all interested persons, e.g. by joining our public discussion forum. The more people who can contribute the better, and there seem to be a never-ending flow of interesting topics when you start talking about sports data.

Johan Lindgren is the Chair of the Sports Content Working Group and a developer at TT Nyhetsbyrån, Sweden.

 

Our current Managing Director Michael Steidl is retiring in the summer of 2018 after 15 years of dedicated service to the IPTC. The IPTC is now seeking applicants to be the next Managing Director of our organization. Feel free to contact Michael (office@iptc.org) or our Chair, Stuart Myles, (chair@iptc.org) if you have any questions.

Job Opportunity: Managing Director of IPTC

For more information about the position, read the full Managing Director profile.

The IPTC is seeking applications for the next Managing Director of our organization. We are the global standards body of the news media and provide the technical foundation for the news ecosystem. Our mission is to simplify the distribution of information. We develop and promote efficient technical standards to improve the management and exchange of information between content providers, intermediaries and consumers.

The Managing Director, working with the Board and Membership, seeks to broaden adoption of IPTC standards, to maximize information sharing between members and to collaborate with partners across the news and media industry. The ideal applicant will have a strong record of working with news technology, presenting a positive image of an organization and administering a membership or non profit organization.

Learn more about IPTC and read the full Managing Director profile.

Applicants should email a letter of interest, curriculum vitae and contact information for three references to office@iptc.org. We will start considering applications on 1 December 2017.