IPTC is looking for software developers to design, develop, document and test EXTRA, an open source rules-based classification engine for news. First preference will be given to applications received by 21st October 2016, and review will continue until the positions are filled.
“Classification” means assigning one or more categories to the text of a news document. Rules-based classifiers use a set of Boolean rules, rather than machine-learning or statistical techniques, to determine which categories to apply.
EXTRA is the EXTraction Rules Apparatus, a multilingual open-source platform for rules-based classification of news content. IPTC was awarded a grant of €50,000 from the first round of Google’s Digital News Initiative Innovation Fund to build and freely distribute the initial version of EXTRA. DNI granted IPTC €50,000 for the entire project.
We are working with news providers to supply sets of news documents and with linguists to write rules to classify the documents. IPTC is looking for qualified developers to create the rules engine to accurately and efficiently categorize the documents using the rules.
Please consult this page for more information and to let us know if you’re interested in being considered.
The IPTC NewsCodes family of controlled vocabularies has a new member: Product Genre.
The Product Genre vocabulary was developed at the request of the broadcast industry. A broad category of terms was needed – one that specifies the kind of content by media product type – in addition to metadata that describes the content. The Product Genre scheme includes terms such as comedy, drama, entertainment, travel and sport.
NewsCodes are sets of concepts created and maintained by the IPTC, also known as controlled vocabulary or taxonomy. They are assigned as metadata values to news objects like text, photographs, graphics, audio and video files and streams. This allows for a consistent coding of news metadata across news providers and over the course of time.
The Product Genre vocabulary was an idea initiated by Andy Read, IPTC delegate and BBC’s Service Development and Delivery Manager for News, who has worked with IPTC for more than 20 years. This was based on feedback from broadcast members that highlighted the value of the forum engagements in driving the progression of the data set.
“There was a need to extend the breadth of the controlled vocabularies,” said Read. “The new Product Genre vocabulary codes describe the type of program itself, and help to broaden the program to a wider audience and general TV/broadcast industry.”
NewsCodes vocabularies can be very specific. A broader category like Product Genre allows identification of an entire broadcast program or package – not just smaller segments. For example, a longer 60-minute program overview about Syria’s war can be coded according to Product Genre – supplemented by metadata specific to a minute-long clip about a possible chemical attack, in the context of the larger news program.
“The Product Genre needed to be added to help facilitate use of these codes with IPTC’s NewsML-G2 standards,” said Read.
The new Product Genre vocabulary is also beneficial on the business side, said Jennifer Parrucci, senior taxonomist for the New York Times.
“Advertising is often sold based on the type of program – not necessarily subject tags or more specific terms,” Parrucci said. “The Product Genre vocabulary identifies advertising opportunities at a more comprehensive level.”
The IPTC NewsCodes Working Group, chaired by Parrucci, collaborated to define the vocabulary terms, based on concrete examples and actual TV programs. For each Concept identifier and name, a definition is listed. The notes section gives an example of what that Concept describes, for clarity and accurate use.
Any NewsCode provided by the IPTC can be used at any stage of a news workflow, without any royalty fee. But if one includes IPTC NewsCodes into an application, the intellectual property and the copyright of the IPTC must be explicitly attributed.
Interesting stats and info about the International Press Telecommunications Council’s technical standards for exchange of news information:
1.) The International Press Telecommunications Council publishes 14+ technical standards that are intended for the business-to-business exchange of news among news agencies, other news providers and publishers.
2.) At least one or two IPTC standards are in use at virtually every newspaper and news web site in the world.
Publishers use IPTC standards to save money and improve the ability of their news products to be used by customers.
3.) IPTC standards for news exchange are available for downloading at no cost – and there are no royalties or fees.
The only source of income for IPTC is membership dues. Membership currently consists of more than 50 organizations and individuals worldwide.
4.) All IPTC standards are designed to be independent of any specific language.
Although our publications are written in English and meetings are conducted in English, every recent standard is usable by any written language that is supported by Unicode.
5.) More than 70 software applications support IPTC Standards.
Software developers seamlessly integrate IPTC standards into their products – often in subtle ways that are not obvious to customers.
It’s an Olympic year for IPTC’s SportsML 3.0 standard, the recently released update to the most comprehensive tech-industry XML format for sports data.
“We figured, why not use the latest technology available?” said Trond Husø, system developer for NTB, who worked on the standard’s update, released in July. “SportsML 3.0’s use of controlled vocabularies for sport competitions and other subjects now provides many benefits, including more flexibility. Storing results is also more convenient.”
SportsML 3.0 is the ideal structure and back-end solution used by many major news organizations because it is the only open global standard for scores, schedules, standings and statistics. “It saves the time and cost of developing an in-house structure,” said Husø, also a member of IPTC’s Sports Content Working Party.
The Rio Games, which will host about 10,500 athletes from 206 countries, for 17 days and 306 events, are revolutionary for big data and new approaches for managing it. For the first time, the International Olympic Committee (IOC) used cloud-based solutions for work processes including volunteer recruitment and accreditation.
And consider the experimental technologies and apps launched by key broadcasters and Olympic Broadcasting Services, the Olympic committee responsible for coordinating TV coverage of the Games: virtual reality footage, online streaming, automated reporting, drone cameras, and Super-High Vision, which is supposedly 16 times clearer than HD.
Billions of Olympic spectators worldwide have naturally come to expect real-time results and accurate scores to be delivered to them, with a side of historical perspective. All with little thought as to how the information reaches the public, be it via tickers on websites, graphic stats on TV screens, or factoids offered by commentators.
Schedules, competitors’ names, bio information, times, rankings, medalists – how does all of this data get served up so quickly and uniformly among networks and news services? And how does it get integrated into existing news systems, namely SportsML 3.0?
It starts with the IOC – the non-profit, non-governmental body that organizes the Olympic Games and Youth Olympic Games. They act as a catalyst for collaboration for all parities involved, from athletes, organiser committees, and IT, to broadcast partners and United Nations agencies. The IOC generates revenue for the Olympic Movement through several major marketing efforts, including the sale of broadcast rights.
The IOC produces the Olympic Data Feed (ODF), the repository of live data about past and current games. The IOC is responsible for communicating the official results; they use the specific ODF format for their ODF data.
Paying media partners sign a licensing agreement to use ODF, to report on results through their own channels, and build new apps, services and analysis tools.
The goal of ODF is to define a unified set of messages valid for all sports and several different news systems – so that all partners are receiving the same data, at the same time. It was introduced for the Vancouver Games in 2010 and is an ongoing development effort.
According to the IOC’s website, ODF plays the part of messenger. From a technical standpoint, the data is machine-readable. ODF sends sports information from the moment it is generated to its final destination via Extensible Markup Language (XML). XML, a framework for storing metadata about files, is a flexible means to electronically share structured data via the Internet, as well as via corporate networks.
IPTC’s SportsML 3.0 easily imports data from ODF. Using SportsML to structure the ODF’s data is a broad and comprehensive solution to approaching all sports and competitions worldwide. ODF has identifiers for sports and awards (gold, silver, and bronze medals) executed at the Olympic Games; sports outside of ODF are identified by vocabulary terms of SportsML.
“SportsML 3.0 provides one structure for the data for developers to work in,” said Husø. “The structure will be the same, even if there are changes to ODF in future Olympic Games; the import and export process of the data will not change.”
Among content providers that use SportsML (various versions) are NTB, AP mobile (USA), BBC (UK), ESPN (USA), PA – Press Association (UK), Univision (USA, Mexico), Yahoo! Sports (USA), and Austria Presse Agentur (APA) (Austria), and XML Team Solutions (Canada).
SportsML 3.0 is based on its parent standard, NewsML-G2, the backbone of many news systems, and a single format for exchanging text, images, video, audio news and event or sports data – and packages thereof. SportsML 3.0 is fully compatibility with IPTC G2 structures.
Media Topics is an IPTC standard – a 1,100-term taxonomy with a focus on categorizing text. Released in 2010 as a development based on the IPTC Subject Codes, use of Media Topics is free and available in different formats. They can be viewed on the IPTC Controlled Vocabulary server, or in a user-friendly tree hierarchy tool.
IPTC creates and maintains taxomonies and controlled vocabularies – to assign terms as metadata values to news objects like text, photographs, graphics, audio and video files and streams. This allows for a consistent coding of news metadata across news providers, over the course of time.
“The idea of semantic mapping and being involved in a linked data initiative like Wikidata is a natural step for IPTC,” said Jennifer Parrucci, chair of the IPTC NewsCodes Working Group and senior taxonomist for The New York Times. “When linking an existing taxonomy to another, Wikidata serves as a central point of reference.”
Wikidata is a free, collaborative, multilingual knowledge base that can be read and edited by both humans and machines. It provides centralized storage for an access to structured data for all Wikimedia projects, as well as for use on external websites.
In total about 100 mappings from Media Topics to Wikidata have been manually applied. The mappings use SKOS mapping relationships.
Media Topics began with the Subject Codes vocabulary and extended the tree from 3 to 5 levels and reused the same 17 top-level terms. The lower-level terms have been revised and rearranged. Each Media Topic provides a mapping back to one of the Subject Codes.
The International Press Telecommunications Council (IPTC) is close to finalizing a new recommendation for video standards: the IPTC Video Metadata Hub.
The Video Metadata Working Group, which is comprised of members worldwide from news organisations, vendors and experts in the metadata field, is planning to vote on a recommendation of the Video Metadata Hub (VMD Hub) at the IPTC Autumn Meeting, 24 – 26 October 2016, in Berlin. The final Draft #4 has been published for a last round of reviews: http://dev.iptc.org/Video-Metadata.
Because there are several different existing standards for video – for compressing video and audio, file formats and different schemas of metadata properties – IPTC is presenting a “hub” recommendation that covers many use cases and exchange of metadata over multiple standards.
The VMD Hub is comprised of a single set of video metadata properties, which can be expressed by multiple technical standards (namely XMP for metadata embedded into binary video files, and EBU Core for non-embedded metadata stored in sidecar files). These properties can be used for describing the visible and audible content, rights data, administrative details and technical characteristics of a video.
Likewise, the VMD Hub supports workflow, exchange of metadata, and search functions across other existing standards, and will include mapping to Apple Quicktime, PBCore, MPEG7 and Schema.org, and perhaps more in the future.
“Users of videos of different standards told IPTC they need a common ground in metadata for efficient workflows,” said Michael Steidl, Managing Director of IPTC. “This is what we deliver now with the Video Metadata Hub.”
The IPTC Autumn Meeting will feature a Video Day on 25 October. In addition to the presentation about the VMD Hub, speakers from video makers, video suppliers, video content publishers and system vendors will discuss how video workflows can be improved.
For information about attending the IPTC Autumn Meeting and Video Day, contact us.
IPTC has secured funding and the foundation for language and technical requirements for its EXTRA Project – a rules-based classification system, as reported at IPTC’s Summer Meeting 2016 by Stuart Myles, project lead and IPTC Chairman of the Board.
EXTRA is the EXTraction Rules Apparatus, a multilingual open-source platform for rules-based classification of news content. EXTRA will allow newsrooms to automatically annotate news content with high-quality metadata subjects using a predefined set of rules. IPTC was awarded a grant from the first round of Google’s Digital News Initiative Innovation Fund to build and freely distribute the initial version of EXTRA.
The EXTRA project team has delivered a road map for the project to Google’s Digital News Initiative, and are finalizing their plans for language requirements and rules, as well as technical requirements and licensing. IPTC will approach existing open source communities, linguists and programmers to facilitate development.
For easy adoption and consistency in the news industry, IPTC is creating rules for tagging documents with its industry standard Media Topics vocabulary, used widely by publishers. IPTC plans to provide example rules for at least two of the languages supported by Media Topics: Arabic, English, French, German and Spanish.
“For small to medium size publishers who are dissatisfied with hand-tagging their content or grappling with complex machine-learning tools, EXTRA is an open-source news classification engine that will let you easily apply rich metadata to breaking news content,” said Myles. “Unlike manual techniques, which can be slow and inconsistent, or traditional statistical methods, which aren’t suitable for breaking news, EXTRA’s rules-based classification will provide fast, consistent and relevant metadata to enrich search, advertising and content analytics.”
IPTC invites other parties to join the development of the EXTRA project. To get involved, contact Myles at email@example.com.
The International Press Telecommunications Council (IPTC) will use a grant from the first round of Google’s Digital News Initiative Innovation Fund to build and freely distribute an initial version of EXTRA: The EXTraction Rules Apparatus, a multilingual open-source platform for rules-based classification of news content.
EXTRA will be a classification system for annotating news documents with high-quality subject tags. Such tags will allow publishers to deliver a variety of valuable services including content recommendations, improved advertising targeting and subject-specific content streams, such as alerts and topic pages.
“By creating a freely available rules-based classification engine, IPTC will help publishers to enhance their content with all sorts of metadata services, including enriched search, intelligent recommendations and precise analytics,” said Stuart Myles, chairman of IPTC.
EXTRA will provide news publishers with several key capabilities: the ability to automatically categorize documents by subject (for example, terrorism, sports, names of celebrities); the ability to author classification rule sets tailored to existing taxonomies; and the ability to classify documents using the industry standard IPTC Media Topics taxonomy. Taxonomies are used by many news organizations to classify their content. Classification is used in various ways, including improved online news navigation by grouping and linking, to organize editorial workflows and to enrich search.
So that EXTRA is immediately useful to the news publishing community, IPTC will create different suites of rules in two languages for classifying news documents into the IPTC Media Topics taxonomy, an industry-standard taxonomy used by several leading news providers.
“We hope that the EXTRA project will support a migration in the news publishing community towards a common industry-wide open source platform,” said Michael Steidl, managing director of IPTC. “We believe that a freely available document classification platform will provide great benefit to small-to-medium sized publishers.”
IPTC invites other parties to join the development of EXTRA.
Contact firstname.lastname@example.org to learn more, including how you can get involved.
Over €27m has been offered by Google to 128 projects, large and small, from 23 countries across Europe – each designed to advance innovation in the news industry. DNI is a collaboration between Google and news publishers in Europe to support high quality journalism and encourage a more sustainable news ecosystem through technology.
About IPTC: The IPTC, based in London, brings together the world’s leading news agencies, publishers and industry vendors. It develops and promotes efficient technical standards to improve the management and exchange of information between content providers, intermediaries and consumers. The standards enable easy, cost-effective and rapid innovation. Visit www.iptc.org and follow on Twitter: @IPTC
IPTC Releases Results of 2016 Social Media Sites Photo Metadata Test
Important image metadata is not retained in images after upload to some of the most popular social media sites, according to a study by the International Press Telecommunications Council (IPTC). The missing data includes key copyright and identification information as well as descriptive data about the image.
The IPTC, a consortium of over 50 news agencies and media companies, sets international technical standards for news exchange, including metadata embedded in image files. The recent Social Media Sites Photo Metadata Test repeats a survey in 2013; while improvements are noted, some sites scored lower this time around.
The Social Media Sites Photo Metadata Test evaluated 15 top social media sites, and checked if embedded metadata was retained and displayed on upload to the sites or downloads of various version of the image. The results are displayed at www.embeddedmetadata.org/testresults.
Only one social media site, Behance, received favorable results for retaining and displaying embedded data. A few systems retained embedded metadata but failed to use it when displaying metadata on the web site. Ten sites removed at least some metadata when images were downloaded to a desktop environment.
“There are many important reasons to embed and preserve metadata – to protect copyrights, ensure proper licensing, track image use, smooth workflow, and make them searchable on- or offline,” said Michael Steidl, Managing Director of IPTC. “If users provide captions, dates, a copyright notice and the creator within their images, that data shouldn’t be removed when sharing them on social media websites without their knowledge.”
There may be several reasons social media services remove metadata – and some may not be intentional. Test results showed that in some cases, when images were downloaded to a desktop environment, the metadata was preserved if the size of the image remained unchanged. But if the image was rescaled, the metadata was stripped. “The quality assurance of these sites might not be aware that their software strips metadata inadvertently,” said Steidl.
“Because many of the social media sites are essentially free, users become the product, and not necessarily the customers,” said David Riecks, a photographer and metadata consultant who owns ControlledVocabulary.com and worked on the test. “Users are often not aware of these practices. There should be a sweet spot between these social sites preserving all metadata and removing it all. I’d like to see more engineers working together to find solutions.”
The Embedded Metadata Manifesto was launched by IPTC in 2011 to draw attention to the importance of retaining important data embedded in image files. The website, www.embeddedmetadata.org also includes Embedded Metadata Manifesto’s five guidelines for how metadata should be handled and preserved in digital media.
About IPTC: The IPTC, based in London, brings together the world’s leading news agencies, publishers and industry vendors. It develops and promotes efficient technical standards to improve the management and exchange of information between content providers, intermediaries and consumers. The standards enable easy, cost-effective and rapid innovation and include the Photo Metadata standard, the news exchange formats NewsML-G2, SportsML-G2 and NITF, rNews for marking up online news, the rights expression language RightsML, and NewsCodes taxonomies for categorizing news. Visit www.iptc.org and follow on Twitter: @IPTC
Extensis, a leading developer of software and services for creative professionals and workgroups, joins IPTC to extend the company’s commitment to advancing standards designed to making working with metadata easier.
“Extensis as system vendor has taken the essential role of enabling companies managing photos to make efficient use of IPTC’s widely used photo metadata standard”, said Michael Steidl, IPTC managing director and lead of the photo metadata work. “IPTC welcomes Extensis as new member of our organization; we will work jointly on improving professional photo workflows”, he added.
Read the Extensis press release.