By Sarah Saunders
Ten years ago, the very first IPTC Photo Metadata Conference in Florence was packed with photographers and picture libraries eager to discuss ways of protecting their work in the digital environment. The image industry has expanded enormously since. The image industry has the additional challenge of vast numbers of images crowding the web, and the difficulty of finding the relevant picture, as well as the metadata relating to it.
IPTC Photo Metadata Conference 2017 was designed to look into the future, with a focus on image search using AI or Artificial Intelligence. The ability of computer systems to learn from humans has increased enormously in recent years, and the necessary computer power has now become available. The question for the conference was – how far can these systems help the professional image industry sharpen up its search capability and make gains in productivity?
Solution for Auto Tagging in the Image Industry
Kai-Uwe Barthel, professor of visual computing at HTW Berlin University gave a clear exposition on the history of the field and of the pressing need to create solutions for the picture industry. There are now too many images for classical search systems to handle, but using Neural Network Analysis – a variant of AI – computer systems can be taught to tag and recognise images in a fraction of the time it takes to manually tag. As most online images are untagged, a combination of human tagging and visual similarity search presents a viable way forward. But Barthel and his team have also researched new methods of presenting images, using three dimensional structures to dig into results with large numbers of images visible at one time.
Speakers on the use of AI came from across the industry, presenting solutions which can be put into practice now. The key to success in this area is to have enough content for the computers to learn from, and this can be achieved in a number of different ways. General AI systems produce good results for skies and beaches and general themes because they’ve had the data to learn from. But users can set a system to learn from their own content so that more specialist content can be tagged if the conditions are right. Computers are learning faster, and need fewer images to learn from than before.
AI systems can be trained to recognise faces, text, colours, composition, scenes, and objects, but can also be trained in the aesthetics of image selection, with one speaker maintaining that twenty images are enough to train a system in a particular brand aesthetic. But speakers admitted also that defining the precise location of what is shown in an image by its content was tested but it did not work in a reliable way.
Speakers stressed that the important element in computer learning is the understanding of the nature of the material to be tagged, an attribute which is currently not about to be taken over by artificial intelligence. The benefits in speed and productivity will be enormous but we’re not yet talking about doing away with human skills altogether!
IPTC Video Metadata and Easier Cross Media Distribution
The first afternoon session by IPTC Managing Director Michael Steidl was about the IPTC Video Metadata Hub (VMH), published in 2016 to provide a standard set of fields for use across the varied technologies used in video. Many of the Video Hub fields are equivalent to those in IPTC Photo Metadata Standard, which helps streamline cross media distribution. The VMH can be applied down to the level of video clips, which makes it a useful metadata tool for production, archive and distribution.
Technology That Protects Rights Information in Google Image Search
The IPTC conference was held a day after a CEPIC seminar on Google. The Google image search scrapes images from their original sites and displays them in its own environment. This is bad for rightsholders as images can be saved and downloaded direct from Google without reference to the original site. Picture libraries and agencies lose significant traffic to their sites as a result with a German agencies survey indicating a drop of 50 percent in traffic. The recent fine levied by the EU on Google for anti-competitiveness in comparative shopping sites is encouraging and has paved the way for scrutiny of Google’s actions with online images.
The second presentation of the afternoon presented a solution for the problems raised in the Google seminar. SmartFrame technology allows images to be presented online without the danger of being scraped by Google as this is disabled by technical means. Most of the mechanisms people use to download images – like right-click – are disabled too. Images can be shared as links so social media sharing doesn’t lead to an image becoming orphaned and lost in the websphere. And when an image is viewed as a thumbnail in Google, there is a clear indication that it is a copyrighted image, and a link back to the originating site. Rob Sewel, Pixelrights CEO demonstrated how product items within an image could be linked back to a brand website, providing ways of funding photography in the future where photography provides a link to a paid-for advertising service. The technology could be put to all sorts of uses in both commercial and non-commercial fields, and gives control back to creators and their agents.
The success of this kind of technology, as with all solutions to image grabbing and orphaned images, lies in the uptake of the technology. To be truly protective of copyright, client websites would need to implement a technology like SmartFrame.
The IPTC Photo Metadata Conference 2017 was fascinating from start to finish for the about 60 attendees on location, the level of presentations was extremely high, and the presentations and videos are all available on the website at https://iptc.org/events/photo-metadata-conference-2017/.
Sarah Saunders runs Electric Lane, an independent DAM consultancy specialising in workflow planning, asset retrieval, data management and DAM project management. She works with IPTC’s Photo Metadata Working Group.
IPTC named Bill Kasdorf, longtime publishing executive and VP and Principal Consultant at Apex Content and Media Solutions, as its new Public Relations Chairperson.
IPTC is a consortium of news agencies, publishers and industry vendors that develops and publishes technical specifications and standards to promote the easy, accurate and inexpensive sharing of news and information in all media. Kasdorf’s main goals as Marketing and Public Relations Chairperson are to increase and strengthen the membership of IPTC, and to extend awareness of IPTC’s work to other sectors of publishing beyond news that would benefit from IPTC’s work.
“Although the technical standards developed by IPTC are rooted in the news media sector, the work of IPTC is incredibly important and useful to all areas of publishing and media, as well as related fields such as library science and the cultural heritage sector,” Kasdorf said.
Kasdorf’s experience gives him a broad perspective across the major sectors of the publishing ecosystem – trade books, educational publishing, scholarly and scientific books and journals, magazines, and news. General editor of The Columbia Guide to Digital Publishing, he is active in many professional and standards organizations. He serves on the Steering Committee of the W3C Publishing Business Group and is a member of the W3C Publishing Working Group; he chairs the Content Structure Committee of the Book Industry Study Group; and he is active in the Society for Scholarly Publishing, of which he is a Past President. He serves on the editorial boards of Learned Publishing and the Journal of Electronic Publishing.
In his consulting practice, Kasdorf has served clients globally, including large international publishers such as Pearson, Wolters Kluwer, and Kaplan; scholarly presses such as Harvard, MIT, and Cambridge; aggregators such as VitalSource; and global organizations such as the World Bank, the British Library, and the European Union.
“The PR Chairperson should combine ideas from our membership with needs from up-to-date marketing strategies and Bill will do this in an excellent way” said Michael Steidl, Managing Director of IPTC.
“IPTC is at the forefront of the publishing ecosystem in the development and implementation of machine processable rights expressions, as well as photo and video metadata,” Kasdorf said. “We live in a multimedia world, and IPTC is providing essential technologies for making that world work.”
An updated version 2.25 of NewsML-G2 is available as Developer Release
- XML Schemas and the corresponding documentation are updated
- the Structure Matrix Excel sheet is updated
Packages of version 2.25 files can be downloaded:
- All XML Schemas plus Structure Matrix from https://www.iptc.org/std/NewsML-G2/NewsML-G2_2.25.zip
- The same without XML Schema documentation in HTML: https://www.iptc.org/std/NewsML-G2/NewsML-G2_2.25-noXMLdocu.zip
- New: in the newsml-g2 repository on GitHub: https://github.com/iptc/newsml-g2
All changes of version 2.25 can be found on that page: http://dev.iptc.org/G2-Approved-Changes
An important decision was taken: the Core Conformance Level will not be developed any further as all recent Change Requests were in fact aiming at features of the Power Conformance Level, changes of the Core Level were only a side effect.
The Core Conformance Level specifications of version 2.24 will stay available and valid, find them at http://dev.iptc.org/G2-Standards#CCLspecs
After 12 years of collaborative work on establishing and implementing photo metadata standards, IPTC, the global technical standards body of the news media and related industries, announced Adobe Systems Incorporated is joining as a Voting Member. Adobe’s membership was announced at IPTC’s Spring Meeting today in London.
“Adobe is a key player in the media production ecosystem, so we are thrilled to welcome them as a member of the IPTC,” said Stuart Myles, Chairman of the Board of IPTC, and Director of Information Management at Associated Press. “We look forward to working together with Adobe on driving continued improvements in the workflows of photo and video creators around the world.”
“Adobe has a long history of working informally with the IPTC, and we look forward to further success as we participate directly and contribute as a Voting Member,” said Dr. Scott Foshee, Principle Scientist, Adobe. “Our close involvement will not only enable greater coordination between Adobe and the IPTC, but will also allow Adobe to facilitate better coordination across the photography standardization community.”
Photo metadata is key to protecting images’ copyright and licensing information, and for managing digital assets. IPTC’s Photo Metadata Standard, created with contributions by Adobe, is the most widely used because of universal acceptance among photographers, distributors, news organisations, archivists, and developers. Adobe’s metadata management software, which supports the IPTC standard, is used by Adobe Photoshop, Lightroom, Illustrator, Acrobat, and Premiere.
“Adobe’s implementation has made IPTC photo metadata very popular,” added Michael Steidl, IPTC Managing Director. “For 12 years we have been collaborating on fostering professional use of IPTC photo metadata by photo businesses – building on our success by conducting research and incorporating feedback from users. This membership will open yet more opportunities for better tagging of photos and videos.”
Adobe first adopted IPTC IIM metadata in Photoshop around 1994 and later created the metadata format XMP. In 2004 IPTC and Adobe joined forces to support a consistent use of metadata: The first IPTC Photo Metadata Standard was created jointly. A main goal of the standard was to provide support for photographers and photo editors to use the fields in correct and consistent ways.
Adobe will be a Voting Member of IPTC, signifying Adobe as a key player and industry leader. IPTC currently has about 60 members. Its voting members take part in all decisions regarding IPTC standards. Delegates can participate in working parties and groups, may request changes, and make contributions to standards’ development.
News Classification Rules Being Developed for English and German with IPTC Media Topics
The IPTC has reached the first milestone in EXTRA, the Google/DNI project to build an open source rules engine for news. We are partnering with Infalia PC and have selected the Elasticsearch engine for developing a high-performance, rules-based news classifier. We are licensing an English language news corpus from Reuters and one in German from the Austrian Press Agency for use within the project. We have two linguists creating sample rules for classifying those corpora with IPTC’s Media Topics using the EXTRA engine. The project is on track to deliver a working version of the engine, together with the sample rules, by the summer of 2017.
EXTRA Open Source Rules for News
EXTRA (“EXTraction Rules Apparatus”) is an open source project to classify news text using rules. The engine allows news organizations to precisely identify the categories to which a piece of news belongs by specifying Boolean rules, with sophisticated natural language processing capabilities. Rule-based classification is better for breaking news than statistical methods, since it doesn’t require re-training using example news items (which typically take time to produce). Automated classification is generally more consistent and scalable than hand tagging of news. Most machine learning techniques are essentially “black boxes”, whereas rules provide much greater transparency – and therefore ability to control – why a piece of content is classified in a particular way. For all of these reasons, we believe that the EXTRA rules engine is ideally suited for news classification.
After evaluating a number of open source frameworks, we decided to make Elasticsearch’s percolator technology the foundation for the EXTRA engine. Our testing indicates that Elasticsearch supports indexing a large number of rules. The percolator has performant and scalable support for matching indexed rules against incoming documents, the core task of the EXTRA engine. Elasticsearch has an active open source community, as well as options for commercial support.
The EXTRA Requirements, Design, API and Rules Language
We have drawn up a detailed set of technical requirements and have created a high level technical architecture for EXTRA. We have designed the EXTRA API and the rule language. Linguists are working on writing the rules to classify English and German news using IPTC’s Media Topics taxonomy
IPTC, Infalia, Google DNI
EXTRA is being developed by the IPTC, an international consortium of news agencies, publishers and system vendors. The project is funded by the Digital News Initiative, Google’s €150 million fund aimed at stimulating innovation amongst European publishers. In 2016, IPTC applied for and won a DNI grant of €50,000 to develop the EXTRA engine. As a development partner, IPTC selected Infalia PC, a spin-out from the Information Technologies Institute of the Centre for Research and Technology Hellas with significant expertise in data analytics and natural language processing.
If you’d like to learn more about the IPTC or the EXTRA project, please contact firstname.lastname@example.org
IPTC’s Photo Metadata Working Group has released the Cultural Heritage Panel plugin for Adobe Bridge, which focuses on fields relevant for images of artwork and other physical objects, such as artifacts, historical monuments, and books and manuscripts.
Sarah Saunders and Greg Reser, experts from the cultural heritage sector, conceived the IPTC Cultural Heritage Panel to address needs of the photo business and growing community of museums, art foundations, libraries, and archive organisations. Furthermore the panel fills a gap: Many imaging software products, including Bridge, do not support all metadata fields of the IPTC Photo Metadata Standard 2016 for artwork or objects.
The artwork or object fields – a special set of metadata fields developed by IPTC a few years ago – describe artworks and objects portrayed in the image (for example, a painting by Leonardo da Vinci). This means that descriptive and rights information about artworks or objects is recorded separately from information about the digital image in which they are shown. Multiple layers of rights and attribution can be expressed – copyright in the photo may be owned by a photographer or museum, while the copyright in the painting is owned by an artist or estate.
The new plugin for Bridge (CC versions up to 2016 and CS6 were tested) allows people to view the image data, and write into these fields using a simple panel, which has been tailor-made for use in the heritage sector. The panel includes fields for artwork/object attributes and also relevant digital image rights.
“The Cultural Heritage Panel will be very useful for people working in the heritage sector in museums and archives,” Saunders, a consultant specialising in digital imaging and archiving. “It allows them to manage and monitor data about objects and artworks that is embedded in the IPTC XMP fields in the image.”
“The metadata can then be transferred into an organisation’s digital asset management system; the panel helps ease the ingest process,” Reser said.
Reser also noted that the panel helps incorporate more people into workflows, such as freelance photographers, who otherwise may not have access to an organisation’s digital asset management system. The Cultural Heritage Panel allows them to be an efficient part of the process of viewing the metadata included with an image, and adding to it when appropriate.
“IPTC is the most popular schema in embedded metadata,” Reser said. “Over time I bet we’ll see a lot of the cultural heritage fields creep into off-the-shelf programs and software.”
The panel is free, includes an easy-to-use interface, and includes key image administration fields. Image caption and keywords can be automatically generated from existing Artwork or Object data.
Download the IPTC Cultural Heritage Panel and User Guide for Adobe Bridge.
I chair the Board of Directors of IPTC, a consortium of news agencies, publishers and system vendors, which develops and maintains technical standards for news, including NewsML-G2, rNews and News-in-JSON. I work with the Board to broaden adoption of IPTC standards, to maximize information sharing between members and to organize successful face-to-face meetings.
We hold face-to-face meetings in several locations throughout the year, although, most of the detailed work of the IPTC is now conducted via teleconferences and email discussions. Our Annual General Meeting for 2016 was held in Berlin in October. As well as being the time for formal votes and elections, the AGM is a chance for the IPTC to look back over the last year and to look ahead about what is in store. What follows is my prepared Chairman’s Report at the AGM.
Good morning from #IPTC Chairman @smyles, at the #IPTC Autumn Meeting 2016, #dpa in Berlin! pic.twitter.com/8u1KvBrfEu
— IPTC (@IPTC) October 24, 2016
The Only Constant
It is clear that the news industry is experiencing a great degree of change. The business side of news continues to be under pressure. And, in no small part, this is because the technology involved in the creation and distribution of news continues to rapidly evolve.
However, in many ways, this is a golden age of journalism. The demand for news and information has never been higher. The immediate and widespread distribution of news has never been easier.
The IPTC has been around for 51 years. I’ve been a delegate to the IPTC since 2000 and Chairman of the Board since June 2014. I’d like to give my perspective on the changes going on within the news industry and how IPTC has and will respond.
We’re On a Mission
IPTC is rooted in – and foundational to – the news industry. Our open source standards for news technology enable the operations of hundreds of news and media organizations, large and small. IPTC standards are instrumental in the software used to create, edit, archive and distribute news and information around the world.
We are starting to evolve the scope of our work beyond standards – such as via the EXTRA project to build an open source rules-based classification engine. Much of what we do is relevant to not only news agencies and publishers, but also to photographers, videographers, academics and archivists. By bringing together these diverse groups, we can not only create powerful, efficient standards and technologies, but also learn from each other about what works and what does not.
What’s Going On?
- continuing to improve documentation – to make it easier to get going with a standard and simpler to grasp the nuances when you want to expand your implementation
- making our standards more coherent and consistent – as many organizations need to use a combination
Great discussion about #IPTC alliance, collaboration with International Image Interoperability Framework @IIIFramework re: #Photo #Metadata pic.twitter.com/EFWdOf7aC6
— IPTC (@IPTC) October 24, 2016
PM session: standards makers PB Core, Media Institute; product vendors inVid, Extensis, Canto, Dextro, Mainstream Data on metadata workflows pic.twitter.com/9EJJQmjDwp
— IPTC (@IPTC) October 25, 2016
IPTC is a membership-driven organization. Membership fees represent the vast majority of the revenue for our organization. As the news industry as a whole continues to feel pressure – including downsizing, mergers and, unfortunately some members going out of business – the IPTC is experiencing downward pressure on its own revenue. So, we are working on ways to reach new members, whilst at the same time ensuring that existing members continue to derive value. We’re also open to exploring new ways of generating revenue which fit with our mission – let us know your ideas!
What new areas should the IPTC focus on? Many journalists are experimenting with an array of technologies – Augmented Reality, Virtual Reality, 360 degree photos, drones and bots, to name but a few. And let’s not forget about the “Cambrian Explosion” of technologies related to news and metadata on the Web, including AMP, AppleNews, Instant Articles, rNews, Schema.org and OpenGraph. How can IPTC help – negotiating standards? Developing best practices? Navigating the ethics of these technologies?
If you’re not happy, then please tell me!
I Want to Thank You
Finally, I’d like to extend a special thanks to Michael Steidl, Managing Director of the IPTC, who is personally involved in almost every aspect of what we do.
The IPTC has released a comprehensive set of sports controlled vocabularies as a supplement to the SportsML 3.0 sports-data interchange format, which was released in July 2016. These controlled vocabularies (CVs) are in the format of NewsML-G2 NewsML-G2 Knowledge Items plus RDF variants and are available on IPTC’s CV server at http://cv.iptc.org/newscodes.
There are 113 CVs representing such core sports concerns such as event and player status, as well as specialized lists for 11 sports (basketball, soccer, rugby, American football, etc.) for statistics, player positions, scoring types, etc.
“The SportsML 3.0 standard’s semantic tech capabilities are improved greatly by the new controlled vocabularies,” said Trond Husø, system developer for Norwegian news agency NTB, one of the early adopters of SportsML 3.0. “Data can be easily imported, structured, and stored.”
“When building a sports app you spend a lot of prep time defining your terms and building a schema,” said Paul Kelly, news technology consultant and lead for IPTC’s Sports Content Working Group. “By using SportsML 3.0, there is no need to reinvent the wheel.”
“You consider things such as ‘What sort of results and stats do we need?’ and ‘How will our system handle interrupted matches?’ IPTC’s vocabularies can get you on your way because they properly define in a standard format almost all the terminology you would use in a sports application: Everything from “goals-scored” to a full enumeration of status codes for sports events,” Kelly said.
For the Summer 2016 Olympics, NTB acquired the rights to distribute the results and data from the International Olympics Committee’s Olympic Data Feed (ODF). NTB then transformed ODF to SportsML 3.0, and then to NITF3.2. “Using SportsML to structure the ODF’s data is a broad and comprehensive solution to approaching all sports and competitions worldwide,” said Husø, who is also a member of IPTC’s Sports Content Working Group. “SportsML is now a truly flexible and universal format that can incorporate multiple vendor codes and still provide a defense against vendor lock-in.”
“Terms defined in another format such as ODF can easily live beside SportsML terms – as well as any other proprietary format – so that an organisation can build a repository of knowledge of all the different sports-data formats,” Kelly said.
Another advantage to the new SportsML 3.0 standard is that if new concepts are added to a sports vocabulary or modified in it, the data model and the XML Schema don’t change; they stay stable. It also supports all languages for the concept labels.
“A great feature is that we can translate the definitions to Norwegian – without changing or breaking the vocabulary,” said Husø. “If we were to distribute internationally, our domestic receivers could look up the definitions in Norwegian, while the international ones could use the English term.”
IPTC’s SportsML 3.0 standard underwent a major upgrade from version 2.2, after 12 years of evolution since its first version. The new standard incorporates contribution from sports experts in 12 countries. Its flexible core covers all major sports and events in most news reporting.
Other early adopters of SportsML 3.0 include Univision and the British Press Association in its new multi-sport API. Its major features include:
- compliance with IPTC’s NewsML-G2 standard
- a flexible core that covers all major sports and events in most news reporting
- plugins for detailed stats in 10+ sports
- a more flexible tournament model
- schedules, scores, standing, statistics, etc.
- choices between specific and generic terms
- controlled vocabularies, semantic tech capabilities
- schema redesign
- many samples and tool support.
Tool support for SportsML 3.0 includes 45 samples from 11 different sports and events, including both classic and SportsML-G2 examples, and both generic and specific examples.
The vocabularies will be maintained by IPTC for future expansion; new sports and terms can be added.
IPTC has published an updated Photo Metadata User Guide, for photographers, photo editors and professionals responsible for in-house metadata workflows, including digital asset management (DAM) systems.
Based on IPTC’s widely used Photo Metadata Standard, the new User Guide contains practical information regarding photo metadata – from photographers familiarizing themselves with basics, to managers in related businesses who have a deeper understanding of implementation of standards and metadata.
A key use of metadata is to describe the content of an image, location and rights information; the guide groups metadata fields according to information types. “The User Guide will help when deciding where metadata should be put about a certain topic, and what data should or should not be filled into a specific field,” said Michael Steidl, managing director of IPTC, and lead of IPTC’s Photo Metadata Working Group.
IPTC sets the industry standard for administrative, descriptive, and copyright information about images. The IPTC Photo Metadata Standard, supported by many software applications, is the most widely used standard because of its universal acceptance among photographers, distributors, news organisations, archivists, and developers.
The Photo Metadata User Guide walks users through the major groups of metadata, and for each IPTC field contained within each, it provides short guidelines on the use and semantics.
The first section of the guide outlines practical use for a basic understanding of applying photo metadata, and may be most helpful to photographers becoming familiar with adding it to their photos for the first time. Photo metadata is key to protecting photographers’ images, including copyright and licensing information online.
The User Guide addresses typical questions such as:
- What is a minimum set of fields to be used?
- How is metadata preserved?
Five examples of metadata for independent, staff, and agency photographers plus images of artwork are given.
Photo metadata is also essential for managing digital assets. Detailed and accurate descriptions about images ensure they can be easily and efficiently retrieved via search, by users or machine-readable code. This results in smoother workflow within organisations, more precise tracking of images, and potential for licensing opportunities.
For professionals responsible for in-house photo metadata workflows and DAM systems, all IPTC metadata fields in the User Guide have been grouped by topic for easy reference: general description, persons, locations, things shown, rights and licensing information, and administrative data.
The User Guide does not focus on the user interface of a specific software, and will be updated regularly to include more details.
The International Press Telecommunications Council (IPTC) released the new Video Metadata Hub Recommendation (VMHub), a comprehensive solution for video metadata management that allows exchange of metadata over multiple existing standards.
The VMHub supports various technical solutions with the key goal of storing and exchanging metadata in a safe and reliable way, with a universal metadata schema.
“Users of videos of different standards told IPTC they need a common ground in metadata for efficient workflows,” said Michael Steidl, managing director of IPTC, at IPTC’s Autumn Meeting in Berlin, during a day devoted to video. “This is what we deliver now with the Video Metadata Hub.”
Diverse video technology methods have made standardisation challenging – the various approaches for embedding metadata and rights information. There are also many different metadata schemas for video, many of them somewhat limiting.
“Organisations and individuals can benefit from implementing the VMHub because it helps to streamline workflows, with guidelines for organising metadata of videos from different sources and standards in a common way,” said Steidl, who is also the lead of IPTC’s Video Metadata Working Group.
Likewise, the VMHub supports workflow, exchange of metadata, and search functions across existing standards, and provides mappings to Apple Quicktime, PBCore, MPEG7, Schema.org, and IPTC’s NewsML-G2.
“The Hub also supports organisations switching from an ‘old’ to a ‘new’ standard by providing a stable metadata schema and gives the ability to search across videos from different standards,” Steidl said.
IPTC’s Video Metadata Working Group – which consists of delegates from news organisations, system vendors and experts in the metadata field – collaborated for two years to review technical elements, rights and administrative information, and metadata terms for describing audio-visual content, to ensure IPTC’s VMHub was a comprehensive solution for video metadata management.
Documentation & Specification
- Specification, technical implementation, and mappings to Apple Quicktime, MPEG 7 (ISO 15938-5), IPTC’s NewsML-G2, PB Core and Schema.org.
- The recommendation documents are available at www.iptc.org/std/videometadatahub/recommendation/1.0, and include specifications of the properties of the metadata schema and their technical implementation by EBU Core, for stand-alone documents, and XMP, for embedded metadata.