The IPTC is happy to announce the latest version of our guidance for mapping between photo metadata standards.

Following our publication of IPTC’s rules for mapping photo metadata between IPTC, Exif and schema.org standards in 2022, the IPTC Photo Metadata Working Group has been monitoring updates in the photo metadata world.

In particular, the IPTC gave support and advice to CIPA while it was working on Exif 3.0 and we have updated our mapping rules to work with the latest changes to Exif expressed in Exif 3.0.

As well as guidelines for individual properties between IPTC Photo Metadata Standard (in both the older IIM form and the newer XMP embedding format), Exif and schema.org, we have included some notes on particular considerations for mapping contributor, copyright notice, dates and IDs.

The IPTC encourages all developers who previously consulted the out-of-date Metadata Working Group guidelines (which haven’t been updated since 2008 and are no longer published) to use this guide instead.

Screenshot of the IPTC wiki page showing how to read and write IPTC Photo Metadata in JavaScript.
Screenshot of the IPTC wiki page showing how to read and write IPTC Photo Metadata in JavaScript.

We at IPTC receive many requests for help and advice regarding editing embedded photo and video metadata, and this has only increased with the recent news about the IPTC Digital Source Type property being used to identify content created by a generative AI engine.

In response, we have created some guidance: Developers’ and power users’ guide to reading and writing IPTC Photo Metadata 

This takes the form of a wiki, so that it can be easily maintained and extended with more information and examples.

In its initial form, the documentation focuses on:

In each guide, we advise on how to read and create DigitalSourceType metadata for generative AI images, and also how to read and write the Creator, Credit Line, Web Statement of Rights and Licensor information that is currently used by Google image search to expose copyright information alongside search results.

Showing how IPTC metadata properties are used in Google Images search results.

We hope that these guides will help to demystify image metadata and encourage more developers to include more metadata in their image editing and publishing workflows.

We will add more guidance over the coming months in more programming languages, libraries and frameworks. Of particular interest are guides to reading and writing IPTC Photo Metadata in PHP, C and Rust.

Contributions and feedback are welcome. Please contact us if you are interested in contributing.

Overview of the C2PA trust ecosystem, showing how the C2PA project implements requirements set by both the Content Authenticity Initiative and Project Origin.
Overview of the C2PA trust ecosystem, showing how the C2PA project implements requirements set by both the Content Authenticity Initiative and Project Origin.

The IPTC is proud to announce that after intense work by most of its Working Groups, we have published version 1.0 of our guidelines document: Expressing Trust and Credibility Information in IPTC Standards.

The culmination of a large amount of work over the past several years across many of IPTC’s Working Groups, the document represents a guide for news providers as to how to express signals of trust known as “Trust Indicators” into their content.

Trust Indicators are ways that news organisations can signal to their readers and viewers that they should be considered as trustworthy publishers of news content. For example, one Trust Indicator is a news outlet’s corrections policy. If the news outlet provides (and follows) a clear guideline regarding when and how it updates its news content.

The IPTC guideline does not define these trust indicators: they were taken from existing work by other groups, mainly the Journalism Trust Initiative (an initiative from Reporters Sans Frontières / Reporters Without Borders) and The Trust Project (a non-profit founded by Sally Lehrman of UC Santa Cruz).

The first part of the guideline document shows how trust indicators created by these standards can be embedded into IPTC-formatted news content, using IPTC’s NewsML-G2 and ninjs standards which are both widely used for storing and distributing news content.

The second part of the IPTC guidelines document describes how cryptographically verifiable metadata can be added to media content. This metadata may express trust indicators but also more traditional metadata such as copyright, licensing, description and accessibility information. This can be achieved using the C2PA specification, which implements the requirements of the news industry via Project Origin and of the wider creative industry via the Content Authenticity Initiative. The IPTC guidelines show how both IPTC Photo Metadata and IPTC Video Metadata Hub metadata can be included in a cryptographically signed “assertion” 

We expect these guidelines to evolve as trust and credibility standards and specifications change, particularly in light of recent developments in signalling content created by generative AI engines. We welcome feedback and will be happy to make changes and clarifications based on recommendations.

The IPTC sends its thanks to all IPTC Working Groups that were involved in creating the guidelines, and to all organisations who created the trust indicators and the frameworks upon which this work is based.

Feedback can be shared using the IPTC Contact Us form.

The IPTC NewsCodes Working Group has approved an addition to the Digital Source Type NewsCodes vocabulary.

Illustration: August Kamp × DALL·E, outpainted from Girl with a Pearl Earring by Johannes Vermeer
Image used by DALL-E to illustrate outpainting. OpenAI’s caption: “Illustration: August Kamp × DALL·E, outpainted from Girl with a Pearl Earring by Johannes Vermeer”

The new term, “Composite with Trained Algorithmic Media“, is intended to handle situations where the “synthetic composite” term is not specific enough, for example a composite that is specifically made using an AI engine’s “inpainting” or “outpainting” operations.

The full Digital Source Type vocabulary can be accessed from https://cv.iptc.org/newscodes/digitalsourcetype. It can be downloaded in NewsML-G2 (XML), SKOS (RDF/XML, Turtle or JSON-LD) to be integrated into content management and digital asset management systems.

The new term can be used immediately with any tool or standard that supports IPTC’s Digital Source Type vocabulary, including the C2PA specification, the IPTC Photo Metadata Standard and IPTC Video Metadata Hub.

Information on the new term will soon be added to IPTC’s Guidance on using Digital Source Type in the IPTC Photo Metadata User Guide.

A screenshot of an example page from sportschema.org showing how IPTC Sport Schema can be used to represent an Olympic Athletics event.

NEW YORK, NY, 26 JULY 2023: The IPTC today announced the beginning of a public feedback and review period of IPTC Sport Schema, which aims to be “the standard for the next generation of sports data.”

The announcement was made by Paul Kelly, Lead of the IPTC Sports Content Working Group, at the Sports Video Group’s Content Management Forum held at 230 Fifth Penthouse, New York.

“The SVG Content Management Forum is attended by senior tech experts from sports broadcasters and sports leagues from the US and around the world, so it is the perfect place to launch the IPTC Sport Schema,” said Kelly. “Many members of SVG have advised us on our work so far, including organisations such as Warner Bros Discovery, NBC Universal, PGA TOUR, Major League Baseball and Riot Games. Presenting our work at their event is a great way to say thanks for their help.”

While not yet an official IPTC standard, the IPTC Sports Content Working Group feels that the schema describing IPTC Sport Schema is solid enough to be published for public feedback.

Sports data for the era of linked data and knowledge graphs

The purpose of the IPTC Sport Schema project is to create a new RDF-based sports data standard, while making the most of the experience the IPTC has gained from the last 20 years of maintaining SportsML, the open XML-based sports data standard used by news and sports organisations around the world.

Another screenshot from sportschema.org showing the full ontology diagram, a generic model that can be used to represent athletes and teams, various competition structures, results and statistics across many sports.
Another screenshot from sportschema.org showing the full ontology diagram, a generic model that can be used to represent athletes and teams, various competition structures, results and statistics across many sports.

While XML served the industry well for many years, more recently developers and IPTC members have asked the Sports Content Working Group whether a standard would become available in a more modern serialisation format such as JSON, and whether knowledge graph protocols would be supported.

Because it is based on the W3C-standard RDF and OWL specifications, IPTC Sport Schema leverages the wide range of tools and expertise in the world of knowledge graphs, semantic web and linked open data, including the SPARQL query language, the JSON-LD serialisation into JSON format, inference using RDF Schema and OWL, and more.

“Using IPTC Sport Schema, sports leagues can choose to own their data,” said IPTC Managing Director Brendan Quinn. “Content publishers or sports leagues can publish open data on their website if they choose, in a way that can be re-mixed and re-used by others around the world.” IPTC Sport Schema can also be used for a more traditional model of aggregation and syndication by sports statistics providers who add value to the raw data being collected by sports leagues.

Like its ancestor SportsML, IPTC Sport Schema is created as a generic sports data model that can represent results, statistics, schedules and rosters across many sports. “Plugins” for specific sports extend the generic schema with specific statistics elements for 10 sports such as soccer, motor racing, tennis, rugby and esports. But the generic model can be used to handle any competitive sports competition, either team-based, head-to-head or individual.

As well as IPTC’s SportsML standard, the project is based on previous work by the BBC on its BBC Sport Ontology (some of its creators worked on this project). We have also consulted with and analysed related projects and formats such as OpenTrack and the IOC’s Olympics Data Feed format.

For more information on IPTC Sport Schema, please see the dedicated site sportschema.org, the project’s GitHub repository

Those who are interested in the details can see an introduction to the IPTC Sport Schema ontology design, the full ontology diagram or full RDF/OWL ontology documentation

There may be significant changes to the schema between now and when it is released as a fully endorsed IPTC Standard, so we don’t recommend that it is implemented in production systems yet. But we welcome analysis and experimentation with the model, and look forward to seeing feedback from those who would like to implement it in the real world.

People and organisations who are not IPTC members can give feedback by posting to the IPTC SportsML public discussion group or use the IPTC Contact Us form.

Microsoft CEO Satya Nadella announcing the new provenance features to Microsoft's Generative AI tools at Microsoft's Build conference on 23 May 2023.
Microsoft CEO Satya Nadella announcing the new provenance features to Microsoft’s Generative AI tools at Microsoft’s Build conference on 23 May 2023.

Following the recent announcements of Google’s signalling of generative AI content and Midjourney and Shutterstock the day after, Microsoft has now announced that it will also be signalling the provenance of content created by Microsoft’s generative AI tools such as Bing Image Creator.

Microsoft’s efforts go one step beyond those of Google and Midjourney, because they are adding the image metadata in a way that can be verified using digital certificates. This means that not only is the signal added to the image metadata, but verifiable information is added on who added the metadata and when.

As TechCrunch puts it, “Using cryptographic methods, the capabilities, scheduled to roll out in the coming months, will mark and sign AI-generated content with metadata about the origin of the image or video.”

The system uses the specification created by the Coalition for Content Provenance and Authenticity. a joint project of Project Origin and the Content Authenticity Initiative.

The 1.3 version of the C2PA Specification specifies how a C2PA Action can be used to signal provenance of Generative AI content. This uses the IPTC DigitalSourceType vocabulary – the same vocabulary used by the Google and Midjourney implementations.

This follows IPTC’s guidance on how to use the DigitalSourceType property, published earlier this month.

Demo used in the Google blog post showing an example of how a Midjourney-generated image might look in a Google search results panel.
Mockup shown in the Google blog post depicting an example of how a Midjourney-generated image might look in a Google search results panel.

As a follow-up to yesterday’s news on Google using IPTC metadata to mark AI-generated content we are happy to announce that generative AI tools from Midjourney and Shutterstock will both be adopting the same guidelines.

According to a post on Google’s blog, Midjourney and Shutterstock will be using the same mechanism as Google – that is, using the IPTC “Digital Source Type” property to embed a marker that the content was created by a generative AI tool. Google will be detecting this metadata and using it to show a signal in search results that the content has been AI-generated.

A step towards implementing responsible practices for AI

We at IPTC are very excited to see this concrete implementation of our guidance on metadata for synthetic media.

We also see it as a real-world implementation of the guidelines on Responsible Practices for Synthetic Media from the Partnership on AI, and of the AI Ethical Guidelines for the Re-Use and Production of Visual Content from CEPIC, the alliance of European picture agencies. Both of these best practice guidelines emphasise the need for transparency in declaring content that was created using AI tools.

The phrase from the CEPIC transparency guidelines is “Inform users that the media or content is synthetic, through
labelling or cryptographic means, when the media created includes synthetic elements.”

The equivalent recommendation from the Partnership on AI guidelines is called indirect disclosure:

“Indirect disclosure is embedded and includes, but is not limited to, applying cryptographic provenance to synthetic outputs (such as the C2PA standard), applying traceable elements to training data and outputs, synthetic media file metadata, synthetic media pixel composition, and single-frame disclosure statements in videos”

Here is a simple, concrete way of implementing these disclosure / transparency guidelines using existing metadata standards.

Moving towards a provenance ecosystem

IPTC is also involved in efforts to embed transparency and provenance metadata in a way that can be protected using cryptography: C2PA, the Content Authenticity Initiative, and Project Origin.

C2PA provides a way of declaring the same “Digital Source Type” information in a more robust way, that can provide mechanisms to retrieve metadata even after the image was manipulated or after the metadata was stripped from the file.

However implementing C2PA technology is more complicated, and involves obtaining and managing digital certificates, among other things. Also C2PA technology has not been implemented by platforms or search engines on the display side.

In the short term, AI content creation systems can use this simple mechanism to add disclosure information to their content.

The IPTC is happy to help any other parties to implement these metadata signals: please contact IPTC via the Contact Us form.

Sundar Pichai, CEO of Google, extolling the benefits of image metadata at Google IO 2023.

At today’s Google I/O event keynote, Sundar Pichai, CEO of Google, explained how Google will be using embedded IPTC image metadata to signal visual media created by generative AI models.

“Moving forward, we are building our models to include watermarking and other techniques from the start,” Pichai said. “If you look at a synthetic image, it’s impressive how real it looks, so you can imagine how important this is going to be in the future.

“Metadata allows content creators to associate additional context with original files, giving you more information whenever you encounter an image. We’ll ensure every one of our AI-generated images has that metadata.”

The IPTC Photo Metadata section of Google Images’ guidance on metadata has been updated with new guidance on the DigitalSourceType field:

This follows the guidance on IPTC Photo Metadata for Generative AI that was recently published by IPTC.

“AI-Generated” label on Google Images

The above guidance hints at an “AI-generated label” to be used on Google Images in the future. Google recommends that all creators of AI-generated images use the IPTC Digital Source Type property to signal AI-generated content. While Google says that “you may not see the label in Google Images right away”, it appears that it will soon be available in Google Images search results.

AI-generated image of a cute robot sitting at a garden table sketching on a notepad.
Image created by Brendan Quinn using Bing Image Creator. This image file contains digitalsourcetype metadata which was added manually using exiftool.

The IPTC has updated its Photo Metadata User Guide to include some best practice guidelines for how to use embedded metadata to signal “synthetic media” content that was created by generative AI systems.

After our work in 2022 and the draft vocabulary to support synthetic media, the IPTC NewsCodes Working Group, Video Metadata Working Group and Photo Metadata Working Group worked together with several experts and organisations to come up with a definitive list of “digital source types” that includes various types of machine-generated content, or hybrid human and machine-generated media.

Since publishing the vocabulary, the work has been picked up by the Coalition for Content Provenance and Authenticity (C2PA) via the use of digitalSourceType in Actions and in the IPTC Photo and Video Metadata assertion. But the primary use case is for adding metadata to image and video files

Here is a direct link to the new section on Guidance for using Digital Source Type, including examples for how the various terms can be used to describe media created in different formats – audio, video, images and even text.

IPTC recommends that software creating images using trained AI algorithms uses the “Digital Source Type” value of “trainedAlgorithmicMedia” is added to the XMP data packet in generated image and video files. Alternatively, it may be included in a C2PA manifest as described in the IPTC assertion documentation in the C2PA specification.

The official URL for the full vocabulary is http://cv.iptc.org/newscodes/digitalsourcetype, so the complete URI for the recommended Trained Algorithmic Media term is http://cv.iptc.org/newscodes/digitalsourcetype/trainedAlgorithmicMedia.

Other terms in the vocabulary include:

Of course, the original digital source type values covering photographs taken on a digital camera or phone (digitalCapture), scan from negative (negativeFilm),  and images digitised from print (print) are also valid and may continue to be used. We have, however, retired the generic term “softwareImage” which is now deemed to be too generic. We recommend using one of the newer terms in its place.

If you are considering implementing this guidance in AI image generation software, we would love to hear about it so we can offer advice and tell others. Please contact us using the IPTC contact form.

Screenshot of the home page of Project Origin's web site, originproject.info.
Screenshot of the home page of Project Origin’s web site, originproject.info.

The IPTC is very happy to announce that it has joined the Steering Committee of Project Origin, one of the industry’s key initiatives to fight misinformation online through the use of tamper-evident metadata embedded in media files.

After working with Project Origin over a number of years, and co-hosting a series of workshops during 2022, the organisation formally invited the IPTC to join the Steering Committee.

Current Steering Committee members are Microsoft, the BBC and CBC / Radio Canada. The New York Times also participates in Steering Committee meetings through its Research & Development department. 

“We were very happy to co-host with Project Origin a productive series of webinars and workshops during 2022, introducing the details of C2PA technology to the news and media industry and discussing the remaining issues to drive wider adoption,” says Brendan Quinn, Managing Director of the IPTC.

C2PA, the Coalition for Content Provenance and Authenticity, took a set of requirements from both Project Origin and the Content Authenticity Initiative to create a technical means of associating media files with information on the origin and subsequent modifications of news stories and other media content.

“Project Origin’s aim is to take the ground-breaking technical specification created by C2PA and make it realistic and relevant for newsrooms around the world,” Quinn said. “This is very much in keeping with the IPTC’s mission to help media organisations to succeed by sharing best practices, creating open standards and facilitating collaboration between media and technology organisations.”

“The IPTC is a perfect partner for Project Origin as we work to connect newsrooms through secure metadata,” said Bruce MacCormack, the CBC/Radio-Canada Co-Lead. 

The announcement was made at the Trusted News Initiative event held in London today, 30 March 2023, where representatives of the BBC, AFP, Microsoft, Meta and many others gathered to discuss trust, misinformation and authenticity in news media.

Learn more about Project Origin by contacting us or viewing the video below: