generative AI Archives

Jun 30 2025 IPTC News in JSON Working Group releases new versions of ninjs

photo of a computer screen showing a portion of the ninjs 3.1 schema featuring the new Digital Source Type property — ninjs 3.1 schema featuring the new Digital Source Type property

The IPTC is excited to announce the latest updates to ninjs, our JSON-based standard for representing news content metadata. Version 3.1 is now available, along with updated versions 2.2 and 1.6 for those using earlier schemas.

These releases reflect IPTC’s ongoing commitment to supporting structured, machine-readable news content across a variety of technical and editorial workflows.

What is ninjs?

ninjs (News in JSON) is a flexible, developer-friendly format for describing news items in a structured way. It allows publishers, aggregators, and news tech providers to encode rich metadata about articles, images, videos, and more, using a clean JSON format that fits naturally into modern content pipelines.

What’s new in ninjs 3.1, 2.2 and 1.6?

The new releases add a new property for the IPTC Digital Source Type property, which was first used with the IPTC Photo Metadata Standard but now used across the industry to declare the source of media content, including content generated or manipulated by a Generative AI engine.

The new property (called digitalSourceType in 3.1 and digitalsourcetype in 2.2 and 1.6 to match the case conventions of each standard version) has the following properties:

Name: the name of the digital source type, such as “Created using Generative AI”
URI: the official identifier of the digital source type from the IPTC Digital Source Type vocabulary or another vocabulary, such as http://cv.iptc.org/newscodes/digitalsourcetype/trainedAlgorithmicMedia (the official ID for generative AI content)
Literal: an optional way to add new digital source types that are not part of a controlled vocabulary.

IPTC supports multiple versions of ninjs in parallel to ensure stability and continuity for publishers and platforms that depend on long-term schema support.

The new property is part of the general ninjs schema, and so can be used in the main body of a ninjs object to describe the main news item and can also be used in an “association” object which refers to an associated media item.

Access the schemas

All versions are publicly available on the IPTC website:

ninjs generator and user guide

The ninjs Generator tool has been updated to cover the latest versions. Fill in the form fields and see what that content looks like in ninjs format. You can switch between the schema versions to see how the schema changes between 1.6, 2.2 and 3.1.

The ninjs User Guide has also been updated to reflect the newly added property.

Why it matters

As the news industry becomes increasingly reliant on metadata for content distribution, discoverability, and rights management, ninjs provides a modern, extensible foundation that supports both human and machine workflows. It’s trusted by major news agencies, technology platforms, and AI developers alike.

Get involved

We welcome feedback from the community and encourage you to share how you’re using ninjs in your own products or platforms. If you would like to discuss ninjs, you can join the public mailing list at https://groups.io/g/iptc-ninjs.

If you’re interested in contributing to the development of IPTC standards, join us!

May 28 2025 IPTC publishes best-practice guidance on Generative AI opt-out for publishers

The Generative AI Opt-Out Best Practices describe how publishers can express opt-out preferences to AI engines.

The IPTC has released a set of guidelines expressing best practices that publishers can follow to express the fact that they reserve data-mining rights on their copyrighted content.

All of the recommended techniques use currently available technologies. While the IPTC is advocating both for better acknowledgement in law of current techniques and for clearer, more stable and more scalable techniques for expressing data-mining opt-out, it is important to remember that opt-out can be expressed today, and that publishers shouldn’t wait for future standards to emerge if they want to control data mining rights on their copyrighted content.

Summary of the recommendations

For full detail, please view the PDF opt-out best practices guidelines. A summary of the guidance is provided below.

Display a plain-language, visible rights reservation declaration for all copyrighted content
To ensure no misrepresentation, ensure that copyright and rights reservations are plainly displayed to human readers.
Display a rights reservation declaration in metadata tags on copyrighted content
Using schema.org, the IPTC Photo Metadata Standard and/or IPTC Video Metadata Hub, the same human-readable copyright notice and usage terms should be attached to media content where possible.
Use Internet firewalls to block AI crawler bots from accessing your content
To ensure that crawlers that ignore robots.txt and other metadata cannot access your content, publishers can employ network-level protection to block crawler bots before they can reach your content.
Instruct AI crawler bots using their user agent IDs in your robots.txt file
Seemingly the simplest method, this is actually one of the most difficult because each AI system’s crawler user-agent must be blocked separately.
Implement a site-wide tdmrep.json file instructing bots which areas of the site can be used for Generative AI training
The Text and Data Mining Reservation Protocol can and should be used, in combination with other techniques.
Use the trust.txt “datatrainingallowed” parameter to declare site-wide data mining restrictions or permissions
The trust.txt specification allows a publisher to declare a single, site-wide data mining reservation with a simple command: datatrainingallowed=no. Sites that already use trust.txt should add this parameter if they want to block their entire site from all AI data training.
Use the IPTC Photo Metadata Data Mining property on images and video files
Announced previously by the IPTC and developed in collaboration with the PLUS Coalition, the Data Mining property allows asset-level control of data mining preferences. An added benefit is that the opt-out preferences travel along with the content, for example when an image supplied by a picture agency is published by one of their customers.
Use the CAWG Training and Data Mining Assertion in C2PA-signed images and video files
For C2PA-signed content, a special assertion can be used to indicate data mining preferences.
Use in-page metadata to declare whether robots can archive or cache page content
HTML meta tags can be used to signal to AI crawlers what should be done with content in web pages. We give specific recommendations in the guidelines.
Use TDMRep HTML meta tags where appropriate to implement TDM declarations on a per-page basis
The HTML meta tag version of TDMRep can be used to convey rights reservations for individual web pages.
Send Robots Exclusion Protocol directives in HTTP headers where appropriate
X-Robots-Tag headers to HTTP responses can be used alongside or instead of in-page metadata.
Use TDMRep HTTP headers where appropriate to implement TDM declarations on a per-URL basis
TDMRep also has an HTTP version, so we recommend that it is used if the top-level tdmrep.json file cannot easily convery asset-level opt-out restrictions.

Feedback and comments welcome

The IPTC welcomes feedback and comments on the guidance. We expect to create further iterations of this document in the future as best practices and opt-out technologies change.

Please use the IPTC Contact Us form to provide feedback or ideas on how we could improve the guidance in the future.

Mar 17 2025 China and Spain introduce requirements on labelling of AI-generated content

China Daily’s article about the new Chinese guidelines.

The news outlet China Daily reported on Friday that China will require all AI-generated content to be labelled from September 1st, 2025.

China Daily reports:

Chinese authorities issued guidelines on Friday requiring labels on all artificial intelligence-generated content circulated online, aiming to combat the misuse of AI and the spread of false information.

The regulations, jointly issued by the Cyberspace Administration of China, the Ministry of Industry and Information Technology, the Ministry of Public Security, and the National Radio and Television Administration, will take effect on Sept 1.

A spokesperson for the Cyberspace Administration said the move aims to “put an end to the misuse of AI generative technologies and the spread of false information.”

According to China Daily, “[t]he guidelines stipulate that content generated or synthesized using AI technologies, including texts, images, audios, videos and virtual scenes, must be labeled both visibly and invisibly” (emphasis added by IPTC). This potentially means that IPTC or another form of embedded metadata must be used, in addition to a visible watermark.

“Content identification numbers”

The article goes on to state that “[t]he guideline requires that implicit labels be added to the metadata of generated content files. These labels should include details about the content’s attributes, the service provider’s name or code, and content identification numbers.”

It is not clear from this article which particular identifiers should be used. There is currently no globally-recognised mechanism to identify individual pieces of content by identification numbers, although IPTC Photo Metadata does allow for image identifiers to be included via the Digital Image GUID property and the Video Metadata Hub Video Identifier field, which is based on Dublin Core’s generic dc:identifier property.

IPTC Photo Metadata’s Digital Source Type property is the global standard for identifying AI-generated images and video files, being used by Meta, Apple, Pinterest, Google and others, and also being adopted by the C2PA specification for digitally-signed metadata embedded in media files.

According to the article, “Service providers that disseminate content online must verify that the metadata of the content files contain implicit AIGC labels, and that users have declared the content as AI-generated or synthesized. Prominent labels should also be added around the content to inform users.”

Spain’s equivalent legislation on labelling AI-generated content

This follows on from Spain’s legislation requiring labelling of AI-generated content, announced last week.

The Spanish proposal has been approved by the upper house of parliament but must still be approved by the lower house. The legislation will be enforced by the newly-created Spanish AI supervisory agency AESIA.

If companies do not comply with the proposed Spanish legislation, they could incur fines of up to 35 million euros ($38.2 million) or 7% of their global annual turnover.

Sep 18 2024 IPTC submits response to EU consultation on AI Act

The IPTC has responded to a multi-stakeholder consultation on the recently-agreed European Union Artificial Intelligence Act (EU AI Act).

Although the IPTC is officially based in the UK, many of our members and staff operate from the European Union, and of course all of our members’ content is available in the EU, so it is very important to us that the EU regulates Artificial Intelligence providers in a way that is fair to all parts of the ecosystem, including content rightsholders, AI providers, AI application developers and end users.

In particular, we drew the EU AI Office’s attention to the IPTC Photo Metadata Data Mining property, which enables rightsholders to inform web crawlers and AI training systems of the rightsholders’ agreement as to whether or not the content can be used as part of a training data set for building AI models.

The points made are the same as the ones that we made to the IETF/IAB Workshop consultation: that embedded data mining declarations should be part of the ecosystem of opt-outs, because robots.txt, W3C TDM, C2PA and other solutions are not sufficient for all use cases.

The full consultation text and all public responses will be published by the EU in due course via the consultation home page.

May 9 2024 IPTC Photo Metadata Conference 2024: Video recordings now available

Tuesday’s IPTC Photo Metadata Conference was a great success. With 12 speakers from the media and software industries and over 200 people registered, it continues to be the largest gathering of photo and image metadata experts globally.

Introduction and welcome, 20 years of IPTC Photo Metadata, Recent work on Photo Metadata at IPTC

We started off with David Riecks and Michael Steidl, co-leads of the IPTC Photo Metadata Working Group, giving an update on what the IPTC has been working on in the areas of photo metadata since the last conference in 2022, along with Brendan Quinn, IPTC Managing Director.

A lot has been happening, including Meta announcing support for IPTC metadata for Generative AI, launching the IPTC Media Provenance Committee and updating the IPTC Photo Metadata User Guide, including our guidance for how to tag Generative AI content with metadata and how to use the DigitalSourceType field.

Panel 1: AI and Image Authenticity

The first panel saw Leonard Rosenthol of Adobe, Lead of the C2PA Technical Working Group; Dennis Walker of Camera Bits, creators of Photo Mechanic; Dr. Neal Krawetz, Computer security specialist, forensic researcher, and founder of FotoForensics; and Bofu Chen, Founder & CTO of Numbers Protocol speak about image provenance and authenticity, covering the C2PA spec, the problems of fraudulent images, what it’s like to implement C2PA technology in existing software, and how blockchain-based systems could be built on top of C2PA to potentially extend its capabilities.

Session on Adobe’s Custom Metadata Panel

James Lockman, Group Manager, Digital Media Services at Adobe demonstrated the Custom Metadata Panel plugin for some Adobe tools (Bridge, Illustrator, Photoshop and Premiere Pro) that allows the full range of IPTC Photo Metadata Standard and IPTC Video Metadata Hub, or any other metadata schema, to be edited directly in Adobe’s interface.

Panel 2: AI-powered asset management

Speakers Nancy Wolff, Partner at Cowan, DeBaets, Abrahams & Sheppard, LLP; Serguei Fomine, Founder and CEO of IQPlug; Jeff Nova, Chief Executive Officer at Colorhythm and Mark Milstein, co-founder and Director of Business Development at vAIsual discussed the impact of AI on copyright, metadata and media asset management.

The full event recording is also available as a YouTube playlist.

Thanks to everyone for coming and especial thanks to our speakers. We’re already looking forward to next year!

Apr 29 2024 NewsML-G2 v2.34 released, including dataMining property to express AI processing rights

The IPTC News Architecture Working Group is happy to announce the release of NewsML-G2 version 2.34.

This version, approved at the IPTC Standards Committee Meeting at the New York Times offices on Wednesday 17th April 2024, contains one small change and one additional feature:

Change Request 218, increase nesting of <related> tags: this allows for <related> items to contain child <related> items, up to three levels of nesting. This can be applied to many NewsML-G2 elements:

pubHistory/published
QualRelPropType (used in itemClass, action)
schemeMeta
ConceptRelationshipsGroup (used in concept, event, Flex1PropType, Flex1RolePropType, FlexPersonPropType, FlexOrganisationPropType, FlexGeoAreaPropType, FlexPOIPropType, FlexPartyPropType, FlexLocationPropType)

Note that we chose not to allow for recursive nesting because this caused problems with some XML code generators and XML editors.

Change Request 219, add dataMining element to rightsinfo: In accordance with other IPTC standards such as the IPTC Photo Metadata Standard and Video Metadata Hub, we have now added a new element to the <rightsInfo> block to convey a content owner’s wishes in terms of data mining of the content. We recommend the use of the PLUS Vocabulary that is also recommended for the other IPTC standards: https://ns.useplus.org/LDF/ldf-XMPSpecification#DataMining

Here are some examples of its use:

Denying all Generative AI / Machine Learning training using this content:

<rightsInfo>
  <dataMining uri="http://ns.useplus.org/ldf/vocab/DMI-PROHIBITED-AIMLTRAINING"/>
</rightsInfo>

A simple text-based constraint:

<rightsInfo>
  <usageTerms>
    Data mining allowed for academic and research purposes only.
  </usageTerms>
  <dataMining uri="http://ns.useplus.org/ldf/vocab/DMI-PROHIBITED-SEECONSTRAINT" />
</rightsInfo>

A simple text based constraint, expressed using a QCode instead of a URI:

<rightsInfo>
  <usageTerms>
    Reprint rights excluded.
  </usageTerms>
  <dataMining qcode="plusvocab:DMI-PROHIBITED-SEECONSTRAINT" />
</rightsInfo>

A text-based constraint expressed in both English and French:

<rightsInfo>
  <usageTerms xml:lang="en">
    Reprint rights excluded.
  </usageTerms>
  <usageTerms xml:lang="fr">
    droits de réimpression exclus
  </usageTerms>
  <dataMining uri="http://ns.useplus.org/ldf/vocab/DMI-PROHIBITED-SEECONSTRAINT" />
</rightsInfo>

Using the “see embedded rights expression” constraint to express a complex machine-readable rights expression in RightsML:

<rightsInfo>
  <rightsExpressionXML langid="http://www.w3.org/ns/odrl/2/">
    <!-- RightsML goes here... -->
  </rightsExpressionXML>
  <dataMining uri="http://ns.useplus.org/ldf/vocab/DMI-PROHIBITED-SEEEMBEDDEDRIGHTSEXPR"/>>
</rightsInfo>

For more information, contact the IPTC News Architecture Working Group via the public NewsML-G2 mailing list.

Apr 9 2024 Announcing the 2024 IPTC Photo Metadata Conference

The 2024 IPTC Photo Metadata Conference takes place as a webinar on Tuesday 7th May from 1500 – 1800 UTC. Speakers hail from Adobe (makers of Photoshop), CameraBits (makers of PhotoMechanic), Numbers Protocol, Colorhythm, vAIsual and more.

First off, IPTC Photo Metadata Working Group co-leads, David Riecks and Michael Steidl, will give an overview of what has been happening in the world of photo metadata since our last Conference in November 2022, including IPTC’s work on metadata for AI labelling, “do not train” signals, provenance, diversity and accessibility.

Next, a panel session on AI and Image Authenticity: Bringing trust back to photography? discusses approaches to the problem of verifying trust and credibility for online images. The panel features C2PA lead architect Leonard Rosenthol (Adobe), Dennis Walker (Camera Bits), Neal Krawetz (FotoForensics) and Bofu Chen (Numbers Protocol).

Next, James Lockman of Adobe presents the Custom Metadata Panel, which is a plugin for Photoshop, Premiere Pro and Bridge that allows for any XMP-based metadata schema to be used – including IPTC Photo Metadata and IPTC Video Metadata Hub. James will give a demo and talk about future ideas for the tool.

Finally, a panel on AI-Powered Asset Management: Where does metadata fit in? discusses teh relevance of metadata in digital asset management systems in an age of AI. Speakers include Nancy Wolff (Cowan, DeBaets, Abrahams & Sheppard, LLP), Serguei Fomine (IQPlug), Jeff Nova (Colorhythm) and Mark Milstein (vAIsual).

The full agenda and links to register for the event are available at https://iptc.org/events/photo-metadata-conference-2024/

Registration is free and open to anyone who is interested.

See you there on Tuesday 7th May!

Feb 20 2024 Google reminds publishers and merchants not to strip metadata from images

Screenshot of an article from SearchEngineLand entitled "Google wants you to label AI-generated images used in Merchant Center", with the subtitle: "Don't remove embedded metadata tags such as trainedAlgorithmicMedia from such images," Google wrote. — The new guidance has received some press in the Search Engine Optimisation (SEO) world, including this post on SearchEngineLand.

Google has added Digital Source Type support to Google Merchant Center, enabling images created by generative AI engines to be flagged as such in Google’s products such as Google search, maps, YouTube and Google Shopping.

In a new support post, Google reminds merchants who wish their products to be listed in Google search results and other products that they should not strip embedded metadata, particularly the Digital Source Type field which can be used to signal that content was created by generative AI.

We at the IPTC fully endorse this position. We have been saying for years that website publishers should not strip metadata from images. This should also include tools for maintaining online product inventories, such as Magento and WooCommerce. We welcome contact from developers who wish to learn more about how they can preserve metadata in their images.

Here’s the full text of Google’s recommendation:

Feb 7 2024 Meta announces support for IPTC metadata in Generative AI images

A cute robot penguin painting a picture of itself using a canvas mounted on a a wooden easel, in the countryside. Generated by Imagine with Meta AI — An image generated by Imagine with Meta AI, using the prompt “A cute robot penguin painting a picture of itself using a canvas mounted on a wooden easel, in the countryside.” The image contains IPTC DigitalSourceType metadata showing that it was generated by AI.

Yesterday Nick Clegg, Meta’s President of Global Affairs, announced that Meta would be using IPTC embedded photo metadata to label AI-Generated Images on Facebook, Instagram and Threads.

Meta already uses the IPTC Photo Metadata Standard’s Digital Source Type property to label images generated by its platform. The image to the right was generated using Imagine with Meta AI, Meta’s image generation tool. Viewing the image’s metadata with the IPTC’s Photo Metadata Viewer tool shows that the Digital Source Type field is set to “trainedAlgorithmicMedia” as recommended in IPTC’s Guidance on metadata for AI-generated images.

Clegg said that “we do several things to make sure people know AI is involved, including putting visible markers that you can see on the images, and both invisible watermarks and metadata embedded within image files. Using both invisible watermarking and metadata in this way improves both the robustness of these invisible markers and helps other platforms identify them.”

This approach of both direct and indirect disclosure is in line with the Partnership on AI’s Best Practices on signalling the use of generative AI.

Also, Meta are building recognition of this metadata into their tools: “We’re building industry-leading tools that can identify invisible markers at scale – specifically, the “AI generated” information in the C2PA and IPTC technical standards – so we can label images from Google, OpenAI, Microsoft, Adobe, Midjourney, and Shutterstock as they implement their plans for adding metadata to images created by their tools.”

We have previously shared the news that Google, Microsoft, Adobe, Midjourney and Shutterstock will use IPTC metadata in their generated images, either directly in the IPTC Photo Metadata block or using the IPTC Digital Source Type vocabulary as part of a C2PA assertion. OpenAI has just announced that they have started using IPTC via C2PA metadata to signal the fact that images from DALL-E are generated by AI.

A call for platforms to stop stripping image metadata

We at the IPTC agree that this is a great step towards end-to-end support of indirect disclosure of AI-generated content.

As the Meta and OpenAI posts points out, it is possible to strip out both IPTC and C2PA metadata either intentionally or accidentally, so this is not a solution to all problems of content credibility.

Currently, one of the main ways metadata is stripped from images is when they are uploaded to Facebook or other social media platforms. So with this step, we hope that Meta’s platforms will stop stripping metadata from images when they are shared – not just the fields about generative AI, but also the fields regarding accessibility (alt text), copyright, creator’s rights and other information embedded in images by their creators.

Video next?

Meta’s post indicates that this type of metadata isn’t commonly used for video or audio files. We agree, but to be ahead of the curve, we have added Digital Source Type support to IPTC Video Metadata Hub so videos can be labelled in the same way.

We will be very happy to work with Meta and other platforms on making sure IPTC’s standards are implemented correctly in images, videos and other areas.

Nov 1 2023 Videos can be opted out from AI indexing using IPTC Video Metadata Hub version 1.5

"A reel of film unspooling and transforming into a stream of binary digits"
Made with Bing Image Creator. Powered by DALL-E. — “A reel of film unspooling and transforming into a stream of binary digits”
Made with Bing Image Creator. Powered by DALL-E.

Following the IPTC’s recent announcement that Rights holders can exclude images from generative AI with IPTC Photo Metadata Standard 2023.1 , the IPTC Video Metadata Working Group is very happy to announce that the same capability now exists for video, through IPTC Video Metadata Hub version 1.5.

The “Data Mining” property has been added to this new version of IPTC Video Metadata Hub, which was approved by the IPTC Standards Committee on October 4th, 2023. Because it uses the same XMP identifier as the Photo Metadata Standard property, the existing support in the latest versions of ExifTool will also work for video files.

Therefore, adding metadata to a video file that says it should be excluded from Generative AI indexing is as simple as running this command in a terminal window:

exiftool -XMP-plus:DataMining="Prohibited for Generative AI/ML training" example-video.mp4

(Please note that this will only work in ExifTool version 12.67 and above, i.e. any version of ExifTool released after September 19, 2023)

The possible values of the Data Mining property are listed below:

PLUS URI	Description (use exactly this text with ExifTool)
http://ns.useplus.org/ldf/vocab/DMI-UNSPECIFIED	Unspecified – no prohibition defined
http://ns.useplus.org/ldf/vocab/DMI-ALLOWED (Allowed)	Allowed
http://ns.useplus.org/ldf/vocab/DMI-PROHIBITED-AIMLTRAINING	Prohibited for AI/ML training
http://ns.useplus.org/ldf/vocab/DMI-PROHIBITED-GENAIMLTRAINING	Prohibited for Generative AI/ML training
http://ns.useplus.org/ldf/vocab/DMI-PROHIBITED-EXCEPTSEARCHENGINEINDEXING	Prohibited except for search engine indexing
http://ns.useplus.org/ldf/vocab/DMI-PROHIBITED	Prohibited
http://ns.useplus.org/ldf/vocab/DMI-PROHIBITED-SEECONSTRAINT	Prohibited, see plus:OtherConstraints
http://ns.useplus.org/ldf/vocab/DMI-PROHIBITED-SEEEMBEDDEDRIGHTSEXPR	Prohibited, see iptcExt:EmbdEncRightsExpr
http://ns.useplus.org/ldf/vocab/DMI-PROHIBITED-SEELINKEDRIGHTSEXPR	Prohibited, see iptcExt:LinkedEncRightsExpr

A corresponding new property “Other Constraints” has also been added to Video Metadata Hub v1.5. This property allows plain-text human-readable constraints to be placed on the video when using the “Prohibited, see plus:OtherConstraints” value of the Data Mining property.

The Video Metadata Hub User Guide and Video Metadata Hub Generator have also been updated to include the new Data Mining property added in version 1.5.

We look forward to seeing video tools (and particularly crawling engines for generative AI training systems) implement the new properties.

Please feel free to discuss the new version of Video Metadata Hub on the public iptc-videometadata discussion group, or contact IPTC via the Contact us form.

Categories

Archives