Categories
Archives

The IPTC has released a set of guidelines expressing best practices that publishers can follow to express the fact that they reserve data-mining rights on their copyrighted content.
All of the recommended techniques use currently available technologies. While the IPTC is advocating both for better acknowledgement in law of current techniques and for clearer, more stable and more scalable techniques for expressing data-mining opt-out, it is important to remember that opt-out can be expressed today, and that publishers shouldn’t wait for future standards to emerge if they want to control data mining rights on their copyrighted content.
Summary of the recommendations
For full detail, please view the PDF opt-out best practices guidelines. A summary of the guidance is provided below.
-
Display a plain-language, visible rights reservation declaration for all copyrighted content
To ensure no misrepresentation, ensure that copyright and rights reservations are plainly displayed to human readers. -
Display a rights reservation declaration in metadata tags on copyrighted content
Using schema.org, the IPTC Photo Metadata Standard and/or IPTC Video Metadata Hub, the same human-readable copyright notice and usage terms should be attached to media content where possible. -
Use Internet firewalls to block AI crawler bots from accessing your content
To ensure that crawlers that ignore robots.txt and other metadata cannot access your content, publishers can employ network-level protection to block crawler bots before they can reach your content. -
Instruct AI crawler bots using their user agent IDs in your robots.txt file
Seemingly the simplest method, this is actually one of the most difficult because each AI system’s crawler user-agent must be blocked separately. -
Implement a site-wide tdmrep.json file instructing bots which areas of the site can be used for Generative AI training
The Text and Data Mining Reservation Protocol can and should be used, in combination with other techniques. -
Use the trust.txt “datatrainingallowed” parameter to declare site-wide data mining restrictions or permissions
The trust.txt specification allows a publisher to declare a single, site-wide data mining reservation with a simple command:datatrainingallowed=no
. Sites that already use trust.txt should add this parameter if they want to block their entire site from all AI data training. -
Use the IPTC Photo Metadata Data Mining property on images and video files
Announced previously by the IPTC and developed in collaboration with the PLUS Coalition, the Data Mining property allows asset-level control of data mining preferences. An added benefit is that the opt-out preferences travel along with the content, for example when an image supplied by a picture agency is published by one of their customers. -
Use the CAWG Training and Data Mining Assertion in C2PA-signed images and video files
For C2PA-signed content, a special assertion can be used to indicate data mining preferences. -
Use in-page metadata to declare whether robots can archive or cache page content
HTML meta tags can be used to signal to AI crawlers what should be done with content in web pages. We give specific recommendations in the guidelines. -
Use TDMRep HTML meta tags where appropriate to implement TDM declarations on a per-page basis
The HTML meta tag version of TDMRep can be used to convey rights reservations for individual web pages. -
Send Robots Exclusion Protocol directives in HTTP headers where appropriate
X-Robots-Tag headers to HTTP responses can be used alongside or instead of in-page metadata. -
Use TDMRep HTTP headers where appropriate to implement TDM declarations on a per-URL basis
TDMRep also has an HTTP version, so we recommend that it is used if the top-level tdmrep.json file cannot easily convery asset-level opt-out restrictions.
Feedback and comments welcome
The IPTC welcomes feedback and comments on the guidance. We expect to create further iterations of this document in the future as best practices and opt-out technologies change.
Please use the IPTC Contact Us form to provide feedback or ideas on how we could improve the guidance in the future.