Categories
Archives
On Thursday, Google announced that it will be extending its usage of AI content labelled using the IPTC Digital Source Type vocabulary.
We have previously shared that Google uses IPTC Photo Metadata to signal AI-generated and AI-edited media, for example labelling images edited with the Magic Eraser tool on Pixel phones.
In a blog post published on Friday, John Fisher, Engineering Director for Google Photos and Google One posted that “[n]ow we’re taking it a step further, making this information visible alongside information like the file name, location and backup status in the Photos app.”
This is based on IPTC’s Digital Source Type vocabulary, which was updated a few weeks ago to include new terms such as “Multi-frame computational capture sampled from real life” and “Screen capture“.
Google already surfaces Digital Source Type information in search results via the “About this image” feature.
Also, the human-readable label for the term http://cv.iptc.org/newscodes/digitalsourcetype/trainedAlgorithmicMedia was clarified to be “Created using Generative AI” and similarly the label for the term http://cv.iptc.org/newscodes/digitalsourcetype/compositeWithTrainedAlgorithmicMedia was clarified to be “Edited with Generative AI.” These terms are both used by Google.
The IPTC has responded to a multi-stakeholder consultation on the recently-agreed European Union Artificial Intelligence Act (EU AI Act).
Although the IPTC is officially based in the UK, many of our members and staff operate from the European Union, and of course all of our members’ content is available in the EU, so it is very important to us that the EU regulates Artificial Intelligence providers in a way that is fair to all parts of the ecosystem, including content rightsholders, AI providers, AI application developers and end users.
In particular, we drew the EU AI Office’s attention to the IPTC Photo Metadata Data Mining property, which enables rightsholders to inform web crawlers and AI training systems of the rightsholders’ agreement as to whether or not the content can be used as part of a training data set for building AI models.
The points made are the same as the ones that we made to the IETF/IAB Workshop consultation: that embedded data mining declarations should be part of the ecosystem of opt-outs, because robots.txt, W3C TDM, C2PA and other solutions are not sufficient for all use cases.
The full consultation text and all public responses will be published by the EU in due course via the consultation home page.
The IPTC has signed a liaison agreement with the Japanese camera-makers organisation and creators of the Exif metadata standard, CIPA.
CIPA members include all of the major camera manufacturers, including Nikon, Canon, Sony, Panasonic, FUJIFILM and more. Several software vendors who work with imaging are also members, including Adobe, Apple and Microsoft.
CIPA publishes guidelines and standards for camera manufacturers and imaging software developers. The most important of these from an IPTC point of view is the Exif standard for photographic metadata.
The IPTC and CIPA have had an informal relationship for many years, staying in touch regularly regarding developments in the world of image metadata. Given that the two organisations manage two of the most important standards for embedding metadata into image and video files, it’s important that we keep each other up to date.
Now the relationship has been formalised, meaning that the organisations can request to observe each other’s meetings, exchange members-only information when needed, and share information about forthcoming developments and industry requirements for new work in the field of media metadata and in related areas.
The news has also been announced by CIPA. According to the news post on CIPA’s website, “CIPA has signed a liaison agreement regarding the development of technical standards for metadata attached to captured image with International Press Telecommunications Council (IPTC), the international organization consists of the world’s leading news agencies, publishers and industry vendors.”
The Internet Architecture Board (IAB), a Committee of the Internet Engineering Task Force (IETF) which decides on standards and protocols that are used to govern the workings of Internet infrastructure, is having a workshop in September on “AI Control”. Discussions will include whether one or more new IETF standards should be defined to govern how AI systems work with Internet content.
As part of the lead-up to this workshop, the IAB and IETF have put out a call for position papers on AI opt-out techniques.
Accordingly, the IPTC Photo Metadata Working Group, in association with partner organisation the PLUS Coalition, submitted a position paper discussing in particular the Data Mining property which was added to the IPTC Photo Metadata Standard last year.
In the paper, the IPTC and PLUS set out their position that data mining opt-out information embedded in the metadata of media files is an essential part of any opt-out solution.
Here is a relevant section of the IPTC submission:
We respectfully suggest that Robots.txt alone is not a viable solution. Robots.txt may allow for communication of rights information applicable to all image assets on a website, or within a web directory, or on specific web pages. However, it is not an efficient method for communicating rights information for individual image files published to a web platform or website; as rights information typically varies from image to image, and as the publication of images to websites is increasingly dynamic.
In addition, the use of robots.txt requires that each user agent must be blocked separately, repeating all exclusions for each AI engine crawler robot. As a result, agents can only be blocked retrospectively — after they have already indexed a site once. This requires that publishers must constantly check their server logs, to search for new user agents crawling their data, and to identify and block bad actors.
In contrast, embedding rights declaration metadata directly into image and video files provides media-specific rights information, protecting images and video resources whether the site/page structure is preserved by crawlers — or the image files are scraped and separated from the original page/site. The owner, distributor, or publisher of an image can embed a coded signal into each image file, allowing downstream systems to read the embedded XMP metadata and to use that information to sort/categorise images and to comply with applicable permissions, prohibitions and constraints.
IPTC, PLUS and XMP metadata standards have been widely adopted and are broadly supported by software developers, as well as in use by major news media, search engines, and publishers for exchanging images in a workflow as part of an “operational best practice.” For example, Google Images currently uses a number of the existing IPTC and PLUS properties to signal ownership, licensor contact info and copyright. For details see https://iptc.org/standards/photo-metadata/quick-guide-to-iptc-photo-metadata-and-google-images/
The paper in PDF format can be downloaded from the IPTC site.
Thanks to David Riecks, Margaret Warren, Michael Steidl from the IPTC Photo Metadata Working Group and to Jeff Sedlik from PLUS for their work on the paper.
Media consultant and IPTC Individual Member Denise Durand Kremer gave a presentation on IPTC Photo Metadata at the Seminário Fototeca Brasileira – the Brazilian Photo Library Seminar.
Over three days, more than 80 people got together to discuss the idea of a national photo library for Brazil. Denise was invited by the Collection and Market group Acervo e Mercado to talk about her experience as an iconographic researcher and about the IPTC standard for photographic metadata.
Photographers, teachers, researchers, archivists and public managers from institutions such as the Museu da Imagem e do Som de São Paulo – MIS (Museum of Image and Sound of São Paulo), Funarte, Instituto Moreira Salles, Zumví and Arquivo Afro Fotográfico participated in the event.
The meeting ended with a commitment from the Executive Secretary of the Ministry of Culture, to set up a working group to take the idea forward.
The seminar was recorded and will be available on SescTV.
Update, 6 August 2024: The video has now been released publicly. You can view Denise’s section below (in Brazilian Portuguese):
Thanks very much Denise for spreading the word about IPTC standards in Brazil!
Last week CEPIC, the “centre of the picture industry”, held its annual Congress in Juan-les-Pins in France. This was the second time that the event was held at the Palais du Congres in Juan-les-Pins near Antibes, which is proving to be a great venue for the Congress with many repeat visitors who also attended last year.
IPTC was well represented at the event, with Managing Director Brendan Quinn presenting on two panels and many other IPTC members involved in presentations, panels or simply attending the event. IPTC members either presenting or in attendance included Google, Shutterstock, Getty Images, ANSA, IMATAG, PA Media / Alamy, TT, dpa, Adobe, Xinhua, Activo, APA, European Pressphoto Agency EPA and possibly more that we have missed!
One of the highlights for us was the Provenance and AI panel moderated by Anna Dickson, who until recently was with Google but is now with another IPTC member, Shutterstock. Presenters on this panel included Katharina Familia Almonte, product manager at Google, Andy Parsons, Director of the Content Authenticity Initiative (although Brendan ended up presenting his slides due to it being 4.30am for Andy in New York at the time!); Mathieu Desoubeaux, CEO of IPTC startup member IMATAG, and Brendan Quinn, Managing Director of IPTC.
Brendan used the opportunity to introduce the IPTC Media Provenance Committee and the work that the IPTC is doing on creating a C2PA Trust List for media organisations. Brendan put out a call for other media organisations who may be interested in joining the next cohort of certificate holders who will be able to obtain a certificate stored on the trust list and use it to cryptographically sign their media content. The discussion went on to look at the issues around using both C2PA and watermarking technology to protect image content.
Later in the day was the panel “Where Law and Technology Meet.” Moderated by Lars Modie (CEPIC, previously of IBL Bildbyrå / TT in Swededn) with speakers: Serguei Fomin (IQPlug, IPTC CEPIC representative), Brendan Quinn (IPTC), Franck Bardol (University of Geneva), Nancy Wolff (partner at the intellectual property, media and entertainment law firm of Cowan, DeBaets, Abrahams & Sheppard, LLP and DMLA counsel) and Katherine Briggs of Australian agency Envato which was recently acquired by Shutterstock.
At this panel, Brendan outlined IPTC’s recent guidance on metadata for generative AI images. This includes the Digital Source Type property but also guidance on using the Creator, Contributor, and Data Mining properties to signal ownership and rights licensing information associated with images, particularly for engines that “scrape” web content to train generative AI models.
Many stimulating conversations always make the CEPIC Congress a valuable event for us to attend, and we are already looking forward to next year’s instalment.
Last week, the IPTC Spring Meeting 2024 brought media industry experts together for three days in New York City to discuss many topics including AI, archives and authenticity.
Hosted by both The New York Times and Associated Press, over 50 attendees from 14 countries participated in person, with another 30+ delegates attending online.
As usual, the IPTC Working Group leads presented a summary of their most recent work, including a new release of NewsML-G2 (version 2.34, which will be released very soon); forthcoming work on ninjs to support events, planned news coverage and live streamed video; updates to NewsCodes vocabularies; more evangelism of IPTC Sport Schema; and further work on Video Metadata Hub, the IPTC Photo Metadata Standard and our emerging framework for a simple way to express common rights statements using RightsML.
We were very happy to hear many IPTC member organisations presenting at the Spring Meeting. We heard from:
- Anna Dickson of recently-joined member Google talked about their work with IPTC in the past and discussed areas where we could collaborate in the future
- Aimee Rinehart of Associated Press presented AP’s recent report on the use of generative AI in local news
- Scott Yates of JournalList gave an update on the trust.txt protocol
- Andreas Mauczka, Chief Digital Officer at Austria Press Agency APA presented on APA’s framework for use of generative AI in their newsroom
- Drew Wanczowski of Progress Software gave a demonstration of how IPTC standards can be implemented in Progress’s tools such as Semaphore and MarkLogic
- Vincent Nibart and Geert Meulenbelt of new IPTC Startup Member Kairntech presented on their recent work with AFP on news categorisation using IPTC Media Topics and other vocabularies
- Mathieu Desoubeaux of IPTC Startup Member IMATAG presented their work, also with AFP, on watermarking images for tracking and metadata retrieval purposes
In addition we heard from guest speakers:
- Jim Duran of the Vanderbilt TV News Archive spoke about how they are using AI to catalog and tag their extensive archive of decades of broadcast news content
- John Levitt of Elvex spoke about their system which allows media organisations to present a common interface (web interface and developer API) to multiple generative AI models, including tracking, logging, cost monitoring, permissions and other governance features which are important to large organisations using AI models.
- Toshit Panigrahi, co-founder of TollBit spoke about their platform for “AI content licensing at scale”, allowing content owners to establish rules and monitoring around how their content should be licensed for both the training of AI models and for retrieval-augmented generation (RAG)-style on-demand content access by AI agents.
- We also heard an update about the TEMS – Trusted European Media Data Space project.
We were also lucky enough to take tours of the Associated Press Corporate Archive on Tuesday and the New York Times archive on Wednesday. Valierie Komor of AP Corporate Archives and Jeff Roth of The New York Times Archival Library (known to staffers as “the morgue”) both gave fascinating insights and stories about how both archives preserve the legacy of these historically important news organisations.
Brendan Quinn, speaking for Judy Parnall of the BBC, also presented an update of the recent work of C2PA and Project Origin and introduced the new IPTC Media Provenance Committee, dedicated to bringing C2PA technology to the news and media industry.
On behalf all attendees, we would like to thank The New York Times and Associated Press for hosting us, and especially to thank Jennifer Parrucci of The New York Times and Heather Edwards of The Associated Press for their hard work in coordinating use of their venues for our meeting.
The next IPTC Member Meeting will be the 2024 Autumn Meeting, which will be held online from Monday September 30th to Wednesday October 2nd, and will include the 2024 IPTC Annual General Meeting. The Spring Meeting 2025 will be held in Western Europe at a location still to be determined.
The 2024 IPTC Photo Metadata Conference takes place as a webinar on Tuesday 7th May from 1500 – 1800 UTC. Speakers hail from Adobe (makers of Photoshop), CameraBits (makers of PhotoMechanic), Numbers Protocol, Colorhythm, vAIsual and more.
First off, IPTC Photo Metadata Working Group co-leads, David Riecks and Michael Steidl, will give an overview of what has been happening in the world of photo metadata since our last Conference in November 2022, including IPTC’s work on metadata for AI labelling, “do not train” signals, provenance, diversity and accessibility.
Next, a panel session on AI and Image Authenticity: Bringing trust back to photography? discusses approaches to the problem of verifying trust and credibility for online images. The panel features C2PA lead architect Leonard Rosenthol (Adobe), Dennis Walker (Camera Bits), Neal Krawetz (FotoForensics) and Bofu Chen (Numbers Protocol).
Next, James Lockman of Adobe presents the Custom Metadata Panel, which is a plugin for Photoshop, Premiere Pro and Bridge that allows for any XMP-based metadata schema to be used – including IPTC Photo Metadata and IPTC Video Metadata Hub. James will give a demo and talk about future ideas for the tool.
Finally, a panel on AI-Powered Asset Management: Where does metadata fit in? discusses teh relevance of metadata in digital asset management systems in an age of AI. Speakers include Nancy Wolff (Cowan, DeBaets, Abrahams & Sheppard, LLP), Serguei Fomine (IQPlug), Jeff Nova (Colorhythm) and Mark Milstein (vAIsual).
The full agenda and links to register for the event are available at https://iptc.org/events/photo-metadata-conference-2024/
Registration is free and open to anyone who is interested.
See you there on Tuesday 7th May!
The IPTC Photo Metadata Working Group has updated the IPTC Photo Metadata User Guide, including guidance for accessibility and for tagging AI-generated images with metadata.
The updates to the User Guide are across several areas:
- A guide to using the accessibility fields added in IPTC Photo Metadata Standard version 2021.1 (Alt Text (Accessibility) and Extended Description (Accessibility) has been added
-
A new section with guidance for applying metadata to AI-generated images has been added
-
Guides for new fields added: Event Identifier, Product/Identifier, Contributor, Data Mining
-
The Metadata Usage Examples section has been updated to reflect some of the recently-added fields
-
The guidance on fields and topics has generally been reviewed and updated
Please let us know if you spot any other areas of the user guide that should be updated or if you have suggestions for more guidance that we could give.
Google has added Digital Source Type support to Google Merchant Center, enabling images created by generative AI engines to be flagged as such in Google’s products such as Google search, maps, YouTube and Google Shopping.
In a new support post, Google reminds merchants who wish their products to be listed in Google search results and other products that they should not strip embedded metadata, particularly the Digital Source Type field which can be used to signal that content was created by generative AI.
We at the IPTC fully endorse this position. We have been saying for years that website publishers should not strip metadata from images. This should also include tools for maintaining online product inventories, such as Magento and WooCommerce. We welcome contact from developers who wish to learn more about how they can preserve metadata in their images.
Here’s the full text of Google’s recommendation: