We were proud to be involved at last week’s Metadata Exchange for News interoperability demo organised by DPP (formerly known as the Digital Production Partnership).
DPP’s “Metadata Exchange for News” is an industry initiative aimed at making the news production process easier.
The DPP team looked around for existing standards on which to base their work, and when they found IPTC’s NewsML-G2, they realised that it exactly matched their requirements. NewsML-G2’s generic PlanningItem and NewsItem structure meant that it could easily be used to manage news production workflows with no customisation required.
We were treated to a demo of a full news production workflow in the DPP’s offices at ITV in London on February 6th.
A full news production workflow
As you can see from the diagram, the workflow involves these steps:
- An editor creates a planning record for a news item using Wolftech’s planning system, describing metadata for the planned story
- The system sends the planning item as NewsML-G2 to Sony’s XDCAM Air system which converts it to Sony’s proprietary planning metadata and sends it directly to a camera
- XDCAM Air retrieves the footage from the camera, links it to the planning metadata using the NewsML-G2 IDs, back into XDCAM Air which is then retrieved by some simple custom web services
- The web services send NewsML-G2 NewsItem metadata along with the MP4 video file to Ooyala’s Flex Media Platform via an Amazon Web Services S3 bucket
- Ooyala Flex Media Platform sends the media and metadata to the platforms that require it, in this case the Reuters Connect video browsing and distribution platform.
The NewsML-G2 integrations were built for the demo but the idea is that they will soon become standard features of the products involved. All parties reported that implementing NewsML-G2 was fast and fairly painless!
Thanks to all involved and special thanks to Abdul Hakim of DPP for leading the project and organising the demo day.
Look out for an IPTC Webinar on this topic soon!
Thanks to everyone who attended our first webinar on Thursday, with Brendan Quinn providing an introduction to IPTC, explaining what we do, where we have come from and where we are going.
For those who missed it, you can view it on demand by registering via this site.
Please let us know what you thought! Your feedback is always welcome, and we would particularly like to hear ideas for future webinars.
All feedback can be sent to email@example.com.
This report was presented by Stuart Myles, IPTC Chairman, at the IPTC Annual General Meeting in Toronto, Canada on October 17 2018.
IPTC has had a good year – the 53rd year for the organization!
We’ve updated key standards, including NewsML-G2, the Video Metadata Hub and the Media Topics, as well as launching RightsML 2.0, a significant upgrade in the way to express machine processable rights for news and media.
Of course, IPTC standards are a means, not an end. The value of the standards is the easier exchange, consumption and handling of news and media by organizations large and small around the world. So it is important that we continue to focus on making our standards straightforward to use and have them adopted as widely as possible. I think we are making progress on the usability front, such as moving away from zip’d PDFs towards actual HTML web pages for documentation of NewsML-G2. Over the last year, we’ve continued to work with other organizations – W3C, Europeana and MINDS – to develop standards, increase adoption – and, perhaps most importantly, to open up IPTC to other perspectives. And we have had a huge win in the recognition of key photo metadata by Google Images. But we clearly need to do more for both usability and adoption. During the course of this meeting, we’ve had some good discussion about what more we can do in both areas and I encourage all members to help spread the word about IPTC standards, and suggest ways we can accelerate adoption.
Of course, the nature of news and media continues to evolve. On the one hand, new forms of story telling are emerging, such as Augmented Reality and Virtual Reality. Equally, using data as the way to power stories continues to increase both data-driven stories and data-supported stories. By data-driven stories, I mean journalists reviewing large databases of information and creating stories based on the trends they find. By data-supported stories, I mean content creators using visually-interesting graphics to support their content. The automated production, curation and consumption of news and media is likely to increase for the foreseeable future, driven by both technological improvements and the seductive economics of replacing people with algorithms. And it is not only economics which are driving these changes and challenges, just as it is no longer fill-in-the-blank text stories being written by robot journalists. Synthetic media – such as “deep fakes” – are able to produce increasingly convincing photo, video and audio stories that are indistinguishable from “real” media. Inevitably, the existence and debunking of these fakes will be used to deny legitimate reporting, with the implications of continued erosion of trust in media. All of these trends – AR, VR, data-powered journalism and dealing with trust, credibility and misinformation – are topics which IPTC has discussed over the last few years, but we have not developed any tracks of work to try to address them. In part, this is because these are, by definition, outside of the areas that our member organizations traditionally deal in and are so quite difficult to tackle in terms of establishing standards.
However, even within the context of standards, IPTC is opening up to new forms of experimentation. As we heard on Monday, the joint project between IPTC and MINDS, to allow for the identification of audience and interest metadata, has lead to the introduction of structures within NewsML-G2 to support rapid prototyping and experimentation. I see this as a positive move, with great potential to accelerate the work we do and to help keep it lightweight and relevant.
Of course, IPTC has had significant changes of its own over the last year. We bid goodbye to Michael Steidl as our Managing Director of 15 years, and welcomed Brendan Quinn as our new Managing Director this summer. We’re grateful that we continue to benefit from Michael’s skills and experience, as he has remained the Chairman of the Photo and Video Working Groups. And I think that Brendan has made a great start in his new role in helping us keep the IPTC moving forward.
As part of the handover from Michael to Brendan, we decided to scan a lot of the old paper documents (link available to members only), including various types of IPTC newsletter, dating back to 1967, two years after the organization was founded. I thought I would look back to what IPTC was up to in the year 2000, the year I became a delegate to the IPTC, back when I worked for Dow Jones.
And there I am in the photo at the top of the page. Or, at least, the back of my head. Some things are quite reminiscent of this week’s meeting – the birth of NewsML, a focus on improved communications, cooperation with other organizations e.g. MPEG-7.
Then I thought I would look back on IPTC in 1968, the year I was born:
Some things were similar to today – such as a focus on fine technical details such as Alphabet Number 5 and a plan to go to Lisbon next year for a meeting. However, most of the focus in those days was mainly on lobbying against tariffs and satellite monopolies.
So I think it is fair to say that the IPTC has never been just a standards body. It is also, more broadly, a community of practice. We are a group of people from around the world who have a common interest in news and media technology. The process of sharing information and experiences with the group, through these face to face meetings and the online development of standards, means that the members of IPTC learn from each other, and so have an opportunity to develop professionally and personally. I hope you will agree that yesterday’s discussion of news search and classification was an excellent example of exchange of experiences, both good and bad, which can help many of us avoid problems and seize opportunities, and so accelerate our work.
I think it is helpful for us to recognize that IPTC is a community which continues to evolve, as the interests, goals and membership of the organization change. I’m confident that – working together – we can continue to reshape the IPTC to better meet the needs of the membership and to move us further forward in support of solving the business and editorial needs of the news and media industry. I look forward to working with all of you on addressing the challenges in 2019 and beyond.
This is the report of Day 3 of the IPTC Autumn 2018 Meeting in Toronto. See the report from Day 1 and the report from Day 2. All the presentations are available to IPTC members in the IPTC Members Only Zone.
Day 3 of IPTC Autumn Meetings always includes the Annual General Meeting, where all Voting Members can have their say in the future of the organisation. This time new Managing Director Brendan Quinn gave his first MD’s report, alongside Stuart Myles’ Chairman’s Report (which will be posted to the IPTC blog soon). Materials from the AGM are available to members in the IPTC Members Only Zone.
Rounding out the discussions for the three days, we had some broad-ranging and future-facing conversations regarding News Credibility projects, where Stuart Myles took us on a tour of the wide range of projects and initiatives around misinformation, the credibility of news and news sources, and the perceived problems of “fake news.” IPTC or IPTC members are helping out several organisations in their efforts in this area such as the w3C Credible Web community group and the Journalism Trust Initiative.
We also had a discussion on funding opportunities and potential IPTC projects, which is an internal discussion involving members only.
Lastly, speaking about the future, we had Michael Young from Civil Media speak to us about their plans to use blockchain technologies to power small newsrooms and fulfil their broad goal to “power sustainable journalism throughout the world.” A lot of focus has been on Civil’s Initial Coin Offering, which closed underfunded and will be returning investors’ money, but they have many other activities, including a suite of WordPress-based plugins allowing news providers to join the Civil ecosystem and pledge openness, fairness and transparency according to the Civil Foundation’s constitution. Mike explained how blockchain based voting and decisions mean that members can be rewarded for pointing out breaches of the constitution, and bad actors can be punished or even removed from the network entirely.
The event ended with a few of us attending the Canadian Journalism Foundation’s event with journalism pundits Vivian Schiller, Jeff Jarvis, Jay Rosen and Matthew Ingram, talking about misinformation and misuse of social media (video recording available via the above link), and ten of us went on a networking and team bonding trip to Niagara Falls and to a local winery on the Thursday.
Overall it was a great Autumn Meeting which set the scene and built the foundation for many more great IPTC meetings to come!
This is the report of Day 2 of the IPTC Autumn 2018 Meeting in Toronto. See the report from Day 1 and the report from Day 3. All the presentations are available to IPTC members in the IPTC Members Only Zone.
Day 2 of the IPTC Autumn 2018 Meeting in Toronto was a deep dive into search and classification. Many of our members are working hard to make their content accessible quickly and easily to their customers, and user expectations are higher than ever, so search is a key part of what they do.
First up we had Diego Ceccarelli from Bloomberg talking through their search architecture. Users of Bloomberg terminals have very high expectations that they will see stories straight away: They have 16m queries and 2m new stories and news items per day, with requirements for a median query response time of less than 200ms and for new items to be available in search results in less than 100ms. And as Diego says, “with huge flexibility comes huge complexity.” For example, because customers expect to see the freshest content straight away, the system has no caching at all!
To achieve this, the Bloomberg team use Apache Solr – in fact they have 3 members of staff dedicated to working on Solr full-time, and have contributed a huge amount of code back to the project, including their machine-learning-based “learning to rank” module which can be trained to rank a set of search results in a nuanced way. Bloomberg also worked with an agency to develop open source code used to monitor a stream of incoming stories against queries, used for alerting. Other topics Diego raised included clustering of search results, balancing relevance and timeliness, crowdsourcing data to train ranking systems, combining permissions into search results, and more – a great talk!
Our heads already reeling with all the information we learned from Bloomberg, we then heard from another search legend, Boerge Svingen, one of the founders of FAST Search in Norway and now Director of Engineering at the New York Times. He spoke about how NYT re-architected their search platform to be based around Apache Kafka, a “distributed log streaming” platform that keeps a record of every article ever published on the Times (since 1851!) and can replay all of them to feed a new search node in around half an hour. The platform is so successful that it is used to feed the “headless CMS” (see yesterday’s report) based on GraphQL which is used to render pages on nytimes.com for all types of devices. Boerge and his team use Protocol Buffers as their schema to keep everything light and fast. More information in Boerge’s slide deck, available to IPTC members.
Next up was Chad Schorr talking about search at Associated Press, discussing their Elastic implementation on Amazon Web Services. Using a devops approach based on “immutable infrastructure” meant that the architecture is now very solid and well-tested. Chad was very open and spoke about issues and problems AP had while they were implementing the project and we had a great discussion about how other organisations can avoid the same problems.
Then Robert Schmidt-Nia from DPA talked about their implementation of a content repository (in effect another “headless CMS”!) based on serialising NewsML-G2 into JSON using a serverless architecture based on Amazon Lambda functions, AWS S3 for storage, SQS queues and Elasticsearch. Robert told of how the entire project was built in three months with one and a half developers, and ended up with only 500 lines of code! It can now be used to provide services to DPA customers that could not be provided before, including subsets of content based on metadata such as all Olympics content.
Next, Solveig Vikene and Roger Bystrøm from Norway’s news agency NTB spoke about and gave a live demo of their new image archive search product. They demonstrated how photographers can pre-enter metadata so that they can send their photos to the wire a few seconds after taking them on the camera. Some functions like global metadata search and replace and a feature-rich query builder made their system look very impressive.
Veronika Zielinska from Associated Press spoke about AP’s rule-based text classification systems, showing the complexity of auto-tagging content (down to disambiguating between two US Republican Congressmen both called Mike Rogers!) and the subtlety of AP’s terms (distinguishing between “violent crime” events versus the social issue of “domestic violence”) therefore the necessity of manually creating, and maintaining, a rules-based system.
Stuart Myles then took us on a tour through AP’s automated image classification activities, looking at whether commercial tools are yet up to the task of classifying news content, the value of assembling good training sets but the difficulties in doing so, and the benefits of starting with a relatively small taxonomy that is easier for machine learning systems to understand.
Dave Compton talked us through Thomson Reuters Knowledge Items used by the OpenCalais classifier and how they use the PermID system to unify concepts across their databases of people, organisations, financial instruments and much more. Dave described how Knowledge Items are represented as NewsML-G2 Knowledge Items, and are mapped to Media Topics where possible.
On that subject, Jennifer Parrucci of the New York Times, and chair of the IPTC NewsCodes Working Group, gave an update on the latest activities of the group, including the ongoing Media Topic definitions review, adding new Media Topic terms after suggestions by the Swedish media industry, and work with schema.org team on mapping between schema.org and Media Topics terms.
As you can see, it was a very busy day!
- All XML Schemas plus full documentation (about 60 MB) from https://www.iptc.org/std/NewsML-G2/NewsML-G2_2.28.zip
- The same without XML Schema documentation in HTML (about 3 MB) from https://www.iptc.org/std/NewsML-G2/NewsML-G2_2.28-noXMLdocu.zip
- From the newsml-g2 repository on GitHub as a Release: https://github.com/iptc/newsml-g2
Please note that the XML examples have been temporarily removed as we have not yet updated them to 2.28. The pack will be updated when the examples are brought up to date.
Update on 6 November: examples have now been updated to 2.28 and are now available on the above links. Enjoy!
Details of the changes made in version 2.28 can be found on http://dev.iptc.org/G2-Approved-Changes.
In summary the changes are:
- Add new element derivedFromValue. Previously we could say that elements were derived from a concept using the derivedFrom element. But if a system creates a new property based on another existing property, such as a slugline, there was no way of representing it.
- Add a new element metadataCreator to itemMeta. This allows us to represent NewsML-G2 items that have had metadata created by a third-party person or system, without having to specify the creator on each metadata property individually.
The NewsML-G2 Implementation Guidelines are available at https://www.iptc.org/std/NewsML-G2/guidelines.
Note on Power and Core Conformance Levels
As a reminder of an important decision taken for NewsML-G2 version 2.25 which also applies to version 2.28: the Core Conformance Level will not be developed any further as all recent Change Requests were in fact aiming at features of the Power Conformance Level, changes of the Core Level were only a side effect.
The Core Conformance Level specifications of version 2.24 will stay available and valid. Find them at http://dev.iptc.org/G2-Standards#CCLspecs
This is the report of Day 1 of the IPTC Autumn 2018 Meeting in Toronto. See the report from Day 2 and the report from Day 3. All the presentations are available to IPTC members in the IPTC Members Only Zone.
This week we are in Toronto for the IPTC Autumn Meeting. Unfortunately the weather is not as warm as it was last week but we are still enjoying ourselves immensely and learning a lot from each other!
All presentations are available to members on the members-only event page.
After an introduction from Chair Stuart Myles, we heard an update from Michael Steidl, chair or the Video Metadata and Photo Metadata Working Groups. Michael updated us on work promoting the IPTC Video Metadata Hub standard, talking to manufacturers and software vendors at events like IBC in Amsterdam, and pulling together use cases and success stories from existing users of the standard.
On the IPTC Photo Metadata Standard, Michael shared news about the fact that Google Images now displays IPTC Photo Metadata project and the press we have received since that time. Also we are working on new technical features in the standard such as metadata for regions within images. We’re looking for use cases and requirements for storing metadata against regions, so if you have any input, please let Michael, or IPTC Managing Director Brendan Quinn, know!
Dave Compton of Refinitiv, formerly the Financial & Risk business of Thomson Reuters, chair of the NewsML-G2 Working Group, gave an update on recent progress and work towards NewsML-G2 version 2.28 which will be released soon. It will incorporate features for the requirements of auto-tagging systems and a new experimental namespace to be used for potential new updates to NewsML-G2 that aren’t yet ready to be added to the full specification.
The experimental extension to NewsML-G2 is already put in use by Gerald Innerwinkler of APA and Robert Schmidt-Nia of DPA who presented an update on a current project between IPTC and MINDS International looking at metadata for suggesting news stories to users based on psychological and emotional characteristics, plus properties like the likely timeliness for different types of user. Based on the Limbic Map concept from marketing theory, the new proposals are in testing right now.
Chair of the Sports Content Working Group, Johan Lindgren of TT in Sweden, presented an update on SportsML and the work on SportsJS which is nearing a final version now that JSON Schema is soon able to support some new properties that we need to be able to validate Sports content.
Stuart Myles appeared again in his role as chair of the Rights Working Group, updating us on RightsML and where we can take it in the future, including the potential to use RightsML as the basis of blockchain-based rights management systems.
Then we had a focus on “new-generation editorial systems” including a great presentation from Peter Marsh of new IPTC member NEWSCYCLE Solutions on the history and state of the art of content management systems from Tandem-based SII workstations in the 1980s, all the way through to the current wave of headless CMSs as illustrated by this project by The Economist.
Stephane Guerrilot of AFP finished day one presenting AFP’s new-generation system, Iris, which enables AFP customers and partners to search for stories, video and images.
Stay tuned for a report on Day Two!
Rights-related photo metadata can now be accessed directly in Google Image Search results, thanks to a joint effort by IPTC, Google and CEPIC, the Council of European Professional Informatics Societies.
Google, the IPTC and CEPIC worked together closely to determine the best way to incorporate metadata in Google search results of images to identify an image’s author and rights holder.
When users see an image in a Google search result, they can click the “image credits” link to see the image’s creator and credit information, read from IPTC embedded metadata. Over the coming weeks, copyright notice metadata will also be added.
“Embedded IPTC photo metadata has an essential role for photos posted on a website,” said Michael Steidl, lead of IPTC’s Photo Metadata and Video Metadata Working Groups. “These fields easily show people searching for images who its creator and copyright owner is. We encourage all parties who post images on the web to fill in these IPTC fields.”
Photo metadata is vital to guarding images’ licensing and copyright information online, and essential for managing digital assets.
The IPTC Photo Metadata Standard is the most widely used specification for describing photos, due to its universal acceptance among news outlets, photographers and photo agencies, libraries, museums and other related organisations. Most major photo software vendors support IPTC’s Photo Metadata Standard.
In a recent blog post, Google Image’s product manager Ashutosh Agarwal said this change will help promote “a healthy visual content ecosystem.”
Brendan Quinn, Managing Director of IPTC, said “we are looking forward to continuing our work with Google on IPTC Photo Metadata and other areas. We have a ton of ideas on how we can work together and are looking forward to using our standards to make the web more searchable and more accountable.”
IPTC has prepared a Quick Guide to IPTC Photo Metadata and Google Image Search to help users, developers and site administrators understand what they need to do to ensure that their metadata is shown in Google Image Search results.
For more detailed help with questions and implementation of IPTC’s Photo Metadata standard, see our IPTC Metadata User Guide.
Publishers, broadcasters, news and photo agencies and tool vendors are encouraged to join IPTC to work with us and Google on future projects. See the Participate pages for information on joining and working with IPTC.
“For years, the professional photography community has relied on IPTC metadata as the cornerstone of copyright protection,” said Andrew Fingerman, CEO of PhotoShelter, a provider of digital asset management tools for photographers and brands. “As assets are change hands, pass through organisations, and are published with greater frequency, IPTC metadata provides the basis for identifying the creator and rights owner. This major step by Google and IPTC will help everyone discover, identify, and trace copyright. We applaud this collaboration!”
For more information:
Join the public IPTC Photo Metadata groups.io Group
Join IPTC: Membership Information
Google Blog: Images Rights Metadata In Google Images
It is now just over 80 days since I took on the IPTC Managing Director role on the 1st of June, so I thought it would be a good time to reflect on my experience so far.
I actually started my IPTC “life” in April at the IPTC 2018 Spring Meeting in Athens, Greece – thankfully my previous project allowed me to go to Athens for a few days to meet everyone and see first hand how an IPTC face-to-face meeting works. Everyone was very friendly and welcoming, and I look forward to seeing many familiar and new faces at the Autumn 2018 meeting in Toronto – the hotel booking link will be released in the next few days so keep an eye out! And if you’re not an IPTC member but you’re interested in speaking or attending, see the call for participation for the IPTC Toronto meeting that we released recently and please do get in touch.
Having worked for many media companies over the years – building content management and syndication systems at Fairfax Media and the BBC, working on long and short term projects for various media organisations (Associated Press, TV3 Ireland, BBC Worldwide and Newsworks) and co-founding a startup in the industry (NewsFixed, since acquired by Paydesk) – I have a broad background in the technology side of the media industry. So it’s great to work with some of those organisations plus many more, helping to set the standards that bring the industry together.
Working with the previous MD Michael Steidl has been a breeze. His care and attention to detail meant that handover was very easy, and hopefully I can continue to uphold the high standards that he has set. I wish him the best of luck in enjoying his retirement, and am very thankful that he has offered to stay on as chair of the IPTC Photo Metadata and Video Metadata working groups!
When most people in the industry think of IPTC, they probably think of the technical standards or the controlled vocabularies – but I really see IPTC as a group of people from companies across the news and media industry who are working together to solve the sorts of problems that can only be solved by working together. I really hope that we can continue to work together to solve more problems in the future. If you have ideas, please get in touch – I can be reached at firstname.lastname@example.org.
I also plan to get out to many industry events and conferences to meet and learn from people from the industry who may or may not be IPTC members. Thanks to those I have already met at the IPTC Photo Metadata Conference co-located with CEPIC in Berlin in May, the Henry Stewart Digital Asset Management conference in London in June, and I’m looking forward to going to the IBC Conference in Amsterdam in September.
Taking on the role with IPTC has coincided with another change in my life: as the MD role is remote, my wife and I took the opportunity to move to my wife’s home town, the tech hotspot of Tallinn, Estonia. We’re having a great time getting settled here, and if you ever happen to be in Tallinn, please get in touch, I would love to show you around!
IPTC Managing Director
After a successful IPTC Spring Meeting in Athens and IPTC Photo Metadata Conference in Berlin (see Sarah Saunders’ write-up of the event), we’re currently hard at work planning the IPTC Autumn 2018 Meeting. This year’s Autumn meeting and AGM will be held from 15-17 October in Toronto, Canada.
We’re currently considering topics for the agenda and organisations to invite, so we’re inviting suggestions from members and the industry. Recent topics discussed at IPTC Meetings include 360-degree images, rights management, blockchain and the media industry, non-XML news standards such as NinJS and much more. We want to focus on how we can use technology and technical standards to make the news and media industry function more smoothly.
Registration opens in August. We look forward to seeing as many members as possible attending.
If you’re interested in joining IPTC so that you can attend, or if you’re interested in presenting some relevant work to some of the top technical specialists in the media industry, please submit a short abstract of your presentation topic to Brendan Quinn, IPTC Managing Director at email@example.com.