Day 3 of the Lisbon meeting was all about metadata and controlled vocabularies, rights, and a look to the future of IPTC’s work plan.
We started with an update from Jennifer Parrucci, Senior Taxonomist at New York Times and lead of the IPTC NewsCodes Working Group, who gave an update of the group’s activities over the past six months. We have been focussing on updating our core subject taxonomy Media Topics, including updates to term labels and definitions, and also integrating and updating mappings to Wikidata entities that were kindly provided by Thad Guidry from the schema.org community.
Integrating Wikidata mappings was an interesting challenge as we didn’t always have good mappings, for example for “arts, culture, entertainment and media” there is no Wikidata entity that is broad enough to encompass all of those terms. But for the leaves of our tree, most terms had mappings, and for those that didn’t we will be suggesting new terms in Wikidata to accommodate them. We will also look at updating the mappings from Wikidata back to Media Topics now that we have updated the mappings in the other direction. Brendan Quinn presented some new tools used for managing NewsCodes internally, plus a new web tree browser view of Media Topics which will be launched very soon.
Translating Media Topics is another hot issue, with a recent contribution from the Swedish media that is now available as a Swedish language version of Media Topics. We have made it easier to find the language translations in the NewsCodes browser, and have also added some new terms that were suggested by the Swedish media consortium that will be using the new Swedish translation of Media Topics as their categorisation system for sharing content in the future. We realise that nearly ten years after moving from SubjectCodes to Media Topics as the standard IPTC subject classification, we still don’t support as many languages in Media Topics as we do in SubjectCodes so we want to make it as easy as possible to perform translations. Our discussion was based on the useful idea that anything with an existing translation in SubjectCodes can be directly taken into a Media Topics translation, and we can use the SubjectCode and Wikidata mappings to extract suggested term to get a translation team started. We have interest in creating Media Topics translations in Portuguese (for both Portugal and Brazil) and Chinese. If you are interested in helping with translations, please let us know.
Johan Lindgren from TT in Sweden spoke about the project that led to the Swedish translations and also discussed how they are approaching handling entities (names and organisations). This led to a wider discussion led by Stuart Myles of how to handle lists of entities and whether IPTC should be working on a standard or a best practice document in that area. We also discussed the idea of a taxonomy for describing images in a stylistic way (such as “happy”, “blue”, or “outdoors”) as opposed to describing the content. Such a standardised controlled vocabulary could be useful to image libraries and AI classification engines. This is an area of active work for us and more information will be available in the coming months. If you want to help, talk to us!
Invited guest Carlos Amaral from local company Priberam demonstrated their text mining and visualisation system created in partnership with Deutsche Welle and other broadcasters for use in browsing stories according to subject, image, extracted entities and keywords.
Stéphane Guérillot from AFP presented his new API for retrieving news content, which led to more discussion of whether IPTC should be standardising an API that could be used by multiple news providers to share their content.
Michael Steidl spoke on RightsML and Blaise Galinier from BBC talked about their current project looking at viewing news content based on rights. Two key insights from Blaise’s talk: Firstly, any demonstration of what is or isn’t usable is always based on the particular user and the context in which they want to use a piece of media. Also, it’s not enough to show a journalist what they can and can’t use; they need to know why a piece of content is “red” “green” or “amber”.
Everyone had a great time at this year’s 2019 Spring Meeting, we’re already planning the next one in Ljubljana, Slovenia in October. Members: please save the dates 14 – 16 October 2019. If you’re not a member but you would like to present at the meeting, please get in touch!