Categories
Archives
The IPTC has long worked with organisations on schemas for representing news and media content in all of their forms.
Back in early 2022, the IPTC started hosting the BBC Ontologies, a set of semantic web vocabularies created between 2012 and 2014 that can be used to describe news content, sports, TV and radio programmes and more. When the BBC stopped hosting them in late 2021, IPTC offered to host them on the BBC’s behalf.
“I’m very grateful to the IPTC for providing hosting for these ontologies while we perform some maintenance on their former home,” said Jeremy Tarling, Head of Content Metadata for the BBC, at the time. “For those BBC ontologies relevant to IPTC’s mission we would be keen to discuss longer-term arrangements for their hosting and ownership.”
Since then we have added the SNaP Ontology, a similar semantic web ontology created by the UK’s national news agency PA Media (known at the time as the Press Association). The SNaP ontology was similarly left without a home after the PA brand change.
“We are delighted for the SNaP Ontologies to find their home with the IPTC and its community,” said Steve Robinson, Director of Technology, PA Media Group. “It is our hope that these ontologies, complemented by other member contributions, will support the IPTC’s continued evolution of digital news standards.”
While neither of these standards are being actively developed, we at the IPTC think that they should be accessible to researchers, architects and developers in the future who may want to draw upon their concepts and vocabularies.
In fact, the BBC Sport Ontology is being used as one of the sources of inspiration for IPTC’s forthcoming sports data ontology, which will be announced soon.
With that in mind, the IPTC is willing to host other data schemas and specifications, especially those that are no longer hosted by their creators. If you have suggestions for resources that we should host in our third party area, please let us know.
The IPTC has released a comprehensive set of sports controlled vocabularies as a supplement to the SportsML 3.0 sports-data interchange format, which was released in July 2016. These controlled vocabularies (CVs) are in the format of NewsML-G2 NewsML-G2 Knowledge Items plus RDF variants and are available on IPTC’s CV server at http://cv.iptc.org/newscodes.
There are 113 CVs representing such core sports concerns such as event and player status, as well as specialized lists for 11 sports (basketball, soccer, rugby, American football, etc.) for statistics, player positions, scoring types, etc.
“The SportsML 3.0 standard’s semantic tech capabilities are improved greatly by the new controlled vocabularies,” said Trond Husø, system developer for Norwegian news agency NTB, one of the early adopters of SportsML 3.0. “Data can be easily imported, structured, and stored.”
“When building a sports app you spend a lot of prep time defining your terms and building a schema,” said Paul Kelly, news technology consultant and lead for IPTC’s Sports Content Working Group. “By using SportsML 3.0, there is no need to reinvent the wheel.”
“You consider things such as ‘What sort of results and stats do we need?’ and ‘How will our system handle interrupted matches?’ IPTC’s vocabularies can get you on your way because they properly define in a standard format almost all the terminology you would use in a sports application: Everything from “goals-scored” to a full enumeration of status codes for sports events,” Kelly said.
For the Summer 2016 Olympics, NTB acquired the rights to distribute the results and data from the International Olympics Committee’s Olympic Data Feed (ODF). NTB then transformed ODF to SportsML 3.0, and then to NITF3.2. “Using SportsML to structure the ODF’s data is a broad and comprehensive solution to approaching all sports and competitions worldwide,” said Husø, who is also a member of IPTC’s Sports Content Working Group. “SportsML is now a truly flexible and universal format that can incorporate multiple vendor codes and still provide a defense against vendor lock-in.”
“Terms defined in another format such as ODF can easily live beside SportsML terms – as well as any other proprietary format – so that an organisation can build a repository of knowledge of all the different sports-data formats,” Kelly said.
Another advantage to the new SportsML 3.0 standard is that if new concepts are added to a sports vocabulary or modified in it, the data model and the XML Schema don’t change; they stay stable. It also supports all languages for the concept labels.
“A great feature is that we can translate the definitions to Norwegian – without changing or breaking the vocabulary,” said Husø. “If we were to distribute internationally, our domestic receivers could look up the definitions in Norwegian, while the international ones could use the English term.”
IPTC’s SportsML 3.0 standard underwent a major upgrade from version 2.2, after 12 years of evolution since its first version. The new standard incorporates contribution from sports experts in 12 countries. Its flexible core covers all major sports and events in most news reporting.
Other early adopters of SportsML 3.0 include Univision and the British Press Association in its new multi-sport API. Its major features include:
- compliance with IPTC’s NewsML-G2 standard
- a flexible core that covers all major sports and events in most news reporting
- plugins for detailed stats in 10+ sports
- a more flexible tournament model
- schedules, scores, standing, statistics, etc.
- choices between specific and generic terms
- controlled vocabularies, semantic tech capabilities
- schema redesign
- many samples and tool support.
Tool support for SportsML 3.0 includes 45 samples from 11 different sports and events, including both classic and SportsML-G2 examples, and both generic and specific examples.
The vocabularies will be maintained by IPTC for future expansion; new sports and terms can be added.
For more information on SportsML 3.0:
SportsML 3.0 Standard, including Zip package
SportsML 3.0 Specification Documents
NewsML-G2 Standard
Contact: Trond Husø @trondhuso, Trond.Huso@ntb.no