IPTC has secured funding and the foundation for language and technical requirements for its EXTRA Project – a rules-based classification system, as reported at IPTC’s Summer Meeting 2016 by Stuart Myles, project lead and IPTC Chairman of the Board.
EXTRA is the EXTraction Rules Apparatus, a multilingual open-source platform for rules-based classification of news content. EXTRA will allow newsrooms to automatically annotate news content with high-quality metadata subjects using a predefined set of rules. IPTC was awarded a grant from the first round of Google’s Digital News Initiative Innovation Fund to build and freely distribute the initial version of EXTRA.
The EXTRA project team has delivered a road map for the project to Google’s Digital News Initiative, and are finalizing their plans for language requirements and rules, as well as technical requirements and licensing. IPTC will approach existing open source communities, linguists and programmers to facilitate development.
For easy adoption and consistency in the news industry, IPTC is creating rules for tagging documents with its industry standard Media Topics vocabulary, used widely by publishers. IPTC plans to provide example rules for at least two of the languages supported by Media Topics: Arabic, English, French, German and Spanish.
“For small to medium size publishers who are dissatisfied with hand-tagging their content or grappling with complex machine-learning tools, EXTRA is an open-source news classification engine that will let you easily apply rich metadata to breaking news content,” said Myles. “Unlike manual techniques, which can be slow and inconsistent, or traditional statistical methods, which aren’t suitable for breaking news, EXTRA’s rules-based classification will provide fast, consistent and relevant metadata to enrich search, advertising and content analytics.”
IPTC invites other parties to join the development of the EXTRA project. To get involved, contact Myles at firstname.lastname@example.org.
For developers: http://dev.iptc.org/Topic-EXTRA
Road map and project description: https://iptc.github.io/extra/