NewsCodes Glossary

This is a short glossary of terms which are often used in the scope of NewsCodes. Terms are listed in lexical order.

Code:
A character sequence which forms a member of a controlled vocabulary. Each code represents a concept.

Concept:
Anything that one may wish to refer to, e.g. Diplomacy, Paris, the Euro, OECD, the Japanese language, the IMF, Oil, Madonna, Olympic Games. Thus concept here has a broader meaning than is usual. This is because we are dealing with the idea of Paris, rather than with Paris itself, the idea of Oil, rather than Oil itself, and so on. Concepts fall in two broad categories: named entity and generic (or abstract) concepts. A concept may be represented by one or more codes.

Controlled Vocabulary:
A set of code(s), managed by some authority (e.g. a person or an organisation), employing some mechanism (e.g. an XML Schema, a Web page, an RFC, or IPTC G2 KnowledgeItem) to maintain this set. Each code in a controlled vocabulary represents a concept.

Generic (or abstract) concept:
Any concept which does not represent a named entity but a generic topic like e.g. Diplomacy, Art, Science, Country Music, Forest, or Global Warming.

Globally Unique Identifier (GUID)
An identifier that is unique, unambiguous, and persistent. Being unique and unambiguous means that there is a 1:1 relationship between the identifier and the identified object. Being persistent means that the identifier never changes as time passes, and that it is never reused as an identifier for another concept even if the original concept disappears.

Knowledge Item (NewsML-G2 Knowledge Item):
This is an XML format of the IPTC to exchange one or more controlled vocabularies. The outer wrapper is a NewsML-G2 Knowledge Item instance and it delivers a set of concept elements in a conceptSet element as inner wrapper. Each concept delivers the NewsCode by the qcode attribute of the conceptId element, the concept’s name and definition is delivered by the correspondingly named elements.
Its functional sibling in NewsML 1.x is a TopicSet.

Metadata:
Data which asserts something about some other data.

Named entity:
A named entity may be a person, place, event, organization, product name, object name or any other news-related real life entity.

Ontology:
See Taxonomy.

QCode:
A special IPTC format to express the code of a concept which was introduced with NewsML-G2. Typical for the format is having a string, then a colon, and finally another string. As the NewsML-G2 Standard requires to have potentially long strings as globally unique identifiers the major goal of QCodes is to shorten them and to make the controlled vocabulary visible this code pertains to. The format of a QCode is in short: “short name for the controlled vocabulary”:”code of the concept” like e.g. subj:06011000

Taxonomy:
In a broad sense, taxonomy is the science of classification, but is often taken to mean a particular classification. In the context of the NewsCodes, a taxonomy is a collection of concept(s), with associated code(s). A taxonomy may support typed relationships between concepts. Such a taxonomy is sometimes known as an ontology or thesaurus.

Thesaurus:
See Taxonomy.

Topicset (NewsML 1.x Topicset):
This is an XML format of the IPTC to exchange controlled vocabularies. The outer wrapper is a NewsML 1.x instance and it delivers a set of Topic elements in a TopicSet as inner wrapper. Each Topic delivers the NewsCodes by the FormalName element and further the name of the concept, its definition as “Explanation”, and some more administrative attributes. Its function sibling in NewsML-G2 is a Knowledge Item.

Type (of a concept):
A concept type allows the logical grouping of all similar concept(s), regardless of the vocabulary the concepts belong to. Examples of concept type might be: Person, Organisation, Language, Business Sector, News Subject or Geography. A concept type is itself a concept and, as such, is represented by a code in a scheme.

Vocabulary:
A set of codes. Can be either controlled (see Controlled Vocabulary) or uncontrolled, that means terms are added and deleted at random.