Technical corrigendum #1 to this standard is available.
1.1 Organization of the document
The structure of this document is as follows. Clauses 2-4 specify the terms, abbreviations, symbols and conventions used throughout the document. Clauses 5-11 contain definitions of the description tools standardized by 15938-3 grouped by the visual features they are associated with, starting with basic structures and containers in Clause 5, through color, texture, shape, motion, localization in Clause 10. Clause 11 contains the remaining, unclassified items.
Each description tool is described by the following subclauses:
- Syntax: Normative DDL specification of the Ds or DSs.
- Binary Syntax: Normative binary representation of the Ds or DSs.
- Semantic: Normative definition of the semantics of all the components of the corresponding D or DS.
1.2 Overview of Visual Description Tools
This part of ISO/IEC 15938 specifies tools for description of visual content, including still images, video and 3D models.
These tools are defined by their syntax in DDL and binary representations and semantics associated with the syntactic elements.
They enable description of the visual features of the visual material, such as color, texture, shape and motion, as well as localization of the described objects in the image or video sequence. An overview of the visual description tools is shown in Figure 1.
The basic structure description tools include five supporting tools of visual descriptions defined in clauses 6-11. They are categorized into two groups, descriptor containers and basic supporting tools. The former consists of three data types, Grid Layout providing efficient representations of visual features on grids, Time Series representing temporal arrays of several descriptions, and Multiple View describing a 3D object using several pictures captured from different view angles.
The latter contains two tools, Spatial2DCoordinateSystem used to specify the 2D coordinate system and Temporal Interpolation indicating the interpolation method between two samples on a time axis.
The remaining description tools, except for the Face Recognition descriptor, are associated with visual features and are grouped into five feature categories: Color, Texture, Shape, Motion and Localization.
The color description tools include four color descriptors to represent different aspects of color features: representative colors (Dominant Color), color distribution (Scalable Color), spatial distribution of colors (Color Layout and Color Structure). It also contains two supporting tools, Color Space and Color Quantization used in Dominant Color and an extension of Scalable Color to a group of frames or pictures (Go F Go P Color).
All the color descriptors can be extracted from arbitrarily shaped regions. The texture description tools facilitate browsing (Texture Browsing) and similarity retrieval (Homogeneous Texture and Edge Histogram) using the texture of a still or moving image region.
All the texture descriptors can be extracted from arbitrarily shaped regions. The shape description tools include two descriptors that characterize different shape features of a 2D object or region. The Region Shape descriptor captures the distribution of all pixels within a region and the Contour Shape descriptor characterizes the shape properties of the contour of an object.
The Shape3D descriptor provides an intrinsic shape characterization of 3D mesh models. The motion description tools include four descriptors that characterize various aspects of motion. The Camera Motion descriptor specifies a set of basic camera operations such as, for example, panning and tilting. The motion of a key point (pixel) from a moving object or region can be characterized by the Motion Trajectory descriptor.
The Parametric Motion descriptor characterizes an evolution of an arbitrarily shaped region over time in terms of a 2D geometric transformation. Finally, the Motion Activity descriptor captures the pace of the motion in the sequence, as perceived by the viewer. All motion descriptors except for Camera Motion can be extracted from arbitrarily shaped regions. The localization description tools can be used to indicate regions of interest in the spatial (Region Locator) and spatio-temporal (Spatio Temporal Locator) domains.