Preface
This is the second edition of CSA Standard CAN/CSA-Z234.4.1, Canadian Alphanumeric Ordering Standard. It supersedes the preliminary (1992) edition, Z243.4.1 known as Canadian Alphanumeric Ordering Standard for Character Sets of CSA Standard CAN/CSa-Z243.4. Standardization in this area is highly desirable because various coding schemes have made it difficult to obtain consistently ordered lists than can be produced across various system architectures. This Standard focuses on the repertoire of graphic characters using the Latin alphabet.
CSA wishes to thank Alain LaBonté of the Secrétariat du Conseil du trésor du Québec for his valuable contribution in the preparation and technical editing of the French and English versions of this Standard.
This Standard was prepared by the Subcommittee on Coded Character Sets, under the jurisdiction of the Technical Committee on Information Technology and the CSA Steering Committee on Information Technology, and was formally approved by the Technical Committee. It has been approved as a National Standard of Canada by the Standards Council of Canada.
Scope
1.1 Applicability
This Standard defines the alphanumeric lexical sequence for the English and French languages, corresponding to Canadian cultural expectations. It is intended for general-purpose sorting of alphanumeric strings using the character repertoire of CSA Standards CAN/CSA-Z243.4 and T500, wherever human intervention is involved or sorted results are presented to users.
1.2 Ordering vs Classification
Sorting is based on the rules of word ordering rather than on telephone directory classification. However, as telephone directory sorting depends heavily on the application of sorting and would utilize word ordering as a lower common denominator of sorting, telephone directory sorting could also use this Standard as a base.
1.3 Target Repertoires
The sort tables are defined for the complete repertoire of graphic characters defined in CSA Standards CAN/CSA-Z243.4 and T500. It should be noted that because there is a particular necessity to use a character composition technique to code the character repertoire of CSA Standard T500 for teletex applications, this Standard defines complementary sort tables for that Standard. In the complementary tables, precomposed characters that are equivalent to valid sequences from CSA Standard T500 are used. These precomposed characters are taken from ISO/IEC Standard 10646-1, where available. It is assumed that a prerequisite validation and conversion into precomposed characters has been done in cases where CSA Standard T500 coding has initially been used.
1.4 Coding Independence
Because different bit combinations are used to represent the same graphic characters, this Standard makes no reference to these bit combinations, but refers instead in all tables to the name of precomposed characters accompanied by their equivalent bit combination in the universal character set (ISO/IEC Standard 10646-1) as an ultimate reference. This could permit extension of the scope to other standard character sets or coding schemes adopted by other international organizations or specific equipment manufacturers.
1.5 Computerized String Sequences
This standard alphanumeric lexical sequence is the sequence that computerized, general-purpose, alphanumeric sort programs will produce. It is also the order that indexed files in computers or ordered lists in databases will follow when alphanumeric data is involved. All computerized comparisons in which the results are expected to be consistent with this Standard will be made in accordance with the order prescribed by this Standard.
1.6 Character Names
As names used in ISO/IEC Standard 10646-1 were changed, and as these names are not unique, because different languages are used as much in international Standards as in Canadian Standards, reference to universal character set bit combinations rather than to character names as unique identifiers appears to be a more stable and hence preferred strategy. The identifier of combinations has the form Uxxxx, where xxxx represents the hexadecimal value of the bit combination used for an equivalent character in ISO/IEC Standard 10646-1.