|
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465 |
- ### Javascript porting of Markus Kuhn's wcwidth() implementation
-
- The following explanation comes from the original C implementation:
-
- This is an implementation of wcwidth() and wcswidth() (defined in
- IEEE Std 1002.1-2001) for Unicode.
-
- http://www.opengroup.org/onlinepubs/007904975/functions/wcwidth.html
- http://www.opengroup.org/onlinepubs/007904975/functions/wcswidth.html
-
- In fixed-width output devices, Latin characters all occupy a single
- "cell" position of equal width, whereas ideographic CJK characters
- occupy two such cells. Interoperability between terminal-line
- applications and (teletype-style) character terminals using the
- UTF-8 encoding requires agreement on which character should advance
- the cursor by how many cell positions. No established formal
- standards exist at present on which Unicode character shall occupy
- how many cell positions on character terminals. These routines are
- a first attempt of defining such behavior based on simple rules
- applied to data provided by the Unicode Consortium.
-
- For some graphical characters, the Unicode standard explicitly
- defines a character-cell width via the definition of the East Asian
- FullWidth (F), Wide (W), Half-width (H), and Narrow (Na) classes.
- In all these cases, there is no ambiguity about which width a
- terminal shall use. For characters in the East Asian Ambiguous (A)
- class, the width choice depends purely on a preference of backward
- compatibility with either historic CJK or Western practice.
- Choosing single-width for these characters is easy to justify as
- the appropriate long-term solution, as the CJK practice of
- displaying these characters as double-width comes from historic
- implementation simplicity (8-bit encoded characters were displayed
- single-width and 16-bit ones double-width, even for Greek,
- Cyrillic, etc.) and not any typographic considerations.
-
- Much less clear is the choice of width for the Not East Asian
- (Neutral) class. Existing practice does not dictate a width for any
- of these characters. It would nevertheless make sense
- typographically to allocate two character cells to characters such
- as for instance EM SPACE or VOLUME INTEGRAL, which cannot be
- represented adequately with a single-width glyph. The following
- routines at present merely assign a single-cell width to all
- neutral characters, in the interest of simplicity. This is not
- entirely satisfactory and should be reconsidered before
- establishing a formal standard in this area. At the moment, the
- decision which Not East Asian (Neutral) characters should be
- represented by double-width glyphs cannot yet be answered by
- applying a simple rule from the Unicode database content. Setting
- up a proper standard for the behavior of UTF-8 character terminals
- will require a careful analysis not only of each Unicode character,
- but also of each presentation form, something the author of these
- routines has avoided to do so far.
-
- http://www.unicode.org/unicode/reports/tr11/
-
- Markus Kuhn -- 2007-05-26 (Unicode 5.0)
-
- Permission to use, copy, modify, and distribute this software
- for any purpose and without fee is hereby granted. The author
- disclaims all warranties with regard to this software.
-
- Latest version: http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c
-
-
-
|