Crate unicode_width
source ·Expand description
Determine displayed width of char
and str
types according to
Unicode Standard Annex #11
and other portions of the Unicode standard.
See the Rules for determining width section
for the exact rules.
This crate is #![no_std]
.
use unicode_width::UnicodeWidthStr;
let teststr = "Hello, world!";
let width = UnicodeWidthStr::width(teststr);
println!("{}", teststr);
println!("The above string is {} columns wide.", width);
let width = teststr.width_cjk();
println!("The above string is {} columns wide (CJK).", width);
§Rules for determining width
This crate currently uses the following rules to determine the width of a character or string, in order of decreasing precedence. These may be tweaked in the future.
- Emoji presentation sequences have width 2.
- Outside of an East Asian context, text presentation sequences have width 1
if their base character:
- Has the
Emoji_Presentation
property, and - Is not in the Enclosed Ideographic Supplement block.
- Has the
- The sequence
"\r\n"
has width 1. - Lisu tone letter combinations consisting of a character in the range
'\u{A4F8}'..='\u{A4FB}'
followed by a character in the range'\u{A4FC}'..='\u{A4FD}'
have width 1. - In an East Asian context only,
<
,=
, or>
have width 2 when followed by'\u{0338}'
COMBINING LONG SOLIDUS OVERLAY. '\u{115F}'
HANGUL CHOSEONG FILLER has width 2.- The following have width 0:
- Characters
with the
Default_Ignorable_Code_Point
property. - Characters
with the
Grapheme_Extend
property. - The following 8 characters, all of which have NFD decompositions consisting of two
Grapheme_Extend
characters:'\u{0CC0}'
KANNADA VOWEL SIGN II,'\u{0CC7}'
KANNADA VOWEL SIGN EE,'\u{0CC8}'
KANNADA VOWEL SIGN AI,'\u{0CCA}'
KANNADA VOWEL SIGN O,'\u{0CCB}'
KANNADA VOWEL SIGN OO,'\u{1B3B}'
BALINESE VOWEL SIGN RA REPA TEDUNG,'\u{1B3D}'
BALINESE VOWEL SIGN LA LENGA TEDUNG, and'\u{1B43}'
BALINESE VOWEL SIGN PEPET TEDUNG.
- Characters
with a
Hangul_Syllable_Type
ofVowel_Jamo
(V
) orTrailing_Jamo
(T
). - The following
Prepended_Concatenation_Mark
s: '\u{A8FA}'
DEVANAGARI CARET.
- Characters
with the
- Characters
with an
East_Asian_Width
ofFullwidth
orWide
have width 2. - Characters fulfilling all of the following conditions have width 2 in an East Asian context, and width 1 otherwise:
- Has an
East_Asian_Width
ofAmbiguous
, or has a canonical decomposition to anAmbiguous
character followed by'\u{0338}'
COMBINING LONG SOLIDUS OVERLAY, or is'\u{0387}'
GREEK ANO TELEIA, and - Does not have a
General_Category
ofModifier_Symbol
, and - Does not have a
Script
ofLatin
,Greek
, orCyrillic
, or is a Roman numeral in the range'\u{2160}'..='\u{217F}'
.
- Has an
- All other characters have width 1.
§Canonical equivalence
Canonically equivalent strings are assigned the same width (CJK and non-CJK).
Constants§
- The version of Unicode that this version of unicode-width is based on.
Traits§
- Methods for determining displayed width of Unicode characters.
- Methods for determining displayed width of Unicode strings.