Radified Community Forums | |
http://radified.com/cgi-bin/yabb2/YaBB.pl
Rad Community Technical Discussion Boards (Computer Hardware + PC Software) >> PC Hardware + Software (except Cloning programs) >> UTF-16 character encoding http://radified.com/cgi-bin/yabb2/YaBB.pl?num=1315676938 Message started by Rad on Sep 10th, 2011 at 12:48pm |
Title: UTF-16 character encoding Post by Rad on Sep 10th, 2011 at 12:48pm
Trying to a handle on UTF-16 character encoding, which is used by JavaScript (ECMAScript).
http://en.wikipedia.org/wiki/UTF-16/UCS-2 From the 'Definitive Guide': Quote:
I know about 'surrogate pairs' and how they work. That's not the problem. From the Wiki link above: Quote:
and Quote:
My question is > where the the 16 bits? (Or maybe > ARE THERE 16 bits?) When I think of a 'bit', I think of "0 or 1" (zero or one): http://en.wikipedia.org/wiki/Bit Right? So, when I think of 16 bits, I think of something like (for example) > 0110100111001001 (.. should be 16 characters there). But maybe my thinking is incorrect. For example, these UTF-16 codepoints have 4 hexidecimal characters. So .. 2 to the 4th power = 16 (2x2x2x2). Is *that* what they mean my "16-bits" It would seems that way, but the book says this (regarding an explanation of surrogate pairs): uh, i cant find the character, but the text references the natural number 'e'. (It's sorts a forward leaning e.) Anyway, it says: Quote:
I realize this is a codepoint and not the UTF-16 encoding for this particular codepoint. But the two seem to parallel each other in other ways. Now, if my 2-to-the-4th-power theory were correct, the above (I believe) would be 32-bit .. no? (i.e. 2 to the 5th power). By way of comparison, a regular 16-bit codepoint (for the letter/number pi) is > 0x03c0 (one less digit/charatcer than the one given for natural e above). Anyway, I dont need to know *everything* about UTF-16 character encoding, but I would like to know > where are the 16 bits they are talking about in the values for UTF-16? |
Title: Re: UTF-16 character encoding Post by MudCrab on Sep 12th, 2011 at 8:59pm
I'm not an expert on bits and encoding, but I think the 16-bits is just the base value -- it uses 16-bits in the BMP. The surrogates use a total of 32-bits.
2 bytes (16-bits) = 1 BMP code point 4 bytes (32-bits) = non-BMP code point The 17-bit code point is possibly referring to 17-bits being required for the value, not the storage. Paul |
Title: Re: UTF-16 character encoding Post by Rad on Sep 15th, 2011 at 1:16pm
yeah, i got that much. Here's some further info:
Quote:
|
Title: Re: UTF-16 character encoding Post by Rad on Sep 15th, 2011 at 1:19pm
cont'd from above:
Quote:
|
Radified Community Forums » Powered by YaBB 2.4! YaBB © 2000-2009. All Rights Reserved. |