|
At our September 2003 meeting, Bruce Haddon discussed the ins
and outs of Unicode, the mother-of-all character set encodings.
We all know about character sets that run left-to-right, and
some that run right-to-left, and top-to-bottom, but how about
those such as Arabic, whose characters are in script and require
sequences of characters to line up as if the pen doesn't leave
the paper? Or how about sequences of left-to-right characters
(like numerals) that are embedded in right-to-left text?
Unicode covers it all, and it turns out that Unicode sits
at a very interesting intersection of computer science,
phonetics, sociology, and archaeology. For example, if you
were going to store text in ancient Egyptian hierolgyphs
on a computer, how would you do it? Unicode, of course.
Bruce's talk touched on this, and many of the subleties
of encoding all known languages in a single, common
character set description.
Bruce wrote a comprehensive book review
(HTML,
PDF)
of the The Unicode Standard Version 4.0 book by Addison Wesley that
is almost a summary of his talk; his presentation slides are
also available (HTML,
(PDF 10MB).
|
|