Ben Humphreys

Computational linguistics researcher at Kyoto University, focussing on machine translation. Also learning Japanese, Korean, French and other badassery.
(日本語版)

October 28, 2011 at 1:17pm

"Why doesn't the Unicode Standard adopt a compositional model for encoding Han ideographs?" →

Found something interesting on the Unicode FAQ:

The compositional nature of the script makes it attractive to propose a compositional encoding model, such as can be used for Hangul. Such a mechanism would result in the savings of thousands of code points and relieve the IRG from the burden of having to examine potential candidates for encoding.

Unfortunately, there are some difficulties involved with a compositional model for Han.