±z¦³·s«H

 
¤¤¤å¦ò¨å³y¦r°Q½× FAQ
#1
Post Gateway
µo«H¯¸: ¥Ñ ·à¤l§q¯¸ ¦¬«H (cctwin.ee.ntu.edu.tw , «H°Ï: BudaTech)
~---------- Forwarded message ----------
Date: Sat, 2 Sep 1995 13:29:57 +0800 (CST)
From: David Chiou <b83050@cctwin.ee.ntu.edu.tw>
Subject: Chinese Characters FAQ (about Buddhism)

¥H¤U¬Oªñ¨Ó¦Ü¦U³B·j¶°¨Óªº¤¤¤å¤º½X¬ÛÃö¤å¥ó¤¤¡A¤ñ¸û­«­nªº¡C
¥Ø«e¦ò¨åªº¤º½X¿ï¥Î¥H¤Î³y¦r°ÝÃD¡A¬O¦ò¨å¿é¤Jªº²~ÀV¡A¥H¤U
¸ê°T¨Ñ¦U¦ì¾Çªø°Ñ¦Ò¡C

ps. ­Y¦¹ mail alias ªº¾Çªø¡A¦³¦b¦x°|¤u§@©Î¬O¹ï©ó¬ÛÃö
    ¤¤¤å¿é¤Jªº°T²ß¡]¤º½X¡B³y¦rµ¥¡^«Ü¦³¿³½ìªº¤H¡A
    ½Ð¦^¨ç§iª¾¥½¾Ç¤@Án¡A¥H±N±z¥[¦b¦ò¨å¿é¤Jªº¦ò±Ð¾÷ºc mail alias ¤¤¡C
    ¦³¨ÇÃö©ó¤¤¤å¤º½Xªº§Þ³N©Ê°ÝÃD¡A±N¤£·|¦b¥Ø«eªº mail alias ¤ºµo§G¡C
    ¡]¥Ø«e¥u¦³ corbon copy ¦Ü¥x¤j¦ò¾Ç¬ã¨s¤¤¤ß¡B­»¥ú¦x
      ¦Û­lªk®v¡B¹AÁI¦xªG¥úªk®vµ¥´X¦ìªk®v¡A¤Î´X¦ì¯S§O¼ö¤ßªº¾Çªøªº±b¸¹¦Ó¤w¡C¡^


¥H¤U§Y¬O¦ò¨å¬ÛÃö¤¤¤å¤º½Xªº­«­n FAQ:
¡]¥½¾Ç¤W¦¸´¿Âà¶K¼Æ¤Q«Ê¬ÛÃöªº«H¥óµ¹¥H¤W¦ò±Ð¾÷ºc¡A
  ´£¨Ñ§@¬°°Ñ¦Ò¡C¦pªG¦³¾Çªø¹ï¦¹¯S§O¦³¿³½ìªº¸Ü¡A
  ¥i¥H¦V¥½¾Ç¯Á¨ú§ó¸Ô²Óªº¤å¥ó¡A©Îª½±µ¥[¤J¦ò±Ð¾÷ºc
  ªº¦W³æ¤¤¡C¡^


=========================================================================
Date: Sat, 13 May 1995 10:07:34 +0800
From: Shann Wei-Chang <shann@math.ncu.edu.tw>
§@ªÌ²¤¶¡G¤¤¥¡¼Æ¾Ç¨t³æºû¹ü±Ð±Â¡A¹ï©ó°ê¾Ç·¥¦³¿³½ì¡A¹ï©ó UNIX ¨t²Î¥ç«D±`¼ô¡A
          °Ñ»Pºô¸ô¤W¤º½Xªº°Q½×¤w¦h¦~¡C
Subject: internal code
 
¤j­è,
 
¤è¤~Ū¤F§Aªº report, ¦³Ãö¦ò¨å¿é¤J¸I¨ìªº¨u¨£¦r°ÝÃD.  §Aª¾¹D§Ú¦b CCNET ©M
CHPOEM ªº mailing list ¤W«Ü¤[¤F, §Ú­Ì±`±`°Q½×³o¤@Ãþªº°ÝÃD.  Ãö©ó¥¦ªº
¸Ñ¨M¤è®×, ¨ä¹ê¬O¨S¦³¦@ÃѪº©w®×, ¦Ó¥B§Ú¦Û¤vªº·Qªk¤]ÀH®É¶¡§ïÅÜ (¤£ª¾¬O¤£¬O
¶VÅܶV¦¨¼ô´N¦³«Ý®É¶¡¦ÒÅç¤F).
 
Åý§Ú§i¶D§A§Ú²{¦bªº·Qªk, ¥H¸ê°Ñ¦Ò.  ²Ä¤@, §Ú¤£³ßÅw Big-5 ©M·íªì³]­p¥¦ªº
¨º¤@À°¤H, ³o¬O¨å«¬ªº¦H¹ôÅX¨}¹ôªº¨Ò¤l.  ¦ý¬O, ÀHµÛ¹ï¨Æ¹êªº»{ª¾»P§´¨ó
 (³oÀ³¸Ó¬O»P¦~ÄÖ¦³Ãö), §Ú¶}©l©Ó»{, ¥ô¦ó·Q­n´¶¹M¬y¶Çªº¤¤¤å¹q¤lÀÉ®×,
 ¥²¶·»P Big-5 ¬Û®e; ª½±µ¬Û®e, ¤ð¶·Âà½X©Î¯S®í³B²z.
 
­n output ¯S®í¦r¤ñ input ²³æ, (input for search, for instance).  ¦ý¬O,
¤@½g¹q¤l¤å¥ó³q±`¥u¦³¦r½X, ¦Ó¤£ªþ±a¦r«¬ (glyph, the bitmap binary file 
or in other formats).  ¦pªG¤å¥ó¬O©ñ¦bºÏ¤ù©Î¥úºÐ¤W¬y³q, ³o­Ó°ÝÃD¤ñ¸û¤p,
¦ý¬O§Ú­ÌÁ`§Æ±æ¦P¼Ëªº¤å¥ó, À³¸Ó¯à¦b·¥¤ÖªºÅܰʤU©ñ¨ìºô¸ô¤W¬y¶Ç.  ³o®É­Ô,
¤å¥ó»P¾\Ū¾¹´N¬O¨â½X¤l¨Æ.  ³o¬O³Ì»Ý­nªá¤O®ðªº¦a¤è.
 
§Ú¥Ø«eªº·Qªk¬O, °ò¥»¤W¨Ï¥Î Big-5 ½X, ¸I¨ì¨u¥Î¦r, ¥Î Escape sequence
¹j¶}, ´N¹³®ü¥~¯d¾Ç¥Í±`¥Îªº HZ ½X, ©Î¬O¤é¥»ªº JIS ¼Ð·Ç, ¥H¤Î¤j³¡¥÷ UNIX
¤u§@¯¸¤§´©ªº EUC.  ¦pªG¨Ï¥ÎºÝªº¾\Ū¾¹µLªkÃѧO³o­Ó Escape sequence, ©Î¬O
¨S¦³¬Û¹ïÀ³ªº¦r«¬, «hŪªÌ¥i¯à¬Ý¨ì¤@¦ê¶Ã¤C¤KÁVªº¦r, ¦ý¬O³q±`³o¨Ç¦rÀ³¸Ó
¤£¦h, ¤£¦Ü©ó¼vÅT¾ã­Ó¤å³¹ªº¤º®e.  ¦Ü©ó¸Ó¥Î­þ¨Ç¦r¦ê§@¬° Escape sequence?
§Ú°êªº CNS ½X¤w¸g¦b°ê»Ú¤Wµù¥U, §Ú­ÌÀ³¸Ó¾¨¶q¸òÀH³o­Ó¼Ð·Ç, ¤£¯à¸òªº®É­Ô,
À³¸Ó¹B¥Îºô¸ô¤j²³¶Ç¼½ªº¤O¶q, ¥[¤W¬Fªv´å»¡ªº¤O¶q, §â§Ú­Ì¿ï©wªº Escape sequence
³]¦¨¼Ð·Ç.  ¦Ü©ó¨u¥Î¦r¸Ó¦p¦ó½s½X, ¦P¼ËÀ³¸Ó¥ý°Ñ¦Ò¤¤¥¡¼Ð·Ç§½¦b 1992 ¦~¤½¥¬
ªº¼Ð·Ç¥æ´«½X.  ³o­Ó½Xªº½s±Æ²Å¦X°ê»Ú¼Ð·Ç, ¥Ø«e¦@¦³¤C­Ó¦r­±, ÁÙ¦³«Ü¦h¬A¥R
ªºªÅ¶¡, ¨C­Ó¦r­±¨Ì°ê»Ú¼Ð·Ç±Æ¤J 94*94 ­Ó¦r½X (two bytes, each byte is
between 33 and 126, decimal inclusive).  ²Ä¤@¤G¦r­±©Ò¿ï©wªº¦r°ò¥»¤W»P
Big-5 ¬Û¦P, ¦ý§ï¥¿¤F´X­Ó (¤]³\¬O©Ò¦³ªº) ¿ù»~.  ²Ä¤T¨ì¤C¦r­±©w¸q¤F¤T¸U
¦h­Ó¨u¥Î¦r, ©ÎÅé¦r, ²§Åé¦r, ©M¤@¨Ç¥u¥X²{¦bºâ©R¥ý¥Íªº©R¦W¾Ç¤Wªº©_©_©Ç©Ç
ªº¦r: ¥¦­Ìªº¦r½X¥H¤Î¦r«¬.  ¤K¨ì¤Q¤»¦r­±ªÅµÛ, ²Ä¤Q¤G¦r­±¬O user defined.
 
§Úªº¾ÇÃѤ£¨¬¥H¾ÌÂ_³o¨Ç¦b²Ä¤T¨ì²Ä¤C¦r­±ªº¦r¬O§_§¹¾ã©Î±Æ§Ç§´·í, ¦]¬°¥¦­Ì
¥þ¬O§Ú¤£»{ÃѪº¦r.  ¦pªG¦ò¸g¸Ìªº¦rÁÙ¦³¦b³o¸Ì§ä¤£¨ìªº, §Ú«Øij¤£­n¥Î²Ä¤Q¤G
¦r­±, ¦Ó¬O¹B¥Î¦ò±Ð¹ÎÅ骺¬Fªv¤O¶q¥hª§¨ú¤@­Ó¦r­±, ¨Ò¦p¤Q¤T, §@¬°©v±Ð¨u¥Î¦r­±.
¦]¬°, ©Ò¿× user defined, ¨ì³Ì«á¤@©w¬O¤@¹Î¨S¥Îªºµ}ªd.
 
¦Ü©ó¨u¥Î¦rªº¿é¤J, «Ü©úÅ㪺, ¥²¶·µo®i¹ïÀ³ªº¤¤¤å¿é¤J³nÅé¥H¤Î¦r«¬.  ¦b X window
¤W¤w¸g¦³¤@®M§@ªk¥i´`, ¨ä¥L¨t²Î¤W¤]¤£¸Ó¦³§Þ³N¤Wªº§xÃø.
 
§Ú­Ìªº¬F©²¤£ª¾¹D¦b°µ¤°»ò, ¥H»OÆWªº¦Û³\¬°¹q¸£¤ý°êªº¦a¦ì, §Ú­Ìªº°ê®a¥æ´«½X¨ì
1986 ¤~­º¦¸¤½¥¬, ¦Ó¥B¤S·¾³q¤£¨}, ¾É­P¥«³õ¤W¨S¤H²z¥¦ (¤£²z¬F©²¦ü¥G¬Oªñ¥N¨â
©¤¤¤°ê¤Hªº¦@¦P¯S¼x).  §Ú·Q, §Y¨Ï²{¦b, ÁÙ¬O«Ü¦h°é¤l¸Ìªº¤H¨SÅ¥»¡¹L³o­Ó¼Ð·Ç,
©Î¬OÅ¥»¡¤F¦ý¬O¨S¦Ò¼{¹L­n¥Î¥¦.  ­Ë¬O¸êµ¦·|©M¤@¨Ç¤½®a³æ¦ì¶}©l (¤]³\¬O³Q­¢)
¨Ï¥Î¥¦, °ê¥~ªº¤@¨Ç¤½¥q¶}©l¤ä´©¥¦, ¦]¬°¥¦²¦³º¬O¦b°ê»Ú¤Wµù¥Uªº°ê®a¼Ð·Ç½X.
 
®É¶¡¥^«P, ¼g¤F¨Ç§O¦r, ¦ý¦¹ editor ¤£®e©ö§ó¥¿, ½Ð­ì½Ì.
 
-Shann
 
========================================================================
Date: Mon, 28 Aug 1995 22:57:15 +0800 (CST)
From: David Chiou <b83050@cctwin.ee.ntu.edu.tw>
Subject: Recommend Chinese Code -- CNS
 
 
 
¤U¤å§Y¬OÃö©ó¦UºØ¤º½XªºÂ²¤¶¡A¨ú¦Ûªá¶é¤j¾ÇÁI¾Ç WWW:
http://www.iijnet.or.jp/iriz/irizhtml/irizhome.htm
 
¡]¤@¨Ç­«­nªº¤º®e¡A§Ú·|ÀH¤âªþ¤W¤¤¤å½Ķ¡A¤£¹L¤£«OÃÒ¨S½¿ù¡C
  ¤@¤Á±o¥H­ì¤å¬°·Ç¡C¡^
 
     _________________________________________________________________
   
Chinese character codes: an update
¤¤¤å¤º½Xªº±´¯Á¡G­×§ïª©
 
    by Christian Wittern
    §@ªÌ²¤¶¡G¤é¥»¨Ê³£ªá¶é¤j¾ÇÁI¾Ç¤¤¤ß¡]§Y¡u¹q¤l¹F¼¯¡v¥Zª«µo¦æªÌ¡^
              ªº¸ê²`¤H­û¡Cªá¶é¤j¾ÇÁI¾Ç¤¤¤ß¹ï©ó¦ò¨å¹q¤l¤Æªº¥þ¥@¬É
              Ápµ¸¤u§@¡A¦Û 1992 ¦~¥H«e§Y¶}©l¶i¦æ¡A¥i¬O·í¤µ°ê»Ú¤W
              ³Ì¤jªºÁpµ¸ºô¡C
   
     _________________________________________________________________
   
    Summary
    
   This article presents an update to Christian Wittern's and Urs App's
   articles concerning Chinese character codes (Electronic Bodhidharma
   No. 3). In those articles, Urs App argued that database creators must
   make the most crucial distinction between master data and user data.
   Master data should be of the highest quality, recording even minute
   detail like studio recording equipment. User data, on the other hand,
   must conform to what codes and equipment we presently have. Christian
   Wittern's article compared different codes and concluded that CCCII, a
   very large Taiwanese code that also includes Japanese and Korean
   letters, seems to be the best choice for the master data set of
   Chinese text databases.
 
   ºK­n
   
   ¥»¤å§ï¶i¤F Christian Wittern ¥ý¥Í©M Urs App Ãö©ó¤¤¤å¤º½XªºµûªR
   ¡]¥Z¸ü©ó¡u¹q¤l¹F¼¯¡v´Á¥Z²Ä¤T´Á¡^¡C¦b¸Ó¤å¤¤¡A Urs App ªí¥Ü¸ê®Æ®w
   ªº«Ø¥ßªÌ¥²¶·¹ï©ó master data ¤Î user data §@¤U«D±`«D±`­«­nªº¨M©w¡C
   Master data ¥²¶·¨ã¦³³Ì°ªªº«~½è¡A¦p¦P¿ý¼v¾¹§÷°O¿ý¤U¨C¤ÀÄÁªºµe­±¤@¯ë¡F
   ¥t¤@¤è­±¡A user data ¥²¶·¶¶±q©ó¨ººØ¤º½X¬O§Ú­Ì²{¦³ªº¡C
 
   Christian Wittern ¥ý¥Íªº¤å³¹¤ñ¸û¤F´XºØ¤£¦Pªº¤º½X¡Aµ²½×¬O¡G
   ¡u CCCII¡]¤@ºØ«D±`Ãe¤jªº¥xÆWªº¤º½X¡A¨Ã¥B¥]§t¤F¤é¥»¤ÎÁú°ê¦r¡^
      ¦ü¥G¬O¤¤¤å¤º½Xªº master data ªº³Ì¨Î¿ï¾Ü¡C¡v
 
 
   We shelled out US $ 2000 for a CCCII board, only to discover that both
   the code itself and its implementation are seriously flawed. We thus
   had to continue using Big-5 for all practical purposes while looking
   for better solutions. Finally, Christian decided that the only
   practical approach at this time was to build on Big-5 (and other
   national codes such as JIS) and extend them through code references
   that are both stable and portable. His ingenious approach forms the
   basis of the IRIZ KanjiBase and its encoding scheme -- a scheme which
   will be as useful after the introduction of Unicode as it proves to be
   right now. (U.A.)
 
   §Ú­Ìªá¤U¤F¬üª÷ 2000 ¤¸¡A¶R¤F¤@­Ó CCCII ªºªO­±¡Aµ²ªGµo²{¸Ó½X¥»¨­¤Î
   ¥¦ªºªþÄݳ]³Æ¡A³£¨ã¦³ÄY­«ªº·å²«¡C¦]¦¹¡A§Ú­Ì¦b¹ê»Úªºª¬ªp¤W¡A¥u¦nÄ~Äò 
   ¨Ï¥Î BIG-5¤º½X¡Aµ¥µÛÄ~Äò´M§ä§ó¦nªº¸Ñ¨M¤è®×¡C³Ì«á¡A Christian ¥ý¥Í
   ¨M©w¤F¡A²{®É°ß¤@¹ê»Ú¥i¦æªº¤èªk¬O«Ø¥ß¦b BIG-5 ¡]¤Î¤é¥»°ê¤º´¶¹M¬y¦æªº
   JIS ½X¡^¤W­±¡A¨Ã¥BÂǥѬJí©w¤S¨ã¥iÄâ©Êªº¡u¤º½X°Ñ·Óªí¡v¡]code references¡^
   ¨ÓÂX®i¥¦­Ì¡C¥Lªº³o¶µÁo©ú´£Ä³²£¥Í¤F¡uIRIZ º~¦r®w¡vªº°ò¦¡A¥H¤Î¡uIRIZ
   º~¦r®w¡vªº¡uÂàĶ¾¹¡v¢w¢w¤@ºØ¦b±N¨Ó Unicode ¤Þ¶i«á¡A¯à°÷¦p¦P²{¦b§Ú­Ì
   ÃÒ©ú¥¦¦³°÷¹ê¥ÎªºÂàĶ¾¹¡C
 
     _________________________________________________________________
   
     * Some kanji codes for computers
         1. Japanese JIS Codes
         2. Taiwanese Big5
         3. Taiwanese CNS
         4. CCCII and EACC
         5. Unicode
 
     ¡¯¤@¨Ç¹q¸£¤Wªºº~¦r¤º½X¡G
         1. ¤é¥» JIS ¤º½X
         2. ¥xÆW BIG-5 ¤º½X
         3. ¥xÆW¤¤¥¡¼Ð·Ç§½ CNS ¤º½X
         4. CCCII¤º½X¤Î EACC µ{¦¡
         5. Unicode
 
     * More information is available at ifcss.org in Ross Patterson's
       document CJK Codes and in Ken Lunde: Understanding Japanese
       Information Processing p35ff.
 
     ¡¯¦b ifcss.org(.jp) ¤W¦³§ó¦h¦³¥Îªº¸ê°T¡A´N¬O Ross Patterson ¥ý¥Íªº  
       ¡u CJK ¤º½X¡v¤@¤å¡A¤Î Ken Lunde¥ý¥Íªº¡G¡u¤F¸Ñ¤é¥»¦b³B²z p35ff ¤W
       ªº¸ê°T¡v¤å¥ó¡C
 
     _________________________________________________________________
   
Development of kanji codes for computers
¹q¸£º~¦r¤º½Xªºµo®i
 
  Japanese JIS Codes
¤é¥» JIS ½X
   
   The first character code designed to make the processing of
   ideographic characters on computers possible was the JIS C 6226-1978.
   It was developed according to the guidelines laid down in the ISO
   standard 2022-1973 and became the model for most other code standards
   used today in East Asia (the most notable exception is Big5). Covering
   approximately 6500 characters, this standard has been revised two
   times, in 1983 and 1990, where the assignment of some characters where
   changed and a few added. Revising a standard is about the worst thing
   a standard body can do and has caused much grieve and headache among
   manufacturers and users alike. Today we finally have fonts that bear
   the year of the standard they cover in their name, so that users can
   know which version is encoded in that font and select if accordingly.
   Our texts and tools are based on the latest version.
   
   The version of 1990 has become known under the name JIS X 0208-1990
   and has been together with an additional set of 5800 characters (JIS X
   0212) the base of the Japanese contribution to Unicode.
   
   The JIS code is almost never used in computers as it was defined;
   rather, some changes are made in the way the code numbers are
   represented. This is necessary to allow JIS be mixed with ASCII
   characters and, as in the case of ShiftJis (or MS-Kanji, the most
   popular encoding on personal computers) with earlier Japanese
   encodings of half-width kana. East Asian text is thus most frequently
   based on a multibyte encoding, a character stream that contains a
   mixture of characters represented by one single byte and of characters
   represented by two bytes.
   
   In addition to the characters in the national standard, many Japanese
   vendors have added their own private characters to JIS, making the
   conversion between these different encodings difficult beyond belief.
   
  Big5
¡]¤¤¤å BIG-5 ½X¡^
   
   
   There are different legends about the beginnings of Big5; some say
   that the code had been developed for an integrated application with 5
   parts, and others say it was an agreement of five big vendors in the
   computer industry. No matter which one is true (and it might as well
   be something else), the Taiwanese government did not realize the need
   for a practical encoding of Chinese characters timely enough.
   Government agencies had apparently been involved also in the
   development of Big5, but it was only in 1986 that an official code was
   announced, a time by which Big5 was already a de facto standard with
   numerous applications in daily use.
 
   Ãö©ó BIG-5 ¤º½X¶}©lªº¶Ç»¡¡A¦³³\¦h¤£¦Pªºª©¥»¡G¦³¤H»¡¦¹¤º½X¬O¥Ñ¤@­Ó
   ¾ã¦X¤­­Ó³¡¥÷ªºÀ³¥Î³nÅé©Ò²£¥Íªº¡A¤S¦³¤H»¡¥¦¬O¤­­Ó¤j«¬ªº¹q¸£¼t°Ó©Ò
   ¦@¦P¬ù©wªº¡C¤£ºÞ­þ¤@­Ó¶Ç»¡¬O¯uªº¡A¥xÆW¬F©²¨Ã¥¼§Y®É¤F¸Ñ¤¤¤å¤º½X
   ªº­«­n©Ê¤Î¶·¨D©Ê¡CÁöµM¬F©²¾÷Ãö«Ü©úÅã¦a¤]°Ñ»P¤F BIG-5 ªº¶}µo¤u§@¡A
   ¤£¹Lª½¨ì 1986 ¦~¡A©x¤èªº¤º½X¤~¥¿¦¡¹ï¥~«Å§G¡A³o®É BIG-5 ¤º½X¦­¤w¬O
   ¬°¼Æ·¥¦hªº¤é±`À³¥Î³nÅé©Ò±Ä¥Îªº¼Ð·Ç¤F¡C
 
 
   Big5 defines 13051 Chinese characters, arranged in two parts according
   to their frequency of usage. The arrangement within these parts is by
   number of strokes, then Kangxi radical. As Big5 was apparently
   developed in a great hurry, some mistakes were made in the stroke
   count (and thus placement) of characters, and two characters are twice
   represented. On the other hand, some frequently used characters were
   left out and were later implemented by individual companies.
 
   All implementations agree on the core part of Big5, but different
   extensions by individual vendors aquired much weight, most notably in
   the case of the ETEN Chinese system that was very popular in the late
   eighties and early nineties. As there is no document that defines Big5
   apart from the documentation provided by the vendors with their
   products, it is impossible to single out one standard Big5. This was
   actually a big problem in the process of designing Unicode -- and it
   remains one even today.
 
   ¡]³o¤@¬qÁ¿¨ì BIG5 µLªk²Î¤@¼Ð·Çªº¤j°ÝÃD¡Aª½¨ì¤µ¤éÁÙ¬O¦p¦¹¡A¦b±N¨Ó
     Unicode ¨î©w®É¥ç·|³y¦¨³Â·Ð¡C¡^
 
   
 
  CNS X-11643-1986 and CNS X-11643-1992
¡]¤¤¥¡¼Ð·Ç§½ CNS X-11643-1986 ¤Î CNS X-11643-1992¡^
   
   This is the Chinese National Code for Taiwan. In the form published in
   1992, it defines the glyph-shape, stroke count and radical heading for
   48027 characters. For all these characters a reference font in a 40 by
   40 grid ( and for most of them also in 24 by 24 grid ) is available
   from the issuing body. These characters are assigned to 7 levels with
   the more frequent at the lower levels and the variant forms at the two
   top levels. The whole architecture reserves space for five more
   standard levels and four level are reserved for non-standard, private
   encoding, bringing the total to 16 levels, with a hypothetical space
   for roughly 120 000 ideographs. On top of the currently defined ones,
   one more level with about 7000 characters is currently under revision
   and expected to be published in the course of 1995. This will bring
   the total number of assigned characters to roughly 55000.
 
   ³o¬O¥xÆWªº¤¤¥¡¼Ð·Ç½X¡C¦b 1992 ¦~µo§Gªº®æ¦¡¤W¡A¥¦¬° 48027 ­Ó¤¤¤å¦r
   ©w¸q¤F glyph-shape¡Astroke count¡A¥H¤Î radical heading ¡C¹ï©ó³o¨Ç
   ©Ò¦³ªº¤¤¤å¦r¡A¨Ã¦³¬ÛÀ³ªº 40 x 40 ®æ¤lªº¦r«¬¡]¤j³¡¥÷ªº¥ç¦³24 x 24
   ¦r«¬¡^ªþ¦bµoªíªº¤º®e¤W¡C
 
   ³o¨Ç¤¤°ê¦r³Q¤À°t¦Ü¤C­Ó¦r­±¡A¥H³Ì±`¥Îªº¦rÂ\¦b¤U¼h¦r­±¡A¥H¤ÎÅܲ§ªº
   ¦rÅéÂ\¦b¤W­±¤G¼h¦r­±¡C¤¤¥¡¼Ð·Ç½Xªº§Þ³N¡A¨Ï¥¦«O¯d¤F¤­­Ó¥H¤Wªº¼Ð·Ç¦r­±
   ¥H¤Î¥|­Ó«D¼Ð·Ç¡B¨p¤H¥Î¦r­±¡A¨Ï±o¥¦Á`¦@¥i¥H¦³ 16 ­Ó¦r­±¡A¨Ã¥B¹ï©ó²Ê²¤
   ºâ¨Ó 120 000 ­Ó¦r¸¹¦³­Ó°²³]ªºªÅ¶¡¡C
 
   ¦b¥Ø«e¤w©w¸qªº³Ì¤W¼h¦r­±¡]²Ä¤C¼h¡^¡A¤@¼h¦hªº¦r­±¡]¨ã¦³¬ù 7000 ­Ó¦r¡^
   ¥¿¦b¥[¥H­«·s¼f®Ö¡A¨Ã¥B¥´ºâ¦b 1995 ¦~¤½§G¡C³o±N¨Ï±o¥¦©Ò«ü©wªº¤¤¤å¦r¤¸
   ¥i¹F¨ì±Nªñ 55000 ­Ó¦r¡C
 
 
   The overall structure has already been outlined; but how does the CNS
   code relate to other code sets in use in East Asia, e.g. the Korean
   KSC, the Japanese JIS, and the mainland Chinese GB? And what about
   Unicode?
 
   ³o¾ãÅ骺µ²ºc¤w¸g³Q¤Äµe¥X¨Ó¤F¡C¦ý¬O CNS ½X»P¨ä¥¦ªF¨È©Ò¥Îªº¤º½X
   ¡]¨Ò¦pÁú°ê KSC ½X¡B¤é¥» JIS ½X¡B¤¤°ê¤j³°Â²Åé GB ½Xµ¥¡^¦³¤°»ò
   Ãö«Y©O? ©M Unicode ªºÃö«Y¤S¦p¦ó©O?
   
 
   The answer to this is somewhat disappointing: Although CNS defines
   roughly eight times the number of characters, more than three hundred
   characters present in the Japanese JIS are still missing from the CNS.
   In relation to GB, the CNS misses roughly 1800 simplified characters.
   With this it is also clear that the CNS code will miss quite a number
   of Unicode Han characters. Upon closer examination, the reason is soon
   obvious: CNS in its higher levels occasionally defines some
   abbreviated forms, but in general it does not include characters
   created as a result of the modern character reforms. I consider this a
   serious drawback and an obstacle to a true universal character set.
   But this seems to h³B²z³o¶µ¶·¨D¡C¹ê»Úªº¤u§@
   Åã¥Ü¤F©µ¥Î¤w²ßºDªº¤u§@Àô¹Ò¡]°t¦X¦r«¬¡B½s¿è¾¹µ¥¡^¬O¦h»òªº­«­n¡C
   ¦]¦¹¡A§Ú²{¦b´£­Ò¨Ï¥Î¤@ºØ¥Ø«e°ê»Ú³q¦æªº¤º½X¡]¥xÆWBIG5 ©Î¤é¥» JIS¡^
   °t¦X¡uIRIZº~¦r®w¡v¡A¬O¤ñ°_±Ä¥Î CCCII ¨Ó±o¦nªº¤è®×¡C
 
 
     _________________________________________________________________
   
   
   
    1. Before launching large database projects, one ought to find out
       what has already been done in the area and study its qualities and
       defaults. Often one learns much by asking programmers and database
       designers what they would do differently if they could start all
       over again. In the field of Buddhist studies, the Electronic
       Buddhist Text Initiative tries to help in this coordination and
       learning process.
       
     This may sound trite, but it is a fact that even major projects in
     the field are unaware of what is happening elsewhere Ñ and sometimes
     even in their own institution. On the recent field trip organized by
     the Electronic Buddhist Text Initiative, we found for example that
     the people managing the Chinese University of Hong Kong concordance
     project were not aware of the very similar effort in Oslo; and a
     long-time resident scholar at the Academia sinica found out through
     us that important materials for a Chinese text he has been
     translating are on his instituteÕs computer. That electronic
     versions of a text exist does not mean much in itself; one must
     evaluate data quality, accessibility, and suitability for oneÕs
     project.
    2. One must classify data input projects by the amount of data
       involved and their destination. Thus one must distinguish between
       small amounts of data and large amounts of data, data destined for
       individual users or small groups and data destined for large user
       groups and institutions, etc. The present guidelines apply to
       large input projects that contain many full-form Chinese
       characters and are aimed at a large and diverse group of users.
       
     Failure to make such distinctions may lead to inadequate demands for
     data quality, search strategies, etc. For example, certain automatic
     or half-automatic methods of scanner input can be quite useful and
     efficient for an individual user prepared to spend a substantial
     amount of time for data correction; but the very same method may
     prove totally inadequate for large-scale institutional data input
     because of the high cost of error correction. Similarly, a
     relatively high number of mistakes may not bother some users but is
     unacceptable for data that are to be distributed to other users.
     Again, the use of many self-defined characters can be acceptable for
     individuals but not for institutions.
    3. It is of the greatest importance to make basic decisions at the
       beginning of a project and to discuss them with specialists. In
       making these decisions, both present and future possibilities of
       use must be kept in mind. This applies particularly to the choice
       of source text, text editing, annotation, basic data character
       (character encoding, data format, non-standard character handling,
       etc.), and hard/software environments. Such questions must be
       discussed by a team of specialists at the outset of a large
       project, i.e. before the main input activity starts, and an action
       plan should be approved by the whole team.
       
     Failure to do this can result in gigantic waste of money. Several
     Chinese text databases I know of started out with little planning;
     mostly they were designed to fit the hardware and software
     environment of some years ago at a specific location. Later, when
     trying to convert the data to present requirements and for use by
     other institutions, they found that automatic conversion was not
     possible or corrupted the data set. Prior planning and consultation
     with specialists could have prevented this. Another example: tagging
     data during the input or correction / editing process can improve
     the value of a database enormously, for example in making it
     possible to look for all plant names or place names in the whole
     Pali canon. Doing something like this at a later point would be
     another major enterprise that could have been avoided through
     careful planning.
    4. If the electronic text is (or may at a later point in time be)
       destined for international users and a variety of hardware and
       software environments, it is necessary to make a basic data set
       (master data set) that can later be automatically converted into
       any necessary code or format. It is important to treat this master
       data set as a separate entity whose input conditions, character
       code, hardware environment, etc. can be very different from that
       of the eventual user, just as studio quality music recording and
       editing equipment is different from the reproduction equipment of
       the consumer.
       
     With Chinese text, the difference shows particularly in the way rare
     characters and different national standards are handled.
     Institutions that do not separate master data and user data
     invariably produce data that follow the low standards of character
     codes now used on PCs (JIS, GB, BIG-5, etc.; see the article in this
     number by C. Wittern). Of the institutions visited on the recent
     field trip, those who did not distinguish between master and user
     data all suffer from data quality problems which will become even
     more serious as larger codes become available. Those who were wise
     enough to make this distinction are: the libraries of Taiwan
     National University and Hong Kong University of Science and
     Technology (both use master data in CCCII code and user data in
     BIG-5) and the Chinese Academy of Social Sciences (master data in
     their own 45,000 character code, user data in various formats). Just
     like master tapes in the music business, master data must be of such
     quality that it can be used in many different environments, present
     and future. Most of the Chinese text data so far input in Japan,
     Korea, and mainland China will have about as much future as the
     recording of a concert made on a Walkman.
    5. In order to assure such convertibility and adaptability, the
       master data must contain the greatest possible amount of
       information. This is an important factor of data quality. In the
       case of Chinese, Korean, or Japanese data (or any other text set
       that maip, we
     met programmers who admitted that they have never actually used the
     database they have been working on for years...
    9. Databases are made for users; therefore the wishes, working
       environment, and likely working habits of users must be carefully
       studied and respected. For example, most users search while
       writing a paper or book; therefore it must be possible to use the
       database concurrently with a word processing program. Any large
       text database should also let the user attach notes and tags to
       the main text. Such notes should also be searchable, printable
       (together with the text or separately), savable as separate files
       with location tags, and portable to updated versions of the
       electronic text. Search engines must also be adapted to many
       usersÕ needs. Therefore it must be flexible and adaptable to a
       variety of usersÕ preferences (just like word processing programs)
       rather hard-coded. Search results should be viewable and printable
       and file saveable in a variety of formats according to the userÕs
       wishes. Since the main aim of databases is the retrieval of
       information, such retrieval should be carefully planned with many
       options for the user.
       
     In projects whose input takes many years of work, one must make
     programmers produce multiple test versions of search software and
     have scholars and other prospective users evaluate it even while
     input is going on. If necessary, data structure decisions have to be
     reevaluated. Users should have a say in all important software
     decisions, and programmers should assist users to evaluate test
     versions and to formulate their wishes by telling them about
     alternative possibilities.
    Author:Urs App
    Last updated: 95/04/23
    
 
 
==========================================================================
Date: Mon, 24 Jul 1995 23:39:11 +0800
From: Shann Wei-Chang <sq¥Lªº¤å¥ó¨Ó¬Ý¡A¦ü¥G¨S¦³
µ´¹ï¼ÖÆ[ªº¸Ñ¨M¤èªk¡Cªº½T¥O¤H­W´o¡C
 
¥¼¨Óªº¤@¦Ü¤G¶g¡A§Ú±N§ë¤J¥þ¤O¼g¤@¥÷¤¤¤å TeX ªº¨Ï¥Î¤â¥U¡AµM«á­n¨ó§U¤uŪ¥Í
©M­p¤¤¼g accounting ªº³B²z scripts.  «u¡A¦h»¡µL¯q¡AÁ`¤§§Ú«Ü·QÀ°¦£¦ý¬O¹ê¦b
µL¯à¬°¤O¡C
 
>     ¤£¹L¨º¦ì­Ê¤Ñªº¤uµ{¤H­û¼B©ú«Â¥ý¥Íªí¥Ü¡A±oµ¥¦³¤@©w¼Æ¶qªº
> ¦ò±Ð¹ÎÅé¤ä«ù¦¹¤@ÂX¥Rªººc·Q«á¡A¼B¥ý¥Í¤~·|¥h¶i¦æµ{¦¡­×§ïªº¤u
> ¨ã¡A¥H§K¨ìÀY¨Ó¥Õ¦£¤@³õ¡C
> 
>     ·Ó³o¼Ë¤l¨Ó¬Ý¦¹ Big-5 ªº§ï¨}ª©¥»®£©È·|¦³°ÝÃD? ¤£¹ê¥Î?
> ¦]¦¹¤@¯ë user ¨Ï¥Îªº¤´µM¬Oªº Big-5 ª©¥»...
> ¦]¦¹³o­Óª©¥»¬J¤£¦p CCCII, Unicode µ¥¯à´£¨Ñ "¥þ¼Æ" ªº³y¦r¡A
> ¤S¤£¹³ Big-5 ¯ëªº¬y³q¡A¦ü¥G¥u¯à§@¹L´ç¤§¥Î?
 
§Ú¤£¤ÓÀ´³o¤@¬q¸Üªº·N¸q¡C CCCII ªº°ÝÃD Wittern ¤w¸g»¡±o«Ü²M·¡ (§Ú¥H«e¨S³o»ò
²M·¡¡A¥u¬O¦b²z½×±À²z¤W¡A»{¬°¥¦¤£¬O¤@­Ó¦n¥D·N¡A²{¦b Wittern µ¹¤F«Ü©ú½Tªº§Þ³N
¸ê®Æ¡A»¡©ú¥¦¤£¬O¤@­Ó¦n¥D·N), ¦ý§Ú¤£»{¬° Unicode ¯à´£¨Ñ¥þ¼Æªº³y¦r, ¥¦²¦³º¬O
¤@­Ó©T©w¤j¤p 256*256 ªº¦rªO¡A³y¦rªº­Ó¼Æ¬O¦³¤W­­ªº¡F¦Ó¥B³o­Ó½XÁÙ­n¥þ¥@¬É¨Ó¤À
µÛ¥Î¡A¤£¥i¯à§â©Ò¦³³y¦rªÅ¶¡³£µ¹¤F§Ú­Ì§a¡HÁÙ¦³¡A§A»¡°µ¹L´ç¤§¥Î¡A«üªº¬O½Ö¡H
¬O§ï¨}ªº Big-5 ¶Ü¡H¥i¬O§A­è¤~¤£¬O¤~»¡­Ê¤Ñ²{¦b¤£¯à®³¥X¨Ó¥Î¶Ü¡H
 
§Ú«ÜÃÙ¦¨ Wittern ¤å³¹¤¤ (©Î¬O¥t¤@¤H¼gªº¡AÁ`¤§¬O§Aªþªº¨º½g) ©Ò»¡ªº¡A¸ê®Æ
­n¤À¤º½X (master data) ©M ¥~½X (user data)¡C¦pªG§A±µ¨ü³o­ÓÆ[©À¡A¨º»ò§Y¨è
¥i¥H¿ï¤@­Ó³Ì¾A·íªº¦r½X¨Ó»s³y master data¡C¬Æ¦Ü¤£¥²²z·|¥ô¦ó¼Ð·Ç½X¡C¦Ó§Ú
­Ó¤Hªº«Øij (¤@­Ó¤£°Ñ»P¤u§@ªº¤H»¡³o»ò¦h«Øij¡A¹ê¦b«Ü¤ßµê) ¬O¡A¸ò¥H«e¤@¼Ë¡A
¾¨¶q¥Î CNS, ¤£¨¬ªº¦r¦Û¦æ©w¸q¡A¥Î¸õ²æ½Xªí¥Ü§A­Ìªº¯S®í³y¦r¡A¦b PC ¤W¦³«Ü¦h
³y¦rµ{¦¡¦@±z­Ì¥Î¡A¦b UNIX ¤W¤j®a¤@«ß¥Î X Window ªº bitmap ©Î BDF ®æ¦¡§Y¥i¡C
¤@¥¹¤º½X³y¦¨¤F¡A»P¥~½Xªº¹ïÀ³¥u¬O¤@±iªí®æªº°ÝÃD¡C
 
 
>    ½Ð°Ý¤@¤U¡A¥H CNS ¼Ð·Ç¿é¤Jªº¤å¥ó¡A¦b BIG-5 ¤U­±¥i¥H¬Ý¶Ü?
 
CNS ¼Ð·Çªº¨â­Ó bytes ³£¬O low bytes (0 .. 127), ³o¬O ISO ªº¼Ð·Ç¡C
¤£¦P¦r­±ªº CNS ¥Î¸õ²æ½X¡A©Ò¥H°ò¥»¤W©M Big-5 ¬OºIµM¤£¦P¡C¦ý¬O¦b PC ¤W
­Ê¤Ñ´£¨Ñ CNS ½X¡A¥Lªº·N«ä¬O shift-CNS (like shift-JIS). ¥L¥u¥Î CNS ªº
²Ä¤@¤G¨â­Ó¦r­±, ²Ä¤@¦r­± shift ²Ä¤@­Ó byte 128...255, ²Ä¤G¦r­±§â¨â­Ó byte
³£ shift.  ¬GÄY®æ¨Ó»¡­Ê¤Ñ©Òµ¹ªº CNS ½X¤]¤£¬O¼Ð·Çªº.
 
¦Ó¥B CNS ªº«e¨â­Ó¦r­±©M Big-5 ¤]¤£¬O order-preserving one-to-one mapping,
©Ò¥H§Y¨Ï¬O shift-CNS ¤]¤£µ¥©ó Big-5.  ¥h¦~§Ú´¿ªá¤F¦Ü¤Ö¤@­Ó¤U¤È¥h·d²M·¡
Big-5 ©M CNS plans 1,2 ªº®t²§¡A¨Ã½T©w Big-5 ªº¿ù»~¤§³B¡A§Ú´¿¼g¤@¥÷³ø§i
post µ¹ CCNET-L, ²{¦b¨S®É¶¡§ä¥X½Z.
 
¦ý¦³¤@­Ó betty µ{¦¡¥i¥H¤Î®É§â shift-CNS Âন Big-5 (vice versa),  ¦ý¥¦¥u¦b
UNIX ¤W°õ¦æ.
 
>    ½Ð±Ð¤@¤U¡A¤£ª¾ CNS ªº¤¤¤å¨t²Î­n¦p¦ó¨ú±o©O?
 
°Ý­Ë§Ú¤F¡C°£¤F­Ê¤Ñ¤Wªº shift-CNS §Ú¨S¨£¹L¨ä¥Lªº implementations.  ³o·íµM
¤£¬O PD µ{¦¡¡C§Ú²q¸êµ¦·|©M¬Y¨Ç¬F©²³æ¦ì¤@©w¦³³o³nÅé¡A¥u¬O°Ó³õ¤W¥¦²@µL¥ß¨¬
¤§¦a¡A©Ò¥H¤@¯ëªº¨Ï¥ÎªÌ¬Ý¤£¨ì³oºØ²£«~¡C¦b UNIX ¤W§Ú·Q§Úª¾¹D¦p¦ó°t¦X CXTERM
implement ¤@¥÷ shift-CNS ªº¤¤¤åÀô¹Ò¡A¦Ü©ó¦Û³y¦rªº¸õ²æ½X³B²z¡A§Ú·Q¥i¥H­×§ï
betty µ{¦¡¨Ó implement.  Betty ªº§@ªÌ¦b²M¤j (§Æ±æ¥LÁÙ¨S²¦·~), ¥i¥H½Ð¥L
«ü¾É¡C
 
-Shann
 
 
/End of lin
Fri Mar 29 17:30:18 1996
¦^ÂÐ | Âà±H | ªð¦^

Éà ¥x¤j·à¤l§q¦ò¾Ç±M¯¸  http://buddhaspace.org