                                                Dr. Supachai   Tangwongsan
                                                Dr. Damras     Wongsawang
                                                Miss Jiraporn  Kiatpibool

       Many  people  said  "This software would be another brave new
world". It is the first of its kind in the universe of international
Buddhism perusal.

       Mahidol  University  Computing Center (MUCC) is very proud to
present the world's first complete digital edition of The Buddhist
Scripture, Tipitaka which is a collection of scriptures representing the
collected teachings and sayings of the Buddha and the scripture's
commentary, the Atthakatha. The Tipitaka's importance is in being the root
and basic reference for all teachings and explanations of Buddhism, the
standard for measuring the teachings presented as Buddhism, a record of
beliefs, religions, traditions and events of times many centuries past, an
invaluable source of reference material relating to other fields of

      Since the commencement of the Buddhist era over 2500 years ago, there
have been continuous efforts to preserve and maintain the Tipitaka so that
it remains as a religious heritage for the coming generations. Various
media depending on the technology of the age have been used to preserve the
contents of the Tipitaka e.g. using the method known as "Mukhapatha" or
memorization and spread by "word of mouth", later, engraved stone , leaf,
cloth, paper etc. were used to store the contents of the scriptures. With
more complex technological advancement, various storage devices of the
computer have been used to store the contents e.g. hard disk, optical disk
and finally on CD-ROM which is 120 mm wide and 1.25 mm thick. What more,
only a single CD- ROM carries the entire Thai and Romanized Pali of the
Tipitaka (45 volumes) Attakatha (55 volumes) and special scriptures (15
volumes) totalling to more than 450 million characters. The CD-ROM is very
small in size, is light and costs relatively cheap and needs very little
care. Also, data in a CD-ROM is virus-safe which is a problem found with
other computer media. BUDSIR (BUDdhist Scriptures Information Retrieval) on
CD-ROM as named by the University, and the first of its kind will be
available globally making the study and research of Buddhism virtually

        MUCC also developed the program BUDSIR which aids in the
search and retrieval of the contents of the digital edition of the scriptures
and its commentary. Development of BUDSIR took off grounds as a
project to develop a computerized version of the Tipitaka in honour of His
Majesty the King's Ratchamangklaphisek Ceremony (The celebration of
the Longest Royal Enthronement Anniversary) and the celebration of His
Majesty the King's 60th birthday. BUDSIR II, the first Romanized version
of the Tipitaka was developed in September 1989 providing another
channel through which the study of Buddhism is accessible to the
international community. BUDSIR III was developed in April 1990
allowing more complex search queries using the mathematical concept of
Boolean Algebra. His Majesty the King Bhumibhol Adulyadej The Great
continued to support the study of computerizing of the Buddhist scriptures
and its commentary and BUDSIR IV was developed in November 1991
which included 45 volumes of Tipitaka and 70 volumes of the  Atthakatha
and its related scriptures. BUDSIR IV  includes both  the Thai and
Romanized Pali versions of the scriptures and is thus the  most complete.
BUDSIR IV was developed to store the scriptures on a hard  disk which
was found to be prone to virus attacks and often caused loss  to
information. BUDSIR on CD-ROM was thus developed and was
completed in  July 1994.

          BUDSIR's  internal  structure  is elaborately developed using
mature and efficient information retrieval techniques usually used in large
databases and specially designed with the ease of use for users of all levels
of competence in mind.


          In the endeavor to pursue a particular subject in Tipitaka and
Atthakatha that  contain  tremendous amounts of information, not only
does  one have to overcome the barrier of the Pali language, but also
overwhelming  amounts  of  information  so  widely  scattered under a
variety  of headings within a volume. Hence it is extremely difficult to
retrieve the information in question, accurately and exhaustively.  An
attempt  has been made to store the entire Tipitaka and Atthakatha in
digital  form  so  that any research that needs to gain access to this huge
database will be greatly facilitated.

        BUDSIR is unique in its accuracy, speed and completeness.  It can
retrieve  any  word  (including compounds), phrase or stretch of text  that
can  be  found in the Buddhist Scriptures. Moreover, this digital  edition  is

also capable of searching both the Tipitaka and Atthakatha simultaneously,
showing the results in two separate windows so that they can be studied
and compared.


     The Buddhist Scriptures included in the Digital Tipitaka and Attakatha
consist of 115 volumes, or 50,189 pages of text. The data can be divided
into two groups as follows:

     1. The Pali Tipitaka in Thai script, Siamrattha version, 45 volumes with
a total of 24 million characters. After computerized transliteration in
Romanized script, the size becomes 31 million characters.

     2. The Atthakatha, commentary and other important scriptures, 70
volumes with a total of 37 million characters comprising:

        a. The Atthakatha: 55 volumes,

        b. The text used in Thai monastic Pali examinations and two essential
scriptures: The Milindapanha and The Bhikkhu Patimokkha-Pali.

     After computerized transliteration in Romanized script, the size
increases to 47 million characters.

     The data was prepared with Pali text editor developed by the MUCC.
The data from each volume was entered twice and verified by a computer
program which pin-pointed any discrepancies between the two versions,
which were then corrected until the two versions were identical. This was
done by eighty typists, each working at a rate of thirty Pali words a
minute, or on average 15 pages a day.


     The database structure of the Digital Edition of Buddhist Scriptures is
essentially an inverted file similar to that in the STAIRS system on the
IBM main-frame. The system is composed of three main groups of data
files: (1) the Text-block file, (2) the Dictionary file, and (3) the Inverted

     The Text-block file is a computerized collection of all the data from
115 printed volumes of the Tipitaka and Atthakatha.

     The Dictionary file is a collection of all lexical items found in the
Tipitaka and Atthakatha. Each lexical items are arranged in the form of a
B-tree structure with the pointers cross-referring to the hierarchical orders
on the tree.

     The Inverted file actually is a list of occurrences of all the words found

in the Text-block file. Each word will be cross-referred from the
Dictionary file. The occurrence code consists of the volume number, page
number, line number, word number and, when applicable, a flag to indicate
last word of the line or the page. This is to facilitate data management in
searching, particularly in adjacent words, including searching via Boolean
operators for the future version.


1. Inherent B-TREE Architecture

         Since B-Tree has been known as the most efficient structure for
any heavily accessed database. BUDSIR is crafted on this superb

2. Several Efficient Search Methodologies

        BUDSIR features 2 efficient search methods. User is able to
launch a search using word/phrase keyword or using volume/page/item

3. Dual Windows Display

        BUDSIR independently displays the Tipitaka and the Atthakatha
in separate windows. User is able to freehandedly select which
window to display which manuscript.

4. Working brilliantly in graphical environment

        BUDSIR completely runs in graphics mode display; definitely
no need to modify the video graphic adapter to display the characters.

5. Pull-Down Menus and Mouse Support

        Any feature can be accessed using hot-key, pull-down menus or
a mouse.

6. Printing

        BUDSIR supports every de facto standard 9-pin and 24-pin dot-
matrix printer and also  HP Laser Jet printer or compatible.

7. Saving a Scripture Passage to Disk

        BUDSIR  allows user to save any passage displaying on the
screen to disk for private use. The text file saved by BUDSIR can be
edited using general text editors.


     To perform gracefully, BUDSIR essentially needs equipment along the
following specifications:

1.      An IBM PC, AT, PS/2 computers, or a true compatible using Intel-
        based 80386, or 80486 microprocessors,

2.      At least 2 MegaBytes of RAMs,

3.      A superVGA color graphic adapter and a matching monitor,

4.      A standard CD-ROM drive for reading data on a CD-ROM,

5.      A hard disk drive with capacity not less than 5 MB for BUDSIR's
        temporary working area,

6.      A keyboard and a Microsoft compatible mouse,

7.      A floppy disk drive,

8.      A printer,

9.      MS-DOS version 5 or higher.

Moreover, for Macintosh users, BUDSIR IV can also run on Macintosh
computers, e.g., Mac II, LC, Classic, Quadra, Power PC, etc., with
SoftWindows (or SoftAT or SoftPC) emulator program and OS version 7.0
or higher.


Authors Address : Mahidol University Computing Center, Faculty of Science,

                          Rama VI Rd., Bangkok 10400, THAILAND

                          Tel : (662) 247-0333, FAX : (662) 246-7308,

                          Email : budsir@mahidol.ac.th
