thai-language.comInternet resource
for the Thai language
Page 1 of 2

Asian Script Converter

PostPosted: Sun May 22, 2011 7:39 pm
by vinodhrajan
Hi Guys,

I have developed a Asian Script Converter which converts between all major South Asian Script which include all the Mainland Indian Scripts, Sinhala and the East Asian scripts Thai, Burmese & Khmer (as an offshoot Urdu too ).

The converter can be accessed at : http://www.virtualvinodh.com/aksharamukha

The Character Matrix for all these scripts can be found here: http://www.virtualvinodh.com/character-matrix

Specifically, Thai related notes & options have been provided at : http://www.virtualvinodh.com/thai

Please try the converter. I would like to have your feedback & comments on the Converter, especially with respect to Thai (and other East Asian scripts).

Thanks all.

V

Re: Asian Script Converter

PostPosted: Sun May 22, 2011 11:43 pm
by Richard Wordingham
You've done enough research to know that Thai is hard.

One case you haven't (fully) covered is clusters plus preposed vowels. For example, HK draupadI should yield เทฺราปที , not ทฺเราปที / ทเราปะที. There's a relevant article in Thai at http://www.huso.buu.ac.th/thai/web/pers ... 2chap5.htm - I haven't studied it myself. One problem is that initial and intervocal clusters behave differently, so that for example you correctly convert HK buddho to พุทฺโธ. I've seen some slightly surprising placements of the preposed vowels in Pali in Thai script - something like intervocalic -svo- becoming -โสฺว- if I remember correctly.

Visarga isn't lost - it survives as sara a ().

You'll have fun when you add the Tham script (Thai name tua mueang) - sometimes you have to use a consonant sign instead of choeng plus consonant.

Re: Asian Script Converter

PostPosted: Mon May 23, 2011 2:44 am
by vinodhrajan
Hi Richard,

Thanks a lot for your response.

http://tipitaka.org/thai/

Here: Buddho is written as พุโทฺธ instead of พุทฺโธ.

Is it a standard rule for initial & intervocalic cluster to behave differently ?

Or can we standardize the behavior of (โทฺธ) for clusters at all positions like that of http://tipitaka.org/thai/.

Are there any other references to Pali transliterated into Thai ? If we can compare them, we can come up with a rule of a thumb for Transliteration of clusters into Thai.

V

Re: Asian Script Converter

PostPosted: Mon May 23, 2011 3:37 am
by vinodhrajan
I have found quite a few thai print editions here:

http://hall.worldtipitaka.org/node/240199

All of which spell Buddho as "พุทฺโธ". (Tena Samayena Buddho [...] in the first line)

http://www.flickr.com/photos/dhammasoci ... 959624647/
http://www.flickr.com/photos/dhammasoci ... 959624647/
http://www.flickr.com/photos/dhammasoci ... 959624647/
http://www.flickr.com/photos/dhammasoci ... 959624647/
http://www.flickr.com/photos/dhammasoci ... 959624647/

[as you suggested] I suppose its suffice just to change the initial cluster formation rule for vowels <e, o, ai, au>.

V

Re: Asian Script Converter

PostPosted: Mon May 23, 2011 10:59 pm
by Richard Wordingham
For the initial clusters, the vowel comes before the two consonants. This seems to be the rule even if the cluster results from elision. (The Thai script seems to have no avagraha.)

For medial clusters, the rules are more complicated. I haven't been able to find a statement of the rules, so I'm having to look at examples. Some of the rules are fairly clear:

1) Nasal plus oral stop is split by a vowel.
2) Nasal geminates are split
3) มฺห is split. On the other hand, มฺย might not be split - but I only have one examples so far.
4) Homorganic stop clusters are split by the vowel.
5) Oral stop plus semivowel (e.g. ตฺร, พฺย) is not spit by a vowel (muta cum liquida).

An interesting example of the interaction of these rules is ภุญฺเชฺย.

ยฺย is a nasty case - sometimes the vowel comes before, sometimes afterwards.

I intend to do a corpus search on the texts from http://www.learntripitaka.com/ . Unfortunately, it's not free from typographical errors, and they used a non-standard coding to get round rendering problems that are now largely history.

Re: Asian Script Converter

PostPosted: Tue May 24, 2011 10:41 am
by Rick Bradford
Advertised in this week's Matichon is a book called อักษรไทย มาจากไหน? by สุจิตต์ วงษ์เทศ.

Re: Asian Script Converter

PostPosted: Tue May 24, 2011 5:19 pm
by vinodhrajan
Richard Wordingham wrote:For the initial clusters, the vowel comes before the two consonants. This seems to be the rule even if the cluster results from elision. (The Thai script seems to have no avagraha.)

For medial clusters, the rules are more complicated. I haven't been able to find a statement of the rules, so I'm having to look at examples. Some of the rules are fairly clear:

1) Nasal plus oral stop is split by a vowel.
2) Nasal geminates are split
3) มฺห is split. On the other hand, มฺย might not be split - but I only have one examples so far.
4) Homorganic stop clusters are split by the vowel.
5) Oral stop plus semivowel (e.g. ตฺร, พฺย) is not spit by a vowel (muta cum liquida).

An interesting example of the interaction of these rules is ภุญฺเชฺย.

ยฺย is a nasty case - sometimes the vowel comes before, sometimes afterwards.

I intend to do a corpus search on the texts from http://www.learntripitaka.com/ . Unfortunately, it's not free from typographical errors, and they used a non-standard coding to get round rendering problems that are now largely history.


To summarize:

Only the following is not split by the vowel:

1) Word Initial clusters [whatever be the case] - เทฺว

2) Clusters with Semi-Vowels [ra/ya] - นิเทฺร , วินฺไทฺย . Does it include other semi-vowels [la & va] ?

3) yya can be standardized to be not split based on above rule.

In all other cases, the Cluster is split.

If these are the final ones.. I will make the necessary changes to the converter to display the Thai Clusters correctly.

Thanks again for your suggestions.

V

Re: Asian Script Converter

PostPosted: Tue May 24, 2011 6:36 pm
by Richard Wordingham
(This should have bee posted this morning (GMT) - it doesn't answer this afternoon's questions. I'm still researching the rules.)

It's looking more and more complicated. It seems that practice is not uniform. In one sections, one mostly gets ตุมฺเห and in another one mostly gets ตุเมฺห.

Re: Asian Script Converter

PostPosted: Sun May 29, 2011 11:32 am
by vinodhrajan
I found a Multi-Script version of the Tripitaka here. Seems to be very authentic. They have even faithfully reproduced the Sinhala Pali Conjuncts in the Tripitaka . Except for Thai, they use a non-Unicode font for other scripts

http://budsir.mahidol.ac.th/

I did a search myself for some clusters.

Seems all of the conventions elucidated before is followed. Some additional conventions, Conjuncts of -h are not split as mhe, nhe, etc. hme is also not split.

V

Re: Asian Script Converter

PostPosted: Thu Jun 02, 2011 12:45 am
by Richard Wordingham
The rules seem to be, in order of priority:

1) Word initial clusters are not split.
2) Geminates are split.
3) Nasal or Indic semivowel (ย ร ล ว) or + is not split.
4) Consonant + v/y is not split.
5) / + consonant is not split.
6) Oral stop + / is not split.
7) Other combinations are split.

There are a lot of exceptions to Rules 3 to 6, but they seem to be random. Rule 2 isn't infallible either.

I had worried about the apparent sequence ชฺเย that I had seen, but it turned out to be due to misspellings of -ปจฺจเยน as -ปจฺเยน (four times!) and -ปจจฺเยน (once). Likewise, the apparent sequence นฺโห turned out to be a misspelling of เสฺนโห as เสนฺโห!

Copyright © 2024 thai-language.com. Portions copyright © by original authors, rights reserved, used by permission; Portions 17 USC §107.