thai-language.comInternet resource
for the Thai language
Lookup:
» more options here
Browse

F.A.Q. Check out the list of frequently asked questions for a quick answer to your inquiry

e-mail the author
guestbook
site settings
site news
bulk lookup
Bangkok
Thanks for your

recent donations!

Narisa N. $+++!
John A. $+++!
Paul S. $100!
Mike A. $100!
Eric B. $100!
John Karl L. $100!
Don S. $100!
John S. $100!
Peter B. $100!
Ingo B $50
Peter d C $50
Hans G $50
Alan M. $50
Rod S. $50
Wolfgang W. $50
Bill O. $70
Ravinder S. $20
Chris S. $15
Jose D-C $20
Steven P. $20
Daniel W. $75
Rudolf M. $30
David R. $50
Judith W. $50
Roger C. $50
Steve D. $50
Sean F. $50
Paul G. B. $50
xsinventory $20
Nigel A. $15
Michael B. $20
Otto S. $20
Damien G. $12
Simon G. $5
Lindsay D. $25
David S. $25
Laurent L. $40
Peter van G. $10
Graham S. $10
Peter N. $30
James A. $10
Dmitry I. $10
Edward R. $50
Roderick S. $30
Mason S. $5
Henning E. $20
John F. $20
Daniel F. $10
Armand H. $20
Daniel S. $20
James McD. $20
Shane McC. $10
Roberto P. $50
Derrell P. $20
Trevor O. $30
Patrick H. $25
Rick @SS $15
Gene H. $10
Aye A. M. $33
S. Cummings $25
Will F. $20
Get e-mail

Sign-up to join our mail­ing list. You'll receive e­mail notification when this site is updated. Your privacy is guaran­teed; this list is not sold, shared, or used for any other purpose. Click here for more infor­mation.

To unsubscribe, click here.

How do I recognize where Thai words begin and end?

The Thai language does not use spaces between the words in a sentence. This poses unique challenges for the beginning student whose first language does utilize word spacing, but with practice, reading Thai becomes second nature.

If you reflect on how you read English, for example, you might be surprised to learn that we recognize words as a whole, rather than by parsing the individual letters from which they are composed. Doing the latter would make reading very much slower! Rest assured that the same will hold for Thai. Like a first-grader, we begin by following each individual letter, but when you can read at a more advanced grade school level, you will be recognizing entire words, and you won't even notice that there are no spaces between the words.

Because Thai is "designed" to be written without spaces between the words, it will be easier to read Thai than it is to read this:
Englishwritingprovidesfewercluesaboutwordbreaking.
That's because Thai has rules such as the following:
  • The preposed vowels (เ แ โ ใ ไ) start a syllable.
  • ะ ends a syllable (unless it is followed by a consonant with the symbol อ์ as in the word เคราะห์ . These exceptions are rare. The symbol อ์ is called การันต์   /gaaM ranM/ or gaaran.)
  • Except for European loan words (such as กอล์ฟ   /gaawpH/), gaaran ends a syllable.
  • A syllable starting with or is an open syllable.
  • อั and อ็ do not appear over a syllable final consonant.
  • Sometimes two consonants form an initial cluster together; a tone mark, if any, will appear on the second consonant of such a cluster.
  • อำ ends a syllable.
These rules go a long way, and will be helpful until you start forming word boundaries by recognising words. Having said all this, Thai does use spaces in some cases. See Bryan's article for more information on Spacing in the Thai Language.

Thai Word- and Sentence-Breaking

Because Thai doesn't use space between words, the task of automatically separating Thai text into words has been a long-standing challenge in the field of computational linguistics. A further challenge is to identify sentence boundaries in Thai text, because—as Bryan's article points out—Thai also uses space for various functions within a sentence, so a given space may or may not indicate the end of a sentence. For more information on these interesting problems, you may be interested in the following academic papers I've (co-)authored on the subject:

Glenn Slayden and Elias Luqman. 2010. Derivative Sentence Breaking for Moore Alignment.
Glenn Slayden, Mei-Yuh Hwang, and Lee Schwartz. 2010. Large-Scale Thai Statistical Machine Translation. Microsoft Technical Report MSR-TR-2010-41. Redmond: Microsoft Corporation.
Glenn Slayden, Mei-Yuh Hwang, and Lee Schwartz. 2010. Thai Sentence-Breaking for Large-Scale SMT. Proceedings of the 1st Workshop on South and Southeast Asian Natural Language Processing, Beijing, China, August 2010, p. 8-16. COLING 2010 Organizing Committee.

This article by Glenn Slayden based on material by Richard Wordingham
Last updated: November 5, 2010
Copyright © 2024 thai-language.com. Portions copyright © by original authors, rights reserved, used by permission; Portions 17 USC §107.