thai-language.comInternet resource
for the Thai language
Page 1 of 1

LKB system back online

PostPosted: Wed Oct 14, 2009 10:25 pm
by Glenn Slayden
This morning I gave a talk to my department on my work on this website and future plans for it, including my forthcoming master's thesis work, which will probably be on extending the computational grammar of the Thai language that I started last spring. As part of this talk, I resurrected the experimental Thai syntax parsing system I was working on back then.

The set of sentences that are included in the system's vocabulary are listed here, along with links to a version of the dictionary entry that includes live syntactic parsing of the sentence. A green "bead" on that page indicates either:
  • a sentence that is expected parse (i.e. is syntactically correct), and for which the system found one or more parses (green highlight), or
  • a sentence that is not expected to parse (i.e. that is syntactically incorrect), and which the system correctly failed to parse (white highlight).

Yellow highlighting indicates grammatical sentences that the system was not able to parse (false negative).
Red highlighting indicates ungrammatical sentences that the system found one or more parses for (false positive).

If the backend connection to my Linux system which runs the LKB parsing engine is working, then clicking on the id number in the left column brings you to a special dictionary page which includes a live re-parse of the sentence. If parses are found, then this section includes three parts:
  • one or more parse trees showing how the parts of the sentence fit together into a syntax tree
  • the meaning of the sentence, expressed in Minimal Recursion Semantics (MRS) (Copestake 1999)
  • full Attribute-Value Matricies (AVMs) for each parse.

Much of this information is technical and requires familiarity with the HPSG grammar formalism, but over time I expect to make the information more accessible. At this point, I'm just documenting--for the curious--an obscure backwater of this website which one day may become more prominent.

Re: LKB system back online

PostPosted: Tue Nov 10, 2009 9:12 pm
by Thomas
Glenn Slayden wrote:here
.

WOW! Simply wow (this could be also a very interesting instruction to Thai grammar). Will your thesis be published once you have your master degree? I'm very eager to read it!

Re: LKB system back online

PostPosted: Tue Nov 10, 2009 10:18 pm
by Glenn Slayden
Thanks for your interest Dr. Stock. I have a lot of work in progress that will take some time to develop for the website, but rest assured that this is definitely my long-term goal.

Glenn

Re: LKB system back online

PostPosted: Mon Oct 31, 2011 7:39 am
by keetper
Good day, Glenn Slayden!
This is a really worthy work to look forward to. Keep us in the loop, when your master degree is defended. Good luck!

Re: LKB system back online

PostPosted: Sun May 05, 2013 10:56 am
by Glenn Slayden
keetper wrote:Keep us in the loop, when your master degree is defended.


Well, it was defended last fall, but that thesis ended up getting bogged down in the grammar-supporting toolset, rather than the development of the Thai grammar itself. If you're still interested, the paper can be found here.

Meanwhile, I'm hoping that the actual further development of the Thai grammar proper (which is still in "toy" status at present) will now be my Ph.D. work! (As it turns out, developing a competent analytical grammar of a natural language is hard. :shock: Who knew.)

Also meanwhile, a graduate seminar I'm currently taking is advancing my fledgling work on Thai-English semantic transfer. If it were not for the seminar, this work--though long planned--would probably have been delayed until after the Thai grammar was upgraded somewhat. So for the moment I'm sketching out a new formalism within which one will be able to author a set of declarative correspondences between English and Thai semantic structures (yes, there does appear to be some additional tool work involved!).

Putting the pieces together, given:
(1.) an upgraded HPSG-computaional grammar of Thai (ongoing work, currently the grammar is still in "toy" status);
(2.) the English Resource Grammar (Flickinger et. al 2000), which is a highly-developed, open-source HPSG grammar of English (emphatically not a toy);
(3.) an efficient HPSG parser (done!);
(4.) an efficient HPSG generator (done!);
(5.) a processing engine for the declarative transfer rule formalism (demo project for current seminar, but a real version will eventually be needed); and
(6.) a set of Thai-English declarative "transfer rules" written to this formalism (which I'm also mocking-up at present);

...one theoretically has all the ingredients for a fairly competent rule-based Thai-English and English-Thai machine translation system :!:

Note that, unlike the Bing and Google translation functions which rely on purely statistical techniques for machine translation (SMT), the planned system is entirely based on hand-written analytical grammars. These "precision grammars" are so-called because they offer precision at the expense of recall. SMT systems are an example of systems with great recall: We've all the the experience of getting gibberish tranlsations from SMT systems, but their high recall means that: at least you got something (regardless of whether it's usable or not).

Contrast this to a system characterized by high precision (such as the one I'm working on): some of the translation inputs you submit may be more likely to return nothing, but at least when you do get a result, you'll know that it's guaranteed to be grammatically correct. In 2013, there are very few, if any, widely-known, public systems that offer rule-based MT to the public for free general use. It will be interesting to see if this is still the case when it comes to pass that I'm finally able to launch reasonably competent Thai-English and English-Thai translation on this website.

It will be also be interesting to see how the advantages and disadvantages of each approach (statistical versus rule-based MT) become weighted vis-a-vis particular language pairs. For example, consider the English-Thai pair versus, say, English-Spanish. Though each pairing will surely exhibit it's own distinct linguistic challenges, I don't have an intuition about which MT approach might fare better for which pairing. This is ironic since I've essentially bet the past 6 years of my life on the idea that rule-based MT will be more successful for the former pairing.

Re: LKB system back online

PostPosted: Sat Nov 22, 2014 2:55 pm
by Richard Wordingham
Glenn Slayden wrote:If the backend connection to my Linux system which runs the LKB parsing engine is working, then clicking on the id number in the left column brings you to a special dictionary page which includes a live re-parse of the sentence. If parses are found, then this section includes three parts:
  • one or more parse trees showing how the parts of the sentence fit together into a syntax tree
  • the meaning of the sentence, expressed in Minimal Recursion Semantics (MRS) (Copestake 1999)
  • full Attribute-Value Matricies (AVMs) for each parse.

Is the backend connection supposed to be working nowadays? I fear it may have stopped and not been restored for lack of public interest.

Glenn Slayden wrote:Putting the pieces together, given:
(1.) an upgraded HPSG-computaional grammar of Thai (ongoing work, currently the grammar is still in "toy" status);
(2.) the English Resource Grammar (Flickinger et. al 2000), which is a highly-developed, open-source HPSG grammar of English (emphatically not a toy);
(3.) an efficient HPSG parser (done!);
(4.) an efficient HPSG generator (done!);
(5.) a processing engine for the declarative transfer rule formalism (demo project for current seminar, but a real version will eventually be needed); and
(6.) a set of Thai-English declarative "transfer rules" written to this formalism (which I'm also mocking-up at present);

...one theoretically has all the ingredients for a fairly competent rule-based Thai-English and English-Thai machine translation system :!:

Where does the guessing go in this system? For example, I was wondering how your demonstration system translates เขา ไป ซื้อ ดอกไม้ ที่ ตลาด และ ไป เยี่ยม เพื่อน as 'She bought flowers at the market and went to visit a friend'. There's a noticeable amount of cultural knowledge in choosing 'she' over 'he', which it seems your parser does use, and 'she' v. 'they' and the choice of English tense would be statistical guesses in a full system, and may well be in your mock-up.

As it turns out, developing a competent analytical grammar of a natural language is hard. :shock: Who knew?

I didn't get the impression that developing one for English was easy, and there've been many people working on it. I still wonder whether they can cope with a political comment like "Colourless green ideas long dreamed furiously, but now they're on the march".

Copyright © 2024 thai-language.com. Portions copyright © by original authors, rights reserved, used by permission; Portions 17 USC §107.