THE AUTOMATIC ANALYSIS OF THE GEORGIAN SENTENCE

Authors

  • Oleg Kapanadze Author
  • Nana Kapanadze Ivane Javakhishvili Tbilisi State University Author

Abstract

Until recently, most basic research in Natural Language Technology (NLT) has been performed on “major” languages such as (predominantly) English but also German, Japanese, Chinese, French, and Spanish. At the same time, Low-Density Languages (LDL) compete to take advantage of modern digital technologies implemented in high-quality computing systems. As a result, the long-term viability of languages not specifically supported by NLT is at risk, which can lead to their digital extinction. This paper presents an undertaking for developing computational applications involving Georgian to fill a gap with technologically well-equipped languages and to lower the current scarcity of language resources for Georgian text processing. It is well known that Georgian is a language with rich inflectional morphology and with very few fixed structures on the sentence level. The languages of similar design are called Morphologically Rich and Less-Configurational (MR&LC). This paper concerns issues related to developing crucial NLT tools for the MR&LC Georgian language: We discuss the development of a Feature-Based Context-Free Grammar (FCFG) and a Featured Grammar parser for the Less-Resourced Georgian language. Generative lexicalised parsing models, which are the mainstay for probabilistic parsing, do not perform as well when applied to languages with free word order or rich morphology. Based on the syntactic valency property of the verb and language-specific features such as productive morphology, we designed a prototype FCFG parser for automatic syntactic chunking/shallow parsing of the Georgian clause, which we present here.

Downloads

Download data is not yet available.

Published

2023-11-11

How to Cite